Today, if you’re working at a big tech company, the chances are that working on a web application is something like this:
- You’re building an app that comprises a list view and a detail view with some editing, and for bonus points the list view has to support reordering, filtering, and sorting. This is considered to be quite an undertaking.
- There is currently an app that does 90% or more of what your app is intended to do, but it’s not quite right, and because it was written in Angular/RxJs or React/Redux it is now considered to be unmaintainable or broken beyond repair.
- The services for the existing app deliver 100% of the back-end data and functionality necessary for your new app to work.
- (There’s also probably a mobile version of the app that likely consumes its own service layer, and somehow has been maintained for five years despite lacking the benefit of a productivity-enhancing framework.)
- Because the service layer assumes a specific client-side state-management model (i.e. the reason the existing app is unmaintainable) and because your security model is based on uniquely tying unique services to unique front-ends and marshaling other services (that each have their own security model) on the front-end server) there is absolutely no way your new app can consume the old app’s services. Also, they’re completely undocumented. (Or, “the code is the documentation”.)
- Your new app will comprise a front-end-server that dynamically builds a web-page (server-side) on demand and then sends it to the end-user. It will then consume services provided by the front-end-server OR a parallel server.
- When you start work each day you merge with master and then fire off a very complicated command (probably by using up-arrow or history because it’s impossible to remember) to build your server from soup-to-nuts so that, god-willing and assuming nothing in master has broken your build, in five to fifteen minutes you can reload the localhost page in your browser and see your app working.
- You then make some change to your code and, if you’re lucky you can see it running 15s later. If you’ve made some trivial error the linters didn’t catch then you get 500 lines of crap telling you that everything is broken. If you made non-trivial dependency changes, you need to rebuild your server.
Imagine if we designed cars this way.
Wikipedia, which is usually good on CS topics, is utterly wrong on the topic of “leaky abstractions”. It wasn’t popularized in 2002 by Joel Spolsky (first, it’s never been “popularized”, and second it has been in use among CS types since at least the 80s, so I’m going to hazard that it comes from Djikstra). And the problem it highlights isn’t…
the reliance of the software developer on an abstraction’s infallibility
In Joel’s article, his example is TCP, which he says is supposed to reliably deliver packets of information from one place to another. But sometimes it fails, and that’s the leak in the abstraction. And software that assumes it will always work is a problem.
I don’t disagree with any of this particularly, I have a problem with the thing that’s defined as the “leaky abstraction”. “TCP is a reliable messaging system” is the leaky abstraction, not the programmer assuming it’s true. But it’s also a terrible abstraction that no-one actually uses.
I’d suggest that a leaky abstraction is actually an abstraction is one that requires users to poke through it into the underlying implementation to actually use it effectively. E.g. if you were to write a TCP library that didn’t expose failures, you’d need to bypass it and get at the actual errors to handle TCP in actual use.
In other words, an abstraction is “leaky” when you must violate it to use it effectively.
An example of this is “the user interface is a pure function of its data model” which turns out to be impractical, necessitating the addition of local “state” which in turn is recognized as a violation of the original abstraction and replaced with “pure components” which in turn are just as impractical as before and thus violated by “use hooks to hide state anywhere”.
Assuming abstractions are perfect can be a problem, but usually it’s actually the opposite of the real problem. It’s like the old chesnut “when you assume you make an ass out of u and me”. It’s utterly wrong — if you don’t assume you’ll never get anything done, although sometimes you’ll assume incorrectly.
The command and pernicious problem is software relying on or assuming flaws in the abstraction layer. A simple (and common) example of this is software developers discovering a private, undocumented, or deprecated call in an API and using it because it’s easy or they assume that, say, because it’s been around for five years it will always be there, or that because they work in the same company as the people who created the API they can rely on inside knowledge or contacts to mitigate any issues that arise.
The definition in the Wikipedia article (I’ve not read Joel Spolsky’s article, but I expect he actually knows what he’s talking about; he usually does) is something like “assuming arithmetic works”, “assuming the compiler works”, “assuming the API call works as documented”. These are, literally, what you do as a software engineer. Sometimes you might double-check something worked as expected, usually because it’s known to fail in certain cases.
- If an API call is not documented as failing in certain cases but it does, that’s a bug.
- Checking for unexpected failures and recovering from them is a workaround for a bug, not a leaky abstraction.
- Trying to identify causes of the bug and preventing the call from ever firing if the inputs seem likely to cause the failure is a leaky abstraction. In this case this code will fail even if the underlying bug is fixed. This is code that relies on the flaw it is trying to avoid to make sense.
According to Wikipedia, Joel Spolsky argues that all non-trivial abstractions are leaky. If you consider the simplest kinds of abstraction, e.g. addition, then even this is leaky from the point of view of thinking of addition in your code matching addition in the “real world”. The “real world” doesn’t have limited precision, overflows, or underflows. And indeed, addition is defined for things like imaginary numbers and allows you to add integers to floats.
The idea that all non-trivial abstractions are leaky is something I’d file under “true but not useful”, it’s like “all non-trivial programs contain at least one bug”. What are we supposed to do? Avoid abstractions? Never write non-trivial programs? I suggest that abstractions are only useful if they’re non-trivial, and since they will probably be leaky, the goal is to make them as leak-free as possible, and identify and document the leaks you know to exist, and mitigate any you stumble across.
I don’t know if some warped version of Spolsky’s idea infected the software world and led it to a situation in which, to think in terms of designing a car in the real world, to change the color of the headrest and see if it looks nice, we need to rebuild the car’s engine from ore and raw rubber using the latest design iteration of the assembly plant and the latest formulation of rubber, but it seems like it has.
In the case of a car, in the real world you build the thing largely out of interchangeable modules and there are expectations of how they fit together. E.g. the engine may be assumed to fit in a certain space, have certain standard hookups, and have standardized anchor points that will always be in the same spot. So, you can change the headrest confident that any new, improved engine that adheres to these rules won’t make your headrest color choices suddenly wrong.
If, on the other hand, the current engine you were working with happens to have some extra anchor point and your team has chosen to rely on that anchor point, now you have a leaky abstraction! (It still doesn’t affect the headrest.)
Abstractions have a cost
In software, abstractions “wrap” functionality in an “interface”. E.g. you can write an add() function that adds two numbers, i.e. wraps “+”. This abstracts “+”. You could check to see if the inputs can be added before trying to add them and um do something if you thought they couldn’t. Or you could check the result was likely to be correct by, say, subtracting the two values from zero and changing the sign to see if it generated the same result.
For the vast majority of cases, doing anything like this and, say, requiring everyone in your company to use add() instead of “+” would be incredibly stupid. The cost is significant:
- your add may have new bugs, and given how much less testing it has gotten than “+” the odds are pretty high
- your add consumes more memory and executes more code and will slow down every piece of code it runs in
- code using your add may not benefit from improvements to the underlying “+”
- time will be spent in code reviews telling new engineers to use add() instead of “+” and having back and forth chats and so on.
- some of those new engineers will start compiling a list of reasons to leave your company based on the above.
- all your programmers know how to use “+” and are comfortable with it
- knowing how “+” works is a valuable and useful thing anywhere you work in future; knowing how add() works is a hilarious war story you’ll tell future colleagues over drinks
So, an abstraction needs to justify itself in terms of what it simplifies or it shouldn’t exist. Ways in which an abstraction becomes a net benefit include:
- insulating your code from capricious behavior in dependencies (maybe the people who write your compiler are know to badly break “+” from time to time, so you want to implement it by subtracting from zero and flipping for safety. Usually this is a ridiculous argument, e.g. one of the funnier defenses I’ve heard of React was “well what if browsers went away?” Sure, but what are the odds React outlives browsers?
- insulating user code from breaking changes. E.g. if you’re planning on implementing betterAdd() but it can break in places where add() is used, you can get betterAdd() working, then deprecate add() and eventually switch to betterAdd() or, when all the incompatibilities are ironed out, replace add() with betterAdd().
- saving the user a lot of common busywork that you want to keep DRY (e.g. to avoid common pitfalls, reduce errors, improve readability, provide a single point to make fixes or improvements, etc.). Almost any function is anyone writes is an example of this.
- eliminate the need for special domain knowledge so that non-specialists can use do something that would otherwise require deep knowledge to do, e.g. the way threejs allows people with no graphics background to display 3d scenes, or the way statistics libraries allow programmers who never learned stats to calculate a standard deviation (and save those who did the trouble of looking up the formula and implementing it from scratch).
If your abstraction’s benefits don’t outweigh its costs, then it shouldn’t exist. Furthermore, if your abstraction’s benefit later disappears (e.g. the underlying system improves its behavior to the point where it’s just as easy to use it as your abstraction) then it should be easy to trivialize, deprecate, and dispose of the abstraction.
jQuery is an example of a library which provided huge advantages which slowly diminished as the ideas in jQuery were absorbed by the DOM API, and jQuery became, over time, a thinner and thinner wrapper over built-in functionality.
E.g. lots of front-end libraries provide abstraction layers over XMLHttpRequest. It’s gnarly and there are lots of edge cases. Even if you don’t account for all the edge cases, just building an abstraction around it affords the opportunity to fix it all later in one place (another possible benefit).
Since 2017 we have had fetch() in all modern browsers. Fetch is simple. It’s based on promises. New code should use fetch (or maybe a library wrapped around fetch). Old abstractions over XMLHttpRequest should either be deprecated and/or rewritten to take advantage of fetch where possible.
Abstractions, Used Correctly, Are Good
The problem with abstractions in front-end development isn’t that they’re bad, it’s that they’re in the wrong places, as demonstrated by the fact that making a pretty minor front-end change requires rebuilding the web server.
A simple example is React vs. web-components. React allows you to make reusable chunks of user interface. So do web-components. React lets you insert these components into the DOM. So do web-components. React has a lot of overhead and often requires you to write code in a higher-level form that is compiled into code that actually runs on browsers. React changes all the time breaking existing code. React components do not play nicely with other things (e.g. you can’t write code that treats a React Component like an <input> tag. If you insert web-components inside a React component it may lose its shit.
Why are we still using React?
- Does it, under the hood, use non-obvious best practices? At best arguable.
- Does it insulate you from capricious APIs? No, it’s much more capricious than the APIs it relies on.
- Does it insulate you from breaking changes? Somewhat. But if you didn’t use it you wouldn’t care.
- Does it save you busywork? No, in fact building web-components and using web-components is just as easy, if not easier, than using React. And learning how to do it is universal and portable.
- Does it eliminate the need for domain knowledge? Only to the extent that React coders learn React and not the underlying browser behavior OR need to relearn how to do things they already know in React.
Angular, of course, is far, far worse.
How could it be better?
If you’re working at a small shop you probably don’t do all this. E.g. I built a digital library system from “soup-to-nuts” on some random Ubuntu server running some random version of PHP and some random version of MySQL. I was pretty confident that if PHP or MySQL did anything likely to break my code (violate the abstraction) this would, at minimum, be a new version of the software and, most likely, a 1.0 or 0.1 increment in version. Even more, I was pretty confident that I would survive most major revisions unscathed, and if not, I could roll back and fix the issues.
It never burned me.
Why on Earth don’t we build our software out of reliable modular components that offer reliable abstraction layers? Is it because Joel Spolsky allegedly argued, in effect, that there’s no such thing as a reliable abstraction layer? Or is it because people took an argument that you should define your abstraction layers cautiously (aware of the benefit/cost issue), as leak-free as possible, and put them in the right places, and instead decided that was too hard: no abstraction layer, no leaks? Or is it the quantum improvement in dependency-management systems that allows (for example) a typical npm/yarn project to have thousands of transitive dependencies that update almost constantly without breaking all that often?
The big problem in software development is scale. Scaling capacity. Scaling users. Scaling use cases. Scaling engineers. Sure, my little digital library system worked aces, but that was one engineer, one server, maybe a hundred daily/active users. How does it scale to 10k engineers, 100k servers, 1B customers?
Scaling servers is a solved problem. We can use an off-the-shelf solution. Actually the last two things are trivial.
10k engineers is harder. I would argue first off that having 10k engineers is an artifact of not having good abstraction layers. Improve your basic architecture by enforcing sensible abstractions and you only need 1k engineers, or fewer. At the R&D unit of a big tech company I worked at, every front-end server and every front-end was a unique snowflake with its own service layer. There was perhaps a 90% overlap in functionality between any given app and at least two other apps. Nothing survived more than 18 months. Software was often deprecated before it was finished.
Imagine if you have a global service directory and you only add new services when you need new functionality. All of a sudden you only need to build, debug, and maintain a given service once. Also, you have a unified security model. Seems only sensible, right?
Imagine if your front-end is purely static client-side code that can run entirely from cache (or even be mobilized as an offline web-app via manifest). Now it can be served on a commodity server and cached on edge-servers.
Conceptually, you’ve just gone from (for n apps) n different front-end servers you need to care for and feed to 0 front-end servers (it’s off-the-shelf and basically just works) and a single service server.
Holy Grail? We’ve already got one. It’s very nice.
(To paraphrase Monty Python.)
If you’ve already got a mobile client, it’s probably living on (conceptually, if not actually) a single service layer. New versions of the client use the same service layer (modulo specific changes to the contract). Multiple versions of the client coexist and can use the same service layer!