The Graph with a Trillion Edges

After looking at this thread on Hacker News about this paper from a bunch of Facebook researchers, and this excellent article by Frank McSherry, I tried to understand McSherry’s approach (since I found the paper kind of impenetrable) and this is my take on it.

The Basic Problem

Let’s suppose you’re Facebook and you have about 1B members who have an arbitrary number of links to each other. How do you scale and parallelize queries (especially iterated queries) on the data?

McSherry basically implements a single thread algorithm on his laptop using brute force. (He uses a publicly available database of 128B edges.) As I understand it, his approach is pretty simple:

  • Encode each connection as a 64 bit number (think connection from A (32 bits) to B (32 bits).
  • Store the numbers in RAM as a sequence of variable-length integer encoded differences. E.g. a set of numbers beginning with 4, 100, 2003, 2005,… would be encoded as 4, 96, 1903, 2,… Since the average distance between 128B values scattered among 1T is 8, the expected distance between connections (64-bit values) will be around 9, which we can encode as ~4 bits of data (using variable-length encoding) instead of 64 bits, and storage needed is proportional to number of values stored [1]).
  • Look for a connection from A to anyone by running through the list and looking for values starting with A’s 32 bits. (You can make this search arbitrarily efficient — trading memory requirements for performance — by partitioning the list into a lookup table.)
  • Store the numbers on disk as deltas between the numbers encoded as a Hilbert Curve [2] using variable-length integer encoding for efficiency. (The naive encoding has an average of ~5bits between values; the optimized encoding was 2.

[1] Each entry will be log(n) bits in size, there’ll be c of them, and the average difference (to be encoded) will be log(c/n) bits. If we assume c scales as n^2 (highly dubious — it means if the world’s population increased by a factor of 2, I’d have twice as many friends) then we get n^2 * log(n) for storage. If we assume c scales as n log(n) (probably still generous) then we get n log(n) * log(log(n)). At any given scale that function looks pretty linear, although it is actually faster than linear — the gradient hangs around 5 for n between 1B and 100B — I don’t think it’s a cause for concern.

[2] A simple way of looking at Hilbert Curve encoding is that it treats a binary number as a series of 2-bit numbers, with each pair of bits selecting a sector of a 2×2 square (in a somewhat non-obvious way). So all numbers starting with a given pair of bits are found in the same sector. Then, look at the next two bits and subdivide that sector in a similarly non-obvious way until you run out of bits. (In an earlier post, McSherry explains Hilbert Curve implementation a graphically.) Incidentally, simply interleaving the bits of the two numbers has much the same effect.

That’s it in a nutshell. Obviously there are plenty of simple ways to optimize the lookups.

He points out that the database he used comes in 700 files. These could each be processed by 700 different computers (stored slightly less efficiently, because the average distance between deltas goes up slightly) and any request could easily be split among them.

The interesting thing is that running in one thread on a laptop, this approach runs faster and scales better than the 128 core systems he benchmarks against.


So let’s suppose I wanted to emulate Facebook and have access to 256 virtual machines. (I could easily afford to do this, as a startup, using AWS or similar.) How would I do this in practice? Remember that Facebook also has to deal with users creating new connections all the time.

First of all (kind of obviously) every connection gets stored twice (i.e. connections are two-way). We need this for worst-case scenarios.

Let’s suppose I number each of my virtual machines with an 8-bit value. When I store a connection between A and B I encode the connection twice (As AB and BA) and take the last 8 bits of one value and record them as a connection on machines with the corresponding number. Each machine stores the value in its representation of its link database.

How would this work in practice?

Well, each value being stored is effectively 56-bits (8 bits are pulled off the 64-bit value to pick the machine are and thus the same). Let’s divide that into 32-bits and 24-bits and store (in some efficient manner) 2^24 lists of 32-bit numbers, again stored in some efficient manner (we can break down and use McSherry’s approach at this point — effectively we’re progressively using McSherry’s approach on different portions of the numbers anyway).

So any lookup will entail grabbing 24-bits of the remaining 56-bits leaving us with 1/2^32 of the original data to search. Our efficient encoding algorithm would be to split the remaining data using the same approach among the cores, but I’m sure you get the idea by now.

So in a nutshell:

To record the existence of the edge AB:

  1. Encode as 2 64-bit values.
  2. Reduce to two 56-bit values and 2 8-bit values.
  3. Ask the 2 (or occasionally 1) machines designated for the 2 8-bit values.
  4. Each machine reduces the 56-bit value to a 32-bit value with a 24-bit lookup.
  5. Then inserts the remaining 32-bit value in a list of ~E/2^32 values (where E is the total number of edges, so 232 values for a 1T vertex DB)

Inserting a value in a list of n values is a solved O(log(n)) problem. Note that all connections Ax will be stored on one machine, but connections xA will be scattered (or vice versa) because we’re storing two-way connections. (Important for the worst case, see below.)

To find all edges from A:

  1. Grab 8-bits using the same algorithm used for storing AB.
  2. Ask the designated machine for the list of ~E/2^32 connections from A.

So to sum up — storing a connection AB involves bothering two machines. Determining if A’s connections involves bothering 1 machine in most cases, and (see below) all the machines in rare cases. However, the rare cases should never actually exist.

Worst Case Response

One problem here is that the worst case response could be huge. If A has 10M outgoing edges then one machine is going to have to cough up one giant database. (If one node has more connections than can easily be stored on one machine then we can deal with that by getting our machine names from the entropic bits of both user ids, but let’s ignore that for now.)

In sufficiently bad cases we reverse the lookup. And reverse lookups will never be bad! If we hit the machine containing all of Ax for all connections of the form Ax — there are too many to deal with, so the machine tells us to reverse the lookup, and we ask all our machines for connections xA, and we can reasonably expect those to be evenly distributed (if 256 bots sign up for and follow A simultaneously, there will be 256 new entries of the form Ax on one machine, but only one connection of the form xA on each machine).

So, the worst case performance of this design comes down to the cost of transmitting the results (which we can evenly distribute across our machines) — you can’t really do better than that, and the key step is treating each machine at having responsibility for one square in a grid picked using Hilbert Curves.

Note that the worst case isn’t terribly bad if we just want to count connections, so interrogating the database to find out how many outgoing links A has is practical.

Anyway, that never happens…

One can assume in the case of most social networks that while there will often be worst cases of the form A follows B (i.e. B has 10M followers) there will never be cases where A follows 10M users (indeed anyone like that could be banned for abuse long, long before that point is reached).

It follows that when someone posts something (on Twitter say) it only gets pulled by followers (it doesn’t get pushed to them). A posts something — solved problem, just update one list. B asks for update — we need to check B’s followers — solved problem. So we don’t even need to store two-way connections or worry about reverse lookups.


This is apparently very similar to PageRank (Google’s algorithm for evaluating websites), which I’ve never bothered to try to understand.

Consider that each edge represents A links to B, and we rank a page by how many incoming links it has and we throw out pages with too many outgoing links (i.e. link farms, sitemaps). Then we iterate, weighting the value by the previously calculated values of each page. Google (et al) add to that by looking at the text in and around the link (e.g. “publicly available database of 128B edges”, above, links to a page about exactly that, so Google might infer that the linked page is “about” things in those words (or in the case of “click here” type links, look at words around the link).

So if from incoming links we infer that a page is about “scalable algorithms”, and from the quality of the pages with incoming links the quality of the page itself — and we maintain a database of connections between topics and pages — then when someone searches for “scalable algorithms”, we take the id of “scalable” and algorithms”, find pages about both, sort them by “quality”, create a web page, and festoon it with sponsored links and ads. By golly we’ve got a search engine!

The Myth of the $500 FX Sensor

Bubble defects in a silicon wafer — SEM image
Bubble defects in a silicon wafer — SEM image

Disclaimer: I am not an electrical engineer and have no special knowledge about any of this.

Some time ago Thom Hogan estimated the cost of an FX camera sensor to be around $500 (I don’t have the reference, but I’m pretty sure this is true since he said as much recently in a comment thread). Similarly, E. J. Pelker, who is an electrical engineer, estimated an FX sensor to cost around $385 based on industry standard cost and defect rates in 2006. So it seems like there’s this general acceptance of the idea that an FX sensor costs more than 10x what a DX sensor costs (Pelker estimates $34 for a Canon APS sensor, which is slightly smaller than DX, and $385 for a 5D sensor).

My assumptions can be dramatically off but the result will be the same.

E.J. Pelker

I don’t mean to be mean to Pelker. It’s a great and very useful article — I just think it’s not that the assumptions he knows he’s making are off, it’s that he’s made tacit assumptions he doesn’t realize he’s made are completely and utterly wrong.

The assumption is that if you get an 80% yield making DX sensors then you’re get a 64% (80% squared) yield from FX sensors (let’s ignore the fact that you’ll get slightly fewer than half as many possible FX sensors from a wafer owing to fitting rectangles into circles).

Here are Peltzer’s “unknown unknowns”:

Sensors are fault-tolerant, CPUs aren’t

First, Peltzer assumes that a defect destroys a sensor. In fact if all the defect is doing is messing up a sensel then the camera company doesn’t care – it finds the bad sensel during QA, stores its location in firmware, and interpolates around it when capturing the image. How do we know? They tell us they do this. Whoa — you might say — I totally notice bad pixels on my HD monitors, I would totally notice bad pixels when I pixel peep my 36MP RAW files. Nope, you wouldn’t because the camera writes interpolated data into the RAW file and unless you shoot ridiculously detailed test charts and examine the images pixel by pixel or perform statistical analysis of large numbers of images you’ll never find the interpolated pixels. In any event (per the same linked article) camera sensors acquire more bad sensels as they age, and no-one seems to mind too much.

Sensor feature sizes are huge, so most “defects” won’t affect them

Next, Peltzer also assumes industry standard defect rates. But industry standard defect rates are for things like CPUs — which usually have very small features and cannot recover from even a single defect. The problem with this assumption is that the vast majority of a camera sensor comprises sensels and wires hooking them up. Each sensel in a 24MP FX sensor is roughly 4,000nm across, and the supporting wiring is maybe 500nm across, with 500nm spacing — which is over 17x the minimum feature size for 28nm process wafers. If you look at what a defect in a silicon wafer actually is, it’s a slight smearing of a circuit usually around the process size — if your feature size is 17x the process size, the defect rate will be vanishingly close to zero. So the only defects that affect a camera sensor will either be improbably huge or (more likely) in one of the areas with delicate supporting logic (i.e. a tiny proportion of any given camera sensor). If the supporting logic is similar in size to a CPU (which it isn’t) the yield rate will be more in line with CPUs (i.e. much higher).

This eliminates the whole diminishing yield argument (in fact, counter-intuitively, yield rates should be higher for larger sensors since their feature size is bigger and the proportion of the sensor given over to supporting logic is smaller).

(Note: there’s one issue here that I should mention. Defects are three dimensional, and the thickness of features is going to be constant. This may make yields of three dimensional wafers more problematic, e.g. BSI sensors. Thom Hogan recently suggested — I don’t know if he has inside information — that Sony’s new (i.e. BSI) FX sensors are turning out to have far lower yields — and thus far higher costs — than expected.)

Bottom Line

To sum up — an FX sensor would cost no more than slightly over double a DX sensor (defect rates are the same or lower, but you can fit slightly fewer than half as many sensors onto a die owing to geometry). So if a DX sensor costs $34, an FX sensor should cost no more than $70.

Affinity Photo — No Good For Photography

Sydney Harbor by night — processed using Photos
Sydney Harbor by night — processed using Photos

This is a pretty important addition to my first impressions article.

After reading a comment thread on a photography blog it occurred to me that I had not looked particularly hard at a core feature of Affinity Photo, namely the develop (i.e. semi-non-destructive RAW-processing) phase.

I assumed Affinity Photo used Apple’s OS-level RAW-processing (which is pretty good) since just writing a good RAW importer is a major undertaking (and an ongoing commitment, as new cameras with new RAW formats are released on an almost daily basis) and concentrated my attention on its editing functionality.

(There is a downside to using Apple’s RAW processor — Apple only provides updates for new cameras for recent OS releases, so if you were using Mac OS X 10.7 (Lion) and just bought a Nikon D750 you’d be out of luck.)

In the thread, one commenter suggested Affinity Photo as a cheaper alternative to Phase One (which misses the point of Phase One entirely) to which someone had responded that Affinity Photo was terrible at RAW-processing. I wanted to check if this was simply a random hater or actually true and a quick check showed it to be not only true but horribly true.

Photos default RAW import
Photos default RAW import
Affinity Photo default RAW input
Affinity Photo default RAW input

White Balance

Acorn's RAW import dialog
Acorn’s RAW import dialog — it respects the camera’s white balance metadata and also lets you see what it is (temperature and tint).
Affinity simply ignores WB metadata by default
Affinity simply ignores WB metadata by default
Affinity with WB adjustment turned on and the settings copied from Acorn
Affinity with WB adjustment turned on and the settings copied from Acorn (note that it still doesn’t match Acorn, and I know which I trust more at this point).

Affinity Photo ignores the white balance metadata in the RAW file. If you toggle on the white balance option in develop mode you still need to find out the white balance settings (somehow) and type them in yourself.

Good cameras do a very good job of automatically setting white balance for scenes. Serious photographers will often manually set white balance after every lighting change on a shoot. Either way, you want your RAW-processing software to use this valuable information.

Noise Reduction

Top-Right Corner at 200% — Photos on the left, Affinity on the right
Top-Right Corner at 200% — Photos on the left, Affinity on the right

Affinity Photo’s RAW processing is terrible. It somehow manages to create both chrome and tonal noise even for well-exposed images shot in bright daylight — night shots at high ISO? Don’t even ask. (If you must, see the Sydney Harbor comparison, earlier.) It’s harder to say this definitively, it seems to me that it also smears detail. It’s as if whoever wrote the RAW importer in Affinity Photo doesn’t actually know how to interpolate RAW images.

Incidentally, Affinity Photo’s noise reduction filter appears to have little or no effect. An image with noise reduction maxed out using Affinity Photo is far noisier than the same image processed without noise reduction using any decent program or Apple’s RAW importer’s noise reduction.

Now, if you’re using Affinity Photo in concert with a photo management program like Lightroom, Aperture, Photos, or iPhoto — programs which do the RAW processing and simply hand over a 16-bit TIFF image — you simply won’t notice a problem with the lack of white balance support or the noise creation. But if you actually use Affinity Photo to work on RAW images (i.e. if you actually try to use it’s semi-non-destructive “develop” mode) you’re basically working with garbage.

I can only apologize to any photographers who might have bought Affinity Photo based on my earlier post. I mainly use would-be Photoshop replacements for editing CG images where RAW processing isn’t a factor, but my failure to carefully check its RAW processing is egregious.

If you want to use Affinity Photo for working on photographs I strongly recommend you wait until its RAW processing is fixed (or it simply adopts the RAW processing functionality Apple provides “for free”).

Remember when I discovered that Affinity Designer’s line styling tools simply didn’t work at all? That’s ridiculous. Well, a self-declared photo editing tool that doesn’t do a halfway decent job of RAW processing is just as ridiculous.

So, what to do?

Photos offers powerful RAW processing if you figure out how to turn it on
Photos offers powerful RAW processing if you figure out how to turn it on

Apple’s new(ish) Photos application is actually surprisingly good once you actually expose its useful features. By default it doesn’t even show a histogram, but with a few clicks you can turn it into a RAW-processing monster.

And, until Apple somehow breaks it, Aperture is still an excellent piece of software.

Acorn does a good job of using Apple’s RAW importer (it respects the camera’s metadata but allows you to override it). Unfortunately, the workflow is destructive (once you use the RAW importer if you want to second guess your import settings you need to start again from scratch).

Adobe still offers a discounted subscription for Photographers, covering Lightroom and Photoshop. It’s annoying to subscribe to software, but it may be the best and cheapest option right now (especially with Apple abandoning Aperture).

If noise reduction is your main concern, Lightroom, Aperture, Photoshop, and other generalist programs just don’t cut it. You either need a dedicated RAW processing program or a dedicated noise reduction program.

RAW Right Now speeds up RAW previews and makes QuickLook more useful
RAW Right Now speeds up RAW previews and makes QuickLook more useful

Finally, if you’re happy to use different programs for image management (I mainly use Finder with these days), RAW processing, and editing then you have a lot of pretty attractive options. FastRAWViewer is incredibly good for triaging RAW photos (its Focus Peaking feature is just wonderful). DxOMark and Phase One offer almost universally admired RAW-processing capabilities and exceptionally good built-in noise handling. Many serious photographers consider the effect of switching to either of these programs for RAW processing as important as using a better lens. Even the free software offered by camera makers usually does a very good job of RAW processing (it just tends to suck for anything else). If you don’t use Affinity Photo for RAW processing there’s not much wrong with it (but you don’t have a non-destructive workflow).

Affinity Photo — First Impressions

Affinity Photo in action

Affinity Photo has just come out of beta and is being sold for a discounted price of $40 (its regular price will be $50). As with Affinity Designer, it’s well-presented, with an attractive icon and a dark interface that is reminiscent of late model Adobe Creative Cloud and Apple Pro software. So, where does it fit in the pantheon of would-be Photoshop alternatives?

In terms of core functionality, it appears to fit in above Acorn and below Photoline. In particular, Photoline supports HDR as well as 16-bit and LAB color, while Affinity Photo lacks support for HDR editing. Unless you work with HDR (and clearly not many people do) then Affinity Designer is both less expensive than Photoline, and far more polished in terms of the features it does support.

Affinity Designer supports non-destructive import of RAW files. When you open a RAW file you enter “Develop” mode where you can perform adjustments to exposure, curves, noise, and so forth on the RAW data before it gets converted to 8- or 16-bit RGB. Once you leave Develop mode, you can return and second-guess your adjustments (on a layer-by-layer basis). This alone is worth the price of admission, and leaves Acorn, Pixelmator, and Photoline in the dust.

In essence you get the non-destructive workflow of Lightroom and the pixel-manipulation capabilities of Photoshop in a single package, with the ability to move from one to the other at any point in your workflow. Let me repeat that — you can “develop” your raw, go mess with pixels in the resulting image, then go back and second-guess your “develop” settings (while retaining your pixel-level manipulations) and so on.

This feature isn’t quite perfect. E.g. you can’t go back and second-guess a crop, and vector layer operations, such as text overlays, get reduced to a “pixel” layer if you go back to develop mode. But it’s a big step in the right direction and for a lot of purposes it’s just dandy.

This is just my first impressions, but there are some things that could be better.

Affinity Photo provides adjustment layers, live filter layers, filters, and layer effects — in many cases providing multiple versions of the same filter in different places. Aside from having functionality scattered and in arbitrary buckets, you get several different user interfaces. This is a mess, and it is a direct result of copying Photoshop’s crazy UI (accumulated over decades of accumulated functionality) rather than having a consolidated, unified approach the way Acorn does.

At first I thought Affinity Photo didn’t support layer styles, but it does. Unfortunately you can’t simply copy and paste layer styles (the way you can in Photoshop and Acorn), so the workflow is a bit more convoluted (you need to create a style from a selection and then apply it elsewhere — often you just want to copy a style from A to B without creating a reusable (or linked) style so this is a bit unfortunate).

I really like the fact that the RGB histogram gives a quick “approximate” view but shows a little warning symbol on it. When you click it, it does a per-pixel histogram (quite quickly, at least on my 24MP images).

I don’t see any support for stitching images, so if that’s important to you (and it’s certainly very important to landscape photographers) then you’ll need to stick with Adobe, or specialized plugins or software.

It also seems to lack smart resize and smart delete or Photoshop’s new motion blur removal functions. (Photoline also does smart delete and smart resize.)

Anyway, it’s a great first release, and definitely fulfills the promise of the public betas. It seems to me that it’s a more solid overall effort than Affinity Designer was when first released, and I’m probably a more demanding user of Photoshop-like programs than I am of Illustrator-like programs. I can understand the desire to provide a user interface familiar to Adobe products even at the cost of making them unnecessarily confusing and poorly organized, but I hope that sanity prevails in the long run.

Bottom line: a more complete and attractive package than either Photoline or Acorn (its most credible competitors) and better in some ways than Photoshop.

Email E. Neumann

Email E. NeumannHow stupid is email?

Actually, email is great. It’s robust, widely-supported, and highly accessible (in the 508 and economic senses of the word). The problem is email clients.


A colleague of mine and I once considered starting up a business around a new email client. The problem though, is that it works best when someone send emails using your email client to someone else using your email client. E.g. you can easily implement PGP encryption:

  • if you’ve previously exchanged email, you both have each others’ keys — snap you’re done;
  • if you haven’t, your client asks whether you want it sent insecurely or asks you for authentication information (something you know about the person that a man-in-the-middle probably doesn’t, or an out-of-band mechanism for authentication such as calling you on the phone; and then sends an email initiating a secure authentication process OR allowing them to contact you and opt to receive insecure communication; all this can happen pretty seamlessly if the recipient is using your email client — they get asked the question and if they answer correctly keys get sent).

It’s relatively easy to create a secure encryption system if you (a) opt out of email, and (b) have a trusted middleman (e.g. if both parties trust a specific website and https then you’re done — even a simple forum will work). But then you lose the universality of email, which is kind of important.

The obvious goal was to create a transparently secure email client. The benefits are huge — e.g. spam can be dealt with more easily (even insecure email can be sent with authentication) and then you can add all the low-hanging fruit. But it’s the low-hanging fruit I really care about. After all, I figure if the NSA can hack my storage device’s firmware, my network card’s firmware, and subvert https, encryption standards, and TOR — and that’s just stuff we know about — the only paths to true security are anonymity (think of it as “personal steganography”) or extreme paranoia. When dealing with anyone other than the NSA, Google, China, Iran, etc. you can probably use ordinary caution.

Well, how come Windows Mail / Outlook and Apple Mail don’t do exactly what I’ve just said and automatically handshake, exchange keys and authentication questions, and make email between their own email clients secure? If it’s that easy (and really, it is that easy) why the hell? Oddly enough, Apple has done exactly this (using a semi-trusted middleman — itself) with Messages. Why not Mail?

OK, set all that aside.


  • Why can’t I conveniently send a new message the way I send a reply (i.e. “Reply with new subject and empty body” or “Reply all with new subject and empty body”)? When using an email client most people probably use Reply / Reply All most, then create new message and copy/paste email addresses from some other message second, and create a new message and type in the email address or use some kind of autocomplete last. Furthermore, many replies are actually intended to be new emails to the sender or sender and recipients. Yet no email client I know of supports the second — very frequent usage.
  • Why does my email client start me in the subject line? Here’s an idea: when you create a new email you start in the body. As you type the body the email client infers the subject from what you type (let’s say using the first sentence if it’s short, or the first clause with an ellipsis if that works, or a reasonable chunk of it with an ellipsis otherwise).
  • Why does my OS treat email, IMs, and SMSs as completely separate things? Studies show grown-ups use email and hardly SMS. Younger people use SMS and hardly use email. Both probably need to communicate with each other, and both are generally sending short messages to a person, not a phone number or an email address.
  • (While I’m at it, why does an iPhone treat email and IMs as different buckets? How come they had the nous to merge IMs and SMSs, and even allow semi-transparent switching between secure and free iMessages and less secure and not-necessarily-free SMSs based on whether the recipient was using an Apple device or not? I don’t ask why Android (or heaven forfend Windows) does this because (a) Android generally hasn’t even integrated mailboxes, and (b) don’t expect real UI innovation from Google; they can imitate, but when they originate it tends to be awful — aside from Google’s home page which remains one of the most brilliant UI decisions in history.
  • Oh yeah, and voicemail.


Now imagine a Contacts app that did all this stuff. I’d suggest it needs to be built into email because email is the richest of these things in terms of complexity and functionality, but let’s call it Contact. Consider the nirvana it would lead to:

  • Instantly, four icons on your iPhone merge into one (Mail, Phone, Messages, Contacts (the existence of the last has always bothered me, now it would make sense). Three of those are likely on your home screen; now you have more space.
  • You no longer have to check for messages in four different places (e.g. if you have a voicemail system that emails you transcripts of voicemails, you can mark them both as read in one place, or possibly even have them linked automatically.)
  • Similarly, when you reply to a given message, you can decide how to do so. (Is it urgent? Are they online? Is it the middle of the night? What is your preferred method of communicating with this person?) Maybe even multiple linked channels.
  • Message threads can cross message domains (imagine if you reply to an email with a phone call and Contacts knows this and attaches the record of the call to the thread containing the emails, SMSs, iMessages, voicemails, and so on). Some of this would require cleverness (e.g. Apple owns iMessages, so it could do things like add subject threads to messages on the side, but SMSs are severely constrained and would lose their thread context).
  • Oh, and you can use the same transparent encryption implementation across whichever bands make sense.
  • Obviously some of these things won’t work with some message channels e.g. you can’t do much with SMS because the messages aren’t big enough, but MMS, which is what most of us are using, works fine, similarly Visual Voicemail could support metadata but doing it with legacy voicemail systems isn’t going to happen.

Consider for a moment how much rocket science was involved in getting Continuity to work on iOS and OS X devices. To begin with it requires hardware that isn’t on older Macs and iOS devices. And what it does is pretty magical — I am working on a Keynote presentation, walk over to my Mac and automagically I am working on the same document in the same place. But really, how useful is this really and how often? Usually when I switch devices I am also switching tasks. Maybe that’s because I grew up in a world without Continuity.

Now consider how this stuff would require almost no rocket science and how often it would be useful.