The Graph with a Trillion Edges

After looking at this thread on Hacker News about this paper from a bunch of Facebook researchers, and this excellent article by Frank McSherry, I tried to understand McSherry’s approach (since I found the paper kind of impenetrable) and this is my take on it.

The Basic Problem

Let’s suppose you’re Facebook and you have about 1B members who have an arbitrary number of links to each other. How do you scale and parallelize queries (especially iterated queries) on the data?

McSherry basically implements a single thread algorithm on his laptop using brute force. (He uses a publicly available database of 128B edges.) As I understand it, his approach is pretty simple:

  • Encode each connection as a 64 bit number (think connection from A (32 bits) to B (32 bits).
  • Store the numbers in RAM as a sequence of variable-length integer encoded differences. E.g. a set of numbers beginning with 4, 100, 2003, 2005,… would be encoded as 4, 96, 1903, 2,… Since the average distance between 128B values scattered among 1T is 8, the expected distance between connections (64-bit values) will be around 9, which we can encode as ~4 bits of data (using variable-length encoding) instead of 64 bits, and storage needed is proportional to number of values stored [1]).
  • Look for a connection from A to anyone by running through the list and looking for values starting with A’s 32 bits. (You can make this search arbitrarily efficient — trading memory requirements for performance — by partitioning the list into a lookup table.)
  • Store the numbers on disk as deltas between the numbers encoded as a Hilbert Curve [2] using variable-length integer encoding for efficiency. (The naive encoding has an average of ~5bits between values; the optimized encoding was 2.

[1] Each entry will be log(n) bits in size, there’ll be c of them, and the average difference (to be encoded) will be log(c/n) bits. If we assume c scales as n^2 (highly dubious — it means if the world’s population increased by a factor of 2, I’d have twice as many friends) then we get n^2 * log(n) for storage. If we assume c scales as n log(n) (probably still generous) then we get n log(n) * log(log(n)). At any given scale that function looks pretty linear, although it is actually faster than linear — the gradient hangs around 5 for n between 1B and 100B — I don’t think it’s a cause for concern.

[2] A simple way of looking at Hilbert Curve encoding is that it treats a binary number as a series of 2-bit numbers, with each pair of bits selecting a sector of a 2×2 square (in a somewhat non-obvious way). So all numbers starting with a given pair of bits are found in the same sector. Then, look at the next two bits and subdivide that sector in a similarly non-obvious way until you run out of bits. (In an earlier post, McSherry explains Hilbert Curve implementation a graphically.) Incidentally, simply interleaving the bits of the two numbers has much the same effect.

That’s it in a nutshell. Obviously there are plenty of simple ways to optimize the lookups.

He points out that the database he used comes in 700 files. These could each be processed by 700 different computers (stored slightly less efficiently, because the average distance between deltas goes up slightly) and any request could easily be split among them.

The interesting thing is that running in one thread on a laptop, this approach runs faster and scales better than the 128 core systems he benchmarks against.

Optimizing

So let’s suppose I wanted to emulate Facebook and have access to 256 virtual machines. (I could easily afford to do this, as a startup, using AWS or similar.) How would I do this in practice? Remember that Facebook also has to deal with users creating new connections all the time.

First of all (kind of obviously) every connection gets stored twice (i.e. connections are two-way). We need this for worst-case scenarios.

Let’s suppose I number each of my virtual machines with an 8-bit value. When I store a connection between A and B I encode the connection twice (As AB and BA) and take the last 8 bits of one value and record them as a connection on machines with the corresponding number. Each machine stores the value in its representation of its link database.

How would this work in practice?

Well, each value being stored is effectively 56-bits (8 bits are pulled off the 64-bit value to pick the machine are and thus the same). Let’s divide that into 32-bits and 24-bits and store (in some efficient manner) 2^24 lists of 32-bit numbers, again stored in some efficient manner (we can break down and use McSherry’s approach at this point — effectively we’re progressively using McSherry’s approach on different portions of the numbers anyway).

So any lookup will entail grabbing 24-bits of the remaining 56-bits leaving us with 1/2^32 of the original data to search. Our efficient encoding algorithm would be to split the remaining data using the same approach among the cores, but I’m sure you get the idea by now.

So in a nutshell:

To record the existence of the edge AB:

  1. Encode as 2 64-bit values.
  2. Reduce to two 56-bit values and 2 8-bit values.
  3. Ask the 2 (or occasionally 1) machines designated for the 2 8-bit values.
  4. Each machine reduces the 56-bit value to a 32-bit value with a 24-bit lookup.
  5. Then inserts the remaining 32-bit value in a list of ~E/2^32 values (where E is the total number of edges, so 232 values for a 1T vertex DB)

Inserting a value in a list of n values is a solved O(log(n)) problem. Note that all connections Ax will be stored on one machine, but connections xA will be scattered (or vice versa) because we’re storing two-way connections. (Important for the worst case, see below.)

To find all edges from A:

  1. Grab 8-bits using the same algorithm used for storing AB.
  2. Ask the designated machine for the list of ~E/2^32 connections from A.

So to sum up — storing a connection AB involves bothering two machines. Determining if A’s connections involves bothering 1 machine in most cases, and (see below) all the machines in rare cases. However, the rare cases should never actually exist.

Worst Case Response

One problem here is that the worst case response could be huge. If A has 10M outgoing edges then one machine is going to have to cough up one giant database. (If one node has more connections than can easily be stored on one machine then we can deal with that by getting our machine names from the entropic bits of both user ids, but let’s ignore that for now.)

In sufficiently bad cases we reverse the lookup. And reverse lookups will never be bad! If we hit the machine containing all of Ax for all connections of the form Ax — there are too many to deal with, so the machine tells us to reverse the lookup, and we ask all our machines for connections xA, and we can reasonably expect those to be evenly distributed (if 256 bots sign up for and follow A simultaneously, there will be 256 new entries of the form Ax on one machine, but only one connection of the form xA on each machine).

So, the worst case performance of this design comes down to the cost of transmitting the results (which we can evenly distribute across our machines) — you can’t really do better than that, and the key step is treating each machine at having responsibility for one square in a grid picked using Hilbert Curves.

Note that the worst case isn’t terribly bad if we just want to count connections, so interrogating the database to find out how many outgoing links A has is practical.

Anyway, that never happens…

One can assume in the case of most social networks that while there will often be worst cases of the form A follows B (i.e. B has 10M followers) there will never be cases where A follows 10M users (indeed anyone like that could be banned for abuse long, long before that point is reached).

It follows that when someone posts something (on Twitter say) it only gets pulled by followers (it doesn’t get pushed to them). A posts something — solved problem, just update one list. B asks for update — we need to check B’s followers — solved problem. So we don’t even need to store two-way connections or worry about reverse lookups.

Incidentally…

This is apparently very similar to PageRank (Google’s algorithm for evaluating websites), which I’ve never bothered to try to understand.

Consider that each edge represents A links to B, and we rank a page by how many incoming links it has and we throw out pages with too many outgoing links (i.e. link farms, sitemaps). Then we iterate, weighting the value by the previously calculated values of each page. Google (et al) add to that by looking at the text in and around the link (e.g. “publicly available database of 128B edges”, above, links to a page about exactly that, so Google might infer that the linked page is “about” things in those words (or in the case of “click here” type links, look at words around the link).

So if from incoming links we infer that a page is about “scalable algorithms”, and from the quality of the pages with incoming links the quality of the page itself — and we maintain a database of connections between topics and pages — then when someone searches for “scalable algorithms”, we take the id of “scalable” and algorithms”, find pages about both, sort them by “quality”, create a web page, and festoon it with sponsored links and ads. By golly we’ve got a search engine!

Node-Webkit Development

I’m in the process of porting RiddleMeThis from Realbasic (er Xojo) to Node-Webkit (henceforth nw). The latest version of nw allows full menu customization, which means you can produce pretty decently behaved applications with it, and they have the advantage of launching almost instantly and being able to run the same codebase everywhere (including online and, using something like PhoneGap, on mobile). It has the potential to be cross-platform nirvana.

Now, there are IDEs for nw, but I’m not a big fan of IDEs in general, and I doubt that I’m going to be converted to them any time soon. In the meantime, I’ve been able to knock together applications using handwritten everything pretty damn fast (as fast as with Realbasic, for example, albeit with many of the usual UI issues of web applications).

Here’s my magic sauce — a single shell script that I simply put in a project directory and double-click to perform a build:

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
echo "$DIR"
cd "$DIR"
pwd
zip app.nw *.html package.json *.png *.js *.css *.map
mv app.nw ../node-webkit.app/Contents/Resources/
cp nw.icns ../node-webkit.app/Contents/Resources/
cp info.plist ../node-webkit.app/Contents/
open ../node-webkit.app

What this shell script does is set the working directory to the project directory, zip together the source files and name the archive app.nw, and then move that, the icon, and the (modified) info.plist into the right place (assuming the node-webkit app is in the parent directory), and launch it. This basically takes maybe 2s before my app is running in front of me.

info.plist is simply copied from the basic nw app, and modified so that the application name and version are what I want to appear in the about box and menubar.

This is how I make sure the standard menus are present (on Mac builds):

var gui = require('nw.gui'),
    menubar = new gui.Menu({ type: 'menubar' });
if(process.platform === 'darwin'){
    menubar.createMacBuiltin(appName);
}
gui.Window.get().menu = menubar;

And finally, to enable easy debugging:

if(dev){
    var debugItem = new gui.MenuItem({ label: 'Dev' }),
        debugMenu = new gui.Menu(),
        menuItem;
    debugItem.submenu = debugMenu;
    menubar.append(debugItem);
    menuItem = new gui.MenuItem({ label: "Open Debugger" });
    debugMenu.append(menuItem);
    menuItem.click = function(){
        gui.Window.get().showDevTools();
    }
}

So with a few code snippets you get a Mac menubar (where appropriate), a Dev menu (if dev is truthy) and a single double-click to package and build your application in the blink of an eye. You can build your application with whatever Javascript / HTML / CSS combination suits you best.

Obviously, you don’t get a real “native” UI (but the next best thing is a web UI, since people expect to deal with web applications on every platform), and this isn’t going to be a great way to deliver performant applications that do heavy lifting (e.g. image processing), but it’s very easy to orchestrate command-line tools, and of course the chrome engine offers a ridiculous amount of functionality out of the box.

I should mention that Xojo is actually quite capable of producing decent Cocoa applications these days. (It sure took a while.) But its performance is nowhere near nw in practical terms. For example, C3D Buddy is able to load thousands of PNGs in the blink of an eye; the equivalent Xojo application is torpid. Now if I were doing heavy lifting with Javascript, perhaps it would tilt Xojo’s way, but it’s hard to think of just how heavy the lifting would need to get.

Daikon Forge

Daikon Forge lets you edit your UI live in the scene editor and preview it in the game view.
Daikon Forge lets you edit your UI live in the scene editor and preview it in the game view.

Progress continues on Project Weasel, and last weekend I bought a license for Daikon Forge ($90 in the Unity Asset Store) after mulling over the Unity GUI situation for over two years (basically since Unity announced its new GUI system in San Francisco in 2011). Like many Unity developers, I’ve been waiting for Unity to fix its execrable GUI library for some time, and had avoided investing in nGUI, Daikon Forge, or Coherent because something better — and free — was coming out real soon now.

I reached the point where not having a functional UI library was blocking progress, so I decided I was going to buy a better UI library — the question was which one? nGUI has been the market leader, but from what I’ve seen the API is pretty terrible (and the fact that the nGUI dev is the guy building the new Unity GUI does not fill me with enthusiasm). Coherent — based on HTML5 — seems like a great idea, but according to reviews I’ve read it consumes quite a bit of CPU on mobile devices (and you’d expect it to, given that web browsers can tax mobile devices). From what I could see, Daikon Forge was the least expensive option and possibly the best (if not as widely supported as nGUI). It’s very well reviewed, including by quite a few nGUI refugees. It also struck me that the Unity GUI would probably be, in effect, an improved nGUI, so I’ll have the best of both worlds once it actually ships.

Anyway, so far I’ve found Daikon Forge is pretty nice but not without issues. But it’s worth noting that all of the weaknesses I’ve seen so far affect the developer and not the delivered product.

(Mostly) Good Stuff

I whipped up a 3d pinned information panel in less than an hour
I whipped up a 3d pinned information panel in less than an hour

To create a user interface with Daikon Forge you use a wizard to set up a “manager”. On the plus side, it works! On the minus side it’s a “wizard” — automatic fail. I’d prefer you to be able to drag a prefab into the scene, or drag a script onto a GameObject, or select a GameObject and pick a menu item. The sad thing is that there’s no reason it can’t work that way. Functionality A. Usability C.

Next, you need to assign a default skin and a default font to the UI gadget. In my opinion it should Just Work without this. (How? Easy — stick a default skin and a default font in a Resources folder and load them on the fly if needed.) This particular bit initially stymied me (the picker didn’t “see” the relevant assets), and I needed to find the relevant components manually and drag them into place. Functionality B. Usability F.

Daikon Forge makes creating and maintaining a UI atlas very easy
Daikon Forge makes creating and maintaining a UI atlas very easy
Using Sketch (this is the old version) you can whip up a UI skin in a matter of minutes
Using a tool like Sketch (this is the old version) you can whip up a UI skin in a matter of minutes

The tools for creating your own skins and fonts are almost brilliant. (Select the assets and pull a menu item — and you can easily add items to and remove items from your UI atlas any time you need to.) Functionality A. Usability B.

Next, you can right-click the UI “gizmo” in the scene editor and add controls to it directly. This also stymied me because the context menu does not work when your inspector panel is in debug mode, which mine happened to be in at the time. (Who knew?) Again, on the plus side it works, on the down side many controls come with idiotic defaults (e.g. huge or dimensionless). Functionality B. Usability C.

So I finally managed to get my Galaxy working so that I could click on a star and have a little information display pin itself to the star and show information like the name and spectral class of the star and how many planets it has orbiting it (and the display is not scaled, renders in a minimum of draw calls, and is pixel perfect, but is pinned to the projected 3d position of the star — very cool). On the downside, I found the documentation for binding values to controls to be incomprehensible, and did it all by brute force. Functionality C (so far). Usability F.

Bad Stuff

Every widget has a lot of properties, and they're not given sensible defaults
Every widget has a lot of properties, and they’re not given sensible defaults

First of all, there’s a big workflow problem. When I first imported Daikon Forge into my project it didn’t work properly (e.g. when I tried to assign a skin to one of Daikon Forge’s UI managers it wouldn’t find the relevant assets). This problem eventually “went away” (no idea why) but it was a sign of things to come.

Second, I keep my project in Dropbox and would work on it from my laptop and desktop at different times. When I switch machines, the other machine would load changed files and Just Work. But after I installed Daikon Forge this process became completely broken. Daikon Forge would become completely broken and need to be reimported whenever I switched machines. Worse, every script component assignment would become detached from its relevant script (only for Daikon Forge scripts) and have to be manually repaired. I don’t have a fix for this except to stop using DropBox, which might not be a Bad Thing.

Third, the documentation is good as a class reference, but bad as a guide/overview/cookbook. The tutorials are good but video only and somewhat lengthy. The examples are good but no substitute for actual documentation. Then there’s the website. To post on the website you need to register, and when I try to register it tells me my username or email address flags me as a spammer and that if I want to fix this I need to reply (how?) or use the contact form (where?). I tried my usual handle (podperson) and my usual email address, and my real name and my gmail address — all spammers apparently.

The WYSIWYG editing of your interface is great as far as it goes, but that’s not far. E.g. there are guides, but nothing snaps to them. There’s a grid but nothing snaps to it. There’s a tool to distribute multiple controls horizontally or vertically, but it doesn’t work properly. Worst of all, simply moving and editing things is very finicky. You can easily get into a state where the transform widget is overlapping some other control (e.g. a sizing handle) so that whenever you try to move a control you actually resize it. I ended up resorting to the keyboard for moving controls, and directly entering values for sizing them most of the time.

In a nut

Daikon Forge delivers on its promises in terms of functionality and performance. It renders user interfaces efficiently and attractively. It’s very easy to create and maintain texture atlases. And it supports scalable fonts. Furthermore, its design leverages idiomatic Unity development (e.g. you can make and reuse prefab controls in the obvious manner.) But it is far from perfect: ironically for a UI building tool, its own UI is spotty to put it charitably. Thus far I’ve found the website to be borderline useless and the documentation weak. I’m not sure if the video tutorials and examples really fill the gap (I haven’t had the time or inclination to watch all the tutorials).

C#’s Belt and Suspenders

In discussing my (fairly minor) annoyances with C# with a colleague, I finally figured out the nub of the problem. C# tries to protect you from certain bugs (overflowing bounds and buffers in particular) in so many different ways that the relevant warnings and fixes become irritations and noise, probably making code less reliable rather than more

E.g. why does assigning a double precision value to a float stop code from compiling? What’s the problem this solves?

Another example, why does moving values from unsigned to signed integer types stop compiles? The main reasons for worrying about integer overflows or getting a negative number when you expect a non-negative number is array bounds checking, but C# has array bounds checking, so why do this at all?

It seems to me that C# ought to treat all numbers with periods or exponents as double precision by default and allow assignment to float with implicit casting. This emphasizes correctness (precision) over performance by default, which is as it should be. Similarly, assigning an int to a uint should simply be allowed. (The compiler can insert code to throw an error if the value is negative if so desired.)

Now, I should say I’m using Unity’s (i.e. Mono’s) C# compiler here, so perhaps Microsoft deals with these annoyances at IDE level (e.g. by offering to automatically fix the relevant problems) in much the same way as Apple has leveraged compiler technology to provide automatic fixes, optimizations, and suggested improvements in XCode.

C# Part II

Galaxy Generated and Rendered in Unity 3D
Galaxy Generated and Rendered in Unity 3D

In the end, it took me four evenings to replicate the functionality in my Javascript prototype. I’d have to point out that I was able to do some really nice UI stuff (e.g. drag to rotate, mousewheel to zoom) in Unity that I hesitated to mess with in Javascript (the galaxy was rendered to a canvas in Javascript).

On the whole, I have some impressions.

First, C# is very strict about types, and while I can see some benefits, I think a lot of it is utterly wasted. E.g. having to constantly cast from one numeric type to another is simply a pain in the butt (I routinely expect to have to tease apart a bunch of cryptic type-related errors every time I compile a few lines of new code).

// given an array of weights, pick an index with corresponding weighted probability,
// e.g. [1,2,3] -> 1/6 chance of 0, 2/6 chance of 2, 3/6 chance of 3
public uint Pick( uint[] weights ){
    int s = 0;
    uint idx;
    foreach( uint w in weights ){
        s += (int)w;
    }
    s = Range (1, s); // returns a random int from 1 to s, inclusive
    for( idx = 0; idx < weights.Length; idx++ ){
        s -= (int)weights[idx];
        if(s <= 0){
            break;
        }
    }
    return idx;
}

And all of this rigor didn’t actually prevent or even help debug an error I ran into with overflowing a uint (I have a utility that picks weighted random numbers, but I overflowed the weights leading to an out-of-bounds error. (A simple fix is to change idx < weights.Length to idx < weights.Length – 1.) On the whole it would be nice if you could simply live in a world of “numbers” (as in Javascript) and only convert explicitly to a specific representation when doing something low level.

Second, there’s this weird restriction on naming classes within a namespace such that the namespace and class sometimes may not match and the class you want is often not in the namespace you expect. (E.g. I wanted to create the namespace LNM.PRNG and define the class PRNG in it, but this confuses the compiler, so I ended up calling the namespace LNM.Random — so code referring to this is “using LNM.Random” which mysteriously causes a class called PRNG to become available.) I don’t see why namespaces and class names can’t be the same.

Oddly enough in some cases I am allowed to name the class the same as the namespace, and I don’t know why. So LNM.Astrophysics implements the Astrophysics class, but I had to rename Badwords to BadwordFilter at some point because the compiler started complaining.

I’ve been using Monodevelop, which is an editor produced by the Mono folk and lightly customized to work with Unity. It’s a bit of a slug (if it’s developed using Mono, then it’s not a terrific advertisement for Mono). In particular, its autocomplete is great when it works, but utterly frustrating far more often. It fails to match obvious words (e.g. local variable names) and often makes it impossible to type something short (which matches something longer) on the first attempt. The autocomplete is darn useful, or I’d simply switch back to Sublime.

Flying my untextured spaceship around an undecorated, partially implemented solar system
Flying my untextured spaceship around an undecorated, partially implemented solar system

So the current reckoning is that I ended up producing the following:

  • Astrophysics.cs — defines the LNM.Astrophysics namespace and implements the Astrophysics utility class, some useful enumerations, and the Galaxy, Star, and Planet classes.
  • PRNG.cs — defines the LNM.Random namespace and implements the PRNG class which provides convenience functions for the Meisui.Random Mersenne Twister implementation I’m using.
  • Badwords.cs — defines the LNM.Badwords namespace and implements the BadwordFilter class which checks to see if any of a pretty comprehensive list of nasty words is present in a given string. (Used to prevent obscene star names from being generated.)
  • Billboard.cs — a Monobehavior that keeps a sprite facing the camera. It’s a little cute in that it tries to save CPU time by only updating a given star every 10 physics updates. There’s probably a better way.
  • FixedCamera.cs — a Monobehavior that implements a mouse/touch controlled camera that keeps a fixed view of a specified target unless explicitly moved. I’m planning on using this as my main view control throughout the game.
  • NewtonianScoutship.cs — a Monobehavior implementing an Asteroids-style player ship. I’ve also experimented with a “Delta Vee” style abstracted Newtonian ship controller but good old rotate and accelerate just feels better, and makes movement a pleasant challenge in and of itself. (It also makes becoming “stationary” in space almost impossible, which I think is a Good Thing.)
  • GalaxyGenerator.cs — a Monobehavior that instantiates a galaxy and renders it with sprites. (Right now every single star is a Draw Call, so I’m going to need to do some optimization at some point.)
  • Starsystem.cs — a Monobehavior that instantiates a Star (and its planets) and renders them, a navigation grid (to provide a sense of movement when passing through empty space) and orbits using a bunch of Prefabs.

So, while I continue to find C# quite tedious to develop with, I am making significant progress for the first time in over a year, and while C# feels much less productive than real Javascript, I do think it’s very competitive with Unityscript, and it’s been quite easy to get over the hump.