Data-Table Part 2: literate programming, testing, optimization

I’ve been trying to build the ideas of literate programming into b8r (and its predecessor) for some time now. Of course, I haven’t actually read Knuth’s book or anything. I did play with Oberon for a couple of hours once, and I tried to use Light Table, and I have had people tell me what they think Knuth meant, but what I’ve tried to implement is the thing that goes off in my head in response to the term.

One of the things I’ve grown to hate is being forced to use a specific IDE. Rather than implement my own IDE (which would (a) be a lot of work and (b) at best produce Yet Another IDE people would hate being forced to use) I’ve tried to implement stuff that complements whatever editor, or whatever, you use.

Of course, I have implemented it for bindinator, and of course that’s pretty specific and idiosyncratic itself, but it’s a lot more real-world and less specific or idiosyncratic than — say — forcing folks to use a programming language no-one uses on an operating system no-one uses.

Anyway this weekend’s exploit consisted of writing tests for my debounce, throttle, and — new — throttleAndDebounce utility functions. The first two functions have been around for a long time and used extensively. I’m pretty sure they worked — but the problem is the failure modes of such functions are pretty subtle.

Since all that they really do is make sure that if you call some function a lot of times it will not get called ALL of those times, and — in the case of debounce — that the function will be called at least once after the last time you call it, I thought it would be nice to make sure that they actually worked exactly as expected.

Spoiler alert: the functions did work, but writing solid tests around them was surprisingly gnarly.

The reason I suddenly decided to do this was that I had been trying to optimize my data-table (which is now virtual), and also show off how awesome it is, by adding a benchmark for it to my performance tools. This is a small set of benchmarks I use to check for performance regressions. In a nutshell it compares b8r’s ability to handle largish tables with vanilla js (the latter also cheating, e.g. by knowing exactly which DOM element to manipulate), the goal being for b8r to come as close to matching vanilla js in performance as possible.

I should note that these timings include the time spent building the data source, which turns out to be a mistake for the larger row counts.

So, on my laptop:

  • render table with 10k rows using vanilla js — ~900ms
  • render table with 10k rows using b8r — ~1300ms
  • render table with 10k rows using b8r’s data-table — ~40ms

Note: this is the latest code. On Friday it was a fair bit worse — maybe ~100ms.

Anyway, this is pretty damn cool. (What’s the magic? b8r is only rendering the visible rows of the table. It’s how Mac and iOS applications render tables. The lovely thing is that it “just works” — it even handles momentum scrolling — like iOS.)

Virtual javascript tables are nothing new. Indeed, I’ve had two implementations in b8r’s component examples for three years. But data-table is simple, powerful, and flexible in use.

So I added a 100k row button. Still pretty fast.

A little profiling revealed that it was wasting quite a bit of time on recomputing the slice and type-checking (yes, the b8r benchmarks include dynamically type-checking all of the rows in the table against a heterogeneous array type where the matching type is the second type checked). So, I optimized the list computations and had the dynamic type-checking only check a subsample of arrays > 110 rows.

  • render table with 100k rows using b8r’s data-table — ~100ms

So I added a 1M row button. Not so fast. (Worse, scrolling was pretty bad.)

That was Thursday. And so in my spare time I’ve been hunting down the speed problems because (after all) displaying 40 rows of a 1M row table shouldn’t be that slow, should it?

Turns out that slicing a really big array can be a little slow (>1/60 of a second), and the scroll event that triggers it gets triggered every screen refresh. Ouchy. It seemed to me that throttling the array slices would solve this problem, but throttle was originally written to prevent a function from being called again within some interval of the last call starting (versus finishing). So I modified throttle to delay the next call based on when the function finishes executing.

Also, you don’t want to use throttle, because then the last call to the throttled function (the array slice) won’t be called) which could leave you with a missing array row on the screen. You also don’t want to use debounce because then then the list won’t update until it stops scrolling, which is worse. What you want is a function that is throttled AND debounced. (And composing the two does not work — either way you end up debounced.)

So I needed to write throttleAndDebounce and convince myself it worked. I also wanted to convince myself that the new subtly different throttle worked as expected. And since the former was going to be used in data-table, which I think is going to be a very important component, I really want it to be sure it’s solid.

  • render table with 1M rows using b8r’s data-table — ~1100ms

Incidentally, ~600ms of that is rendering the list, the rest is just building the array.

At this point, scrolling is smooth but the row refreshes are not (i.e. the list scrolls nicely but the visible row rendering can fall behind, and live filtering the array is acceptably fast). Not shabby but perhaps meriting further work.

Profiling now shows the last remaining bottleneck I have any control over is buildIdPathValueMap — this itself is a function that optimizes id-path lookup deep in the bowels of b8r. (b8r implements the concept of a data-path which lets you uniquely identify any part of a javascript object much as you would in javascript, e.g. foo.bar.baz[17].lurman, except that inside of square brackets you can put an id-path such as app.messages[uuid=0c44293e-a768-4463-9218-15d486c46291].body and b8r will look up that exact element in the array for you.

The thing is, buildIdPathValueMap is itself an optimization specifically targeting big lists that was added when I first added the benchmark. b8r was choking on big lists and it was the id-path lookups that were the culprit. So, when you use an id-path on a list, b8r builds a hash based on the id-path once so you don’t need to scan the list each time. The first time the hash fails, it gets regenerated (for a big, static list the hash only needs to be generated once. For my 1M row array, building that hash is ~300ms. Thing is, it’s regenerating that hash.

So, eventually I discover a subtle flaw in biggrid (the thing that powers the virtual slicing of rows for data-table. b8r uses id-paths to optimize list-bindings so that when a specific item in an array changes, only the exact elements bound to values in that list item get updated. And, to save developers the trouble of creating unique list keys (something ReactJS developers will know something about) b8r allows you to use _auto_ as your id-path and guarantees it will efficiently generate unique keys for bound lists.

b8r generates _auto_ keys in bindList, but bindList only generates keys for the list it sees. biggrid slices the list, exposing only visible items, which means _auto_ is missing for hidden elements. When the user scrolls, new elements are exposed which b8r then detects and treats as evidence that the lookup hash for the id-path (_auto_) is invalid. A 300ms thunk later, we have a new hash.

So, the fix (which impacts all filtered lists where the _auto_ id-path is used and the list is not initially displayed without filtering, not just biggrid) is to force _auto_ keys to be generated for the entire list during the creation of the hash. As an added bonus, since this only applies to one specific property but a common case, in the common case we can avoid using getByPath to evaluate the value of each list item at the id-path and just write item._auto_ instead.

  • render table with 1M rows using b8r’s data-table — ~800ms

And now scrolling does not trigger Chrome’s warning about scroll event handlers taking more than a few ms to complete. Scrolling is now wicked fast.

As the Wwworm Turns

Microsoft’s recent announcement that it is, in effect, abandoning the unloved and unlamented Edge browser stack in favor of Chromium is, well, both hilarious and dripping in irony.

Consider at first blush the history of the web in the barest terms:

  • 1991 — http, html, etc. invented using NeXT computers
  • 1992 — Early browsers (Mosaic, NetScape, etc.) implement and extend the standard, notably NetScape adds Javascript and tries to make frames and layers a thing. Also, the <blink> tag.
  • 1995 — Microsoft “embraces and extends” standards with Internet Explorer and eventually achieves a 95% stranglehold on the browser market.
  • 1997 — As NetScape self-destructs and Apple’s own OpenDoc-based browser “Cyberdog” fails to gain any users (mostly due to being OpenDoc-based), Apple begs Microsoft for a slightly-less-crummy version of IE5 to remain even vaguely relevant/useful in an era where most web stuff is only developed for whatever version of IE (for Windows) the web developer is using.
  • 2002 — FireFox rises from the ashes of NetScape. (It is essentially a cross-platform browser based on Camino, a similar Mac-only browser that was put together by developers frustrated by the lack of a decent Mac browser.)
  • 2003 — Stuck with an increasingly buggy and incompatible IE port, Apple develops its own browser based on KHTML after rejecting Netscape’s Gecko engine. The new browser is called “Safari”, and Apple’s customized version of KHTML is open-sourced as Webkit.
  • As a scrappy underdog, Apple starts a bunch of small PR wars to show that its browser is more standards-compliant and runs javascript faster than its peers.
  • Owing to bugginess, neglect, and all-round arrogance, gradually Microsoft loses a significant portion of market share to FireFox (and, on the Mac, Safari — which is at least as IE-compatible as the aging version of IE that runs on Macs). Google quietly funds FireFox via ad-revenue-sharing since it is in Google’s interest to break Microsoft’s strangehold on the web.
  • 2007 — Safari having slowly become more relevant to consumers as the best browser on the Mac (at least competitive with Firefox functionally and much faster and more power efficient than any competitor) is suddenly the only browser on the iPhone. Suddenly, making your stuff run on Safari matters.
  • 2008 — Google starts experimenting with making its own web browser. It looks around for the best open source web engine, rejects Gecko, and picks Webkit!
  • Flooded with ad revenue from Google, divorced from any sense of user accountability FireFox slowly becomes bloated and arrogant, developing an email client and new languages and mobile platforms rather than fixing or adding features to the only product it produces that anyone cares about. As Firefox grows bloated and Webkit improves, Google Chrome benefits as, essentially, Safari for Windows. (Especially since Apple’s official Safari for Windows is burdened with a faux-macOS-“metal”, UI and users are tricked into installing it with QuickTime.) When Google decides to turn Android from a Sidekick clone into an iPhone clone, it uses its Safari clone as the standard browser. When Android becomes a success, suddenly Webkit compatibility matters a whole lot more.
  • 2013 — Google is frustrated by Apple’s focus on end-users (versus developers). E.g. is the increase in size and power consumption justified by some kind of end-user benefit? If “no” then Apple simply won’t implement it. Since Google is trying to become the new Microsoft (“developers, developers, developers”) it forks Webkit so it can stop caring about users and just add features developers think they want at an insane pace. It also decides to completely undermine the decades-old conventions of software numbering and make new major releases at an insane pace.
  • Developers LOOOOVE Chrome (for the same reason they loved IE). It lets them reach lots of devices, it has lots of nooks and crannies, it provides functionality that lets developers outsource wasteful tasks to clients, if they whine about some bleeding edge feature Google will add it, whether or not it makes sense for anyone. Also it randomly changes APIs and adds bugs fast enough that you can earn a living by memorizing trivia (like the good old days of AUTOEXEC.BAT) allowing a whole new generation of mediocrities to find gainful employment. Chrome also overtakes Firefox as having the best debug tools (in large part because Firefox engages in a two year masturbatory rewrite of all its debugging tools which succeeds mainly in confusing developers and driving them away).
  • 2018 — Microsoft, having seen itself slide from utter domination (IE6) to laughingstock (IE11/Edge), does the thing-that-has-been-obvious-for-five-years and decides to embrace and extend Google’s Webkit fork (aptly named “Blink”).

b8r 1.0

I decided to make the current revision of b8r “1.0” (it’s still marked “prerelease”) based on the fact it’s proven solid and usable during a year of constant use and improvement. It has been at least as robust and easy to work with as the jQuery-dependent library we developed at USPTO. I’ve just updated bindinator.com and my galaxy demo.

Recently, I made the first deliberately breaking changes and the difficulty of adapting various codebases that use b8r was pretty minor. So, I’m pretty confident that b8r is in good shape.

b8r's "fiddle" component in action
b8r’s “fiddle” component in action — b8r’s fiddle.component.js currently weighs in at 272 loc including markup, css, and comments.

I also improved the appearance of the inline fiddle and test components, and added prism.js code-rendering to all the various inline code examples to make the documentation pages look snazzier. A nice change to the test library makes sure that async test results are consistently ordered, and added a visible “pending” state so you can see tests that somehow failed to complete.

b8r has some pretty nice stuff. (Although much of this nice stuff needs to be documented.) You can download b8r, put nwjs in your Applications directory and/or npm install electron and double-click a .command file to see the b8r documentation inside a desktop app. Or you can install nodejs and double-click a .command file and serve it locally via https (I also provide a .command file will generate local ssl keys). (The .command stuff is currently Mac only, for which I apologize. I imagine it would be very easy to do it for Linux and Windows, but I haven’t tried.)

"electron-file-browser" component (running inside nwjs)
“electron-file-browser” component (running inside nwjs)

There’s a cute feature if you load the b8r documentation in nodejs or electron and command-click on a component — the component is loaded in a new window. I’m planning on leveraging this functionality to let the documentation app function as an IDE.

I’m currently working on convenience methods for multi-window desktop applications (it would be super cool if you could transparently bind across windows and browser tabs). I’ve also written a new version of foldermark that uses a very simple PHP back end (nodejs servers are still a much bigger pain to deal with than PHP) combined with b8r on the client-side.

The biggest shortcoming of b8r remains the fact that my team is the only group really using it. Because we’re developing a desktop app using Electron, we aren’t constantly testing on Edge, Firefox, Safari, etc.. I know for a fact that it has problems in IE and Edge, and that some of the example components aren’t touch-friendly, and we’re definitely doing more stuff for Electron than for nwjs (nwjs is much simpler to work with, but it’s becoming increasingly irrelevant, I fear). But if you’re working in reasonably recent releases of Chrome or Chromium, b8r should be very solid.

So, that’s the way it is: b8r 1.0.

Heterogeneous Lists

b8r's demo site uses a heterogeneous list to display source files with embedded documentation, tests, and examples
b8r’s demo site uses a heterogeneous list to display source files with embedded documentation, tests, and examples

One of the things I wanted to implement in bindinator was heterogeneous lists, i.e. lists of things that aren’t all the same. Typically, this is implemented by creating homogeneous lists and then subclassing the list element, even though the individual list elements may have nothing more in common with one another than the fact that, for the moment, they’re in the same list.

This ended up being pretty easy to implement in two different ways.

The first thing I tried was a effectively an empty (as in no markup) “router” component. Instead of binding the list to a superclass component, you bind to a content-free component (code, no content) which figures out which component you really want programmatically (so it can be arbitrarily complex) and inserts one of those over itself. This is a satisfactory option because it handles both simple cases and complex cases quite nicely, and didn’t actually touch the core of bindinator.

Here’s the file-viewer code in its entirety:

<script>
    switch (data.file_type || data.url.split('.').pop()) {
        case 'md':
        case 'markdown':
            b8r.component('components/markdown-viewer').then(viewer => {
                b8r.insertComponent(viewer, component, data);
            });
            break;

        case 'text':
            b8r.component('components/text-viewer').then(viewer => {
                b8r.insertComponent(viewer, component, data);
            });
            break;

        case 'js':
            b8r.component('components/literate-js-viewer').then(viewer => {
                b8r.insertComponent(viewer, component, data);
            });
            break;
    }
</script>

(I should note that this router is not used for a list in the demo site, since the next approach turned out to meet the specific needs for the demo site.)

The example of this approach in the demo code is the file viewer (used to display markdown, source files, and so on). You pass it a file and it figures out what type of file it is from the file type and then picks the appropriate viewer component to display it with. In turn this means that a PNG viewer, say, need have nothing in common with a markdown viewer, or an SVG viewer. Or, to put it another way, we can use a standalone viewer component directly, rather than creating a special list component and mixing-in or subclassing the necessary stuff in.

You’ll note that this case is pretty trivial — we’re making a selection based directly on the file_type property, and I thought it should be necessary to use a component or write any code for such a simple case.

The second approach was that I added a toTarget called component_map that let you pick a component based on the value of a property. This maps onto a common JSON pattern where you have a heterogeneous array of elements, each of which has a common property (such as “type”). In essence, it’s a toTarget that acts like a simple switch statement, complete with allowing a default option.

The example of this in the demo app is the source code viewer which breaks up a source file into a list of different kinds of things (source code snippets, markdown fragments, tests, and demos). These in turn are rendered with appropriate components.

This is what a component_map looks like in action:

<div
  data-list="_component_.parts"
  data-bind="component_map(
    js:js-viewer|
    markdown:markdown-viewer|
    component:fiddle|
    test:test
  )=.type"
>
  <div data-component="loading"></div>
</div>

From my perspective, both of these seem like pretty clean and simple implementations of a common requirement. The only strike against component_map is obviousness, and in particular the quasi-magical difference between binding to _component_.parts vs. .type, which makes me think that while the latter is convenient to type, forcing the programmer to explicitly bind to _instance_.type might be clearer in the long run.

P.S.

Anyone know of a nice way to embed code in blog posts in WordPress? All I can find are tools for embedding hacks in wordpress.

Node-Webkit Development

I’m in the process of porting RiddleMeThis from Realbasic (er Xojo) to Node-Webkit (henceforth nw). The latest version of nw allows full menu customization, which means you can produce pretty decently behaved applications with it, and they have the advantage of launching almost instantly and being able to run the same codebase everywhere (including online and, using something like PhoneGap, on mobile). It has the potential to be cross-platform nirvana.

Now, there are IDEs for nw, but I’m not a big fan of IDEs in general, and I doubt that I’m going to be converted to them any time soon. In the meantime, I’ve been able to knock together applications using handwritten everything pretty damn fast (as fast as with Realbasic, for example, albeit with many of the usual UI issues of web applications).

Here’s my magic sauce — a single shell script that I simply put in a project directory and double-click to perform a build:

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
echo "$DIR"
cd "$DIR"
pwd
zip app.nw *.html package.json *.png *.js *.css *.map
mv app.nw ../node-webkit.app/Contents/Resources/
cp nw.icns ../node-webkit.app/Contents/Resources/
cp info.plist ../node-webkit.app/Contents/
open ../node-webkit.app

What this shell script does is set the working directory to the project directory, zip together the source files and name the archive app.nw, and then move that, the icon, and the (modified) info.plist into the right place (assuming the node-webkit app is in the parent directory), and launch it. This basically takes maybe 2s before my app is running in front of me.

info.plist is simply copied from the basic nw app, and modified so that the application name and version are what I want to appear in the about box and menubar.

This is how I make sure the standard menus are present (on Mac builds):

var gui = require('nw.gui'),
    menubar = new gui.Menu({ type: 'menubar' });
if(process.platform === 'darwin'){
    menubar.createMacBuiltin(appName);
}
gui.Window.get().menu = menubar;

And finally, to enable easy debugging:

if(dev){
    var debugItem = new gui.MenuItem({ label: 'Dev' }),
        debugMenu = new gui.Menu(),
        menuItem;
    debugItem.submenu = debugMenu;
    menubar.append(debugItem);
    menuItem = new gui.MenuItem({ label: "Open Debugger" });
    debugMenu.append(menuItem);
    menuItem.click = function(){
        gui.Window.get().showDevTools();
    }
}

So with a few code snippets you get a Mac menubar (where appropriate), a Dev menu (if dev is truthy) and a single double-click to package and build your application in the blink of an eye. You can build your application with whatever Javascript / HTML / CSS combination suits you best.

Obviously, you don’t get a real “native” UI (but the next best thing is a web UI, since people expect to deal with web applications on every platform), and this isn’t going to be a great way to deliver performant applications that do heavy lifting (e.g. image processing), but it’s very easy to orchestrate command-line tools, and of course the chrome engine offers a ridiculous amount of functionality out of the box.

I should mention that Xojo is actually quite capable of producing decent Cocoa applications these days. (It sure took a while.) But its performance is nowhere near nw in practical terms. For example, C3D Buddy is able to load thousands of PNGs in the blink of an eye; the equivalent Xojo application is torpid. Now if I were doing heavy lifting with Javascript, perhaps it would tilt Xojo’s way, but it’s hard to think of just how heavy the lifting would need to get.