HTML5 vs. Flash Video Playback

While I was playing around with my audio/video tag swapping code, it struck me how lousy my test video looked in the Flash player in full-screen mode.

Cropped zoomed video from Flash video player
Cropped zoomed video from Flash video player

And for comparison:

Cropped zoomed video from HTML5 full-screen
Cropped zoomed video from HTML5 full-screen

The original video is 480×270 (it’s a tiny file!) and it’s being zoomed to 1680×1050 in both cases. The FLVPlayback component is set to maintain aspect ratio, and the underlying Flash player is on high quality with scale set to noscale. It’s a pretty stunning difference — Flash is using cheap “nearest neighbor” scaling while HTML5 is using the more expensive (but, with GPU acceleration, effectively free) bicubic scaling. WTF?

It turns out that JWPlayer doesn’t have this problem. Why? Well it uses fullScreenTakeover to enable hardware acceleration. I’d use the same approach but it gets back to puppeting the video control skin, which I can’t figure out how to do. (The basic problem is that the video playback controls aren’t standard controls, and exactly how the FLVPlayback component works them is undocumented.)

Aside: note that Safari somehow manages to play hardware-accelerated h264 with perfect scaling in normal windows (not to mention on Windows). It doesn’t need to go fullscreen. If you think that’s because Apple uses private APIs, then check out Google Chrome, which manages the same trick (on Windows as well). And Flash has the same problem on Windows. So with Flash it’s “fullscreen mode” or “I’m sowwy but using hardware acceleration is too hard to implement”.

I’d love to just use JWPlayer, but it doesn’t support MP3, which gets me back to square one.

So I’m stuck with crappy (software) video scaling because (1) I can’t use Flash’s all-or-nothing full-screen video mode, which I can’t use because (2) Flash’s video playback controls are non-standard and how they work is not documented, and I had to go down this stupid route because (3) FLVPlayback doesn’t cope with audio files, and (4) the Sound and SoundChannel classes are completely different in architecture from FLVPlayback. In short, Flash sucks.

How to use HTML5 Video and Audio Tags Everywhere

My Flash Media Player in Action
My Flash Media Player in Action

A few weeks ago I started down the road of implementing some JavaScript that would scrap the html5 video and audio tags from a page and swap in Flash embeds on an as-needed basis. This would allow simple, clean HTML5 web pages to Just Work where the browser was capable, and Just Work (via Flash) otherwise.

Oddly enough, while I got all the video stuff done in an hour or two, with a few more hours of tweaking, audio proved to be surprisingly intractable.

To begin with, the audio tag itself is not terribly well implemented. E.g. the audio tag is supposed to support playlists, but as far as I can tell it isn’t implemented on any browser. So, scratch that functionality (for now, at least).

Next, unlike in QuickTime, audio doesn’t work like video with a “zero-size” visual component. So, you can’t dimension the audio tag’s control UI in any browser by setting the width. (In my ideal world, the audio, video, and img tags would be interchangeable and Just Work. But then I’ve been spoiled by QuickTime which has taken this view of media since version 1.0.)

Third, Flash, also unlike QuickTime, uses completely different and incompatible APIs to handle audio and video. (In Flash, as far as I can tell, external MP3s can only be played back in code using the Sound and SoundChannel APIs which lack rudimentary functionality, such as telling you whether or not something is actually playing.)

(It’s pretty depressing how all the folks who came after QuickTime and basically copied it did such a piss poor job.)

With Flash it gets quite a bit worse. To begin with the FLVPlayback component is designed to work best with special controls that work in some bizarre and undocumented way. My initial thought was “why invent the wheel?” I built a bog standard FLVPlayback wrapper and then tried to write glue code that would let the FLVPlayback controls work with the Sound and SoundChannel components (which are just totally annoying — but that’s a separate piece). Big mistake — I even tried paring the problem I was having down to an absolute minimum and posting it to Stack Overflow with no luck. Oh well.

Every time I’ve built a video player in Flash before I’ve ended up rolling my own controls from scratch. Now I knew why! So back to the drawing board.  I designed a very minimal UI in Illustrator and then quickly got it working in Flash and voila.

Now, as a side note, let me just rant about how stupid the audio side of Flash is. With video you get the FLVPlayback component (actually two different ones) with a huge bunch of very easy-to-use (if ugly) controls that you can customize to any extent (you can wire just the controls you need, or simply pick from a huge array of prebuilt control panels and it all Just Works). At API level there are oodles of obvious convenience functions which will tell you where the playhead is, whether the video is playing, what its duration is, and so on. Exactly what you’d want and expect.

With sound there’s a Sound component and a SoundChannel component. To play a sound you create a SoundChannel (something like var s:SoundChannel = MySound.play( fromStartingPosition ) ). To stop a Sound, or even pause it, you stop the SoundChannel then destroy it. (And keep as much state as you’d like to track yourself, such as the playhead position, because it’s GONE). There are precious few useful methods or properties (e.g. no way of actually telling if a sound is playing other than seeing if the SoundChannel still exists). The whole thing looks like an ugly afterthought that hasn’t received any love for years. (And I figured out how all this works by looking at the top-ranked open source Flash MP3 player around, so smarter people than I haven’t found any better option.)

Anyhoo, I’ve managed to create a reasonably elegant solution that Just Works for video and audio (and the edge case where you accidentally embed an mp3 in a video tag will also work, maybe). In essence, you can play H264 video and MP3s pretty much anywhere with this code, and all you have to do is use video and audio tags and link in the javascript file.

It’s not quite perfectly cleaned up yet, but it’s pretty nice. I need to do a bit of extra work to have the Flash audio player figure out when a track has actually ended, and (for my own purposes) I’ll need to extend the Flash player’s API and then write an abstraction layer for handling both HTML5 and Flash media transparently, and then I’ll implement track-list support on top of that (in essence, I need to do all this to meet functional requirements for Acumen).

Post Script

It’s amazing just how bad Flash is at its bread-and-butter functionality. If you look very closely at the screen grab at the top you’ll notice that there’s a pixel missing from the bottom-right of the progress rectangle. My immediate assumption was that I had either (a) inadvertently deleted one pixel of the outline somehow (it was an actual Flash “stroke”), or (b) there was a slight error in the dimensioning or positioning of the object (e.g. it wasn’t precisely located at 0,0 or its dimensions weren’t quite right.

I’ve been constantly amazed that a program chiefly aimed at and used by artists has problems rendering bitmaps correctly (until recently, any alpha-channelled bitmap was virtually guaranteed to be displayed incorrectly in Flash — it was such a common issue that artists would routinely add extra rows of pixels to their images in an attempt to forestall problems). It seems that Flash has now extended this to vector art. It turns out my rectangle was perfect in all respects, showed up correctly in the editing view, but mysteriously lost a pixel when “built”. So I deleted the rectangle and replaced the outline with a solid rectangle with another rectangle “subtracted from it”. And now it works.

Badly written Flash runs faster than even worse written HTML5

News at 11.

John Nack links to what Chris Black claims shows HTML5 being massively outperformed by Flash on mobile devices. Here’s the HTML5 link itself.

It’s a shame the JavaScript is so awful, e.g. here’s the refresh code (the comments are mine):

ctx.clearRect(0, 0, 500, 600) // erase the entire background, omit the semicolon because you can!
ctx.fillStyle = ‘rgb(255,255,255)’; // set the background color to white
ctx.fillRect (0, 0, 500, 500); // erase it all again to be sure
ctx.fillStyle = ‘rgb(0,0,0)’;
ctx.beginPath();
ctx.arc(x, y, 20, 0, Math.PI * 2, true);
ctx.fill();
ctx.fillStyle = ‘rgb(128,255,128)’;
ctx.fillRect (0, 500-32, 500, 32); // draw the non-moving element every frame

You can be pretty sure that if you create a bouncing ball animation in Flash (and to do so you need not write a single line of code, which is definitely an advantage for Flash right now), Flash will not be so stupid as to erase the entire background twice each frame. In fact, Flash is highly optimized to draw as little as possible each frame (in fact the SWF format also supports encoding deltas in the animation frames themselves, if I recall correctly, which means the runtime doesn’t even need to figure out how to minimally update the screen).

Anyway, I changed the code to (a) not erase the background even once each frame, and (b) only erase a reasonably minimal area around the previously rendered “ball” and voila, massive framerate increase:
ctx.fillStyle = ‘rgb(255,255,255)’;
ctx.fillRect (lastball.x – 21, lastball.y – 21, 42, 42);
ctx.fillStyle = ‘rgb(0,0,0)’;
ctx.beginPath();
ctx.arc(x, y, 20, 0, Math.PI * 2, true);
lastball.x = x; // lastball is a global variable (eeew) = {x:0,y:0}
lastball.y = y;
ctx.fill();
ctx.fillStyle = ‘rgb(128,255,128)’;
ctx.fillRect (0, 500-32, 500, 32);

So, vaguely apples-to-apples comparison, at least without zooming, HTML5 — when animating circles and rectangles — actually slightly outperforms Flash (based on my 150% framerate increase from a trivial optimization). It’s clear HTML5 is horribly unoptimized beyond this — e.g. zooming in* slows down the framerate significantly, even when the animation itself is offscreen (seriously, get an intern on that stat, because not drawing stuff which is off-screen is not a rocket science optimization).

Note: * on an iOS device. It seems to work just fine on the desktop.

As an aside: the way you’d optimize this properly in practice would be to clip all the rendering to an update region based on what’s moving around. (This would also optimize the redrawing of the static element.) If I were writing the backend of a tool that provided artists with the ability to create HTML5 animations, this is exactly how I would mechanically optimize the output at first pass. The way Flash optimizes (at least, iirc, by analyzing the vectors at “compile” time and optimizing deltas) is even more sophisticated and has an even greater upside. So the real question to my mind is, why does Flash perform so badly?

Further aside: I’ve put a slightly more tuned version here. It uses a simple and more efficient method for calculating fps, removes the fps cap (of ~60fps), and reduces the amount of the green rectangle redrawn every frame. Incidentally, one thing I am consistently seeing on my iPhone 4 is that the frame rate appears to be capped (battery conservation, perhaps?).

I can get well over 75fps zoomed in on the animation on my iPad, but the iPhone won’t get above around 20 fps. Similarly, both versions run around 100fps on my Macbook Pro, indicating that SetInterval (or something) seems to be capped in Safari.

Final Thoughts

You might make the argument that “well, this is hand-tuned HTML5 code vs. a simple, unoptimized Flash animation”. But aside from the fact that I haven’t really tuned this code (and, arguably, the original test oddly favored Flash by featuring a large rect with nothing going on in it), the optimizations I did are nowhere near as clever as the obvious optimizations a decent tool that outputs canvas drawing commands would do. And in fact they’re not nearly as clever as what Flash is already doing (which is kind of sad, really).

On top of this, the mobile Webkit HTML5 engine clearly has a huge amount of headroom in terms of optimization; Flash is a very mature product, and version 10.1 was brought to market specifically to address performance concerns; if it has a lot of “low hanging fruit” in terms of optimization, I’d be very surprised.