How to use HTML5 <video> tags everywhere and have them Just Work™

Chrome's HTML5 video player interface
Chrome's HTML5 video player interface (just a screenshot, it doesn't actually work)

I’ve been coding raw html since 1994 (when I learned just enough to create a home page for Prince of Destruction — check out the all the <BR>s). I can remember all kinds of random CSS properties and half the jQuery API, but when I need to embed video in a web page I still need to google a bunch of references to get anything that works halfway decently.

Until Google pulled the rug out from under H264 there was some hope that embedding video inside a web page might soon become as simple as embedding images has been since the days of NCSA Mosaic. It’s hard to imagine a non-trivial piece of code simpler than a basic image tag. (If you have any doubts as to whether Google really wants the video tag to succeed, take a look up at how horrible the video player UI in Chrome still is as of version 5. I might add that it can hardly play video without skipping and flickering.)

Whether or not Chrome (and Android) support H264, we’re going to be stuck with a ton of legacy users (Firefox, IE6/7) for a long time, and the only viable solution if we want to encode video in as few formats as possible is Flash.

Well, Flash supports H264 and Webkit supports H264, so what I want is to:

  • encode my video once, in some flavor of H264 that “just works” — ideally I want to be able to open a movie in QuickTime, Save As… and I’m done.
  • embed my video using the simplest, easiest-to-remember html possible, i.e. <video src="foo.m4v" width="800" height="450" controls="true">
  • and have my video play back everywhere

So, here’s my solution. Use HTML5 video tags, and use a tiny bit of JavaScript to automagically convert the video tag into a Flash embed where necessary using the information already in the web page (which can be found in the video tag’s attributes).

I’ve built a simple prototype here. It uses video and h264 detection code from Dive into HTML5 (which means that for now Chrome plays back the video natively) and jQuery for cross-browser DOM manipulation, but hand-coding the relevant bits wouldn’t be that difficult if I wanted to make the code self-contained.

In essence, all I need is: $( function(){ $('video').replaceWith( flash_embed_crap ) } );

Most of the code is simply regurgitating boilerplate html for Flash embedding. (Incidentally, the code will have issues in IE7 thanks to the “Eolas bug”, but if I were to put the script in an external JavaScript file that issue would disappear. Any weird IE behavior should be gone.)

The Flash video player I’m using is very simple — it pretty much uses Flash’s built-in video components and has some simple glue code allowing the embedding web page to talk to it once it’s embedded (not that I’m taking advantage of this here).

It works.

P.S. As per the comments in my source code, I plan to extend this code to provide a number of useful convenience functions and support <audio> tags the same way, and finally to provide a simple but robust open source Flash media player. Aside from anything else, this will help me implement cross-platform media playback in Acumen.

P.P.S. My initial attempts to do for <audio> tags what I’ve just done for <video> tags have not met with much success.

My big problem is that Flash’s FLVPlayer component doesn’t seem to want to play MP3s for me. I’m not sure what’s going on there; it may be some odd combination of Flash security settings hosing locally hosted playback and my ISP’s server configuration hosing remote playback, or FLVPlayer may just be fragile. I’d like the audio and video player UIs to be as identical as possible and for the player to be as simple as possible — having two completely different code paths for audio and video seems like hard work (mostly from a UI point of view).

So the upshot is I can’t change this article’s title to include <audio> tags (yet). Sigh.

Google and the <video> tag

Though H.264 plays an important role in video, as our goal is to enable open innovation, support for the codec will be removed and our resources directed towards completely open codec technologies.

From HTML Video Codec Support in Chrome

Well that sucks.

Gruber asks a few “simple questions” here. Aside from the question of hypocrisy w.r.t. Flash bundling, I think his points are more than neutralized by these ten questions:

You are a proponent of Apple using its influence to diminish the importance of Flash for the web. Yet, when Google makes similar moves to rid the web of a similarly closed and patented, albeit different type of technology, you do not support them. Why is Apple promoting an open web a good thing, but Google promoting an open web a bad thing?

I think that if Google pulled Flash support from Chrome there would be no question that Google were on the side of the angels (although it would still be a dumb thing to do), but since there’s no hint of this it seems purely like a cynical move to hurt Apple’s anti-Flash campaign which will damage HTML5 <video> adoption. I think you can make the argument that HTML5 <video> adoption with H264 as the defacto standard codec is a Bad Thing.

Anyway it’s a bigger mess now than it was before Google decided to do this. Ultimately it will come down to “what will let the most people see the most porn using the most devices?”

Postscript: “standards”

One of the arguments made in favor of WebM/VP8 is that it can be part of the W3C standard, unlike H264, because it’s not encumbered by license fees. The problem here is that WebM/VP8 almost certainly is encumbered (as was GIF in earlier days), it just hasn’t been sued yet because no-one uses it. But this is beside the point — the CSS font-family property supports any font, and almost all the fonts that anyone cares about are encumbered (i.e. subject to royalties, copyright, and so on). Just as CSS font-family can specify a non-free non-open-source font, there’s no reason why a video tag can’t point to an arbitrarily encoded video.

To put it another way:

There’s no conflict between the HTML specification being open and royalty-free and H264 video playback being supported in HTML5 video tags as long as the codec doesn’t need to be implemented by the browser. Just as a slab of text with font-family “Verdana” won’t necessarily display on every browser correctly (if the font is not installed) it would follow that not every video will play back in every browser.

As a practical matter, it would be nice if serving a page with video were as simple an affair as possible. E.g. figuring out which video to serve didn’t involve sniffing the browser, operating system, and so forth; better yet, if one video format worked everywhere. As a practical matter right now H264 is the best candidate. VP8/WebM will never be the best candidate because by the time there’s a critical mass of hardware support out there it will be obsolete. This is a stupid, stupid fight.

And yet one more thing:

It’s interesting that the companies still in favor of h264 (Apple, Microsoft) are precisely those companies who do not implement the codec in the browser. Apple and Microsoft both implement h264 as a plugin architecture at OS level rather than a plugin at browser level (a much worse thing — see this excellent piece that daringfireball brought to my attention).

Is a it a font or an image file format?

The flipside of my argument that H264 should be considered analogous to a font is that, generally speaking, text is still legible when presented in the wrong font. By that argument H264 is more like an image format (JPG, PNG, etc.). If we accept this argument — which I’d say is the most h264-hostile stance (within reason) to take with respect to video codecs — then consider that most browsers simply let you display pretty much any image that’s convenient inside an <img> tag (sometimes badly, as per Internet Explorer’s notorious mishandling of PNG files over the years), generally by using the underlying OS’s APIs for handling images, which is exactly what I’d suggest the idealistic and pragmatic approach for video ought to be.

Would it be great if there were one codec out there that worked everywhere that web developers could target? Sure. But that doesn’t mean not supporting video codecs that happen to be around anyway, just as I can click on a PSD or TIFF in Safari and see it in the browser.

Ultimately, Google’s stance would have web browsers simply refuse to play back content with non-standards-based content (unless it’s Flash). What kind of “principled” or “non-evil” position is that? Again, if Google were to drop Flash support and make the argument that HTML5 is “the platform”, then it could make some kind of argument about consistency, but that’s not it. Google is making Flash part of “the platform” but not H264.

Free VP8 vs. Ogg/Theora

Ogg Theora seems to be really good at handling super-saturated images of vegetation

The Free Software Foundation has suggested to Google that it might want to make VP8 “irrevocably royalty free” (whatever that means—will it cover future developments of VP8?). Of course at the same time they argue that Ogg/Theora is as-good-or-better than H264 and of course it’s both open source and royalty free.

I’m not sure if I buy the quality argument (for a start, the article linked is almost entirely about image quality and not, for example, about compression time/effort or playback time/effort; next, Ogg/Theora may be very good at handling supersaturated images of vegetation growing near a river, but how does it do on more realistically saturated fleshtones? Most lossy codecs sacrifice blue detail in particular, so this may be a boneheaded comparison), but certainly Ogg/Theora is at least in the ballpark, and it’s free, so what’s the advantage of having an “irrevocably royalty free” VP8 as an alternative?

There are two major reasons why the world hasn’t embraced Ogg/Theora.

First of all, there’s lack of hardware support. Silicon that will decode H264 (not that desktop PCs or even laptops actually make much use of this hardware) with virtually no effort (for which read battery power) is available free with a box of cereal. No such hardware exists for VP8 or Ogg/Theora.

Let’s look ahead five years or so. We’d presumably like for there to be silicon that decodes our favorite video format free in boxes of cereal. We already have that for H264, is having such a chip available for VP8 going to make Ogg/Theora hardware support more or less likely?

Second, there’s the question of whether Ogg/Theora is actually unencumbered. H264 is, in essence, a known quantity. It’s backed by a huge bunch of megacorporations who have pooled their patents; even if someone comes along who claims to have a vital patent covering some obscure nook of H264’s implementation the participants in the H264 patent pool have lots of money, lawyers, and patents to with which to defend themselves (and they’re juicier targets than we are).

Ogg/Theora hasn’t been sued because no-one is making significant (any?) money with it, which in turn means there’s no reason for someone who thinks they could sue over it (a) to do so, or (b) even to let it be known they think they could do so (which is condemnation enough of our patent system).

If this were 1990 and we were talking about image file formats, “GIF” would be unencumbered. The GIF lawsuit only happened when someone figured there’d be money in it, and that was only true when lots of people became invested in using GIFs.

Here, we assume, VP8 has a huge advantage over Ogg/Theora because, presumably, Google has money, lawyers, and a patent portfolio. But unless VP8 is open sourced, I really don’t see the point. We’ll just be trading one master for another in the long run, since either VP8 won’t continue to evolve and will need to be replaced in fairly short order or it will and then it’s up to Google to figure out whether the new version is covered under “irrevocably”.

Addendum: really nice article on the whole Flash / VP8 / Theora / H264 / HTML5 thing (courtesy of daringfireball as usual). And yup, Theora’s claims to superior quality are a Sad Joke.