Regex to find URLs in Web Pages

This is a really difficult thing to do, but I’m surprised that the top few links I found googling for solutions weren’t better. In fact they sucked. Here’s what I whipped up in a few minutes:

((href|src)\=(\"[^\"]*\"|\'[^\']*\'|[^\s\<\>\^]*\s))

Now, the code I tested it on was from the top link to my first search. (A+ for SEO, D– for actual code.) To begin with, not capturing src URLs is boneheaded. But the big problem dealing with this page is, first of all, sample code (which is why the filters for unquoted text have some odd characters in there) and JavaScript URL builders. No real way to deal with those because JavaScript expressions can be built entirely of legal URL characters.

And now for something completely different: mazes

Generating random mazes with a computer is pretty easy, but generating “interesting” mazes is an open-ended problem. I came across a blog full of information about maze generation algorithms a while back (thanks, I believe, to Hacker News), and immediately wasted an afternoon implementing the slickest algorithm in JavaScript.

The problem with your typical random computer maze is that it’s just too pathological. They’re great mazes for kids to solve in books of mazes, but kind of a pain to walk around in a first person shooter, say. So the question is how to make mazes that make a nice “dungeon map”.

Maze Generator UI
Maze Generator UI

Anyway, I’ve come up with what I think is a pretty nice pragmatic solution (which you can play with here). It uses a variation of the flexible tree algorithm to generate a base maze (but with a variable fill) and then performs a post-processing path that allows you to limit the “isolation” of individual cells, allowing the creation of everything from conventional mazes to city-like grids.

Rendered Maze Detail
The generator shows the isolation of each cell numerically.

Isolation is an interesting and useful concept. E.g. if you want to place goals such as “treasures” or “bosses” on a map, the obvious spot to put them is the most isolated cells. To place obstacles in the path of a player trying to reach a goal simply follow connections to less isolated cells and insert the obstacle.

My next plan is to support “templates” (which can also be randomly generated — e.g. use a maze generated with one set of parameters to create base structure and then use that as a template which fills out details) which should let me generate pretty nice “dungeon maps” — or levels for Project Weasel.

How to use HTML5 <video> tags everywhere and have them Just Work™

Chrome's HTML5 video player interface
Chrome's HTML5 video player interface (just a screenshot, it doesn't actually work)

I’ve been coding raw html since 1994 (when I learned just enough to create a home page for Prince of Destruction — check out the all the <BR>s). I can remember all kinds of random CSS properties and half the jQuery API, but when I need to embed video in a web page I still need to google a bunch of references to get anything that works halfway decently.

Until Google pulled the rug out from under H264 there was some hope that embedding video inside a web page might soon become as simple as embedding images has been since the days of NCSA Mosaic. It’s hard to imagine a non-trivial piece of code simpler than a basic image tag. (If you have any doubts as to whether Google really wants the video tag to succeed, take a look up at how horrible the video player UI in Chrome still is as of version 5. I might add that it can hardly play video without skipping and flickering.)

Whether or not Chrome (and Android) support H264, we’re going to be stuck with a ton of legacy users (Firefox, IE6/7) for a long time, and the only viable solution if we want to encode video in as few formats as possible is Flash.

Well, Flash supports H264 and Webkit supports H264, so what I want is to:

  • encode my video once, in some flavor of H264 that “just works” — ideally I want to be able to open a movie in QuickTime, Save As… and I’m done.
  • embed my video using the simplest, easiest-to-remember html possible, i.e. <video src="foo.m4v" width="800" height="450" controls="true">
  • and have my video play back everywhere

So, here’s my solution. Use HTML5 video tags, and use a tiny bit of JavaScript to automagically convert the video tag into a Flash embed where necessary using the information already in the web page (which can be found in the video tag’s attributes).

I’ve built a simple prototype here. It uses video and h264 detection code from Dive into HTML5 (which means that for now Chrome plays back the video natively) and jQuery for cross-browser DOM manipulation, but hand-coding the relevant bits wouldn’t be that difficult if I wanted to make the code self-contained.

In essence, all I need is: $( function(){ $('video').replaceWith( flash_embed_crap ) } );

Most of the code is simply regurgitating boilerplate html for Flash embedding. (Incidentally, the code will have issues in IE7 thanks to the “Eolas bug”, but if I were to put the script in an external JavaScript file that issue would disappear. Any weird IE behavior should be gone.)

The Flash video player I’m using is very simple — it pretty much uses Flash’s built-in video components and has some simple glue code allowing the embedding web page to talk to it once it’s embedded (not that I’m taking advantage of this here).

It works.

P.S. As per the comments in my source code, I plan to extend this code to provide a number of useful convenience functions and support <audio> tags the same way, and finally to provide a simple but robust open source Flash media player. Aside from anything else, this will help me implement cross-platform media playback in Acumen.

P.P.S. My initial attempts to do for <audio> tags what I’ve just done for <video> tags have not met with much success.

My big problem is that Flash’s FLVPlayer component doesn’t seem to want to play MP3s for me. I’m not sure what’s going on there; it may be some odd combination of Flash security settings hosing locally hosted playback and my ISP’s server configuration hosing remote playback, or FLVPlayer may just be fragile. I’d like the audio and video player UIs to be as identical as possible and for the player to be as simple as possible — having two completely different code paths for audio and video seems like hard work (mostly from a UI point of view).

So the upshot is I can’t change this article’s title to include <audio> tags (yet). Sigh.

Centering Images in a Box via CSS

It’s a testament to how badly thought out web standards are that various simple, common tasks, such as displaying an image centered in a box, are rocket science. (More examples: we still don’t have a decent mechanisms for handling headers and footers, text columns, footnotes, or non-scrolling UI elements.)

When I google for a solution, the top linked page addressing it appears to be here. I got a bit leery of the proposed solution when I realized it involved (a) IE-specific conditional hacks with comment jutsu, and (b) the insertion of empty spans. I became even more leery when I discovered I’d need to rework it to handle images wrapped in anchor tags.

There is one mechanism in CSS for centering images in boxes that Just Works. Unfortunately it involves setting them to be background-images (in other words, rather than being able to apply properties to an img tag to make it behave as desired, you need to replace it with some other element and then give it a background-image property — this is typical of the way CSS “solves” common problems).

Fortunately, this works for the problem I have at hand, and I can add the “linking” functionality back in either by wrapping my anchor tags around a transparent gif scaled to fill the relevant box or attaching an event handler to the box (a span). (Edit: note that my final solution is more elegant and doesn’t do any such nonsense.)

This leaves me with per-item html that looks like this:

<span class="grid_item" style="background-image: url(example.jpg)">
  &nbsp;
</span>

vs.

<div class="wraptocenter">
  <span></span>
  <img src="example.jpg">
</div>

But the win is that my CSS is shorter, simpler, and doesn’t involve unused tags and IE wankery (although it occurs to me that the linked solution might work better if they simply used spans instead of divs as the containers, since — like me — they’re completely overriding the display behavior of the element anyway, so why not take advantage of the fact that inline-block works for span (I believe) in IE 6/7?)

Now because I’m using a background-image I can’t take advantage of proportional scaling using min- and max- dimensions (so I can’t usefully display an image inside a box scaled to fit in that box). This is one of those cases where CSS is simply infuriating.

Anyway, here’s my CSS (the grid_item is intended to sit inside a larger div and act like a table cell):

span.grid_item {
  display: inline-block; /* should work for spans in IE 6/7 */
  width: 128px;
  height: 128px;
  background-position: center; /* why can't all centering be this easy? */
  background-repeat: no-repeat;
}

I haven’t actually tested this under an old IE yet, I’m relying on quirksmode, so take everything I say with a grain of salt and use with caution.

Post Script

I’ve put up my test page here, and tested it with Internet Explorer 7 and it works perfectly. (Thank you, again, quirksmode.) I think you’ll agree this is a much more elegant solution (perhaps not compatible with IE 5.x on a Mac but — seriously — who cares?). What’s more, I’ve made it even cleaner after realizing that anchor tags are inline by default and hence I don’t even need to use a span tag or jQuery kung fu to implement link behavior. (So if the HTML were generated server-side and the browser had no CSS support everything would behave correctly.)

The NeXT Machine Rocks

I just got the new Twitter client from the new Mac App Store. It’s lovely and minimalist and seems to be missing an integrated URL shortener. (Tweetie, upon which this app is supposedly based, offered a choice of three — go figure.)

Fact is, I’m sick of having some apps with URL shorteners and some without and for the mechanism being different from one place to another. Then it occurred to me that this was exactly the kind of thing “Application Services” — you know, that clumsy feature from NeXTStep that’s been in Mac OS X since it was called Rhapsody but no-one ever uses — are for. And I dimly recalled that Automator lets you create services.

So I googled “mac os x url shortener service” and found this. Here’s the AppleScript snippet from that article:

on run {input, parameters}
  set dlstring to ((path to temporary items folder as string) & "shortURL.html")
  tell application "URL Access Scripting"
    download ("http://bit.ly/api?url=" & (item 1 of the input)) to dlstring replacing yes
  end tell
  set x to open for access dlstring
  set aurl to read x
  close access x
  return aurl
end run

I then launched Automator, created a new service, and wasted a bunch of time trying to figure out how to import AppleScripts into Automator. (Apparently you don’t — you use the “Run Applescript” Automator Action.)

Automator in Action

And now I get system-wide URL shortening. (Sadly it’s two levels deep in the global context menu, but at least I know it’s always there.)

P.S. all this work was redundant because the new Twitter client automagically shortens URLs and counts the tweet’s length assuming a shortened URL. But because it does this completely transparently but with no visual indication (e.g. ghosting in the shortened URL or something) it’s not obvious. So while it’s nice to have a URL shortening service that works everywhere on my Mac now, I don’t need it for Twitter.app.