I’ve just spent about a day banging my head against a wall, only to discover that the problem was entirely caused by an XML attribute that was tacked onto the XML standard after XML 1.0 was “finalized”. (Fascinatingly, files which use this tag are still labeled XML 1.0. Apparently it’s part of XML 1.0 “second edition” — a new meaning of “1.0” that I hadn’t previously encountered.)
That attribute is xmlns (the “ns” stands for “namespace”) and it’s kind of symbolic of what a clusterf*ck W3C has managed to be.
So here’s the deal. If you put an xlmns attribute in an XML node then it switches that node and all descendants of that node into a new namespace (which is defined by a pretty much arbitrary string — by convention a URI pointing to documentation of the DTD). In my case the xmlns string was a URN referring to a physical book by ISBN. Handy.
But that wasn’t the problem. The problem was finding out that it was an attribute that was making all the tags in an XML file I was trying to process disappear. If I saw a tag <title> then it would not be recognized by <xsl:template match=”title”>. Now, this is probably just an obvious gotcha for anyone who happens to know what xmlns does, but to my mind it’s simply a violation of everything XML is supposed to be about.
First of all, XML has explicit namespaces that work exactly as you’d expect. E.g. <xsl:template … > is a tag in the xsl namespace. The nice thing about this is it’s really clear which tags are in the xsl namespace, and it’s also going to be clear even if I take a fragment of the XML tree, because every tag in the xsl namespace has an xsl: in front of its name. But xmlns simply turns all the tags in the tree below it to garbage. <document xmlns=”my-arbitrary-string”><title>…</title> … </document> does not contain a document or title tag any more. You can’t “see” them until you remap the arbitrary xmlns string to a local namespace tag, (e.g. <xsl:stylesheet …. xmlns:foo=”my-arbitrary-string” …>) and then use it everywhere (so <xsl:template match=”foo:title”>> would then match <title …>). But if you grabbed a subtree below <document xmln=…> using your XML library then that subtree would lose its namespace and map to the unadorned tag as expected.
I hope I’m not the only person who thinks this is f*cked up.
Now, if you’re a programmer, you’re probably used to OO languages having some facility for making a given object or class easier to refer to within a specific context, e.g. with import foo.bar.someClass; someClass x; x = new someClass. This is handy, especially in overly normalized languages like Java, as it saves both typing and improves code readability.
But XML is designed so that the content of any node of an XML tree should be valid XML that makes sense by itself. Most programming languages — especially C-syntax languages — assume you need to read declarations to understand the code below them.
The way it should work is that you should be forced to map a custom namespace tag (like foo) to a specific namespace string (e.g. <?xmlns foo=”my-arbitrary-string”> and then use it consistently everywhere (<foo:document><foo:title>), so it’s explicit and subtrees will be explicitly assigned to that namespace (and even if you’d lost the mapping it would be clear that you needed it). That would make sense. After all, that’s how the normal namespace stuff works.
But we can’t expect common sense from W3C.
W3C Considered Harmful
Most standards suck. The one thing that doesn’t suck about standards is that they’re standards. Except that all the W3C’s standards seem to be notable mainly for the utter lack of compliance. If W3C’s standards were like voltage, then no power outlet in the US would deliver a voltage in the range 100-120V. You’d just have to hope that the voltage from a given outlet wouldn’t blow up anything you plugged into it or electrocute you.
But, W3C’s standards suck worse than that.
Consider frames. Frames were created by Netscape in a retarded attempt to solve a really simple problem — web pages need static navigation/UI components. Frames sucked, but were adopted as a standard for pragmatic reasons — which is why standards suck. But the way in which standards don’t suck should be that a less horrible approach should also be provided and gradually dominate. Instead, we got a whole slew of progressive approximations (e.g. CSS) that failed to give us static panels, or window-like elements without huge amounts of effort.
Consider CSS. CSS was designed to allow precise graphic design in web pages without messing up HTML’s content representation. A great idea. Pity that despite all that effort it’s almost impossible to use CSS to create a header, footer, snaking text columns, or vertically centered elements. I don’t know hope many blogs there are explaining so-and-so’s semi-functional scheme to get a three column layout to work.
We could have gotten <div style=”column-count: 3; column-spacing: 16px….”> but instead we have padding which increases the size of an element and margin which doesn’t, except when it does.
Then there’s plugins. To begin with, the standard mechanism for including plugin content is retarded (<object …><embed /></object>), but why on earth isn’t there a simpler option for the most common media type. It seems like we’ll need to wait for HTML5 to be out and adopted before we can naively stick something like <video width=”400″ height=”256″>foo.mov</video> in a web page. Maybe by 2019 we’ll have <video width=”400″ height=”256 allowFullScreen=”1″> — I hope the replicants don’t get me first.
So I guess in the greater scheme of things, xmlns is pretty much par for the course.