Posts 2 & 3: The URL as User Interface
A case for why the URL is a critical component of User Interface on the web, showing the role of the URL in (a) being human readable, (b) being an archival path, and (c) revealing server architecture.
First posted to the Sarai Reader List: http://mail.sarai.net/pipermail/reader-list/2005-February/005074.html and http://mail.sarai.net/pipermail/reader-list/2005-March/005246.html
Hello all,
Continuing from last month on how user interface affects online community, I look now at the URL, the Universal Resource Locator, as a critical element of user interface. Jakob Nielsen wrote an excellent summary in 1999. His essay is dated but still relevant. What I have here is a collection of examples, showcasing both good and bad use.
First, a somewhat technical explanation. Here’s an example URL:
http://www.sarai.net/community/fellow.htm
What looks like a single line is actually several parts:
The “http:” prefix refers to the protocol used to access the page. The “//” sequence indicates that what follows is a server name “www.sarai.net” is the server where the resource (page) may be located. “/community/fellow.htm” is the path to the resource on the server.
W3C’s URI (Universal Resource Identifier) specification explains this in better detail. For all practical purposes, URIs are the same as URLs (all URLs are URIs, and URIs that are not URLs are hard to come by).
Windows users will notice the striking similarity between the URI syntax’s “//servername” and Windows Networking’s “\\servername” notation. Windows paths are URL-syntax-derived, but not valid URLs because the “protocol:” prefix is missing. Some applications compensate by requiring a prefix of “file:” (Windows) or “smb:” (Linux, Mac). Browsers typically prefix “http:” if none is specified. The use of backslashes instead of forward slashes in Windows dates to a conflicting use in MS-DOS 1.0. Windows internally supports use of either type of slash, but presents only backslashes in the UI.
URLs play several roles, sometimes conflicting. We will look at these today:
It is important to realise that a URL is not a hyperlink. A hyperlink is a reference from one (usually HTML) resource to another, where the other resource is identified by its URL. The hyperlink itself is not the URL. Hyperlinks have their own influence on user interface, but that is for discussion another day.
URLs as brand identifiers
Consider these example URLs:
- http://www.apple.com/ipod
- http://www.microsoft.com/office
- http://www.boingboing.net/2005/02/23/fake_astronaut_scams.html
Contrast:
- http://www.plusthought.org/article.php3?story_id=58
- http://timesofindia.indiatimes.com/articleshow/1021545.cms
- http://www.linuxjournal.com/article/3882
- http://news.postnuke.com/modules.php?op=modload&name=News&file=article&sid=2666
Notice that the first set of URLs gives you a fairly good idea of what each is about, while the second doesn’t.
While URLs are ideally hidden from users, masked by the page’s title and content, the page being accessed only via links from other pages, in practice this is not how it works. Browsers display URLs prominently. Passing links via email requires users to cut and paste the URL, and a missing or changed character can mean a broken link. Recent phishing scams make it even more important to be aware of the current page’s URL.
Reality is, users read URLs, and site administrators who care about their sites being accessible should use readable URLs. The ideal URL is one you can read out on the phone to another person, who should be able to type it in without errors. Let’s look closely at some of the above examples.
Think of any Apple brand. QuickTime, Mac OS X, iMac, iPod, iTunes. Any brand. Write that brand name in lower case, remove spaces, and stick it to the end of apple.com/. Note that the page you expect comes up. Apple is legendary for their attention to user interface, and it extends to their website.
http://www.microsoft.com/office
This one is a googly. It looks like a clean URL, but click on it and you are redirected to "http://office.microsoft.com/en-us/default.aspx", which is no longer a URL you can remember off the top of your head. Unlike as with the Apple site, it’s not obvious that you can find the right page by going to microsoft.com/brandname (it works, but you find out only by testing for it).
http://www.plusthought.org/article.php3?story_id=58
This URL tells you nothing of what to expect when you click. Imagine if you had a site with URLs like this and you were replying to email asking for details about some programme.
“Yes, the programme is still open. We have details at our website. Please go to http://www.plusthought.org/article.php ...” umm, php3 or php4 or just php? ... umm, what is the story id number? ... open browser ... realise you are offline, wait several seconds for dialup to complete, open front page of site, realise you can’t see the link because it’s an image, curse at how slow dialup is, wait for all images to load, and there it is! Copy and paste in email. Or if you are in a hurry, you’ll just say “please go to our website and click on the Wanderer link,” which is sub-optimal. Imagine if you could just say, “yes, please go to plusthought.org/wanderer”.
Disclosure: I have previously worked with the fine folks at Synapse and they are fully aware of this problem and intend to fix it. Despite their poor URLs, they do excellent information design; by far the best I’ve seen anywhere.
http://timesofindia.indiatimes.com/articleshow/1021545.cms http://www.linuxjournal.com/article/3882
Unlike Synapse’s site, these two are news sites with regularly updated content, making it hard to have short URLs for everything. At least they’re not as bad as the following:
http://news.postnuke.com/modules.php?op=modload&name=News&file=article&sid=2666
This is inexcusable. Not only is it lacking context, it is long enough to be unreliably reproduced in email. Several mail clients will wrap text at 72 or 80 columns, and even if yours doesn’t, the mailing list software (notably Yahoo! Groups) or the recipient’s mail client may. A wrapped URL is an unusable URL. Not everyone understands how or cares enough to join the lines and open the link.
http://www.boingboing.net/2005/02/23/fake_astronaut_scams.html
This URL is an example of how even a regularly updated news site can have meaningful URLs. The numbers are clearly a date, which tells you how old this page is, and the filename is a fragment of the headline. It’s enough to (a) let you decide if you want to open it, and (b) makes it easier to identify if you have already seen this (if someone forwards you a link you have already seen, it’s likely to be recent and interesting enough that the headline appears familiar). This URL scheme is standard with the Movable Type blogging software. Contrast with the URLs generated by LiveJournal and MSN Spaces.
<http://sify.com/news/fullstory.php?id=13672097&headline=Groom~runs~away,~guest~marries~bride>
Sometimes when you are not in a position to fix your server software, a temporary kludge like this helps. It’s ugly, but it’s better than a meaningless number. Also notice that I encased this URL in angle brackets. Most mail clients understand that to indicate that the URL must not be wrapped, or if it arrived wrapped, to piece it back into a single line.
URLs as permanent archival paths
Web founder Tim Berners-Lee argues that this role is by far the most important. URLs should not change. A URL pointing to a particular resource should continue pointing to the same resource 2, 20 or 200 years from now. Take, for example, another page from Apple’s site:
You’ll see there Apple’s marketing pitch for their LCD-based iMac G5. The same page a year ago would have shown the lampshade-like iMac G4, and even earlier, the CRT-based iMac G3, all of which are entirely different computers. By emphasising the branding role, Apple’s URLs fail to serve the archival role. Sometimes this is a conscious decision. Perhaps Apple wants to keep simple URLs for their most current products, and aren’t concerned about discontinued products showing up at expected URLs.
More often however, this is the result of poor planning. For example, my own photo album. (Apologies for the personal plug here, but I didn’t have a better example at hand). Last December [2004] I visited the Tibetian settlements in Bylakuppe in southern Karnataka and posted pictures here:
http://jace.seacrow.com/pics/places/bylakuppe
Earlier in the year, a friend and I drove to Madurai. I took pictures along the way, and now had two sets: pictures taken in Madurai, and pictures taken in Tamil Nadu outside Madurai. A hierarchical organisation made sense:
http://jace.seacrow.com/pics/places/tn http://jace.seacrow.com/pics/places/tn/madurai
Earlier to this I visited Mysore and Karwar, both in Karnataka:
http://jace.seacrow.com/pics/places/mysore http://jace.seacrow.com/pics/places/karwar
Notice the hierarchy is no longer consistent. Tamil Nadu pictures are in their own folder, while Karnataka pictures are scattered in the upper level. Ideally I’d place Bylakuppe, Karwar and Mysore in a Karnataka folder. In practice, this would mean changing URLs, undermining their permanency. In previous situations like this, I’ve setup redirectors so links don’t break, but this is tedious work. Hierarchical systems inevitably change as the library grows. A URL that reflects hierarchy is friendly but not guaranteed permanent. A URL that simply shows a database id conveys little information, and is still at risk of impermanency if the database system is upgraded in future and all ids change. Berners-Lee recommends that URLs should be date-stamped in such situations, which incidentally is the method adopted by Movable Type, as shown in the example from BoingBoing.net earlier.
URLs exposing server architecture
A HTML file typically carries an extension of “.html”. However, consider these examples:
- http://www.bbc.co.uk/worldservice/index.shtml
- http://www.royal.gov.uk/output/Page1.asp
- http://www.xanga.com/register.aspx
- http://gimp-print.sourceforge.net/MacOSX.php3
- http://www.fanniemae.com/index.jhtml
- http://www.poets.org/index.cfm
- http://squishdot.org/987802018/index_html
- http://www.telegram.com/apps/pbcs.dll/frontpage
All of these have different extensions, revealing the technology platform in use. In order: Apache Server Side Includes, Microsoft ASP, ASP.net, PHP 3, Java, Cold Fusion, Zope and Windows Dynamic Link Libraries. The trouble with including such a blatant platform signature in the URL is, should you choose to switch to a different platform, all your URLs change. Some platforms like Zope are insensitive to file extensions. You can use whatever you want and it’ll still work. (In a case of taking this insensitivity too far, Zope is littered with index_html URLs.) Others like Apache-based platforms can be configured to use different extensions, but this typically requires a system-wide configuration change which your ISP may not be willing to do for you.
It is best to avoid identifying platform in your URLs. These examples are even worse:
- http://www.amazon.com/exec/obidos/subst/home/home.html/104-0744072-3248744
- http://store.apple.com/1-800-MY-APPLE/WebObjects/AppleStore.woa
- http://plone.org/search?SearchableText=plone&b_start:int=30
- http://www.telegraph.co.uk/news/main.jhtml?xml=/news/2005/01/30/wgerm30.xml
Notice that all URLs at amazon.com begin with “/exec/obidos”, making that part of the URL semantically meaningless and unnecessary cruft. Further, home.html is followed by a slash and another path component. This breaks the file and folder hierarchy that the Web is built around. Browsers expect that folders contain other folders and files, and that files contain no sub-items. This is required for links with relative references to resolve properly. (When folder/a.html links to b.html, is it referring to folder/b.html or folder/a.html/b.html?) When a path component can behave like a file at times and a folder at other times, it risks confusing the browser. (Zope’s object database also has this problem. Zope solves it by inserting a base href tag in all HTML pages. This works well but is not an elegant solution.)
The second is the home page of Apple’s online store, listing all their products. It seems like a simple matter to copy a link to any of the products listed there, but you’ll find the link does not work when used anywhere else. Jim Roepcke deconstructs the Apple Store URL to find that this is because it includes session related data that is not valid for anyone but the user it was generated for.
The third exhibits a characteristic of the Zope platform (which Plone is built on). The “b_start:int” in the URL signifies that b_start is an integer parameter. Zope includes several others like “:list” and “:tokens”. These are matters of internal architecture and should not appear in the URL.
The final, from the UK Telegraph, is rather interesting. main.jhtml is taking a parameter that appears to refer to a file on disk. What if you change the path and make it read another file, one that was not supposed to be shown to the public? This may seem a humorous hack, but it could be worse. Philip Greenspun describes a case of Harvard Business School rejecting 119 applicants who edited a URL to check their application status.
Further Reading
Matthew P. Thomas documents cruft in URLs generated by various weblogging systems: http://mpt.phrasewise.com/2003/07/26#a534
Mark Pilgrim documents the process to make Movable Type generate cruft-free URLs (warning! technical jargon): http://diveintomark.org/archives/2003/08/15/slugs
Nathan Ashby-Kuhlman presents more real world examples: http://www.ashbykuhlman.net/blog/2003/07/27/2227 http://www.ashbykuhlman.net/blog/2003/08/02/2224
Conclusion
We have looked at various ways to construct a URL and what roles they serve. Should there be a doubt yet on how URLs are relevant to community, that is simple. To discuss the content of any web page, you need a URL that can be shared. Without a URL, you are left attempting to reproduce the content (which may be non-trivial for graphical or Flash content), and have no reference that others can visit. A simple URL is friendlier, and therefore a better URL.
My next few posts will explore the human side of the UI-Community linkup.
