// Internet Duct Tape

What’s a URL to do? – How to Save URLs

Posted in Programming and Software Development, Technology, Web 2.0 and Social Media by engtech on February 28, 2007

Hello, My URL Is photomatt.netThe World Wide Web is an apt analogy. We’re all spiders spinning threads of links. Some people spin their threads with blogs, while others do it with social bookmarking sites like del.icio.us, Digg, reddit or Netscape (see them all at popurls).

One thing that bothers me about these social bookmarking sites is that they don’t do a good job of knowing when two links point to the same document. Ignoring the malicious users who purposely try to resubmit something using slightly different links, there are the flaws in the social bookmarking sites themselves.

As an example I’m going to look at one of my blog posts that has been saved to del.icio.us several 10 different ways. It’s crazy all of the ways a link URL [wiki] can be saved.

1682 people saved it using the trailing slash
132 people saved it with no trailing slash and the named anchor “holygrail”
36 people saved it with a bad URL
25 people saved it without the trailing slash
5 people saved it with the named anchor “comments”
4 people saved it with a different bad URL
4 people saved it with the trailing slash and the named anchor “holygrail”
2 people saved it with a trailing query string
… and several other saved of cached copies of the document or through an anonymizer/translator proxy.

There are several reasons why the same document could be referenced by different URLs.

Easy to Fix Mistakes

Trailing slash – Remove the trailing slash from the end of the URL.

Query string – Remove it if it isn’t taking any arguments.

Named anchors – Remove completely before storing. The named anchor can be used with or without the trailing slash.

Hard to Fix Mistakes

For these mistakes with URLs it comes down to the website in question should provide a redirect to the canonical URL and/or not create duplicate methods of accessing content.

Index file – different web server software uses different names for the index, but if you specify the directory it will grab the index file automatically.

WWW prefix – there is a movement to get rid of using www. at the start of domain names [wiki].

Useless query string – you can add any kind of query to the end of a URL and it is ignored.

Duplicate Pages – through poor planning the web developer can create multiple links to the same content.

The moral of the story is when developing websites always try to use semantic URLs [wiki] that are cruft free, not too long, and readable by humans. When writing an application that uses URLs as keys to store data, make sure you clean them first.

Who knows where they’ve been?

8 Responses

Subscribe to comments with RSS.

  1. […] including this one. But to be on the safe side, I would still use that trailing slash if present. What’s a URL to do? – How to Save URLs Site Search Tags: WordPress+Tips, […]

  2. Daniel said, on February 28, 2007 at 3:30 pm

    Good post. In my opinion there is a need for some standardization. It is crazy the number of options you have to name the same stuff…

  3. engtech said, on February 28, 2007 at 6:35 pm

    I’d like to see standardization on the name of the index file. That was standard, I have no idea why so many applications came up with proprietary names for the index file.

    What might work best is if there was a META header that stored the URL to use for bookmarking… but that would end up being abused very fast (spammers creating seemingly useful content but the bookmark meta header pointed to their spam site).

    People are moving away from using query strings when designing a site, which is a good thing.

    It would be nice if the social bookmarking sites go their act together on trailing slash, and named anchors though. Del.icio.us seems to have stealth-fixed the trailing slash problem in the past few months.

  4. Lloyd Budd said, on March 02, 2007 at 4:13 am

    If you hover over your links in “Easy to Fix Mistakes”, you will see that they are not what the text describes in many cases.

  5. engtech said, on March 02, 2007 at 5:45 am

    Thank you kindly, Sir.

    *promises never to blog from Word again*

  6. engtech said, on March 02, 2007 at 5:47 am

    Interesting, it looks like wp.com is fixing the URLs behind my back. :)

    When I edit the document I see the “bad URLs” but when I look at it on the blog they’re fixed.

  7. […] 5, 2007 at 10:45 pm · Filed under Uncategorized What’s a URL to do? – How to Save URLs “One thing that bothers me about social bookmarking sites is that they don ’t do a good job […]

  8. xanax tablets said, on April 05, 2007 at 12:38 pm

    xanax

    eutoyd sreu


Comments are closed.

%d bloggers like this: