SEO tip: rewriting URLs without apache

Posted:

[Fetish doll]

As I begin my descent into the arcane and mystical realms of Search Engine Optimization, I have heard that some search engines prefer static-appearing content to dynamic. That is, a page with a .pl, .php, or .asp would be spidered less often or ranked lower than a page ending in .htm or .html. Since search engine companies are secretive in how they they spider and rank pages, I cannot say with certainty that this is true. However, this behavior is a widely accepted prejudice.

It is clear that dynamic pages ofter many, many adventages over static content. The most clear advantage is the use of common headers and footers to create a consistent and easily changed look and feel to a web site. It is unlikely that web designers will move away from dynamic templating, but how can they create these valuable .html files for search engines?

One way is to create a process that runs through all the dynamic content on one’s site and produce static HTML files. When I worked for O’Reilly Media, that’s how the catalog pages were generated. But this solution would not work for pages that are updated daily or hourly. The better solution is to simply call your dynamic pages .htm or .html. Since you might actually want a static page now and again, perhaps we’ll focus on .htm.

For the youngsters in the crowd, .htm is the extension for HTML files as they appeared in earlier versions of Windows (like Windows 95). Back then, there was still concern about backward compatibility with the DOS file naming schema (use wikipedia for the gory details). However, this three letter extension is no longer needed and rarely used.

There are at least two methods for dynamically redirectly incoming HTTP requests to a differently named resource on the web server. Again, the scenario is you have a bunch of .php files but you want the public facing URLs to end .html. Apache has the much-feared Rewrite directive that uses regular expressions on requests to find the local resource to serve. An example of this that might appear in an .htaccess file is:

RewriteEngine on
RewriteRule ([^/]+)\.htm$ $1.php

This silently redirects all requests for any .htm file to the similarly named .php file. But I don’t like this solution for two reasons. First, regular expressions are easy to mess up and can be hard to debug. Second, apache has to do this small amount of work for every request. I say, use the filesystem itself to make the magic happen. Enter the symbolic link.

Symlinks are available on Unix and Mac OS X. Simply link you .php to .htm files:

$ ln -sv index.php index.htm

Then add a the .htm to your PHP handler.

AddType application/x-httpd-php .php .htm

This symlink technique reduces the work on apache allowing it more capacity to serve pages than the rewrite trick. How much more effecient this trick is, I can’t say. I can will note that rewrite rules fall into the category of “action at a distance,” with is a software engineering term meaning that the mechanism that causes something to happen isn’t is an obvious place. With symlinks, any new coder will be able to find index.htm on the web server and see that it points to index.php. This is not the case with rewrite rules.

Any time I can make the OS do work instead of apache, I do. OS-level activities are often faster and more effecient than apache, so take advantage of these services when you can.