Latest News
Due to a lingering cold, I'm taking a break of blogging for a bit. This is a great opportunity to use my RSS feed to keep up with my posts.

Comment spam has returned to Taskboy, but this time, it's personal.
The first version of comment spam appeared to be from a program that simply crawled sites and looked to hook into the comment systems of most blogs. The content of these comments was clearly mechanically produced and made little sense. Technologies like CAPTCHA have mostly eliminated that sort of noise.
I know get messages that are either written by the world's greatest ELISA bot or by poorly paid folks with computers. The content of these human generated messages are often on point and could be attempts to participate in a dialog. However, there is always a URL form that points to a business associated with the posts.
Some of these comments I post. Some I do not. I did not want to unfairly reject a comment. However, I'm pretty sure I see the pattern now and will be less lenient in the future.

Doc Searls points out our growing dependency on Google. In brief, he equates Google with a kind of free public utility that provides the following functionality:
- Maps/Satellite data
- Search
- Mobile phones (via Android)
To that list, I would add the following features on which many people and companies have come to depend:
- Advertising (via AdWords)
- Ad-based venue (via AdSense)
- Email and Instant Messaging (via Gmail)
- Voice Mail/Phone routing (via Talk)
- New aggregation (via News)
Searls, with a "me too" from Dave Winer, declaim that Google has become "too big to fail." They worry that Google, fattened and dependent on its advertising engine, is vulnerable to economic bubbles in advertising. They worry that Google is in a bubble right now.
I do not think this is the case. Google no doubt enjoys quite a bit of revenue from advertising right now. However, no one outside of Google has a complete accounting of the company's revenue. If Google survived the recessions of 2001-2003 and 2008-2010, there is good reason to believe that they will weather future economic storms.
However, Doc Searls points to a more immediate danger that Google presents to consumers: that of the digital monoculture. Google rarely extracts money from users directly. There are only a handful of subscription services offered by them. However, the user base for Gmail is enormous. The same can be said for their advertising services. What would our online life be like if Google went offline tomorrow?
Just looking at one service, Gmail, is illustrative of the scope of the problem. Many companies have outsourced their mail handling to entirely to Google. Recall that in the 1990's, IT staffs spent a considerable amount of time and money setting up corporate email systems. Although the largest companies still do this, many simply outsource this tasks and reap significant savings and reliability over in-house mail systems. Without Google, a very large number of companies and people would not be able to conduct business. Sure, there would be work-arounds: alternate email accounts, telephones, etc. However, this distruption would cost real and measurable dollars.
Perhaps the most immediate effect would be the loss of Google's wonderful, if easily forgotten, search function. To remind those readers recently recovered from a coma, the current neologism for searching online for something is "googling." Imagine the sort of a day you would have if your browser returned a 404 missing page error when accessing http://www.google.com/. That would not be a salad day at all.
The open source community, of which both Searls and Winer are associated, has longed battled against digital monocultures (e.g. IBM, Microsoft, Apple, etc.). Consumers usually benefit from choice (although not always [remember the mess of the home computer market in the 1980s]). Healthy competition promotes innovation and cost-savings. It also creates a healthier ecosystem in which the failure of one entity does not threaten the survival of everyone.
To this end, consumers of free digital products ought to consider how much they depend on these services. I practice what I preach. I pay for Yahoo Mail Plus, which is $20 a year. It's a fair deal: I see no ads and I can use POP mail. That's pretty short money for a service that has yet to have a outage more than 5 minutes in three years. The same goes for my blog hosted on bluehost.com. For $7 a month, I get shell access to very reasonable Linux environment. Of course, there are plenty of free choices for blog hosting these days, but I need to control my content and the context in which it appears. Free services can close shop without notice and there is little consumers can do to retrieve their content.
The dangers of monoculture become readily apparent after a failure. In groups, humans are not noted for their ability to successfully anticipate future disasters. It seems that now, we don't even recall past calamities all that well. One would think that the near fatal collapse of traditional lending institutions who participated in rank speculation would produce a rapid and perhaps onerous regulatory response. However, that has not yet proven to be the case nearly two years after the shock.
From a strictly selfish perspective, I would love to see Google fail completely tomorrow. Business opportunities abound in chaos and fortune favors the bold.
UPDATE: It looks like no one at Yale reads my blog.

I tweeted yesterday about my centos 5 linux box losing sound (and frankly X, but that's another story). I tried to use the sound card detector (system-config-soundcard) to find the card again, but it silently failed (typical). The Linux sound howto is over nine years old, so that's pretty useless. What to do?
Well, I turned to my old friends lsmod, dmesg and strace. I could see tha there were kernel sound modules loaded. I could knew that I had configuration that was working. So I issued the following command:
# strace mpg123 /path/to/mp3file
I got quite a bit of output, but the relevant line was this:
open("/dev/snd/pcmC0D0p", O_RDWR|O_NONBLOCK) = -1 ENOENT (No such file or directory)
And sure enough, this is what was in my /dev/snd directory:
[root@durgan snd]# ls controlC0 hwC1D2 pcmC1D0c pcmC1D1p timer controlC1 pcmC0D0c pcmC1D0p seq
Oh look! There's no pcmC0D0p file. What to do. I could try to create the missing dev file, but I just symlinked the pcmC1D0p to pcmC0D0p.
And yes, this horrible, horrible hack worked like a charm.
What happened to delete this device file? Why couldn't I make the system re-detect and re-initialize the sound system? It's stupid, stupid user issues like this that have plagued linux since 1995 and show no signs of getting better today. This is why you don't see a lot of linux notebooks or games.

Increasingly, search engines like Google and Yahoo are supporting a kind of HTML markup schema called microformats. Microformats are a kind of an embedded document inside a standard web page (which can be thought of as a sort of macroformat, if you will).
There are two flavors of mircoformats: one for HTML (which introduces no new elements or attributes) and one for XHTML (which adds a few new attributes, but no new tags). The proposed RDFa standard can be thought of as the flavor of microformats for XHTML, since it is based on the RDF XML format. The reality is the there are two groups of people (RDFa, microformats) working in the same space, so there is overlap. However for now, the HTML vs. XHTML partitioning works well to categorize these two embedded document efforts.
There are a few things you can do with these embedded syntaxes. If you regularly review businesses or products (like yelp.com does), you might consider using the hReview mircoformat to identitfy parts of your review, as shown in the following snippet:
<div class="hreview">
<span class="item">
<span class="fn">Taskboy Feed Bag</span>
</span>
Reviewed by <span class="reviewer">Joe Johnston</span>
on
<span class="dtreviewed">
<span clsas="value-title" title="2010-02-07"/>Feb 7, 2010
</span>
<span class="summary">Terrific news source for free</span>
<span class="description">The Feed Bag RSS aggregator on Taskboy
has replace Google news as one stop shop to get the news of what's
going on in tech and politics now.</span>
<span class="rating">4.5</span>
</div>
As you can see, that's a lot of semantic markup for so little visible text. Google can pull this code apart and display it under the link for the product or service discussed.
In addition to reviews, Google and Yahoo understand at least four other species of embedded documents: people/businesses, products, events, and video. Because there are two standards bodies at work, there are microformats and RDFa specs for each. The following table summarizes this with links to give examples of these embedded documents.
- Reviews: hReview (microformat) v:Review (RDFa)
- People/Business: hCard/vCard (microformat) v:Person (RDFa)
- Events: hCalendar (microformat)
- Product: hProduct (microformat) v:Product (RDFa)
- Video: Facebook Share (microformat) Yahoo! SearchMonkey (for video) (RDFa)
To test your embedded document, try Google's microformat tool.
I have a few concerns about microformats. The first is that it requires a lot of additional markup. I understand that a blogging system can create a form to collect these discrete features, but it still seems like a lot of work for casual use. The second concern I have is that microformats reuse the class attribute that is normally used for CSS. This creates a whole bunch of reserved words to avoid for class names in your site's CSS. Perhaps its not that big a deal, but I do not like namespace conflicts. I prefer the RDFa spec, which simply introduces new attributes (typeod, property, etc.) specific to its purpose. That seems a lot cleaner to me. However, the various RDFa formats are not as well documented as their microformat counterparts. As in all things, "good enough" often trumps "clean design."
There is no doubt that embedded documents are a bit of a moving target. I don't expect the formats for things already defined to change, but more objects will be described by new specifications.
About this blog
The taskboy blog is a exploration of computer technology by Joe Johnston. Topics of posts include practical examples Perl, PHP, Python and Java as well as book reviews, industry insights and miscellaneous good stuff.
Recent posts
- » Facebook's HipHop optimizes the wrong thing
- » On PHP frameworks
- » The Observer pattern and Action Queues
- » Emacs search and replace of unprintable characters
- » Using an SQL backing store for PHP sessions
- » SEO tip: rewriting URLs without apache
- » PHP mode for emacs
- » The Ubiquitous Net
- » Taskboy games for the Wii

