Latest News

I tweeted yesterday about my centos 5 linux box losing sound (and frankly X, but that's another story). I tried to use the sound card detector (system-config-soundcard) to find the card again, but it silently failed (typical). The Linux sound howto is over nine years old, so that's pretty useless. What to do?
Well, I turned to my old friends lsmod, dmesg and strace. I could see tha there were kernel sound modules loaded. I could knew that I had configuration that was working. So I issued the following command:
# strace mpg123 /path/to/mp3file
I got quite a bit of output, but the relevant line was this:
open("/dev/snd/pcmC0D0p", O_RDWR|O_NONBLOCK) = -1 ENOENT (No such file or directory)
And sure enough, this is what was in my /dev/snd directory:
[root@durgan snd]# ls controlC0 hwC1D2 pcmC1D0c pcmC1D1p timer controlC1 pcmC0D0c pcmC1D0p seq
Oh look! There's no pcmC0D0p file. What to do. I could try to create the missing dev file, but I just symlinked the pcmC1D0p to pcmC0D0p.
And yes, this horrible, horrible hack worked like a charm.
What happened to delete this device file? Why couldn't I make the system re-detect and re-initialize the sound system? It's stupid, stupid user issues like this that have plagued linux since 1995 and show no signs of getting better today. This is why you don't see a lot of linux notebooks or games.

Increasingly, search engines like Google and Yahoo are supporting a kind of HTML markup schema called microformats. Microformats are a kind of an embedded document inside a standard web page (which can be thought of as a sort of macroformat, if you will).
There are two flavors of mircoformats: one for HTML (which introduces no new elements or attributes) and one for XHTML (which adds a few new attributes, but no new tags). The proposed RDFa standard can be thought of as the flavor of microformats for XHTML, since it is based on the RDF XML format. The reality is the there are two groups of people (RDFa, microformats) working in the same space, so there is overlap. However for now, the HTML vs. XHTML partitioning works well to categorize these two embedded document efforts.
There are a few things you can do with these embedded syntaxes. If you regularly review businesses or products (like yelp.com does), you might consider using the hReview mircoformat to identitfy parts of your review, as shown in the following snippet:
<div class="hreview">
<span class="item">
<span class="fn">Taskboy Feed Bag</span>
</span>
Reviewed by <span class="reviewer">Joe Johnston</span>
on
<span class="dtreviewed">
<span clsas="value-title" title="2010-02-07"/>Feb 7, 2010
</span>
<span class="summary">Terrific news source for free</span>
<span class="description">The Feed Bag RSS aggregator on Taskboy
has replace Google news as one stop shop to get the news of what's
going on in tech and politics now.</span>
<span class="rating">4.5</span>
</div>
As you can see, that's a lot of semantic markup for so little visible text. Google can pull this code apart and display it under the link for the product or service discussed.
In addition to reviews, Google and Yahoo understand at least four other species of embedded documents: people/businesses, products, events, and video. Because there are two standards bodies at work, there are microformats and RDFa specs for each. The following table summarizes this with links to give examples of these embedded documents.
- Reviews: hReview (microformat) v:Review (RDFa)
- People/Business: hCard/vCard (microformat) v:Person (RDFa)
- Events: hCalendar (microformat)
- Product: hProduct (microformat) v:Product (RDFa)
- Video: Facebook Share (microformat) Yahoo! SearchMonkey (for video) (RDFa)
To test your embedded document, try Google's microformat tool.
I have a few concerns about microformats. The first is that it requires a lot of additional markup. I understand that a blogging system can create a form to collect these discrete features, but it still seems like a lot of work for casual use. The second concern I have is that microformats reuse the class attribute that is normally used for CSS. This creates a whole bunch of reserved words to avoid for class names in your site's CSS. Perhaps its not that big a deal, but I do not like namespace conflicts. I prefer the RDFa spec, which simply introduces new attributes (typeod, property, etc.) specific to its purpose. That seems a lot cleaner to me. However, the various RDFa formats are not as well documented as their microformat counterparts. As in all things, "good enough" often trumps "clean design."
There is no doubt that embedded documents are a bit of a moving target. I don't expect the formats for things already defined to change, but more objects will be described by new specifications.

I'm currently developing a social media application using PHP and sqlite. I don't know if I'll deploy with sqlite, but for development, it works well. I have two CVS sandboxes that I work in for this project. One of these in on my macbook (which comes with sqlite-enabled PHP) and an Ubuntu virtual machine. There are a few gotchas to be aware of when using sqlite in this kind of environment.
Write-protected directories
In other RDBM systems like mysql or postgres, the is a server process that is responsible for reading and writing to the database disk files. With sqlite, this isn't the case. If you're using sqlite through PHP, then the process owner running the PHP must be able to read and write to the location of the sqlite database file. This requirement gets a little more complex when you are running PHP through apache which has its own ideas about directory security.
In must apache setups, your PHP scripts will not be able to write in the web-accessible "document root". You will not be able to keep your sqlite database file in the document root. If you try, you will find errors on INSERT and UPDATES about "database file is locked." However, it is foolish from a security point of view to keep your database file in a web-accessbile directory anyway.
This particular problem hit me hard on Mac OS X. Once user directories are enabled in the apache configuration, your Sites directory becomes a document root. You won't be able to keep your sqlite files in that directory or any subdirectory under it. Instead, created a ~/tmp directory and keep the sqlite database file there.
SQLite version skew: apache/shell
Because sqlite is an embedded system, it is compiled into program you are using. If you are using PHP, you can run into the following issue. I use PHP from the command line all the time. When I create the database, I run a PHP script from the shell to do this. Unfortunately, the command line PHP and the version of PHP compiled into apache may not be the same. Further, the apache PHP may not be compiled with the same version of sqlite. This is the case on Mac OS X. What a mess!
To get around this, always create your sqlite databases through apache/PHP. You will run into far fewer issues this way.
Changing the schema requires an apache restart
Recall that apache is a pre-forking server. If you change the schema of your sqlite database while apache's running, you could get an error in PHP that "schema has changed." Whatever SQL statement you were attempting to run will fail.
From the "don't do that" school of medicine comes this technical advice. If you need to change the schema of an sqlite database, shut down apache first, update the database and restart apache.
I hope this post helps others avoid the mistakes I made.

Facebook will shortly release a tool called HipHop for enhancing the performance of PHP. My understanding of the tool is that it compiles PHP code into C++ which is then compiled into a system native executable. While I have no doubt that this tool does produce significant speed gains over apache/PHP, I do think one needs to be aware of the trade-offs of this kind of system. After all, this isn't the first time a trick like this has been used for a dynamic language.
C++ and PHP are very different languages. I'm not talking about syntax, but how source code is handled. In PHP, the source code is turned into op codes that the PHP interpreter understands. The interpreter knows how to the operating system perform these op codes. In C++, source code is compiled into assembler which is then linked into a system executable which can be run from the shell. Compiled code runs faster than interpreted code for a number of reasons, but the most important is that compiled code is closest to native assembler which essentially is the op code system that the host CPU uses to make stuff happen.
The problem with compiling PHP into C++ is that you lose all the wonderful dynamic features of PHP since these cannot be easily or efficiently translated automatically into C++ source code. The very dynamic nature of PHP (or Perl or Ruby or Python, etc) is what makes these languages accelerate programmer productivity. I think facebook will see this performance hit later.
Let's not forget that Moore's law of CPU power often solves a great deal of performance issues. Hardware is always cheaper than developer time and less prone to bugs.
I favor architectures that take advantage of Moore's law and use horizontal scaling and commodity solutions over fancier tricks that require specialized talent (like erlang). I might suggest caching the opcodes that the PHP interpreter generates and simply running those. This is the essence of the Zend server and how apache/mod_perl/Apache::Registry work. Sure, you don't get quite the performance of compiled code but you'll still see a noticable boost. I believe PHP does some level of this kind of caching right now.
It's true that one can do amazing feats by being clever, but clever doesn't scale (unless you're Google).

An interesting chart comparing various PHP frameworks. I'm not sure that I can read it correctly. It seems to imply that Zend and CakePHP are the most popular frameworks.
Both frameworks are free, but Zend is clearly optimized for the Zend server platform, which isn't free. Also, I can't help thinking that the audience is somewhat different for these two. CakePHP seems aimed at the more opensource, DIY crowd while Zend is clearly pointed to the enterprise IT crowd. While there is overlap, you can see that Zend is a commercial venture.
I have very mixed emotions about using frameworks. On the hand, frameworks deliver huge dollops of functionality right out of the box. This accelerates the completion of many IT projects. On the other, you get locked into another group's development schedule and, to some extent, the architectural choices they make. Projects built with these tools also expose themselves to bugs and security holes originating in the frameworks. Finally, you end up having to trust or vet the code in the framework.
For an inward-facing intranet product, I think frameworks are great. I'm not sure I'd want to launch something like twitter or facebook with one.
Current Status
Successfully installed and booted an XP guest on linux KVM. Seems stable, but let's give it a few weeks.
Posted: Mon Feb 08 22:23:46 +0000 2010
--Via identi.ca
About this blog
The taskboy blog is a exploration of computer technology by Joe Johnston. Topics of posts include practical examples Perl, PHP, Python and Java as well as book reviews, industry insights and miscellaneous good stuff.

