Latest News
M3U files are more commonly known as MP3 playlist files. These are simple files that contain URLs to MP3 files served over an HTTP server. These files may can additional metadata that can be used by MP3 players (like Winamp) for display purposes. I few months ago, I built a simple playlist server in Perl so that I could listen selectively to my vast MP3 collection. You may find the entire source code for this playlist server, called Pixie, here. It has been tested under both Windows and Linux, but should work on Mac OS X too.
At its heart, the Pixie is simply an embedded HTTP server. It serves four specific kinds of pages: an M3U playlist file, a CSS file, the HTML music selection page and specific MP3 files. In additional, it has two HTTP services that are essential to this process: adding MP3s to the current playlist and clearing the list entirely. There can be only one playlist per user.
When a user first points a web browser to the URL belonging to Pixie, a page is presented with all the directories and MP3 files found in the top level of the directory specified by the "-d" parameter. In my case, that's the M:/mp3 folder.
Folders may be traversed and the assets of those directories may be added to the playlist. Notice that there is a crumb trail at the top of the page that leads you back to the root directory.
After a few music assets are selected, the current play list is displayed. Notice that the assets come from different directories.
To listen the the playlist, simply click "Play now" in the Current Playlist section. What could be easier?
The pixie.pl script is somewhat long. It clocks in at 447 lines, even though that includes a small usage screen, a CSS file and an HTML template for the directory listing pages. This script is a little long for a blow-by-blow description of each line of code, but a few points about it should prove illuminating for those wanting to write their own HTTP servers in Perl.
It is perhaps useful to know that I structured the HTTP part of this code on the mod_perl/Apache model. That is, there are some global variables available to the fucntions that handle HTTP responses. The heart of the server can be seen in the relatively small main line code below:
my $S = HTTP::Daemon->new(LocalPort => $Opts->{p},
Reuse => 1,
Listen => 5,
timeout => 10,
);
while (my $c = $S->accept) {
Log("Connection from: " . $c->peerhost);
while(my $r = $c->get_request) {
$This_Request = $r;
$This_Connection = $c;
handle_request();
}
}
exit 0;
This code snippet starts with a pretty standard instantiation of an HTTP::Daemon object, which itself is a subclass of IO::Socket. For servers, it is important to set the Reuse parameter which allows the TCP port to be reused quickly after the last process has exited. Without this parameter, you'll find that you cannot invoke a script that uses the same port without a "cooldown" period specific to the OS.
With the server socket in place, pixie waits for new client connections in an accept loop. From the client socket, the HTTP::Request object can be obtained. Both of these important objects are stored in global variables for use in the handle_request() and later functions. Why not pass these objects into handle_request()? It turns out that there are all kinds of places these objects are useful for. Passing them explicitly gets to be a bit onerous. Let's look at handle_request().
sub handle_request {
my ($c, $r) = ($This_Connection, $This_Request);
if ($r->method ne 'GET') {
$c->send_error(HTTP_FORBIDDEN);
next;
}
my $path = $r->uri->path;
my @query = $r->uri->query_form;
if ($path eq '/serve.m3u') {
# Assemble this sessions selections
# into an m3u and serve that file
do_serve_playlist($c, $r);
} elsif ($path eq "/clear") {
# Clear playlist
do_clear_playlist($c, $r);
} elsif ($path eq "/pixie.css") {
do_serve_css($c, $r);
} elsif (@query) {
# Could be an add request
# Set a cookie, if needed
do_add_asset($c, $r);
} else {
do_browse($c, $r);
}
}
This function can be thought of as trampoline code. It's just is to route the handling of the request to the right routine, which in this case I call "page handlers". Page handlers are functions that all start with "do_" and are responsible for actually sending an HTTP response with content.
The function handle_request() does its routing based on a quick analysis of the details of the current request. Every HTTP::Request object has an initialed URI object in it. The URI object breaks apart the requested URL into logical parts and saves us from writing custom parsing code. You notice that two paths look like they reference real files: pixie.css and serve.m3u. However, this is an illusion. All web servers can be thought of as file systems proxies. Like all proxies, you never can be quite sure how the resource you are requesting is stored on the back end.
There is also a magic path called "/clear" that signals the server to clear the current playlist from memory. There is only one function that does HTML form handling because there is only one form and it only adds MP3 files to the current playlist. If none of these requirements are met, do_browse() is called which serves either a specific file or a directory listing. It is this function I'd like to turn next to since it contains HTTP Cookie handling.
sub do_browse {
my ($c, $r, $cookie) = @_;
my $path = urldecode($r->uri->path);
if ($path =~ /\.\./) {
return $c->send_error(HTTP_FORBIDDEN);
}
my $fs = get_fs();
my $real_dir = $Opts->{d} . $path;
$real_dir =~ s!/!$fs!g;
my $res = HTTP::Response->new(HTTP_OK);
if (-d $real_dir) {
$res->header("Content-type" => "text/html");
if ($cookie) {
$res->header("Set-Cookie" => "sid=$cookie; path=/");
} else {
my $sid = get_sid($r);
if ($sid && !exists $Sessions{$sid}) {
Log("Can't find SID '$sid' in: "
. join(", ", keys %Sessions)) if $DEBUG;
my $epoch = "Wed, 31-Dec-1969 01:00:00 GMT";
$res->header("Set-Cookie" => "sid=$sid; expires=$epoch;");
Log("Deleting old cookie '$sid'");
}
}
$res->content(make_page($real_dir,
$path,
($cookie||get_sid($r))));
return $c->send_response($res);
} elsif (-e $real_dir) {
# Serve real file in a new process
$c->send_file_response($real_dir);
} else {
return $c->send_error(HTTP_FORBIDDEN);
}
}
This page handler is the most complicated because it must decide if the requested path is valid, if a cookie needs to be set or removed or if a file or directory listing needs to be sent. Let's start at the beginning.
The path in the URI could need URL decoding, so that is done first. Next, a quick sanity check is performed to make sure the request isn't attempting to get a resources the server isn't meant to serve. The parent directory URL hack was a common exploit in early web servers. Next, all directory separators are converted to the OS appropriate. Whatever happens next will require a new HTTP::Response object, so one is created.
If the path sent is a directory, a directory listing is required. Directory listings are generated by the make_page() function. The content-type is set in the response object, as pixie will send some kind of HTML. If the browser sent us a Pixie cookie, we simply update it with the current Session ID. If cookie has a Session ID but the server has no record of it, the cookie is deleted from the browser. Which is to say, a new cookie is sent with an old expiration date.
I've glossed over the details of Pixie session management in the above paragraph. When a user builds a playlist, the list needs to be kept somewhere. Pixie stores this list in server memory. Each list is assigned a random number which is its session ID. This ID is passed to the client with HTTP cookies. Every time the client makes a request, this cookie is passed back to Pixie. There is a global hash table called %Sessions that stores the association between ID and play list.
To finish off do_browse(), if the path of the request points to a real file, it is served without much more sanity checking. There is definitely room for improvement here in terms of security. The next page handler of interest is the one that handles requests to add files to the current playlist: do_add_asset.
sub do_add_asset {
my ($c, $r) = @_;
my $path = $r->uri->path;
my @query = $r->uri->query_form;
# Is there a cookie?
my $sid = get_sid($r);
unless (exists $Sessions{$sid}) {
Log("Could not find $sid in: "
. join(", ", keys %Sessions)) if $DEBUG;
$sid = time();
Log("Creating new SID '$sid'") if $DEBUG;
}
# For all the "a" params,
# base64 decode and add to Sessions hash
for (my $i=0; $i < @query; $i += 2) {
if ($query[$i] eq "a") {
# retain order through value
my $cnt = scalar keys %{$Sessions{$sid}};
$Sessions{$sid}->{decode_base64($query[$i+1])} = ++$cnt;
}
}
Log(sprintf("\%Sessions has %d keys\n",
(scalar keys %Sessions))) if $DEBUG;
return do_browse($c, $r, $sid);
}
Much of the first part of this routine should be familiar by now. What's interesting is that if no valid Session ID is found, a new one is created based on epoch time. If security is a concern, you should use a different method to generate IDs, like UUIDs. In any case, for each query parameter in the request (which is to say, MP3 file paths), the path is decoded from base64 and added to the sessions hash. This is complicated by wanted to preserve the order in which the songs are selected. This ordering is perserved in the Sessions hash. Let's see how the actual playlist files are served.
sub do_serve_playlist {
my ($c, $r) = @_;
my $sid = get_sid($r);
if (!$sid || !defined $Sessions{$sid}) {
$r->uri->path("/");
return do_browse($c, $r);
}
my $res = HTTP::Response->new(HTTP_OK);
my @files = map {$Base_URL . $_} get_sorted_playlist($sid);
my $out = make_playlist(@files);
$res->header("Content-type" => "audio/x-mpegurl");
$res->header("Content-Length" => length($out));
$res->content($out);
$c->send_response($res);
$c->shutdown(2);
return;
}
The trickiest part about serving the playlist is getting the MIME type right. The MIME type gives a hint to the browse about the kind of file being served and what sort of external application the browser should use for it. Creating the playlist file is handled by make_playlist() and is pretty straight forward. Note the use of the draconian shutdown(2) on the client socket. I found on Windows that without this call, Winamp never launched. By closing both ends of the client socket, the web browser can be sure it has the entire file, which means that it is safe to launch the external program.
An interesting feature of Pixie is that the look and feel of the directory listings can be controlled with an external CSS file. Simply create a pixie.css file in the root of the MP3 directory and go to town. You can see what the default CSS file looks like simply by pointing your browser to http://localhost:[pixieport]/pixie.css.
Finally, there is ample room for improvement in the Pixie server. There are a number of security enhancements that can be made to ensure that only authorized files are sent. Pixie is a single threaded application and does not handle concurrency at all. Concurrency is a pretty thorny issue to get right for a platform neutral server. The core of the issue is the way Perl handles sockets and filehandles. On Linux, I would fork a new process for each new client request. That's a very clean way to make Pixie more responsive. Child processes inherit the open filehandles of the parent and so sockets can be handled independently in each process. On Windows, the fork() builtin merely emulates forking behavior with threads. Unfortunately since sockets look like filehandles, closing the client socket in the parent after fork (which is what you'd do on Linux) closes the socket in the child. It's not clear to me what solution would work here. I thought perhaps IO::Select would be a good choice, but then I suspect that when music files are sent, that will almost always block the directory listing traffic. I suppose this is a scaling mystery to be solved on another day.
For a long time, I've ignore the Representational State Transfer (REST) architecture. For one thing, I don't particularly agree with its premise that remote procedure calls (RPC) that use HTTP as a transport mechanism should obey the same semantics as regular web traffic. Things like XML-RPC and SOAP are, to my thinking, happening on an entirely different layer of the application stack than HTTP. Indeed, there are implementations of XML-RPC that do no use HTTP at all.
I remember pretty heated arguments I witnessed at tech conferences in the early 2000s about this seemingly unimportant technical point. For REST adherents, web services are another form of web traffic and should be treated as such. Given that Twitter, Facebook and Bit.ly all use REST for their APIs and older apps like liveJournal use XML-RPC/SOAP, I guess REST is the new hotness.
I've recently had reason to interact with the Twitter and Bit.ly APIs. This has made me come to terms with REST RPC mechnanisms. I admit, the sad, sick part of me that enjoys playing around with low-level HTTP stuff finds satisfaction in the way these API leverage existing HTTP features like basic authentication, extra path info, and GET and POST semantics. In this post, I thought I would show a bit of Perl code I wrote post status updates to Twitter, an activity more commonly referred to as "tweeting."
Twitter's API documentation is relatively straight forward, if you already have a solid grounding in HTTP. The API call to tweet is called "statuses/update". The basics of the RPC mechanism are easy enough:
- The caller makes a HTTP GET or POST request
- The sender replies with content in the form of JSON or XML
Let's start with the request. There are serveral bits of information required by the API: user credentials, the URL and additional query parameters. The user credentials are passed as part of the HTTP request header as a basic authentication field, which is merely a base64 string that is the concatenation of the username and password of your Twitter account. Fortunately, Perl's HTTP::Request::Common class makes it easy to add basic auth credentials to the request without knowing how this information is encoded in the HTTP request.
The next bit is the URL to the function. This is a core idea of REST --
function calls should have URIs and look like ordinary web resources.
In this case, the URL is http://twitter.com/statuses/update.xml.
Interestingly, the response from twitter can be encoded in a number of formats.
These formats are determined by the extension you give to the URL. For
instance, I could have request the metainformation about myself in
JSON with the following URL:
http://twitter.com/users/show/taskboy3000.json.
The text of the tweet must be passed to the URL as if it were POSTed from a
form. The parameter name is status. The status must be encoded
as if the data were submitted from an HTML form. Again, Perl makes this very
easy, as will be shown below.
use LWP::UserAgent;
use HTTP::Request::Common ('POST')
my $api_url = q[http://twitter.com/statuses/update.xml];
my $status = "Tweeting from the API!";
my $twitter_username = "taskboy3000";
my $twitter_password = "s3cr3t";
my $ua = LWP::UserAgent->new;
my $req = POST($api_url => [status => $status]);
$req->authorization_basic($twitter_username
=> $twitter_password);
# Make the request
my $res = $ua->request($req);
The code above is sets up and makes the status RPC call to twitter. The first thing needed is an LWP::UserAgent object, which is kind of like a web browser. It makes HTTP requests of web servers. To construct the POST request, I use HTTP::Request::Common::POST. Because I can pass in form parameters as plain perl data structures, it frees me from worrying about urlencoding values and fooling around with HTTP headers that are germain to the task at hand. POST() returns an HTTP::Request object.
Adding my twitter account credentials to the request is a simple one line call to authorization_basic(). Very handy and very clean. That's all the setup I need to make the request. I pass in the HTTP::Request object to the User Agent object. That makes the actual network connection to the URL. The response comes back in the form of an HTTP::Response object, which I'll discuss next.
If all has gone well with the request, I'll get back an XML document that looks something like this:
<?xml version="1.0" encoding="UTF-8"?> <status> <created_at>Tue Apr 07 22:52:51 +0000 2009</created_at> <id>1472669360</id> <text>At least I can get your humor through tweets. RT @abdur: I don't mean this in a bad way, but genetically speaking your a cul-de-sac.</text> <truncated>false</truncated> <in_reply_to_status_id>1472669230</in_reply_to_status_id> <in_reply_to_user_id>10759032</in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <user> <id>1401881</id> <name>Doug Williams</name> <screen_name>dougw</screen_name> <location>San Francisco, CA</location> <description>Twitter API Support. Internet, greed, users, dougw and opportunities are my passions.</description> <url>http://www.igudo.com</url> <protected>false</protected> <followers_count>1027</followers_count> <profile_text_color>000000</profile_text_color> <profile_link_color>0000ff</profile_link_color> <friends_count>293</friends_count> <created_at>Sun Mar 18 06:42:26 +0000 2007</created_at> <favourites_count>0</favourites_count> <utc_offset>-18000</utc_offset> <time_zone>Eastern Time (US & Canada)</time_zone> <profile_background_tile>false</profile_background_tile> <statuses_count>3390</statuses_count> <notifications>false</notifications> <following>false</following> <verified>true</verified> </user> </status>
Most of this, I don't care about. However, I do want to see if there's an
unless ($res->is_success) {
my $c = $res->content;
my ($errstr) = ($c =~ m!<error>([^<]+)</error>!);
warn(sprintf("Post failed (%d): $errstr\n", $res->code));
exit 1;
}
print "OK\n";
exit 0;
Without the services of a full XML parser, it's relatively easy to look for an error tag and extract the contents for display. The error message I've encountered most is essentially "you used the API too much". Twitter does restrict the usage of some of their API calls, but not the status one.
If you collapse all the Perl code, you're looking at less than 20 lines of
code. If you wanted to, you could even make posts using the very handy
command line tool curl:
curl -u taskboy:s3cr3t -d "status=hello curl" \
http://twitter.com/statuses/update.xml
I will leave the checking of error messages from curl output as an excerise for the reader.
As I said, REST RPC mechanisms are fun and interesting if you already understand HTTP. However, not everyone does. I think XML-RPC and SOAP libraries to a better job of insulating the programmer from the HTTP protocol, allowing him to focus on the API task at hand.
UPDATE: It's nice to see that this script still works in 2009, four years after I wrote it. Must have done something right.
Whether pornography is psychologically damaging, socially corrosive or just plain nasty, it is clear that at least for men, it is a strong attractor and motivator. It is time we left behind antiquated nineteenth century morality and boldly embraced this core driver of male behavior to fill the more serious deficit of quality programmers in the U.S.A.
To this end, I submit my own work, a perl script that fetches the publicly available gallery at IShotMyself.com. The real benefit of this script (well, at least secondarily) is that it is my first concerted use of Andy Lester's WWW::Mechanize. Because of the positive reinforcement generated by this script, I am more likely to use this module in other, less pornographic work -- and so I move the Wheels of Industry forward! Adam Smith's Invisible Hand of Capitalism guides me!
Briefly, the script fetches the front page and looks for a link labled "FOLIO [\d+]". This page is then fetched and links that start with "javascript:" are culled. By adding on "&m=img" to these links, it is possible to get to the actual picture in question. These images are stored in a directory on my Winders box, hence the funny path names.
Based on the impressive success of this experiment, I recommend that we teach children to read by giving them digests of Penthouse Forum. We must always be thinking of the children!
I further recommend that we ease the suffering of the impoverished by eating their babies.
More Good Ideas (™) to come...
use strict;
use WWW::Mechanize;
my $base_url = qq[http://www.ishotmyself.com];
my $main_page= qq[/public/main.php];
my $dest_dir = "ishotmyself";
print "Fetching $base_url/$main_page\n";
my $B = WWW::Mechanize->new;
$B->get(qq[$base_url/$main_page]);
unless ($B->success()) {
die "Couldn't fetch main page: ", $B->status();
}
print "Main page fetched\n";
print "Dumping folio links\n";
my $todays_gallery;
my $gallery = "unknown";
foreach my $l ($B->links()) {
next if $l->text() !~ /folio \[\d+\]/i;
$todays_gallery = $l;
($gallery = $l->url) =~ /([^=]+)$/;
$gallery = $1;
print "\t", $l->text(), " : ", $l->url(), "\n";
}
# find today's folio name
print "Fetching ", $todays_gallery->url, "\n";
$B->get($todays_gallery->url);
unless ($B->success()) {
print "Couldn't fetch '" . $todays_gallery->url(),
"' : ", $B->status(), "\n";
print $B->content, "\n";
}
#open OUT, ">out.txt";
#print OUT $B->content, "\n";
#close OUT;
print "Dumping folio links\n";
my @found;
foreach my $l ($B->links()) {
next if $l->url !~ /^javascript:popupLandscape/;
print "\t", $l->text(), " : ", $l->url(), "\n";
# unjavascript!
(my $url = $l->url()) =~ /\('([^']+)'\)/;
if ($1) {
push @found, "$1&m=img";
} else {
warn "Couldn't unjavascriptify!\n";
}
}
unless (-d $dest_dir) {
mkdir $dest_dir;
}
$dest_dir .= "\\$gallery";
unless (-d $dest_dir) {
print "Creating $dest_dir\n";
mkdir $dest_dir || warn "mkdir $dest_dir failed: $!";
}
print "chdir $dest_dir\n";
chdir $dest_dir;
print "Fetching public images\n";
my $cnt = 1;
for my $l (@found) {
$B->get($l);
unless ($B->success()) {
warn "picture fetch failed: ", $B->status, "\n";
next;
}
my $imgfile = sprintf("${gallery}_%03d.jpg", $cnt++);
if (-e $imgfile) {
print "\tOVERWRITING $imgfile\n";
}
if (open(OUT, ">$imgfile")) {
binmode(OUT);
print "\tWriting $imgfile\n";
print OUT $B->content();
close OUT;
} else {
warn("open $imgfile failed : $!");
}
}
print "done\n";
About this blog
The taskboy blog is a exploration of computer technology by Joe Johnston. Topics of posts include practical examples Perl, PHP, Python and Java as well as book reviews, industry insights and miscellaneous good stuff.
Current Status
Watching _Brass Latern_. Ah IF, your coyness is your charm.
Posted: Sun Sep 05 16:02:15 +0000 2010
Latest Feedbag
- Need Niche Network Group Buying Deals? Meet ChompOn
- Q&A: Five key questions about midterm elections in Congress
- Grain Sack Doubles Up As A Water Purifier Kit
- BMW Takes Internet Car Reveals To A Weird New Level
- Monocolumn: Imelda Marcos, Mark 2
- Zoodles Brings Kid-Friendly Browser To Android Phones
- Context Optional Helps Brands Run Location-Based Promotions On Facebook Places
- Eric Schmidt: Were Already Fast..Fast Is About To Get Faster
- Coulomb Wins $15 Million To Roll Out Electric Vehicle Charging Stations Across America
- Ping Is Apples iTunes For Everything
Generated: 10:00 on 08/Sep/2010
Recent posts
- Very quick git primer for basic functionality
- Tips for spammers: don't insult me
- CakePHP vs. Symfony: a quick note
- Creating events for Yahoo and Google calendars
- SANs on a budget: iSCSI under Ubuntu
- iPad, iTouch and Kindle: Which is the better mousetrap?
- Rise of the Ad-Hocracy, Part II
- Rise of the Ad-Hocracy, Part I
- Small Hiatus



