Pixie: a pure Perl HTTP server for M3U playlists

Posted:


M3U files are more commonly known as MP3 playlist files. These are simple files that contain URLs to MP3 files served over an HTTP server. These files may can additional metadata that can be used by MP3 players (like Winamp) for display purposes. I few months ago, I built a simple playlist server in Perl so that I could listen selectively to my vast MP3 collection. You may find the entire source code for this playlist server, called Pixie, here. It has been tested under both Windows and Linux, but should work on Mac OS X too.

At its heart, the Pixie is simply an embedded HTTP server. It serves four specific kinds of pages: an M3U playlist file, a CSS file, the HTML music selection page and specific MP3 files. In additional, it has two HTTP services that are essential to this process: adding MP3s to the current playlist and clearing the list entirely. There can be only one playlist per user.

When a user first points a web browser to the URL belonging to Pixie, a page is presented with all the directories and MP3 files found in the top level of the directory specified by the “-d” parameter. In my case, that’s the M:/mp3 folder.

Folders may be traversed and the assets of those directories may be added to the playlist. Notice that there is a crumb trail at the top of the page that leads you back to the root directory.

After a few music assets are selected, the current play list is displayed. Notice that the assets come from different directories.

To listen the the playlist, simply click “Play now” in the Current Playlist section. What could be easier?

The pixie.pl script is somewhat long. It clocks in at 447 lines, even though that includes a small usage screen, a CSS file and an HTML template for the directory listing pages. This script is a little long for a blow-by-blow description of each line of code, but a few points about it should prove illuminating for those wanting to write their own HTTP servers in Perl.

It is perhaps useful to know that I structured the HTTP part of this code on the mod_perl/Apache model. That is, there are some global variables available to the fucntions that handle HTTP responses. The heart of the server can be seen in the relatively small main line code below:

my $S = HTTP::Daemon->new(LocalPort => $Opts->{p},
              Reuse  => 1,
              Listen  => 5,
              timeout => 10,
             );

while (my $c = $S->accept) {
  Log("Connection from: " . $c->peerhost);
  while(my $r = $c->get_request) {
    $This_Request = $r;
    $This_Connection = $c;
    handle_request();
  }
}
exit 0;

This code snippet starts with a pretty standard instantiation of an HTTP::Daemon object, which itself is a subclass of IO::Socket. For servers, it is important to set the Reuse parameter which allows the TCP port to be reused quickly after the last process has exited. Without this parameter, you’ll find that you cannot invoke a script that uses the same port without a “cooldown” period specific to the OS.

With the server socket in place, pixie waits for new client connections in an accept loop. From the client socket, the HTTP::Request object can be obtained. Both of these important objects are stored in global variables for use in the handle_request() and later functions. Why not pass these objects into handle_request()? It turns out that there are all kinds of places these objects are useful for. Passing them explicitly gets to be a bit onerous. Let’s look at handle_request().

sub handle_request {
  my ($c, $r) = ($This_Connection, $This_Request);
  if ($r->method ne 'GET') {
    $c->send_error(HTTP_FORBIDDEN);
    next;
  }

  my $path = $r->uri->path;
  my @query = $r->uri->query_form;
  if ($path eq '/serve.m3u') {
    # Assemble this sessions selections
    # into an m3u and serve that file
    do_serve_playlist($c, $r);
  } elsif ($path eq "/clear") {
    # Clear playlist
    do_clear_playlist($c, $r);
  } elsif ($path eq "/pixie.css") {
    do_serve_css($c, $r);
  } elsif (@query) {
    # Could be an add request
    # Set a cookie, if needed
    do_add_asset($c, $r);
  } else {
    do_browse($c, $r);
  }
}

This function can be thought of as trampoline code. It’s just is to route the handling of the request to the right routine, which in this case I call “page handlers”. Page handlers are functions that all start with “do_” and are responsible for actually sending an HTTP response with content.

The function handle_request() does its routing based on a quick analysis of the details of the current request. Every HTTP::Request object has an initialed URI object in it. The URI object breaks apart the requested URL into logical parts and saves us from writing custom parsing code. You notice that two paths look like they reference real files: pixie.css and serve.m3u. However, this is an illusion. All web servers can be thought of as file systems proxies. Like all proxies, you never can be quite sure how the resource you are requesting is stored on the back end.

There is also a magic path called “/clear” that signals the server to clear the current playlist from memory. There is only one function that does HTML form handling because there is only one form and it only adds MP3 files to the current playlist. If none of these requirements are met, do_browse() is called which serves either a specific file or a directory listing. It is this function I’d like to turn next to since it contains HTTP Cookie handling.

sub do_browse {
  my ($c, $r, $cookie) = @_;
  my $path = urldecode($r->uri->path);

  if ($path =~ /\.\./) {
    return $c->send_error(HTTP_FORBIDDEN);
  }

  my $fs = get_fs();
  my $real_dir = $Opts->{d} . $path;
  $real_dir =~ s!/!$fs!g;

  my $res = HTTP::Response->new(HTTP_OK);
  if (-d $real_dir) {
    $res->header("Content-type" => "text/html");
    if ($cookie) {
      $res->header("Set-Cookie" => "sid=$cookie; path=/");
    } else {
      my $sid = get_sid($r);
      if ($sid && !exists $Sessions{$sid}) {
    Log("Can't find SID '$sid' in: " 
        . join(", ", keys %Sessions)) if $DEBUG;
    my $epoch = "Wed, 31-Dec-1969 01:00:00 GMT";
    $res->header("Set-Cookie" => "sid=$sid; expires=$epoch;");
    Log("Deleting old cookie '$sid'");
      }
    }
    $res->content(make_page($real_dir, 
                            $path, 
                ($cookie||get_sid($r))));
    return $c->send_response($res);
  } elsif (-e $real_dir) {
    # Serve real file in a new process
    $c->send_file_response($real_dir);
  } else {
    return $c->send_error(HTTP_FORBIDDEN);
  }

}

This page handler is the most complicated because it must decide if the requested path is valid, if a cookie needs to be set or removed or if a file or directory listing needs to be sent. Let’s start at the beginning.

The path in the URI could need URL decoding, so that is done first. Next, a quick sanity check is performed to make sure the request isn’t attempting to get a resources the server isn’t meant to serve. The parent directory URL hack was a common exploit in early web servers. Next, all directory separators are converted to the OS appropriate. Whatever happens next will require a new HTTP::Response object, so one is created.

If the path sent is a directory, a directory listing is required. Directory listings are generated by the make_page() function. The content-type is set in the response object, as pixie will send some kind of HTML. If the browser sent us a Pixie cookie, we simply update it with the current Session ID. If cookie has a Session ID but the server has no record of it, the cookie is deleted from the browser. Which is to say, a new cookie is sent with an old expiration date.

I’ve glossed over the details of Pixie session management in the above paragraph. When a user builds a playlist, the list needs to be kept somewhere. Pixie stores this list in server memory. Each list is assigned a random number which is its session ID. This ID is passed to the client with HTTP cookies. Every time the client makes a request, this cookie is passed back to Pixie. There is a global hash table called %Sessions that stores the association between ID and play list.

To finish off do_browse(), if the path of the request points to a real file, it is served without much more sanity checking. There is definitely room for improvement here in terms of security. The next page handler of interest is the one that handles requests to add files to the current playlist: do_add_asset.

sub do_add_asset {
  my ($c, $r) = @_;
  my $path = $r->uri->path;
  my @query = $r->uri->query_form;

  # Is there a cookie?
  my $sid = get_sid($r);
  unless (exists $Sessions{$sid}) {
    Log("Could not find $sid in: " 
        . join(", ", keys %Sessions)) if $DEBUG;
    $sid = time();
    Log("Creating new SID '$sid'") if $DEBUG;
  }

  # For all the "a" params, 
  # base64 decode and add to Sessions hash
  for (my $i=0; $i < @query; $i += 2) {
    if ($query[$i] eq "a") {
      # retain order through value
      my $cnt = scalar keys %{$Sessions{$sid}};
      $Sessions{$sid}->{decode_base64($query[$i+1])} = ++$cnt;
    }
  }
  Log(sprintf("\%Sessions has %d keys\n", 
              (scalar keys %Sessions))) if $DEBUG;
  return do_browse($c, $r, $sid);
}

Much of the first part of this routine should be familiar by now. What’s interesting is that if no valid Session ID is found, a new one is created based on epoch time. If security is a concern, you should use a different method to generate IDs, like UUIDs. In any case, for each query parameter in the request (which is to say, MP3 file paths), the path is decoded from base64 and added to the sessions hash. This is complicated by wanted to preserve the order in which the songs are selected. This ordering is perserved in the Sessions hash. Let’s see how the actual playlist files are served.

sub do_serve_playlist {
  my ($c, $r) = @_;
  my $sid = get_sid($r);
  if (!$sid || !defined $Sessions{$sid}) {
    $r->uri->path("/");
    return do_browse($c, $r);
  }

  my $res = HTTP::Response->new(HTTP_OK);
  my @files = map {$Base_URL . $_} get_sorted_playlist($sid);
  my $out = make_playlist(@files);
  $res->header("Content-type" => "audio/x-mpegurl");
  $res->header("Content-Length" => length($out));
  $res->content($out);
  $c->send_response($res);
  $c->shutdown(2);
  return;
}

The trickiest part about serving the playlist is getting the MIME type right. The MIME type gives a hint to the browse about the kind of file being served and what sort of external application the browser should use for it. Creating the playlist file is handled by make_playlist() and is pretty straight forward. Note the use of the draconian shutdown(2) on the client socket. I found on Windows that without this call, Winamp never launched. By closing both ends of the client socket, the web browser can be sure it has the entire file, which means that it is safe to launch the external program.

An interesting feature of Pixie is that the look and feel of the directory listings can be controlled with an external CSS file. Simply create a pixie.css file in the root of the MP3 directory and go to town. You can see what the default CSS file looks like simply by pointing your browser to http://localhost:[pixieport]/pixie.css.

Finally, there is ample room for improvement in the Pixie server. There are a number of security enhancements that can be made to ensure that only authorized files are sent. Pixie is a single threaded application and does not handle concurrency at all. Concurrency is a pretty thorny issue to get right for a platform neutral server. The core of the issue is the way Perl handles sockets and filehandles. On Linux, I would fork a new process for each new client request. That’s a very clean way to make Pixie more responsive. Child processes inherit the open filehandles of the parent and so sockets can be handled independently in each process. On Windows, the fork() builtin merely emulates forking behavior with threads. Unfortunately since sockets look like filehandles, closing the client socket in the parent after fork (which is what you’d do on Linux) closes the socket in the child. It’s not clear to me what solution would work here. I thought perhaps IO::Select would be a good choice, but then I suspect that when music files are sent, that will almost always block the directory listing traffic. I suppose this is a scaling mystery to be solved on another day.