Latest News
"Do not meddle in the affairs of Wizards, for they are subtle and quick to anger."--J. R. R. Tolkien
Dynamic, weekly-typed languages like Perl and PHP are wonderful productivity engines. It's amazing how much work one can accomplish with so few lines of code. Both languages allow the programmer to treat simple scalar variables as numbers or strings without a lot of casting or explicit conversions. However, there is a price for this magic.
Consider the following PHP code:
array_push($array, '"' + $file + "'");
This looks harmless enough. It looks like contents of the $file string are being enclosed in double quotes and that new string is being pushed on the end of $array.
Not so fast! The + operator is a little magical. That is, it operates as a concatenation operator when the operands are strings and as a sum operator when the operands are integers. Wait, didn't I just say that values are weakly typed in PHP? How can the interpreter tell the difference between strings and ints?
The answer for both Perl and PHP is that strings that start with integers are considered to be integers for the purposes of magic.
In the code sample above, the filename in question indeed started with "2009-09". PHP took the integer part of the string, 2009, because of the + operator. Then it clearly had a string operand ('"') and a integer (2009), so it "promoted" the integer back to a string, "2009".
And that's how's how I lost my filename, which caused me to spend the next 30 minutes debugging the problem.
The Perl module Math::BigInt::GMP is a wrapper around the GNU Multiple Precision arithmetic C library and is useful for certain other math-intensive Perl operations. Ominously, the C library page warns that "GMP is very often miscompiled! Please never use your newly compiled libgmp.a or libgmp.so without first running make check" which makes the hairs on the back of my neck stand on end. However, I needed this lib.
On my Ubuntu linux machine, the Perl library (Math::BigInt::GMP) would not compile because of gmp was not installed. It would have been nice if the docs for this lib had mentioned this dependency. In any case, I install gmp with the following shell command:
apt-get install libgmp3-dev
If prompted by apt-get to install additional dependencies, agree. Then go back and install Math::BigInt::GMP.
make clean; perl Makefile.PL && make test && make install
I'm beginning to think that entropy is enclosing around the Great Camel.
M3U files are more commonly known as MP3 playlist files. These are simple files that contain URLs to MP3 files served over an HTTP server. These files may can additional metadata that can be used by MP3 players (like Winamp) for display purposes. I few months ago, I built a simple playlist server in Perl so that I could listen selectively to my vast MP3 collection. You may find the entire source code for this playlist server, called Pixie, here. It has been tested under both Windows and Linux, but should work on Mac OS X too.
At its heart, the Pixie is simply an embedded HTTP server. It serves four specific kinds of pages: an M3U playlist file, a CSS file, the HTML music selection page and specific MP3 files. In additional, it has two HTTP services that are essential to this process: adding MP3s to the current playlist and clearing the list entirely. There can be only one playlist per user.
When a user first points a web browser to the URL belonging to Pixie, a page is presented with all the directories and MP3 files found in the top level of the directory specified by the "-d" parameter. In my case, that's the M:/mp3 folder.
Folders may be traversed and the assets of those directories may be added to the playlist. Notice that there is a crumb trail at the top of the page that leads you back to the root directory.
After a few music assets are selected, the current play list is displayed. Notice that the assets come from different directories.
To listen the the playlist, simply click "Play now" in the Current Playlist section. What could be easier?
The pixie.pl script is somewhat long. It clocks in at 447 lines, even though that includes a small usage screen, a CSS file and an HTML template for the directory listing pages. This script is a little long for a blow-by-blow description of each line of code, but a few points about it should prove illuminating for those wanting to write their own HTTP servers in Perl.
It is perhaps useful to know that I structured the HTTP part of this code on the mod_perl/Apache model. That is, there are some global variables available to the fucntions that handle HTTP responses. The heart of the server can be seen in the relatively small main line code below:
my $S = HTTP::Daemon->new(LocalPort => $Opts->{p},
Reuse => 1,
Listen => 5,
timeout => 10,
);
while (my $c = $S->accept) {
Log("Connection from: " . $c->peerhost);
while(my $r = $c->get_request) {
$This_Request = $r;
$This_Connection = $c;
handle_request();
}
}
exit 0;
This code snippet starts with a pretty standard instantiation of an HTTP::Daemon object, which itself is a subclass of IO::Socket. For servers, it is important to set the Reuse parameter which allows the TCP port to be reused quickly after the last process has exited. Without this parameter, you'll find that you cannot invoke a script that uses the same port without a "cooldown" period specific to the OS.
With the server socket in place, pixie waits for new client connections in an accept loop. From the client socket, the HTTP::Request object can be obtained. Both of these important objects are stored in global variables for use in the handle_request() and later functions. Why not pass these objects into handle_request()? It turns out that there are all kinds of places these objects are useful for. Passing them explicitly gets to be a bit onerous. Let's look at handle_request().
sub handle_request {
my ($c, $r) = ($This_Connection, $This_Request);
if ($r->method ne 'GET') {
$c->send_error(HTTP_FORBIDDEN);
next;
}
my $path = $r->uri->path;
my @query = $r->uri->query_form;
if ($path eq '/serve.m3u') {
# Assemble this sessions selections
# into an m3u and serve that file
do_serve_playlist($c, $r);
} elsif ($path eq "/clear") {
# Clear playlist
do_clear_playlist($c, $r);
} elsif ($path eq "/pixie.css") {
do_serve_css($c, $r);
} elsif (@query) {
# Could be an add request
# Set a cookie, if needed
do_add_asset($c, $r);
} else {
do_browse($c, $r);
}
}
This function can be thought of as trampoline code. It's just is to route the handling of the request to the right routine, which in this case I call "page handlers". Page handlers are functions that all start with "do_" and are responsible for actually sending an HTTP response with content.
The function handle_request() does its routing based on a quick analysis of the details of the current request. Every HTTP::Request object has an initialed URI object in it. The URI object breaks apart the requested URL into logical parts and saves us from writing custom parsing code. You notice that two paths look like they reference real files: pixie.css and serve.m3u. However, this is an illusion. All web servers can be thought of as file systems proxies. Like all proxies, you never can be quite sure how the resource you are requesting is stored on the back end.
There is also a magic path called "/clear" that signals the server to clear the current playlist from memory. There is only one function that does HTML form handling because there is only one form and it only adds MP3 files to the current playlist. If none of these requirements are met, do_browse() is called which serves either a specific file or a directory listing. It is this function I'd like to turn next to since it contains HTTP Cookie handling.
sub do_browse {
my ($c, $r, $cookie) = @_;
my $path = urldecode($r->uri->path);
if ($path =~ /\.\./) {
return $c->send_error(HTTP_FORBIDDEN);
}
my $fs = get_fs();
my $real_dir = $Opts->{d} . $path;
$real_dir =~ s!/!$fs!g;
my $res = HTTP::Response->new(HTTP_OK);
if (-d $real_dir) {
$res->header("Content-type" => "text/html");
if ($cookie) {
$res->header("Set-Cookie" => "sid=$cookie; path=/");
} else {
my $sid = get_sid($r);
if ($sid && !exists $Sessions{$sid}) {
Log("Can't find SID '$sid' in: "
. join(", ", keys %Sessions)) if $DEBUG;
my $epoch = "Wed, 31-Dec-1969 01:00:00 GMT";
$res->header("Set-Cookie" => "sid=$sid; expires=$epoch;");
Log("Deleting old cookie '$sid'");
}
}
$res->content(make_page($real_dir,
$path,
($cookie||get_sid($r))));
return $c->send_response($res);
} elsif (-e $real_dir) {
# Serve real file in a new process
$c->send_file_response($real_dir);
} else {
return $c->send_error(HTTP_FORBIDDEN);
}
}
This page handler is the most complicated because it must decide if the requested path is valid, if a cookie needs to be set or removed or if a file or directory listing needs to be sent. Let's start at the beginning.
The path in the URI could need URL decoding, so that is done first. Next, a quick sanity check is performed to make sure the request isn't attempting to get a resources the server isn't meant to serve. The parent directory URL hack was a common exploit in early web servers. Next, all directory separators are converted to the OS appropriate. Whatever happens next will require a new HTTP::Response object, so one is created.
If the path sent is a directory, a directory listing is required. Directory listings are generated by the make_page() function. The content-type is set in the response object, as pixie will send some kind of HTML. If the browser sent us a Pixie cookie, we simply update it with the current Session ID. If cookie has a Session ID but the server has no record of it, the cookie is deleted from the browser. Which is to say, a new cookie is sent with an old expiration date.
I've glossed over the details of Pixie session management in the above paragraph. When a user builds a playlist, the list needs to be kept somewhere. Pixie stores this list in server memory. Each list is assigned a random number which is its session ID. This ID is passed to the client with HTTP cookies. Every time the client makes a request, this cookie is passed back to Pixie. There is a global hash table called %Sessions that stores the association between ID and play list.
To finish off do_browse(), if the path of the request points to a real file, it is served without much more sanity checking. There is definitely room for improvement here in terms of security. The next page handler of interest is the one that handles requests to add files to the current playlist: do_add_asset.
sub do_add_asset {
my ($c, $r) = @_;
my $path = $r->uri->path;
my @query = $r->uri->query_form;
# Is there a cookie?
my $sid = get_sid($r);
unless (exists $Sessions{$sid}) {
Log("Could not find $sid in: "
. join(", ", keys %Sessions)) if $DEBUG;
$sid = time();
Log("Creating new SID '$sid'") if $DEBUG;
}
# For all the "a" params,
# base64 decode and add to Sessions hash
for (my $i=0; $i < @query; $i += 2) {
if ($query[$i] eq "a") {
# retain order through value
my $cnt = scalar keys %{$Sessions{$sid}};
$Sessions{$sid}->{decode_base64($query[$i+1])} = ++$cnt;
}
}
Log(sprintf("\%Sessions has %d keys\n",
(scalar keys %Sessions))) if $DEBUG;
return do_browse($c, $r, $sid);
}
Much of the first part of this routine should be familiar by now. What's interesting is that if no valid Session ID is found, a new one is created based on epoch time. If security is a concern, you should use a different method to generate IDs, like UUIDs. In any case, for each query parameter in the request (which is to say, MP3 file paths), the path is decoded from base64 and added to the sessions hash. This is complicated by wanted to preserve the order in which the songs are selected. This ordering is perserved in the Sessions hash. Let's see how the actual playlist files are served.
sub do_serve_playlist {
my ($c, $r) = @_;
my $sid = get_sid($r);
if (!$sid || !defined $Sessions{$sid}) {
$r->uri->path("/");
return do_browse($c, $r);
}
my $res = HTTP::Response->new(HTTP_OK);
my @files = map {$Base_URL . $_} get_sorted_playlist($sid);
my $out = make_playlist(@files);
$res->header("Content-type" => "audio/x-mpegurl");
$res->header("Content-Length" => length($out));
$res->content($out);
$c->send_response($res);
$c->shutdown(2);
return;
}
The trickiest part about serving the playlist is getting the MIME type right. The MIME type gives a hint to the browse about the kind of file being served and what sort of external application the browser should use for it. Creating the playlist file is handled by make_playlist() and is pretty straight forward. Note the use of the draconian shutdown(2) on the client socket. I found on Windows that without this call, Winamp never launched. By closing both ends of the client socket, the web browser can be sure it has the entire file, which means that it is safe to launch the external program.
An interesting feature of Pixie is that the look and feel of the directory listings can be controlled with an external CSS file. Simply create a pixie.css file in the root of the MP3 directory and go to town. You can see what the default CSS file looks like simply by pointing your browser to http://localhost:[pixieport]/pixie.css.
Finally, there is ample room for improvement in the Pixie server. There are a number of security enhancements that can be made to ensure that only authorized files are sent. Pixie is a single threaded application and does not handle concurrency at all. Concurrency is a pretty thorny issue to get right for a platform neutral server. The core of the issue is the way Perl handles sockets and filehandles. On Linux, I would fork a new process for each new client request. That's a very clean way to make Pixie more responsive. Child processes inherit the open filehandles of the parent and so sockets can be handled independently in each process. On Windows, the fork() builtin merely emulates forking behavior with threads. Unfortunately since sockets look like filehandles, closing the client socket in the parent after fork (which is what you'd do on Linux) closes the socket in the child. It's not clear to me what solution would work here. I thought perhaps IO::Select would be a good choice, but then I suspect that when music files are sent, that will almost always block the directory listing traffic. I suppose this is a scaling mystery to be solved on another day.
Sometimes you need to secure a message could be intercepted by party for whom the message is not meant. One moderately good way to do this is using the Vigenere cipher.
The Vigenere cipher is a kind of caesar or subsistution cipher. It uses a sequence of letters called a Key to determine the substitution. Compare this to the constant shift function of caesar. In Caesar crypt text, all occurences of a particular clear text letter will be the same subsistute letter (i.e. 'Z' always stands in for 'E'). In Vigenere, the substitute for a clear text letter changes depending on where that letter occurs. Neat trick, eh?
The Key is a shared secret known only to those parties authorized to read the message. The encoded text cannot be cracked in the same way that a caesar inciphered text can be, although there are sophisticated avenues of attack known to the NSA and other professionals.
Vigenere works by using a tabula recta, which is is a table of the alphabet in rows and columns, like the following:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z B C D E F G H I J K L M N O P Q R S T U V W X Y Z A C D E F G H I J K L M N O P Q R S T U V W X Y Z A B D E F G H I J K L M N O P Q R S T U V W X Y Z A B C E F G H I J K L M N O P Q R S T U V W X Y Z A B C D F G H I J K L M N O P Q R S T U V W X Y Z A B C D E G H I J K L M N O P Q R S T U V W X Y Z A B C D E F H I J K L M N O P Q R S T U V W X Y Z A B C D E F G I J K L M N O P Q R S T U V W X Y Z A B C D E F G H J K L M N O P Q R S T U V W X Y Z A B C D E F G H I K L M N O P Q R S T U V W X Y Z A B C D E F G H I J L M N O P Q R S T U V W X Y Z A B C D E F G H I J K M N O P Q R S T U V W X Y Z A B C D E F G H I J K L N O P Q R S T U V W X Y Z A B C D E F G H I J K L M O P Q R S T U V W X Y Z A B C D E F G H I J K L M N P Q R S T U V W X Y Z A B C D E F G H I J K L M N O Q R S T U V W X Y Z A B C D E F G H I J K L M N O P R S T U V W X Y Z A B C D E F G H I J K L M N O P Q S T U V W X Y Z A B C D E F G H I J K L M N O P Q R T U V W X Y Z A B C D E F G H I J K L M N O P Q R S U V W X Y Z A B C D E F G H I J K L M N O P Q R S T V W X Y Z A B C D E F G H I J K L M N O P Q R S T U W X Y Z A B C D E F G H I J K L M N O P Q R S T U V X Y Z A B C D E F G H I J K L M N O P Q R S T U V W Y Z A B C D E F G H I J K L M N O P Q R S T U V W X Z A B C D E F G H I J K L M N O P Q R S T U V W X Y
Notice the letters in column 2 are shifted 1 from column 1. This is important, as you'll see in a moment.
To encode a message, you need a key. The key needs to be as long as the clear text message. Repeat the key as needed to make the key long enough. For example, if my key is "ORANGE" and my message is "IAMASECRETAGENT", then you would need to expanded the key to "ORANGEORANGEORA". To encipher the message, walk through each letter of the clear text. For each letter, find the column that starts with that letter of the clear text and look down that column for the row that starts with the corresponding letter of the Key. The letter you find there is the cipher text substitution for that letter. So, the first letter in this example is "I" with a key letter of "O", which means that the substitution is "W". Here's the completed example:
| Clear text: | IAMASECRETAGENT |
| Key: | ORANGEORANGEORA |
| Crypt text: | WRMNYIQIEGGKSET |
In the real world, messages will comprise more characters than just the uppercase letters of the alphabet. There are two ways to handle this when implementing the Vigenere cipher in code. Either reject all characters that are not part of the tablua recta or simply make the subsistute character the same as the input. The latter technique is the one I favor, but that may make the crypt text easier for someone to crack. For example, what do think this crypt text is refering to?
zT4N://eYSqnA.MxDEh7CmcGPT.jEA/aR/yO/?Hy=BRe7hS
Surely, this is a URL. Further, one can safely assume that "zT4N" is "http" in clear text. That's a lot of context to give away.
Implementing this complex cipher in Perl is straight forward. The first thing to do is to build the tablua rect:
sub build_matrix {
my @Wheel = ((0..9), ('A'..'Z'), ('a'..'z'));
my $M = [[]];
for (my $y=0; $y < @Wheel; $y++) {
for (my $x=0; $x < @Wheel; $x++) {
$M->[$y]->[$x] = $Wheel[($y + $x) % @Wheel];
}
}
return $M;
}
Notice that my alphabet is composed of numbers and both cases of letters. This set of characters is contained in the @Wheel array. By looping through the array twice, it is possible to create a two dimensional array. A careful look at the code reveals our old friend the shift function from the caesar cipher. Each column is shifted by one.
You may not be familiar with the modulus operator %. The modulus operator returns the remainder of an integer division. For example, 13 % 5 is 3. Modulus operations have the interesting property that they always return values between 0 and one less than the second operand. Of course, this fits neatly in with arrays that begin indexes their elements at zero.
One last note. Even though this function returns the tablua recta, the caller stores the array in a globally visible variable called $M. This global is referenced in the encoding function below.
sub enc {
my ($plain, $key) = @_;
# Find the row of the cipher
for (my $y=0; $y < @$M; $y++) {
if ($M->[$y]->[0] eq $key) {
# Find the column of plain text
for (my $x=0; $x < @{$M->[$y]}; $x++) {
if ($M->[0]->[$x] eq $plain) {
return $M->[$y]->[$x];
}
}
}
}
return $plain;
}
This function expects a clear text letter and the corresponding key letter. The row of the key letter is found in the tablua recta (i.e. $M) and then the column of the clear text letter is found. Whatever letter is found in $M at that location is returned to the caller. Notice that if the plain letter cannot be found in the tablua recta, that character is returned as the subsistution.
What remains is to show the main part of the program that calls these functions:
my @key_text = split //, $key;
my @plain_text = split//, $message;
my $new = "";
for (my $i=0; $i < @plain_text; $i++) {
$new .= enc($plain_text[$i], $key_text[$i % @key_text]);
}
print "$new\n";
Here, the Key and the message are hard coded in $key and $message respectively. Each of these is split into arrays of characters. For every character in the plain_text array, the encoding function is called and the cipher text is built and later printed.
Notice the return of the modulus operator. Instead of trying to expand the key to match the length of the message, a simple modulus operation is used. This saves a bit on memory and it is easier to implement than the literal expansion of the key.
To decrypt the message, replace the enc() call in the main line with dec(), shown below:
sub dec {
my ($cipher, $key) = @_;
# Find the row of the key
for (my $y=0; $y < @$M; $y++) {
if ($M->[$y]->[0] eq $key) {
# Find the column of the cipher
for (my $x; $x < @{$M->[0]}; $x++) {
if ($M->[$y]->[$x] eq $cipher) {
return $M->[0]->[$x];
}
}
}
}
return $cipher;
}
Notice that it is the inverse operation of enc(). It is passed a cipher letter and the corresponding key. The row of the key is found and the column that contains the cipher letter is found. Then the first letter of that corresponding column it found, which represents the clear text letter. If no match is found, the cipher text is returned.
Of course, you don't need to use this implementation. There is one on CPAN already, although it uses only 'A'..'Z' for its alphabet and strips off other characters.
Happy enciphering.
(Note: Thanks to gizmo, I have corrected an abbreviation expansion problem.)
Uniform Resource Locators
are an addressing scheme at the heart of the Web. Without them, there would
be no stardard way to refer to a resource offered by a web server. URLs
remove the ambiguity of addressing a resource, but at the cost of creating
some rather formidable namespaces (e.g.
https://addons.mozilla.org/en-US/firefox/addon/9549).
In general, long URLs aren't a problem. Either through web page hyperlinks or web browser bookmarks, URLs fade into the background for most users. However, sometimes it is more convenient to have a shorter reference to a resource than the fully qualified URL. For example in the late nineties on IRC, it was common to see tiny.cc URLs pasted into chat rooms. Long URLs tend to clutter up already busy chat room windows. With the advent of text message-based systems like Twitter, which limit status updates to 140 characters, long URLs are actually consuming a valuable resource. The most common URL shortener used on Twitter.com appears to be bit.ly
There are several URL shortening services out there and they all work
pretty much the same way. The user supplies the full URL. The service
hashes the URL into something smaller and appends this to its own namespace.
Using the bit.ly service, the mozilla URL becomes:
http://bit.ly/g0Z9. When someone accesses this bit.ly URL, he
will be seemlessly redirected to the original resource.
Bit.ly provides a REST interface to their service
(API).
To use this, create an account on
bit.ly's system. Now you are ready to build a Perl REST client for the shorten
service (http://api.bit.ly/shorten).
The following code is a listing of a small command line Perl script that expects to be passed a long URL. It uses the bit.ly REST service to return a shortened version.
use strict;
use LWP::UserAgent;
use Getopt::Std;
use HTTP::Request;
use URI;
my $VERSION = "1.0";
my $Opts = {};
my $bitly_api_url = q[http://api.bit.ly/shorten];
my $long_url = pop @ARGV;
getopts('u:p:?', $Opts);
if (!$long_url || $Opts->{'?'}) {
print usage();
exit;
}
set_defaults($Opts);
my $ua = LWP::UserAgent->new;
my $fetch_url = URI->new($bitly_api_url);
$fetch_url->query_form({'version' => "2.0.1",
'format' => "xml",
'longUrl' => $long_url,
});
my $req = HTTP::Request->new(GET => $fetch_url);
$req->authorization_basic($Opts->{u} => $Opts->{p});
my $res = $ua->request($req);
if ($res->code == 200) {
my ($url) = ($res->content
=~ m!([^<]+) !);
unless ($url) {
warn("FAIL: [". $res->content . "]\n");
exit 1;
}
print "$url\n";
exit;
} else {
warn("FAIL:[".$res->content."]\n");
exit 1;
}
#-----
# sub
#-----
sub usage {
return <
OPTIONS
? - Display this screen
u [USERNAME] - Bit.ly username
p [PASSWORD] - Bit.ly password
EOT
}
sub set_defaults {
my ($h) = @_;
$h->{u} ||= "taskboy3000";
$h->{p} ||= "s3c3rt";
}
This code uses the standard Perl module Getopt::Std to parse optinal
command line arguments. The set_defaults function merely uses
my bit.ly credentials if none are provided through optional parameters. Next,
a new LWP::UserAgent object is created to make client HTTP calls. The bit.ly
shorten service expects a GET request with optional arguments encoded as
query parameters in the URL. The bit.ly service can respond to requests with
data in various formats (e.g. XML, JSON). In this case, the format parameter
is set to "xml."
The URI class manages the extra parameters through the
query_form method and urlencodes these into the new
URL. A simple HTTP::Request object is passed the new URL and the bit.ly
credentials are added to the HTTP request header using the
authorization_basic method.
Once the HTTP request has all the information, it is ready to be sent to the bit.ly server. The HTTP::Request object is passed to the LWP::UserAgent::request method, which contacts the server and encodes the response as an HTTP::Response object.
If an error occurred in transmission, the response will have a HTTP status code other than 200. Even if the requests succeeds, the service might fail due to missing or bad credentials. A simple regex extracts the shortend URL from the XML message and reports on the command line for easy consumption by other command line tools.
This script will run on any platform supported by Perl.
For a long time, I've ignore the Representational State Transfer (REST) architecture. For one thing, I don't particularly agree with its premise that remote procedure calls (RPC) that use HTTP as a transport mechanism should obey the same semantics as regular web traffic. Things like XML-RPC and SOAP are, to my thinking, happening on an entirely different layer of the application stack than HTTP. Indeed, there are implementations of XML-RPC that do no use HTTP at all.
I remember pretty heated arguments I witnessed at tech conferences in the early 2000s about this seemingly unimportant technical point. For REST adherents, web services are another form of web traffic and should be treated as such. Given that Twitter, Facebook and Bit.ly all use REST for their APIs and older apps like liveJournal use XML-RPC/SOAP, I guess REST is the new hotness.
I've recently had reason to interact with the Twitter and Bit.ly APIs. This has made me come to terms with REST RPC mechnanisms. I admit, the sad, sick part of me that enjoys playing around with low-level HTTP stuff finds satisfaction in the way these API leverage existing HTTP features like basic authentication, extra path info, and GET and POST semantics. In this post, I thought I would show a bit of Perl code I wrote post status updates to Twitter, an activity more commonly referred to as "tweeting."
Twitter's API documentation is relatively straight forward, if you already have a solid grounding in HTTP. The API call to tweet is called "statuses/update". The basics of the RPC mechanism are easy enough:
- The caller makes a HTTP GET or POST request
- The sender replies with content in the form of JSON or XML
Let's start with the request. There are serveral bits of information required by the API: user credentials, the URL and additional query parameters. The user credentials are passed as part of the HTTP request header as a basic authentication field, which is merely a base64 string that is the concatenation of the username and password of your Twitter account. Fortunately, Perl's HTTP::Request::Common class makes it easy to add basic auth credentials to the request without knowing how this information is encoded in the HTTP request.
The next bit is the URL to the function. This is a core idea of REST --
function calls should have URIs and look like ordinary web resources.
In this case, the URL is http://twitter.com/statuses/update.xml.
Interestingly, the response from twitter can be encoded in a number of formats.
These formats are determined by the extension you give to the URL. For
instance, I could have request the metainformation about myself in
JSON with the following URL:
http://twitter.com/users/show/taskboy3000.json.
The text of the tweet must be passed to the URL as if it were POSTed from a
form. The parameter name is status. The status must be encoded
as if the data were submitted from an HTML form. Again, Perl makes this very
easy, as will be shown below.
use LWP::UserAgent;
use HTTP::Request::Common ('POST')
my $api_url = q[http://twitter.com/statuses/update.xml];
my $status = "Tweeting from the API!";
my $twitter_username = "taskboy3000";
my $twitter_password = "s3cr3t";
my $ua = LWP::UserAgent->new;
my $req = POST($api_url => [status => $status]);
$req->authorization_basic($twitter_username
=> $twitter_password);
# Make the request
my $res = $ua->request($req);
The code above is sets up and makes the status RPC call to twitter. The first thing needed is an LWP::UserAgent object, which is kind of like a web browser. It makes HTTP requests of web servers. To construct the POST request, I use HTTP::Request::Common::POST. Because I can pass in form parameters as plain perl data structures, it frees me from worrying about urlencoding values and fooling around with HTTP headers that are germain to the task at hand. POST() returns an HTTP::Request object.
Adding my twitter account credentials to the request is a simple one line call to authorization_basic(). Very handy and very clean. That's all the setup I need to make the request. I pass in the HTTP::Request object to the User Agent object. That makes the actual network connection to the URL. The response comes back in the form of an HTTP::Response object, which I'll discuss next.
If all has gone well with the request, I'll get back an XML document that looks something like this:
<?xml version="1.0" encoding="UTF-8"?> <status> <created_at>Tue Apr 07 22:52:51 +0000 2009</created_at> <id>1472669360</id> <text>At least I can get your humor through tweets. RT @abdur: I don't mean this in a bad way, but genetically speaking your a cul-de-sac.</text> <truncated>false</truncated> <in_reply_to_status_id>1472669230</in_reply_to_status_id> <in_reply_to_user_id>10759032</in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <user> <id>1401881</id> <name>Doug Williams</name> <screen_name>dougw</screen_name> <location>San Francisco, CA</location> <description>Twitter API Support. Internet, greed, users, dougw and opportunities are my passions.</description> <url>http://www.igudo.com</url> <protected>false</protected> <followers_count>1027</followers_count> <profile_text_color>000000</profile_text_color> <profile_link_color>0000ff</profile_link_color> <friends_count>293</friends_count> <created_at>Sun Mar 18 06:42:26 +0000 2007</created_at> <favourites_count>0</favourites_count> <utc_offset>-18000</utc_offset> <time_zone>Eastern Time (US & Canada)</time_zone> <profile_background_tile>false</profile_background_tile> <statuses_count>3390</statuses_count> <notifications>false</notifications> <following>false</following> <verified>true</verified> </user> </status>
Most of this, I don't care about. However, I do want to see if there's an
unless ($res->is_success) {
my $c = $res->content;
my ($errstr) = ($c =~ m!<error>([^<]+)</error>!);
warn(sprintf("Post failed (%d): $errstr\n", $res->code));
exit 1;
}
print "OK\n";
exit 0;
Without the services of a full XML parser, it's relatively easy to look for an error tag and extract the contents for display. The error message I've encountered most is essentially "you used the API too much". Twitter does restrict the usage of some of their API calls, but not the status one.
If you collapse all the Perl code, you're looking at less than 20 lines of
code. If you wanted to, you could even make posts using the very handy
command line tool curl:
curl -u taskboy:s3cr3t -d "status=hello curl" \
http://twitter.com/statuses/update.xml
I will leave the checking of error messages from curl output as an excerise for the reader.
As I said, REST RPC mechanisms are fun and interesting if you already understand HTTP. However, not everyone does. I think XML-RPC and SOAP libraries to a better job of insulating the programmer from the HTTP protocol, allowing him to focus on the API task at hand.
Loyal Taskboy Readers,
Markdown is an open source text filter written in God's Own Perl. Even though the taskboy blog is written in PHP, I can shoehorn this fitler into the comment system to allow greater markup. I may write a primate forum for taskboy built on the comment system. Stack Overflow uses this system, which is what brought it to my attention.
I ask you, readers, is it desirable that I add the Markup filter to comments?
UPDATE: Thanks to the Internet's feature of building applications before I need them, I have just run across Markup for PHP.
It seems that a few of you are using or trying to use the XML::RSS::Podcast module I published on this blog. As was mentioned there, modern version of XML::RSS do not support the encode function required for generating well-formed XML documents.
That's all changed thanks to Rohan Carly. [Add your thanks to Rohan in the comments.]
I present the complete version of XML::RSS::Podcast below. If you find this module
useful, I'll attempt to create a CPAN module for it.
package XML::RSS::Podcast;
use XML::RSS;
use HTML::Entities qw[encode_entities_numeric encode_entities];
@XML::RSS::Podcast::ISA = qw[XML::RSS];
our $VERSION = q[1.1];
# encode by Rohan Carly
# Stolen from XML::RSS::Private::Output::Base;
sub encode {
my ($self,$text) = @_;
return unless defined($text);
my $encoded_text = '';
while ($text =~ s,(.*?)(<!\[CDATA\[.*?\]\]>),,s) {
# we use &named; entities here because it's HTML
$encoded_text .= encode_entities($1) . $2;
}
# we use numeric entities here because it's XML
$encoded_text .= encode_entities_numeric($text);
return $encoded_text;
};
sub as_string {
my $self = shift;
return $self->as_podcast_rss;
}
sub as_podcast_rss {
my $self = shift;
my $enc = $self->{encoding};
my $output = qq[<?xml version="1.0" encoding="$enc"?>
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
version="2.0">];
$output .= $self->podcast_start_channel;
for my $i (@{$self->{items}}) {
$output .= $self->podcast_item($i);
}
$output .= $self->podcast_end_channel;
return $output .= "\n</rss>\n";
}
sub podcast_start_channel {
my $self = shift;
my @fields = qw[ttl title description link language
pubDate lastBuildDate creator
webMaster copyright
];
my @image_fields = qw[title url description link
width height];
my @itunes_fields = qw[subtitle author summary
image explicit]; # Thanks Rohan
my $output = "<channel>\n";
for my $f (@fields) {
if (defined($self->{channel}->{$f})) {
my $s = $self->encode($self->{channel}->{$f});
$output .= "\t<$f>$s</$f>\n";
}
}
my $seen_image = 0;
for my $f (@image_fields) {
if (defined($self->{image}->{$f})) {
unless ($seen_image) {
$output .= "\t<image>\n";
$seen_image = 1;
}
my $s = $self->encode($self->{image}->{$f});
$output .= "\t\t<$f>$s</$f>\n";
}
}
if ($seen_image) {
$output .= "\t</image>\n";
}
# Owner name/email not handled
for my $f (@itunes_fields) {
if (defined($self->{channel}->{itunes}->{$f})) {
my $s=$self->encode($self->{channel}->{itunes}->{$f});
$output .= "\t<itunes:$f>$s</itunes:$f>\n";
}
}
# Rohan's sub-cat handling code
# Expects an array: [category, sub-category]
if (ref $self->{channel}->{itunes}->{category} eq 'ARRAY') {
my $major = $self->encode(${$self->{channel}->{itunes}->{category}}[0]);
my $minor = $self->encode(${$self->{channel}->{itunes}->{category}}[1]);
$output .= qq[\t<itunes:category text="$major">\n];
$output .= qq[\t\t<itunes:category text="$minor">\n];
$output .= qq[\t\t</itunes:category>\n];
$output .= qq[\t</itunes:category>\n];
}
return $output . "\n";
}
sub podcast_end_channel {
return "</channel>\n";
}
sub podcast_item {
my $self = shift;
my $item = shift;
my @fields = qw[title guid pubDate description link];
my @itunes_fields = qw[author subtitle summary
duration keywords explicit];
my $output = "\t<item>\n";
for my $f (@fields) {
if (defined($item->{$f})) {
$s = $self->encode($item->{$f});
my $perma = "";
if ($f eq "guid") {
$perma = qq[isPermaLink="false"];
}
$output .= "\t\t<$f$perma>$s</$f>\n";
}
}
if (ref $item->{enclosure}) {
$output .= "<enclosure";
for my $f (qw[url length type]) {
if (defined $item->{enclosure}->{$f}) {
$output .= sprintf("$f=%s", $self->encode($item->{enclosure}->{$f}));
}
}
$output .= "/>";
}
for my $f (@itunes_fields) {
if (defined $item->{itunes}->{$f}) {
$s = $self->encode($item->{itunes}->{$f});
$output .= "\t\t<itunes:$f>$s</itunes:$f>\n";
}
}
return $output .= "\t</item>\n";
}
1;
For an example of usage, please see the previous post cited above. Report bugs in the comments or through email. Thanks.
If you are using the Perl module XML::RSS in the following way:
my $R = XML::RSS->new;
foreach my $url (@urls) {
my $content = wget($url);
$R->parse($content);
}
In older versions of the XML::RSS module, this code worked fine. However, if you have upgraded the module recently, you might have noticed the error message:
Modification of non-creatable array value attempted, subscript -1 at /usr/local/lib/perl5/site_perl/5.8.5/XML
/RSS.pm line 792.
It is not the feed that has gone rancid on you, but the library. Try instantiating a RSS object within the loop like this:
foreach my $url (@urls) {
my $content = wget($url);
my $R = XML::RSS->new;
eval{$R->parse($content)};
if ($@) {
warn("Parse error: $@");
next;
}
}
I've added some "exception" handling for free in this example so that parse errors don't blow up your program.
I'm afraid I did not dive into XML::RSS.pm to figure out the problem, but if someone with that knowledge wishes to post below, I'm sure won't be the only one who welcomes enlightenment.
Perl now has three implementions of XML::RPC, Frontier::RPC2, SOAP::Lite and RPC::XML. Each is interesting and somewhat broken in its own way.
Frontier::RPC2 0.06 has issues with Boolean, iso8601 and Base64 because these classes have a bug in their constructors.
Given:
sub new {
my $type = shift;
my $value = shift;
return bless \$value, $type;
}
Instead of checking to see if $type is a reference, this code blesses. If a reference is passed in, the blessed class is sometime like "Frontier::RPC2::Base64=SCALAR(0x65432)". This breaks code later than encodes/decodes these values into XML. Perltoot has the solution to this problem
RPC::XML is an interesting module in that it brings type checking to Perl, in a sense. When creating an XML-RPC server, each remote procedure has to the arguments it accepts along with its return type. This is call a signature. Although type checking is unPerlish, it is very helpful when dealing with other languages that are subPerl. More docs on signatures would be great.
Finally, SOAP::Lite brings its own bad self to the XML-RPC party. Although in my testing it works with all the XML-RPC datatypes, the programming interface is similar to SOAP::Lite (not surprisingly). It is a style which takes a bit of getting usef to.
The good news is that, at least from a client level, all these modules seemed to be compatible. That is, a Frontier::Client can talk to an XMLRPC::Lite server.
More to come.
From a truly ill-considered Ask Slashdot question came this pearl of wisdom. A former (award winning!) Java hacker confesses his adulterous thoughts for Our Favorite Language. I'd like to pick up on a few of his points and perhaps add a bit of my own.
- Unicode: it's a sore spot of Perl, but seems to be hunky-dory for java. I lay the blame for Perl's developmental problems here on DWIM. As Perl coders, we've come to expect Perl to handle simple (and not so simple) string manipulation without that much handholding from us. I suspect that this bit of Sloth is biting the 'porters. Certainly, the unicode issue is preventing Perl from being a first class XML processing language (no XS cheating here).
- Perl's threading system is still developing, but java's model seems to be working fine. Of course, Perl's forking model is probably better and easier to use than java's (cross-platform forking is difficult to guarentee). I say tit-for-tat on threading/forking here.
- Dedicated IDEs are bullshit. There, I said it. Coders shouldn't handle their code with the tongs of an IDE, like some poor MS Word shlub. If the java folks have more IDEs than Perl People, good on 'em.
- ALL PRAISE CPAN! Perl was years ahead of MOST other languages here and continues to put java to shame. CPAN is not only an FTP archive, it's damn protocol to download and install libraries! In your face, JAR!
- Elegance is something for which python and java are praised and for which Perl is found wanting. Clearly, I don't get the 'elegance' argument and I never have. If 'elegance' means fewer characters typed during coding, Perl wins for most applications. Consider perl one-liners, the inclusion of perl hashes, regexes, etc. Perl delivers the promises of 'less typing'. Is elegence the lack of 'weird characters?' It may be, but Perl's syntax acts like a road sign to tell the maintain what's going on. Sigils (those funny characters preceding variable names) are not a misfeature -- they are essential to maintaining a Perl program. You always know what to expect from a variable with a sigil. Without them, the reader (and coder!) needs to find the original declaration of the variable to figure out what it is. I hate that. Is elegance the object model of a language? Python and Perl's object model isn't terribly different. Shocked? Don't be. The difference between python and Perl objects is that python has a object type, where perl blesses variable data. Java and Perl object models are very different. Java's single inherentence (which is a Good Thing) and object data protection (which is a Useless Thing) definitely come from the C++ crowd.
- You always get the source code with every Perl script and library. This makes it simpler to debug programs in Perl because you can always step through almost all of the code, including the libraries.
- Readability. Perhaps this is meant to go under 'elegance,' but I think this is a different issue. All languages can be obfuscated. It's not the language designer's perview to make you code clearly. Any claim a language makes to being inherently cleaner to code in (I'm looking at you, Java and python) is naive. I don't expect a java programmer to maintain a Perl program, just as I don't expect a Perl programmer to maintain a java program. In fact, that's why I'm not an editor for a Japanese magazine -- I have no facility for the language. Does that mean Japanese is inferior to English?
Let's all say it together: XML::Parser Sucks! There, that was cleansing.
After a much prodding of my XML buddies (hi jmac!) and an evil notion of using goto (thankfully Perl doesn't let you jump into the middle of a function), I came across a seemingly little used XML::Parser function So, here's a very goofy example of how to work with this bod boy. I'll be looking to shove this into Frontier::RPC2 in a most Eee-VEIL way. ;-)
parse_start which returns a new XML::Parser::ExpatNB object (with oh so little documentation) that does EXACTLY WHAT I NEED! I need a parser that parses a stream in increments.
Consider how useful this is for dealing with XML messages coming across the network that might f'ing HUGE! This parsing method will at least give me an opportunity to chunk the data into smaller bits (save for the pathological 45TB between a single
use strict;
use warnings;
use XML::Parser;
my $p = XML::Parser->new(
Style => 'My::Pkg',
);
print "Reading from __DATA__\n";
my $data; # A place for my text data
# Don't be fooled: it's an
# object constructor
my $nb_p = $p->parse_start(data => \$data);
while(my $l = <DATA>){
chomp($l);
$nb_p->parse_more($l);
if(my $s = ${$nb_p->{data}}){
print "Back at the range, I got $s\n";
}
}
$nb_p->parse_done;
package My::Pkg;
sub Init {
my($expat) = @_;
print "Hello!\n";
}
sub Start {
my($expat, $tag, %attrs) = @_;
${$expat->{data}} = undef;
print "Start: $tag\n";
}
sub Char {
my($expat, $text) = @_;
${$expat->{data}} = undef;
return if $text =~ /^\s*$/;
$expat->{char_bag} = $text;
}
sub End {
my($expat, $tag) = @_;
print "End: $tag\n";
${$expat->{data}} =
$expat->{char_bag};
# clean up
$expat->{char_bag} = '';
return;
}
__END__
<?xml version="1.0" ?>
<a>
<b>
<c>
<d>fiddlesticks</d>
</c>
</b>
</a>
Here's some fun at my expense. This kept me in stitches for most of yesterday. After a good night's sleep (and more methodical debugging), I tracked the problem down to this block:
sub find_referral_by_ip {
my($self, $ip, $dbh) = @_;
# ... non-related code omitted
unless($ip){
my $ip = $r->connection->remote_ip;
}
The intention here is to get the IP address of the current request if it wasn't passed in (yup, it's mod_perl). In other words, if $ip isn't populated then ensure it is. Normally, I don't pass in the IP, since this method will figure out the IP on its own, right? I began wondering: "should I be looking a subrequest? Is my sql [not shown here] broken? What's going on here?!"
Have you seen the problem yet? Here's a hint: it has NOTHING to do with mod_perl, a bug in Perl or the way Perl handles method calls.
The upshot is that I'm an idiot.
Thank you!
I confess that I don't use the '-w' flag much in Perl except when I do syntax checking like:
$ perl -wc [script]
For the most part, the only warning perl issued during runtime for my programs 'Use of uninitialized value ...', which is a pretty lame warning since Perl defines new created variables to be, er, undefined. In C, you sure as heck don't want to be counting on any uninitialed variable values. Perl does the right thing, however, and that's why we use it.
Today, I found a Perl "feature" that I nearly reported as a bug. Take a look at this code:
use strict;
bar("value");
print "end\n";
sub bar {
for ($_[0]) {
/[a-z]/ && do {
print "Plunging into foo()\n";
foo();
print "You got me!\n";
return 1;
};
}
print "fall through\n";
return 1;
}
sub foo {
my $ans = '';
print "type 'q' to quit\n";
do{
last if lc (substr($ans, 0, 1)) eq 'q';
print "\nstill in loop\n";
} while( $ans = <> );
return;
}
Earlier today, I would have expected this program to print:
Plunging into foo() type 'q' to quit You got me end
This expectation would have also gone unfulfilled, since the real output is more like:
Plunging into foo() type 'q' to quit still in loop fall through end
What's going on here? It looks like the 'last' in foo() is jumping back to the loop in bar(). In fact, this is the case and -w provides some insight:
Exiting subroutine via last at ...
Kudos to p5p for adding this warning! I suspect the problem lies with the infrequently-seen-in-Perl-but-useful-Pascal construction 'do{}while()'. Since I haven't run into this behavior before and I rarely use 'do{}while(),' I'm pointing fingers at it. Still this behavior seems a little bold to me. What do you think?
I'm working on a game similar to Funeral Quest (an upgrade to which is due soon). It's a turned-based web game based on UFOlogy (go with what you know, I say). I'll have details on the game some. Naturally, the game is written in Perl.
The game includes a single-threaded web server implemented with Gisle Aas' HTTP::Daemon Class. It's single-threaded because I want to allow users to run the server on Winders, should they choose to and I couldn't figure out how to use threads or ithreads (am I supposed to have access to ithreads?) on win32.
For simple tests, getting a file or a small web page, HTTP::Daemon works fine. The problem started when it attempted to serve a web page with 15 images on it. When the server tried to server the third file (an image), I'd get a 'broken pipe' error my server would halt. As I worked to isolate the problem, I found that the error was coming from this simple loop in HTTP::Daemon::ClientConn::send_file:
while ($n = sysread($file, $buf, 8*1024)) {
last if !$n;
$cnt += $n;
print $self $buf;
}
As I step through this code in the debugger, I found that after the second iteration $self (a kind of IO::Socket::INET object) would choke. Bummer. I tried changing from print to syswrite, but that didn't change the behavoir at all.
I turned to Google.
Digging around, I found this post from 1997:
Gisle Aas (gisle@aas.no) 26 Nov 1997 11:39:19 +0100 Previous message: Gisle Aas: "Re: libwww-per-5.16" In reply to: Joerg Kammerer: "Broken Pipe" ------------------------------------------------------------------ Joerg Kammererwrites: ] Hope for a Tip... Add: $SIG{'PIPE'} = 'IGNORE'; to your script.
Sure enough, that seems to do the trick. An early hack I thought of that sort of worked was:
$SIG{WARN} = sub { exec "/usr/bin/perl $0" };
This restarts the server on a broken PIPE error. Unfortunately, the file that choke the server wouldn't get sent.
It would be super if Gisle added this little gotcha to his HTTP::Daemon man page. If not, I'm sure others will find this journal page when they Google for a solution, as I did.
Guru advice is welcomed. Any thoughts on why the server connection chokes? It looks like some kind of I/O buffer is getting full. Because I can safely ignore the signal, I'm guessing that the I/O that trigger the signal gets cleared and more traffic is accepted on that socket. Is this some kind of race condition?
Also, should I abandon the fantasy of expecting perl 5.6.1 threading to work reliably on both Linux and Windows?
You guys are the best...
Wouldn't it be cool to make your own synthesizer in Perl? With a little rudimentary physics and a CPAN module, you can produce the monophonic opus of your dreams without leaving the protective womb of your favorite editor.
Experienced readers of this journal will note that I am interested in music. Not only do I listen to quite a bit of it, but I've been playing, composing and recording music at the hobbist level (read: wanker level) for over ten years now. While you can limp along as an audio engineer without understanding the dynamics of audio waves, this knowledge can save quite a bit of time during tracking and mixing. Knowing which band of frequencies best accentuates those instruments in your mix can help you fill out your sound while reducing unintentional mush ("mush" is a technical term).
Listeners of my recent music will notice a lot of MIDI sequencing. From drums to bass to keyboards, it seems I've discovered the little Gary Neumann inside of me. I use Cakewalk Pro Audio 9 for sequencing the MIDI patches that my SoundBlast Live card has (with additional SoundFonts I bought from a third party). Pro Audio allows me to blend MIDI sequences with "live" audio tracks (vocals, guitars and other assorted noises captured by microphones). These audio tracks can be recorded with Pro Audio or WAV files can be imported into the current project. This is how I work with samples from various movies or CDs. My TV, VCR and CD player are components of my PC (even if they weren't, I have an external Mackie mixer which routes various sound sources through one set of speakers and to my PC, producing the same result for this purpose). I sample by playing back the orginal source while recording the audio signal with Cakewalk. As a WAV file, it is then easy to edit and modify that sample to taste.
Music is fun (but not always profitable) when its experimental. Just as I've recorded spatulas and toolboxes as percussion elements in the past, I found that my experiments with IBM's ViaVoice speech synthesis software to have pleasant musical applications (at least, pleasant by my reckoning). Since Perl is a big part of my life, I have wanted to incorporate some our favorite scripting into my music. When I found the Audio::WAV module on CPAN, I seized upon the opportunity to learn more about WAV files and audio dynamics.
I'll skip the high school physics introduction to sound and waves, since most of the readers here probably remember more of that stuff than I do. However, the important thing to remember is that sound moves in waves. The canonical example of a sound wave is one that takes the form of a sine wave. That is, a wave that smoothly oscillates from peak to valley (there are many other possible wave forms, true sine waves rarely occur naturally). The frequency at which that sine wave propagates is called fundamental or, in musical terms, the tonic note. While very important, sound that only consists of the fundamental frequency can fatigue the ear quickly. Additional frequencies that are even multiples of the fundamental make the final tone more complex and interesting. These additional frequencies are called harmonics and they interfere with the fundamental to produce a more complex wave form.
With this small bit of phyics and the Audio::WAV module, you can produce wave files of any tone you want. By extending the code shown here, your scripts can write out entire songs in glorius 16-bit, 44100hz WAV files. The key is to understand how to use Audio::WAV to write out audio information.
Because this is such a new module (it's only up to 0.2), the documentation is a little underpowered. However the core of what you need is there. The Audio::WAV class has two child classes it uses to read and write WAV files (called Audio::WAV::Read and Audio::WAV::Write respectively). Instead of directly instantiating an Audio::WAV::Write object, Audio::WAV has a write() method that returns a new Audio::WAV::Write object. For instance:
my $wav = Audio::Wav->new;
my $write = $wav->write($outfile,
{
bits_sample => $bits_sample,
sample_rate => $sample_rate,
channels => 1,
}
);
Audio::WAV::Write also has a write() method, but it expects to be passed at least one point of wave data to write out to the appropriate file.
$write->write( sin($pi * $time) * $max_no );
(note: the documentation claims that write() can take an array of samples, but that only produced empty 46 byte WAV files for me.)
Although it seems simple enough to feed write() random numbers, the trick is in understanding how to generate meaning data (isn't that always the way). This discussion is limited to talking about sine waves since that does not exceed my mathematical acumen.
Like the graph of a sine wave made by an eighth-grader, the WAV file consists of points that represent the wave's amplitude at a given point in time (it's a bit more complicated than that, but the Audio::WAV module lets me work at this level). Successive calls to write() place a new point on this imaginary graph at the next available time slot (see below for an explaination of how time is subdivided along this "X-axis" of time). Once the maximum and minimum values for wave's amplitude are know, it's a very simple math problem to determine the appropriate "y value."
y = sin(PI * x) # if you have an X value, find Y
In this case, the X value is going to be a slice of time which is
determined by both the frequency of the fundamental and the sampling rate
of the WAV file. The higher the sampling rate, the more X values are produced.
But, how many time slices (that is divisions of the X-axis) are needed?
This is a function of how many seconds you want the sound to last times
the sampling rate.
number of X-axis divisions = seconds * sample rate
The value of each X-axis point is:
X-axis value = (X-axis offset/sample rate) * hertz
Now we're getting somewhere! You can approximate PI with (22/7) and now
you know your X-axis values. You only need to know the range of allowable
amplitudes for this wave file to determine valid Y-values. Recall that sound
wave amplitude is perceived as loudness by human ears. It turns out that the
amplitude is governed by the bit resolution of the WAV file.
max_amplitude = (2 ** bit resolution) / 2
Why are we raising 2 by the power of the bit resolution? For the same reason that you set your video card to the highest video resolution. The more bits, the more graduation. I assume that WAV files allocate the number of bits designated by the bit resolution for each sample of sound to represent the amplitude of the wave file at that time. The result is divided by two because the wave has positive and negative peaks. In effect, it's like the number is signed (in fact, it may be in the WAV file).
Putting this mess together, the amplitude of the wave at a given sample
is found like this:
current amplitude = sin(PI * x-axis value) * max amplitude
Because sin() produces a number between 1 and -1, the amplitude will either be at the maximum amplitude or smaller. By added a scalar to the maximum amplitude, you can control the volume of the samples too. In Perl code, producing each point on the sine wave is done like this:
for my $pos (0..$len) {
my $time = ($pos/$sample_rate) * $hertz;
$write->write( sin($pi * $time) * $max_no );
}
This code produces a sine wave with only the fundamental frequency. If you wanted to add the second harmonic to this wave, simply double the hertz value every other iteration.
for my $pos (0..$len) {
my $hz = $hertz;
if ($pos % 2 == 1) {
$hz *= 2;
}
my $time = ($pos/$sample_rate) * $hz;
$write->write( sin($pi * $time) * $max_no );
}
It's easy enough to generalize this code to support any harmonic. I thought it would be fun to add an arbitrary number of harmonics to the fundamental.
my $next = 0;
for my $pos (0..$len) {
my $hz = $hertz;
# throw in some harmonics, but keep the tonic dominate
if ($pos % 2 == 1) {
$hz *= $harmonics->[$next++];
}
$next = 0 if $next >= @{$harmonics};
my $time = ($pos/$sample_rate) * $hz;
$write->write( sin($pi * $time) * $max_no );
}
Notice that the fundamental is represented at least as often as any additional harmonic. The more harmonics are added, the more the fundamental dominates. This may not be entirely what you want, but at least you now have some place to start tinkering.
Wouldn't it be great if someone wrapped this into an easy to use perl script? You bet it would be!
#!/usr/bin/perl
# Create sine wave WAV files
# Based on code found in Audio::WAV::Write POD
# jjohn 12/2002
use strict;
use Audio::Wav;
use Getopt::Std;
my %opts;
getopts('?hb:f:H:s:t:V:z:', \%opts);
if ($opts{h} || $opts{'?'}) {
print usage();
exit;
}
my $outfile = $opts{f} || 'out.wav';
my $hertz = $opts{z} || 440;
my $seconds = $opts{t} || 2;
my $harmonics = $opts{H} || 1;
my $sample_rate = $opts{s} || 44100; # CD quality;
my $bits_sample = $opts{b} || 16; # 4,8,16 are all good choices
my $volume_scalar = 1;
if ($opts{V} < 1 && $opts{V} > 0) {
$volume_scalar = $opts{V};
}
my $wav = Audio::Wav->new;
my $write = $wav->write($outfile,
{
bits_sample => $bits_sample,
sample_rate => $sample_rate,
channels => 1,
}
);
my $pi = (22/7); # close enough;
my $len = $seconds * $sample_rate;
my $max_no = (2 ** $bits_sample) / 2 * $volume_scalar;
# split Harmonics value into an array
$harmonics = [ split /\s*,\s*/, $harmonics ];
my $next = 0;
for my $pos (0..$len) {
my $hz = $hertz;
# throw in some harmonics, but keep the tonic dominate
if ($pos % 2 == 1) {
$hz *= $harmonics->[$next++];
}
$next = 0 if $next >= @{$harmonics};
my $time = ($pos/$sample_rate) * $hz;
$write->write( sin($pi * $time) * $max_no );
}
$write->finish;
sub usage {
return <<EOT;
$0 - Create fancy sine wave WAV files
USAGE:
# a 3 second 440hz WAV called 'outfile.wav'
$0 -f 'outfile.wav' -z 440 -t 3
OPTIONS:
? Print this screen
h Print this screen
b <num> bit resolution (defaults to 16-bit)
f <str> name of the outfile (defaults to 'out.wav')
H <num> Add this harmonic to the base tone. Can be a comma-separated list.
s <num> sample rate (defaults to 44100 (CD quality))
t <num> number of seconds to make the file (default is 2)
V <num> Volume multiplier (decimal values cut the default MAX volume)
z <num> Frequency in hertz of the WAV file (default is 440)
EOT
}
Next time, I'll look at managing the wave forms better to produced rudimentary FM synthesis. Together with Perl's ability to read MIDI files, you can turn existing MIDI files into "fully realized" WAV files without using a sound card!
Zen thought for the day: If a WAV file is produced on a machine without a sound card, is there any way to tell if the program worked correctly?
For State Secrets, I'm thinking about breaking out the interface into templates. These can be better manipulated by end users. Since I want SS to run on Winders with as little fuss as possible, I want something that PPM will install without fuss. Template Toolkit, my first choice, is unknown to PPM. Bummer. MJD's Text::Template is not. Here's a script that morphs Text::Template into something TT-ish. Note that does not get TT syntax with this, but it's close enough. I present this hear without intelligent comment so that I can find on the web later.
The processor a command line tool
#!/usr/bin/perl --
# See if I can make Text::Template more TT2 like
use strict;
use warnings;
use Text::Template qw(fill_in_file);
use constant TEMPLATES => './templates';
use constant SOURCE => './src';
use constant CONSTANTS => './templates/constants';
my $infile = (shift @ARGV || "");
while (! $infile || ! -e SOURCE . "/$infile") {
print "Which file should I process? \n";
$infile = <>;
chomp $infile;
}
get_constants(CONSTANTS);
my $config = {
TEMPLATE_DIR => TEMPLATES,
SOURCE_DIR => SOURCE,
# template functions
include => \&include,
};
my $processor = Text::Template->new(TYPE => 'FILE',
SOURCE => SOURCE . "/$infile",
DELIMITERS => ['[%', '%]']
);
my $text = $processor->fill_in(HASH => $config,
PACKAGE => "__CONSTANTS",
);
print $text, "\n";
#------
# subs
#------
sub get_constants {
my ($file) = @_;
return unless -e $file;
package __CONSTANTS;
do($file) or die "Can't parse $file: $@";
package main;
return;
}
# includes happen in the templates dir
sub include {
my ($file, %args) = @_;
return fill_in_file(TEMPLATES . "/$file", HASH => \%args);
}
Templates
header
<html>
<title>{ $title }</title>
<body>
footer
</body> </html>
Constants roughly like TT's 'config'
$foo = "bar";
# This is a nutty test
%requires = (bar => 1, b =>2);
$sam = [0,3,5];
sub hairy {
"I like beans!\n";
}
Source File <p>hello.html
[% include('header', title => 'hello') %]
Hello, [% $foo %]
Hello, [% hairy() %]
[% include('footer') %]
Here's the deal: you're on win32, you need to fork() and you're using Perl. Since version 5.6, win32 Perl has had fork() emulation. That's good. It's emulation (accomplished at the interpreter level with threading), so standard Unix tricks like using a signal handler to reap children and control the number of child process don't work. That's bad.
Here then is my Win32 Perl recipe for the week: Limiting forked child "processes" under Win32 Perl (versions 5.6 and higher).
use strict;
use POSIX ":sys_wait_h";
use constant MAX_KIDS => 4;
my (%children, $quit) = ((), 0);
print "[$$] Entering main loop\n";
while (!$quit) {
reap(\%children);
if ((keys %children) <= MAX_KIDS) {
print "[$$] Forking child $_\n";
if (my $pid = fork()) {
$children{$pid} = 1;
} else {
do_child();
exit;
}
} else {
$quit = 1;
}
sleep(1);
}
# reap existing kids
while (keys %children) {
reap(\%children);
sleep(1);
}
print "All children reaped\n";
#--------
# sub
#--------
sub reap {
my ($kids) = @_;
for (keys %{$kids}) {
print "[$$]child '$_' reapable?\n";
next if waitpid($_,WNOHANG()) != -1;
print "[$$]child '$_' reaped\n";
delete $kids->{$_};
}
}
sub do_child {
sleep(1) for 0..10;
print "[$$] done\n";
}
Note that you can do other work in the main reaping loop.
The main loop is really the service state loop for a Win32 service. Also note that reaping the child "processes" isn't strictly necessary. Not only are the children not real processes (they are interpreter threads), but also no zombies can be created if the parent process exits without calling wait(). All child "processes" will be terminated with the parent. So you've got that going for you.
See David Roth's web site, books or articles for more excellent details on the devilish art of creating Win32 services with Perl.
[Original use.perl.org post and comments. Minor cleanup on 11/30/2007.]
UPDATE: It's nice to see that this script still works in 2009, four years after I wrote it. Must have done something right.
Whether pornography is psychologically damaging, socially corrosive or just plain nasty, it is clear that at least for men, it is a strong attractor and motivator. It is time we left behind antiquated nineteenth century morality and boldly embraced this core driver of male behavior to fill the more serious deficit of quality programmers in the U.S.A.
To this end, I submit my own work, a perl script that fetches the publicly available gallery at IShotMyself.com. The real benefit of this script (well, at least secondarily) is that it is my first concerted use of Andy Lester's WWW::Mechanize. Because of the positive reinforcement generated by this script, I am more likely to use this module in other, less pornographic work -- and so I move the Wheels of Industry forward! Adam Smith's Invisible Hand of Capitalism guides me!
Briefly, the script fetches the front page and looks for a link labled "FOLIO [\d+]". This page is then fetched and links that start with "javascript:" are culled. By adding on "&m=img" to these links, it is possible to get to the actual picture in question. These images are stored in a directory on my Winders box, hence the funny path names.
Based on the impressive success of this experiment, I recommend that we teach children to read by giving them digests of Penthouse Forum. We must always be thinking of the children!
I further recommend that we ease the suffering of the impoverished by eating their babies.
More Good Ideas (™) to come...
use strict;
use WWW::Mechanize;
my $base_url = qq[http://www.ishotmyself.com];
my $main_page= qq[/public/main.php];
my $dest_dir = "ishotmyself";
print "Fetching $base_url/$main_page\n";
my $B = WWW::Mechanize->new;
$B->get(qq[$base_url/$main_page]);
unless ($B->success()) {
die "Couldn't fetch main page: ", $B->status();
}
print "Main page fetched\n";
print "Dumping folio links\n";
my $todays_gallery;
my $gallery = "unknown";
foreach my $l ($B->links()) {
next if $l->text() !~ /folio \[\d+\]/i;
$todays_gallery = $l;
($gallery = $l->url) =~ /([^=]+)$/;
$gallery = $1;
print "\t", $l->text(), " : ", $l->url(), "\n";
}
# find today's folio name
print "Fetching ", $todays_gallery->url, "\n";
$B->get($todays_gallery->url);
unless ($B->success()) {
print "Couldn't fetch '" . $todays_gallery->url(),
"' : ", $B->status(), "\n";
print $B->content, "\n";
}
#open OUT, ">out.txt";
#print OUT $B->content, "\n";
#close OUT;
print "Dumping folio links\n";
my @found;
foreach my $l ($B->links()) {
next if $l->url !~ /^javascript:popupLandscape/;
print "\t", $l->text(), " : ", $l->url(), "\n";
# unjavascript!
(my $url = $l->url()) =~ /\('([^']+)'\)/;
if ($1) {
push @found, "$1&m=img";
} else {
warn "Couldn't unjavascriptify!\n";
}
}
unless (-d $dest_dir) {
mkdir $dest_dir;
}
$dest_dir .= "\\$gallery";
unless (-d $dest_dir) {
print "Creating $dest_dir\n";
mkdir $dest_dir || warn "mkdir $dest_dir failed: $!";
}
print "chdir $dest_dir\n";
chdir $dest_dir;
print "Fetching public images\n";
my $cnt = 1;
for my $l (@found) {
$B->get($l);
unless ($B->success()) {
warn "picture fetch failed: ", $B->status, "\n";
next;
}
my $imgfile = sprintf("${gallery}_%03d.jpg", $cnt++);
if (-e $imgfile) {
print "\tOVERWRITING $imgfile\n";
}
if (open(OUT, ">$imgfile")) {
binmode(OUT);
print "\tWriting $imgfile\n";
print OUT $B->content();
close OUT;
} else {
warn("open $imgfile failed : $!");
}
}
print "done\n";
Here's a quick note to me on how to create a CGI script that streams a process to the browser in such a way as to prompt the user for a "save as" box. This is technique is older than dirt, but still useful.
#!/usr/bin/perl --
use strict;
use CGI;
use POSIX q[strftime];
my $q = CGI->new;
if (open my $in, "/usr/bin/zip -r - logs/ |") {
my $filename="lhp_logs-".strftime("%Y%m%d", localtime()) . ".zip";
print $q->header({ "-type"=>"application/x-unknown",
"-Content-Disposition" =>
"attachment; filename=$filename",
}
);
my $buf ="";
while (read($in, $buf, 4048)) {
print $buf;
}
} else {
die "Oops: $!";
}
The error checking isn't particularly robust here. Please gold-plate this code as needed for your next review.
SOAP::Lite is a suite of Perl modules for doing SOAP RPC. SOAP::Lite was originally written by mad Russians. Its incredible flexibility is also it's main drawback. Debugging isn't as obvious as it could be with this library.
Most of the time you the would-be SOAP scripter want to know what the request and response XML messages look like. The SOAP::Trace doc doesn't make this clear (since the sample code is filled with mistakes). Here is one way to get SOAP to spew the XML messages to STDOUT they way you'd expect from a more humble module like Frontier::Client.
use SOAP::Lite "trace" => ["transport" => \&log_it];
sub log_it {
my ($in) = @_;
if (ref $in && $in->can("content")) {
printf "**GOT: %sn", (ref $in);
print "-"x60, "n";
print $in->content, "n";
print "-"x60, "n";
}
}
There's a lot of fun things going on in the log_it() subroutine. Notice the use of the oft-forgotten can() method, which all Perl objects have. Yes, this is a very special code review.
Should SOAP::Lite have a simple flag like "trace_xml" which does all this for you? Sure it should. But until then, you've got this humble blog and your old Uncle jjohn to help you.
Now get the hell off my lawn, you kids!
Perl offers many ways to commafy a number. That is, insert commas every three number for integers larger than 999. Here is my commafy routine:
sub commafy {
my $num = shift;
my $new = "";
while ($num =~ s/(?<=d)(d{3})$//g) {
$new = ",$1$new";
}
$new = "$num$new";
return $new;
}
It works backwards from right to the left. It uses the "new" look behind assertion in the regex. Works fine in Perl 5.8, and I think Perl 5.6.
UPDATE 2: Please see this post for the latest working version of this module.
UPDATE 1: This code works with XML::RSS version 1.05 or so. The newest versions of this library removed the encode() method for reasons beyond my reckoning. You can either use this code as a starting point for porting to the new XML::RSS module (and tell me how you did it!) or simply use the older version, which is still available on CPAN.
Like searching for Bigfoot, creating a podcast feed that's recognized by Apple can be an elusive, furtive and lonely process. Most know that podcasts are really just RSS 2.0 feeds with some extra tags. This seems like something perl should handle. The perl module XML::RSS nearly has everything necessary, but there's always a catch: the itunes namespace.
All is not lost, because XML::RSS is a class that can be inherited from. With a little overriding goodness, you too get make valid feeds that even Apple's iTunes music store will accept. Here's my module that inherits from XML::RSS. While it's not a complete solution, it works well enough for me.
package XML::RSS::Podcast;
use XML::RSS;
@XML::RSS::Podcast::ISA = qw[XML::RSS];
sub as_string {
my $self = shift;
return $self->as_podcast_rss;
}
sub as_podcast_rss {
my $self = shift;
my $enc = $self->{encoding};
my $output = <<EOT;
<?xml version="1.0" encoding="$enc"?>
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
version="2.0">
EOT
$output .= $self->podcast_start_channel;
for my $i (@{$self->{items}}) {
$output .= $self->podcast_item($i);
}
$output .= $self->podcast_end_channel;
return $output .= "n</rss>n";
}
sub podcast_start_channel {
my $self = shift;
my @fields = qw[ttl title description link language
pubDate lastBuildDate creator
webMaster copyright
];
my @image_fields = qw[title url description link
width height];
my @itunes_fields = qw[subtitle author summary
image];
my $output = "<channel>n";
for my $f (@fields) {
if (length($self->{channel}->{$f})) {
my $s = $self->encode($self->{channel}->{$f});
$output .= "t<$f>$s</$f>n";
}
}
my $seen_image = 0;
for my $f (@image_fields) {
if (length($self->{image}->{$f})) {
unless ($seen_image) {
$output .= "t<image>n";
$seen_image = 1;
}
my $s = $self->encode($self->{image}->{$f});
$output .= "tt<$f>$s</$f>n";
}
}
if ($seen_image) {
$output .= "t</image>n";
}
# Owner name/email not handled
for my $f (@itunes_fields) {
if (length($self->{channel}->{itunes}->{$f})) {
my $s=$self->encode($self->{channel}->{itunes}->{$f});
$output .= "t<itunes:$f>$s</itunes:$f>n";
}
}
# FIXME: Doesn't handle sub cats.
if (ref $self->{channel}->{itunes}->{category}) {
for my $c (@{$self->{channel}->{itunes}->{category}}) {
my $s = $self->encode($c);
$output .= qq[t<itunes:category text="$s" />n];
}
}
return $output . "n";
}
sub podcast_end_channel {
return "</channel>n";
}
sub podcast_item {
my $self = shift;
my $item = shift;
my @fields = qw[title guid pubDate description];
my @itunes_fields = qw[author subtitle summary
duration keywords explicit];
my $output = "t<item>n";
for my $f (@fields) {
if (defined $item->{$f}) {
$s = $self->encode($item->{$f});
$output .= "tt<$f>$s</$f>n";
}
}
if (ref $item->{enclosure}) {
$output .= "<enclosure";
for my $f (qw[url length type]) {
if (defined $item->{enclosure}->{$f}) {
$output .= qq[ $f="$item->{enclosure}->{$f}"];
}
}
$output .= "/>";
}
for my $f (@itunes_fields) {
if (defined $item->{itunes}->{$f}) {
$s = $self->encode($item->{itunes}->{$f});
$output .= "tt<itunes:$f>$s</itunes:$f>n";
}
}
return $output .= "t</item>n";
}
A word about the RFC822 pubDate. This seemingly arbitrary date format
can be easily generated with a call to strftime(). The
format string is "%a, %e %b %Y %H:%M:%S %z". You might think
that you can use mysql's DATE_FORMAT() to replicate this, but you'd be wrong.
Instead, generate mysql queries with UNIX_TIMESTAMP(), feed the result of
that to localtime() and feed that to strftime(). Simple, no? No, but such
are the challenges in programming.
Here's a sample of how I use this for pseudocertainty.com.
use strict;
use DBI;
use POSIX qw[strftime];
use MP3::Info;
my $rssfile = shift || "./ps-pod.rss";
my $dbh=DBI->connect("dbi:mysql:pseudo", "pwrUser", "s3cr3t")
or die "connect: $DBI::errstr";
my $shows = get_shows($dbh);
$dbh->disconnect;
my $rss = XML::RSS::Podcast->new(version => "2.0");
my $rfc822_fmt = '%a, %e %b %Y %H:%M:%S %z';
my $iMeta = { "author" => "Joe Johnston and Mike Lord",
"summary" => 'UFOlogy, Cryptozology and
the people who love them are
discussed on this
internet-only radio show',
"subtitle" => "Don't be Certain.
Be PseudoCertain.",
"category" => ["Talk Radio"],
};
$rss->channel(title => 'PseudoCertainty',
"ttl" => 60, # time to live
link => 'http://www.pseudocertainty.com/',
language => 'en-us',
description => 'UFOlogy, Cryptozology and the
people who love them are discussed
on this internet-only radio show',
copyright => "Copyright Joe Johnston and Mike Lord",
webMaster => "jjohn@pseudocertainty.com",
pubDate => strftime($rfc822_fmt,localtime()),
"itunes" => $iMeta,
);
for my $r (@$shows) {
# no more than 30 words
my @words = map { s/(<[^>]+>)//g; $_; }
split /s+/, $r->{about};
my $desc = "not set";
if (@words > 30) {
$desc = join " ", @words[0..29], "...";
} else {
$desc = join " ", @words;
}
my $finfo = get_mp3info("/path/to/shows/$r->{mp3_filename}");
my $pl = qq[http://pseudocertainty.com/$r->{mp3_filename}];
my $enc = { url => $pl,
length => -s "/path/to/shows/$r->{mp3_filename}",
type => "audio/mpeg",
};
my $itunes = {
explicit => "N",
keywords => "UFO aliens zorknapp",
summary => $desc,
duration => $finfo->{TIME},
}
$rss->add_item( title => $r->{title},
link => $pl,
pubDate => strftime($rfc822_fmt,
localtime($r->{pretty_created})),
enclosure => $enc,
permaLink => $pl,
description => $desc,
"itunes" => $itunes,
);
}
$rss->save($rssfile);
#------------------------
# subs
#-------------------------
sub get_shows {
my ($dbh) = shift;
my $sql = qq[
SELECT *,UNIX_TIMESTAMP(created) as pretty_created
FROM shows WHERE publish = 1 ORDER BY created DESC;
];
my $sth = $dbh->prepare($sql);
die "get_shows: '$sql': " . $sth->errstr unless $sth->execute;
return $sth->fetchall_arrayref({});
}
And they called me a fool for wanting to make XML::RSS spew podcast RSS. But I showed them! I showed them all! Bwahahhaha!
(This article continues my thoughts on the taskboy CMS.)
How the taskboy CMS works
Once I decided that content would be managed through emacs (to as large a degree as possible), the rest fell mostly into place. The blog, the music section, the polls and the ratings would all be stored in mysql and accessed through a XML-RPC API. I would use PHP to define the layout, pulling the content from the database where needed. Templates as such are not used. To my thinking, a PHP page is the template. I also decided against database abstraction classes, since I'm unlikely to move from mysql any time soon. I do have a collection of PHP utility functions (like, sql_insert, sql_delete, sql_select) to make database access less painful. Each PHP page calls the same header and footer pages. Much of this code was developed along side State Secrets. Together, this makes the PHP stuff pretty easy to modify.
Getting from emacs to PHP is a little circuitous, so please bear with me. It is straight forward to write a perl script that's an XML-RPC client using the Frontier::RPC2 library. So that's what I did first. I verified that I could talk to the PHP page that processes XML-RPC requests. Emacs is an extentable editor using the macro language lisp. The creator of Perl, Larry Wall, said of lisp that it had all the visual appeal of "porridge with toenail clippings" and I agree. However, I did learn just enough lisp to write the current emacs buffer to standard out to be read by a perl script which could then make the appropriate XML-RPC request and make some snappy response that emacs could deal with. This solution is what I wrote about on use.perl.org.
Gnu Privacy Guard
The new wrinkle for taskboy is security. The XML-RPC messages go across the network in clear text. The primary risk I wanted to address is not that someone will see my blog before it's posted, but that an unauthorized fool would mess with my XML-RPC service. Whatever authorization mechanism I choose would have to work over clear text. It's true I could have used SSL with HTTP Authentication for the web services PHP page, but I didn't want to. Fortunately there is already a solution for this kind of problem, but for a different form of internet messaging.
Back in the mid-80's, Phil Zimmerman had a problem: he couldn't prove he was him. That is, email that claimed to be from him could have been forged by some joker only claiming to be him. How could those receiving email from he be assured that the sender Phil Zimmerman was the Phil Zimmerman? The answer became known as Pretty Good Privacy and it involved some very scary math. But you can think of it as something like a lock and key mechanism. When an email is sent out, a Very Big Number is computed with the content of the message and your private key. Your private key has a sibling called a public key that the recipient of mail will already have (and verified). When the recipient gets this message, pgp uses the public key on file to decode the message (or signature). If nothing has been changed in the message, the math will work out (via magic) and you can be pretty sure that Phil indeed has told you to "go pound sand."
The important concept here is that PGP was meant to guarentee the identity of a sender using a message that anyone could read, but not change. Now in web services, I also have messages that anyone could read, but I want the server to accept only requests from me. Although it's not a seemless fit, PGP turns out to be a good authenication method for private web services. Here's how I modified Edd Dumbill's XML-RPC PHP library and Ken MacLeod's Frontier::RPC to use Gnu Privacy Guard (any open source version of PGP) to look down my web service. The strategy in both cases is that requests should be signed, not responses. It would be staight-forward to implement response signing too, but I don't deem it necessary for my application.
Tweaking the PHP server
This class merely extends the xmlrpc_server class found in xmlrpcs.inc. I need to intercept the content, verify the signature, remove it if the message checks out and pass the rest of the XML doc to the parent class for handling. Hats off to Edd and the boys for getting the class partitioned so that I needed to override only one method.
One PHP tip: name your class files with .php. That way, you can point a browser to them and check the syntax. After all the syntax typos are gone, the page will appear blank. The the contents of files with .inc extensions are typical just displayed by the web server without parsing.
VerifyRequest($data)) {
return $this->RPCError("Couldn't verify request");
}
$data = $this->RemoveSignature($data);
}
# pass off to parent
return parent::parseRequest($data);
}
#-----------------------------------------
# Look at the body of the request. Does it have
# a signature to verify?
function VerifyRequest ($data="") {
# BTW: I hate this solution
# write out to a tmpfile
$infile = "/tmp/" . posix_getpid() . ".vrf";
if ($fh = fopen($infile, "w")) {
fwrite($fh, $data);
fclose($fh);
} else {
return 0;
}
# is this signed by someone I trust?
$cmd = "/usr/bin/gpg --homedir=/path/to/gpg "
. "--verify <$infile 2> /dev/null";
$retval = 1; # default to failure
if (file_exists($infile)) {
system($cmd, $retval);
} else {
return 0;
}
unlink($infile);
return $retval ? 0 : 1;
}
#-------------------------------------------
# remove signature header/footer
function RemoveSignature ($data="") {
# for GPG
# strip of the GPG stuff to get the basic XML back
$preamble = "/-----BEGIN PGP SIGNED MESSAGE-----r?n"
. "Hash: SHA1r?nr?n/";
$footer = "/-----BEGIN PGP SIGNATURE-----r?n"
. "Version: .+r?nr?n(S+r?n)+"
. "-----END PGP SIGNATURE-----/";
$data = preg_replace($preamble,"", $data);
$data = preg_replace($footer,"",$data);
return $data;
}
#--------------------------------------------
# wrapper for easier (and non-granular) error reporting
function RPCError ($msg=0) {
return new xmlrpcresp(0,500,"Bad request: $msg");
}
}
?>
A few notes on this amateurish PHP code. First, any security wonk will tell you not to create temp files with PID names. In my case, I trust the other users on my server and don't feel compelled to improve the security here. You may want to. I'm using the fact that gpg process has an exit value of 0 if the verify succeeds. The only way I saw of getting the exit value of a process in PHP is by using system(). There are a couple of other process handling functions, but those didn't seem to give me this simple result to check (I could have used popen() and grepped through the output, but that seemed painful [although I might have done that if this were a perl module]).
parseRequest() is called by the parent class to unpack the XML request. Here, I look for the GPG signature and if all goes well, I pass just the XML string to the parent parseRequest() for processing.
Keep in mind that PHP runs as whichever user Apache runs as. This affects GPG. You have to set up the file ownership for the keys so that Apache can read and write to a directory. You should create keys specifically for this web service and not reuse your own GPG stuff. You were warned.
This class is used identically to the xmlrpc_service class defined in xmlrpcs.inc. No, I don't know what the "da_" stands for in the class name. I though I wrote "ds_", which would have stood for "digital signature."
Expanding the Frontier
For the perl client, I simply defined to classes at the start of the program. Keep in mind, this is a win32 perl program.
package RPCEncoder;
use Frontier::RPC2;
@RPCEncoder::ISA = qw[Frontier::RPC2];
sub encode_call {
my ($self) = shift;
my $request = $self->SUPER::encode_call(@_);
# sign it. 2-way opens hurt my brain
my $outfile = "C:/blog/tmp.txt";
unlink $outfile;
my $cmd = qq[|C:/blog/gnupg/gpg.exe --homedir=/blog/gnupg ]
. qq[--clearsign > $outfile];
open GPG, $cmd or die "Can't proc open: $!";
print GPG $request;
close GPG;
open IN, $outfile or die "Can't open signed $outfile: $!";
undef($request);
while () {
$request .= $_;
}
close IN;
unlink($outfile);
return $request;
}
sub decode {
my ($self) = shift;
my ($string) = shift;
my %args = ('Style' => 'Frontier::RPC2',
'use_objects' => $self->{'use_objects'},
);
$self->{'parser'} = XML::Parser->new(%args);
return $self->{'parser'}->parsestring($string);
}
#-----------------------------------------------------
package RPCClient;
use Frontier::Client;
@RPCClient::ISA = qw[Frontier::Client];
sub new {
my ($self) = shift->SUPER::new(@_);
my %args = ('encoding' => $self->{'encoding'},
'use_objects' => $self->{'use_objects'}
);
$self->{'enc'} = RPCEncoder->new(%args);
return $self;
}
The perl is a little weirder because of the way the Frontier Client works with XML::Parser, itself a horrible creation of Cthulhu. The Frontier::Client constructor needs to be overrided so that I can insert my custom RPCEncoder class, which is a thin coating over Frontier::RPC2. All the XML encoding and decoding happens in Frontier::RPC2 and that's what I need to intercept.
When making a request, I need to sign the XML string before it goes on the wire. All things being equal, I'll like to open the gpg process for reading (to feed it the string I've got in memory), but also read from it to get the output. This is a kind of double pipe, which is easy to do in shell, but weird to do with perl and especially so on Windows. Once again, I write a temp file and I don't even pretend to give security a mind. Windows boxes are typically single user machines and mine doubly so. Also note that I don't need to worry about running as a different user when I make the XML-RPC request. I'm in emacs (which runs as the current user); it spawns a shell to run perl; perl spawns a shell to run gpg.exe). All these processes run will run as me.
I had to also override decode(), because the parent uses ref($self) to determine the class name of the XML callbacks (n.b. BAD MONKEY!). This
really should have been hard coded to 'Frontier::RPC2' since the callbacks all
have hardcoded class names (see the code for the real scoop). I think this was
an attempt to make child classes easier to write, but this trick backfired.
A Quick Note on GPG setup
Getting up to speed on how GPG works took longer than integrating it into
the taskboy web service. I cannot go in to all the set up details here, but
if you are familiar with ssh key mananagement, you will be well ahead of the
game in GPG. If ssh keys make your brain hurt, GPG is a veritable migraine.
But it boils down to this: you must make a GPG key pair for the source machine
with the perl/emacs setup. You must copy the public key to the server. You
must import that key into GPG and verify it (with gpg --edit).
If you don't do all of these steps, this digital signature for XML-RPC hack
won't work and you'll be mystified at what went wrong.
Verify your GPG at all stages using test files, so that you can get the
GPG errors.
Note to jjohn: Move the *gpg files to wherever gpg want to find them. It will make things go easier on you.
Here's a perl tip for those trying to report progress on external programs that don't report that kind of information. The case this hack was designed for was gzip, but you'll think of many other examples of this class of problem.
Perl is an incredible flexible language. Of all its wonderous features, the ability to get a file handle to a process is the most arcane and little appreciated. It is, however, the key to reporting how much input is consumed by a process.
Here's a concrete example of what I'm talking about. The compression utility gzip is a stream-oriented program that works on chunks of data that it receives typically from stardard input (STDIN). You can therefore feed gzip a file of any size and it should work, given enough disk space. The larger the file, the longer gzip takes to run (I suppose this makes the runtime a Big O of (n), linear time [so much for using my comp sci degree]).
Occassionally, you'd like to know how far along gzip is in compressing a large file. Gzip does not report this, but does give you the compression ratio at the end of the run, if you called it with the -v flag.
Without hacking gzip, you can create a perl wrapper around gzip in which you can report how many bytes gzip has consumed of the source file. The idea is that the source file is read by perl and feed to gzip. Keep tracking of how many bytes are read in the perl script is simple. Here's some code.
my $infile = shift @ARGV || die "$0n"; open GZIP, "|/bin/gzip -c > out.gz" or die "can't open process to gzip: $!"; # disable output buffering to see the progress report $|++; open IN, $infile or die "Can't open $infile: $!"; my $original_size = -s $infile; my ($buf, $sum); my $chunk = 200; while (read(IN, $buf, $chunk)) { $sum += $chunk; print GZIP $buf; printf "progress: %02.2fr", ($sum/$original_size)*100; } print "n"; close GZIP; close IN;
This short script expects to be called with the name of the file to compress. The output file name is hard coded to be "out.gz", but it's a simple matter of programming to make this more flexible. The magic begins when we open the process to gzip. Here, the GZIP file handle will be written to. The source file is then opened for reading. I choose to read the source file in very tiny chunks to clearly see the progress indicate work. Here, 200 bytes are read from the source file and then feed to gzip. The number of bytes read is tracked and reported in a straight forward way.
Two penetrating glimpses into the obvious. One: this script is built
for some flavor of UNIX. Some modifications would be needed for Windows,
including the use of binmode(IN), binmode(GZIP).
Two: this is really just a specialized echo loop. While I'm not one to
yammer on about coding patterns, I would say that nearly 90% of the code I
write is some kind of echo loop, when you take away the business logic, error
checking and other distractions.
If you only learn one thing for a programming class, it should be the humble echo loop.
About this blog
The taskboy blog is a exploration of computer technology by Joe Johnston. Topics of posts include practical examples Perl, PHP, Python and Java as well as book reviews, industry insights and miscellaneous good stuff.
Latest Feedbag
- This Week In Design: Chair Inspired By A Mussel, Transparent Racecar, Uncorked Wine Timer
- Google Hopeful Of 2010 Launch For iTunes Rival Despite Lack Of Signed Label Deals
- Getting to know the 'Catfish'
- Project Dance Party: Facebooks Secret Twitter-Like Follow Feature
- Three Stations
- TI reveals new teensy projector chip
- Important: Gmail Priority Inbox Should Now Be Available To All
- YC-Funded Cloudant Launches Its NoSQL Cloud Database Platform
- A Singularity in English Spelling
- New Calculations May Lead To a Test For String Theory
Generated: 06:45 on 03/Sep/2010
Recent posts
- Very quick git primer for basic functionality
- Tips for spammers: don't insult me
- CakePHP vs. Symfony: a quick note
- Creating events for Yahoo and Google calendars
- SANs on a budget: iSCSI under Ubuntu
- iPad, iTouch and Kindle: Which is the better mousetrap?
- Rise of the Ad-Hocracy, Part II
- Rise of the Ad-Hocracy, Part I
- Small Hiatus



