Latest News

24th June 2010
Send to twitter Send to Facebook

NOTE: The full code archive of mechanism described below can be found here

Joomla is a PHP-based CMS that enjoys wide-spread popularity. It's got a many built-in features that make it great for blogs and news-oriented sites right out of the gate. Additionally, it supports three kinds of extension mechanism: components, plugins and modules. Components are low-level facilities that generally support the other two. Modules are often user-visible blocks of HTML that can be selectively added to the page users see. Plugins respond to various events (page rendering, authentication requests, etc) generated by the Joomla application.

Joomla comes with a variety of login plugins that all use the login module. These plugins allow users to be validated against an external authentication mechanism like LDAP or GMail.

Sometimes it is desirable to log users into the Joomla system who have already been authenticated by a different system without asking for their credentials again. This is called signle sign-on (SSO). SSO is a very important usability and security feature of many Service-Oriented Architectures (SOA). In this article, I will present a token-based mechanism for creating SSO to joomla using the standard extension methods.

To understand this problem a bit better, it is critical to realize that there are two seperate notions of identity in an SSO schema. There is the previously authorized identity (that is, the identity that the user supplied to the non-Joomla system that originally authenticated them) and the user account on the Joomla system that is stored in the local users table. One of the challenges of SSO is to map the remote identity to the local one. For the sake of this excerise, let's assume that the usernames in both the remote authentication system and the local Joomla one are the same.

The next problem is to create a protocol by which authentication credentials may be passed from the remote system to the local Joomla one. To accomplish this, I choose to use to copy the existing mod_login form and make some minor adjustments to accept HTTP GET parameters. These GET parameters are translated into values in a form that can be processed by the default user compoent. Since the user component calls out the enabled authentication plugins, this the kind of routing is desirable.

This form really needs three bits of information to authorize a user: the username, the session token and a checksum. The username is self-evident. The session token is provided to all authenicated requestors and is discussed more later. The check sum is hash of the username, token and a shared secret known to this system and the remote system passing users to it. More on this later too.

Using a bit of javascript magic, this hidden form is submitted automatically.

Of course, a custom authentication plugin is also required. The plugin needs to read a few of the custom form values that are not passed in through the normal onAuthenticate() call, so it is necessary for the plugin to directly read from the superglobal $_POST. The job of this plugin is very simple. If the token is valid (that is, it can be found in a DB table and is younger than 4 hours) and the hash value of the username, token and shared secret matches the given hash, then the user is authenticated. The user is found in the local system and the response object is populated accordingly.

The session token can be any string identifier. In this case, it is the MD5 hash of the value returned by the PHP built-in uniqid(). This value is generated by a script called 'session.php'. The script generates this value, stuffs it into a DB table and simply echoes the value to the caller.

The key to the security of this system comes from the secret string known only to the remote system that wishes to pass users to the local Joomla system and authentication plugin. This secret is used to generate a hash of the usernam and the session token. By using a hashing mechanism like MD5 or SHA1, this checksum value provides pretty good assurance that the values passed in were from a known and trusted source.

The way the remote system and the Joomla system interact to make this autologin happen is the follow:

  • The remote client calls the session.php script on the local Joomla system
  • The remote client hashes the session token, username and secret
  • The remote client generates a URL to the local Joomla system's homepage that passes in the following GET parameters: u, t, s (for username, token and checksum respectively)
  • The remote client redirects the user to this URL
  • If the token is authenticated, the user is logged into the local Joomla system as a local Joomla user.

You'll also notice that you could easily map all remote users to one generic Joomla user if that is desirable.

I hope you find this useful in crafting your own Joomla solutions

25th May 2010
Send to twitter Send to Facebook

I've been working with the PHP CMS WordPress a lot lately. It's a pretty simple system that doesn't make its internals hard to get to, which I appreciate.

One of the internal functions WP provides is wp_mail. This, you might have guessed, is used to send SMTP mail. The parameter list for this function is a bit long and long parameters lists are hard to remember:

wp_mail( $to, $sub, $body, $hdrs, $attach );

These parameters are pretty self-evident: mail recepient, subject line, body of message, SMTP headers, attachments. The last two parameters are optional. This works great for sending plain, unformated text messages. However, you may want to tweak this a bit.

The first thing you might want to do is change the default sender. This is done by adding a header:

$to = "nemo@uptopia.com";
$sub = "Your submarine parts";
$msg = "I have the new parts for your fabulous machine.";

$headers = array("From: Joe Johnston <jjohn@taskboy.com>");
$h = implode("\r\n",$headers) . "\r\n";

wp_mail($to, $sub, $msg, $h);

This makes the email look like it was sent by me even though the the web server process running the PHP script isn't owned by my account.

Another common task is to send HTML-formated email using this system. To do this, you must change the content type of the message to text/html. This is most easily done through the headers, even though you are supposed to be able to do this through the filter wp_mail_content_type. In my testing, this did not work, but the following code did:

$to = "nemo@uptopia.com";
$sub = "Your submarine parts";
$msg = "<html><body><h1>Awsome news</h1>
    <p>I have the new parts for your fabulous machine.</p>
    <address>--Joe</address>";

$headers = array("From: Joe Johnston <jjohn@taskboy.com>",
	         "Content-Type: text/html"
	         );
$h = implode("\r\n",$headers) . "\r\n";

wp_mail($to, $sub, $msg, $h);

By adding the content type to the headers, the recipient's email client should format the message accordingly.

Of course, sending HTML email has risks. It could be caught in spam filters. The client may not support HTML formatting (although that's rare). The client may disable email HTML from using javascript, CSS or grabbing remote assets like images.

Caveat Spammer.

23rd April 2010
Send to twitter Send to Facebook

I have been working through both the symfony JobBeet tutorial and cakePHP's blog tut. Both are basic CRUD apps. Here's my considered opinion of both.

Symfony is by far the more sophisticated of the two. The class system, ORM integration and YAML configuration really put this framework on par with any written in any other language. It's truly enterprise ready. However, it is simply a beast to learn. Installation virtually requires using an apache virtual host. Creating a skeleton app is easy, but non-intuitive. Just to create and process a simple HTML, you end up touching about 4 or so files in as many directories. Also, there is a heavy reliance on the PHP CLI, which can be an issue if your installation has different versions of php for the shell and apache (which the Mac does). While I can get the JobBeet tutorial going, I cannot get much further -- and I have two of the three books from the developers.

CakePHP is a lot simpler, if less ambitious, than symfony. It appears to be pretty lightweight and has no CLI dependency (although there is a lot of automation that is offered by the cake client). The relationship between the model, controller and view is pretty easily grasped. To create a form for an existing model, you're looking at editing two files in two closely related directories.

If you're looking for a solid MVC framework in PHP, you could do a lot worse than cakePHP.

5th February 2010
Send to twitter Send to Facebook

I'm currently developing a social media application using PHP and sqlite. I don't know if I'll deploy with sqlite, but for development, it works well. I have two CVS sandboxes that I work in for this project. One of these in on my macbook (which comes with sqlite-enabled PHP) and an Ubuntu virtual machine. There are a few gotchas to be aware of when using sqlite in this kind of environment.

Write-protected directories

In other RDBM systems like mysql or postgres, the is a server process that is responsible for reading and writing to the database disk files. With sqlite, this isn't the case. If you're using sqlite through PHP, then the process owner running the PHP must be able to read and write to the location of the sqlite database file. This requirement gets a little more complex when you are running PHP through apache which has its own ideas about directory security.

In must apache setups, your PHP scripts will not be able to write in the web-accessible "document root". You will not be able to keep your sqlite database file in the document root. If you try, you will find errors on INSERT and UPDATES about "database file is locked." However, it is foolish from a security point of view to keep your database file in a web-accessbile directory anyway.

This particular problem hit me hard on Mac OS X. Once user directories are enabled in the apache configuration, your Sites directory becomes a document root. You won't be able to keep your sqlite files in that directory or any subdirectory under it. Instead, created a ~/tmp directory and keep the sqlite database file there.

SQLite version skew: apache/shell

Because sqlite is an embedded system, it is compiled into program you are using. If you are using PHP, you can run into the following issue. I use PHP from the command line all the time. When I create the database, I run a PHP script from the shell to do this. Unfortunately, the command line PHP and the version of PHP compiled into apache may not be the same. Further, the apache PHP may not be compiled with the same version of sqlite. This is the case on Mac OS X. What a mess!

To get around this, always create your sqlite databases through apache/PHP. You will run into far fewer issues this way.

Changing the schema requires an apache restart

Recall that apache is a pre-forking server. If you change the schema of your sqlite database while apache's running, you could get an error in PHP that "schema has changed." Whatever SQL statement you were attempting to run will fail.

From the "don't do that" school of medicine comes this technical advice. If you need to change the schema of an sqlite database, shut down apache first, update the database and restart apache.

I hope this post helps others avoid the mistakes I made.

4th February 2010
Send to twitter Send to Facebook

Facebook will shortly release a tool called HipHop for enhancing the performance of PHP. My understanding of the tool is that it compiles PHP code into C++ which is then compiled into a system native executable. While I have no doubt that this tool does produce significant speed gains over apache/PHP, I do think one needs to be aware of the trade-offs of this kind of system. After all, this isn't the first time a trick like this has been used for a dynamic language.

C++ and PHP are very different languages. I'm not talking about syntax, but how source code is handled. In PHP, the source code is turned into op codes that the PHP interpreter understands. The interpreter knows how to the operating system perform these op codes. In C++, source code is compiled into assembler which is then linked into a system executable which can be run from the shell. Compiled code runs faster than interpreted code for a number of reasons, but the most important is that compiled code is closest to native assembler which essentially is the op code system that the host CPU uses to make stuff happen.

The problem with compiling PHP into C++ is that you lose all the wonderful dynamic features of PHP since these cannot be easily or efficiently translated automatically into C++ source code. The very dynamic nature of PHP (or Perl or Ruby or Python, etc) is what makes these languages accelerate programmer productivity. I think facebook will see this performance hit later.

Let's not forget that Moore's law of CPU power often solves a great deal of performance issues. Hardware is always cheaper than developer time and less prone to bugs.

I favor architectures that take advantage of Moore's law and use horizontal scaling and commodity solutions over fancier tricks that require specialized talent (like erlang). I might suggest caching the opcodes that the PHP interpreter generates and simply running those. This is the essence of the Zend server and how apache/mod_perl/Apache::Registry work. Sure, you don't get quite the performance of compiled code but you'll still see a noticable boost. I believe PHP does some level of this kind of caching right now.

It's true that one can do amazing feats by being clever, but clever doesn't scale (unless you're Google).

3rd February 2010
Send to twitter Send to Facebook

An interesting chart comparing various PHP frameworks. I'm not sure that I can read it correctly. It seems to imply that Zend and CakePHP are the most popular frameworks.

Both frameworks are free, but Zend is clearly optimized for the Zend server platform, which isn't free. Also, I can't help thinking that the audience is somewhat different for these two. CakePHP seems aimed at the more opensource, DIY crowd while Zend is clearly pointed to the enterprise IT crowd. While there is overlap, you can see that Zend is a commercial venture.

I have very mixed emotions about using frameworks. On the hand, frameworks deliver huge dollops of functionality right out of the box. This accelerates the completion of many IT projects. On the other, you get locked into another group's development schedule and, to some extent, the architectural choices they make. Projects built with these tools also expose themselves to bugs and security holes originating in the frameworks. Finally, you end up having to trust or vet the code in the framework.

For an inward-facing intranet product, I think frameworks are great. I'm not sure I'd want to launch something like twitter or facebook with one.

2nd February 2010
Send to twitter Send to Facebook

There is a well-know design pattern called publish-subscribe or the Observer model. The problem this model attempts to solve is one in which a object requires one or more parties to act on it when it changes to a particular state. A concrete example of this event handlers in GUIs including the DOM. Actions made be associated with button presses.

The Observer model seperates the subject from the parties (observers) that act on it. Observers register the interest with the subject. When the subject changes to the desired state, it notifies each of the registered observers.

In PHP, a subject class might be modeled like this:

class Subject {
  private $Q = array();
  function Attach($O) {
     array_push($this->Q, $O);
  }

  function Detach($O) { 
    for($i=0; $i < count($Q); $i++) {
      if ($Q[$i] == $O) {
        array_splice($this->Q, $i, 1);
	break;
      }      
    }
  } 
  
  function Notify() {
    foreach($this->Q as $O) {
      $O->Update($this);
    }
  }  
}

The Attach() and Detach() methods are the API by which Observers register or unregister their interest in a Subject object. When the subject changes into an interesting state, its Notify() method is called. This method in turn calls the Observer Update() method with a reference to the current Subject object. An Observer class might look like this:

class Observer {
  function Update($S) {
     // Do something interesting
  }	   
}

As you can see, an Observer need only implement one well-known method, Update(). This arrangement nicely decouples the Subject from the Observers.

There may be times when a less formal, more functional mechanism is desirable. What if you want just want certain actions to happen on an object when an interesting state obtains? You might use what I call an Action queue to do this. An Action Queue is simply an array of function references that are called by an object at an interesting time. Here's what an Action queue might look like:

class MyClass {
   private $Q = array();

   function Attach($name, $func) {
      $this->Q[$name] = $func; 
   }

   function Detach($name) {
      unset($this->Q[$name]);
   }
  
   function Notify() { 
     foreach ($this->Q as $n=>$func) {
        $func($this);
     }
   }
} 

As you can see, there is no need for an Observer class. Bits of functionality created with create_function() can be attached ad hoc to this class, as the following snipet shows:

  $appender = create_function('$obj', 'return "Got => ".\$obj');
  $MyClassObj->Attach("append", $appender);

Because these code bits are anonymous, an arbitary name is required during the Attach phase in the event that you might want to remove the behavior later.

28th January 2010
Send to twitter Send to Facebook

This article explores the mechanisms by which PHP session handling can be customized to work with non-standard datastore like SQL datbases using the PHP Data Object (PDO) interface.

To frame the context of this discussion, let's briefly look at how the HTTP protocol is designed. HTTP is a stateless client/server protocol. That is an HTTP client, like a web browser, makes request for a resource on a server, which the server responds with. From the server's point of view, each request has no relationship with previous requests. For web application developers, this is a challenge. Often applications need to keep state information associated with a particular user. An application state that's associated with a user is called a session.

What is the default PHP way of addressing this?

Being that PHP was designed for web applications, it's not surprising to learn that it offers built-in support for sessions.

A typical "vanilla" use of PHP sessions looks like this:

<?php
session_start();
$_SESSION["count"]++;
session_write_close();
?>
<p>Count: <?=$_SESSION["count"]?></p>

A quick note on session.auto_start and session_register(). You can make PHP start a session when any PHP is loaded by setting the "session.auto_start" attribute to in php.ini or by invoking ini_set(). However, I do not recommend this for applications that have named users. Sessions should begin only after a user has authenticated. PHP also can make session variables global scalars. This too is a bad idea. Global variables in general are a bad engineering practice. PHP's ability to make GET/POST parameters global variables too creates an opportunity for malicious users to pass in arbitrary session values. Instead, use the super global $_SESSION array to store session variables.

By default, PHP can pass a session ID in URLs or through HTTP cookies. Cookies offer the most transparent and arguably safer mechanism. Also by default, PHP stores session information in flat files on the web server. On a shared web host, this backing store presents a security problem as the files are often stored in the communal /tmp directory. Sessions in flat files also present a challenge to scaling a web application beyond the single host. By storing session information in an SQL system, a user can be shunted to any number of web servers that have access to the session database.

The authors of PHP have defined a simple procedural callback mechanism that allows developers to use any desirable data storage mechanism. Up to six session callback functions can be overridden to customize the way session data is stored using the session_set_save_handler(). These six callbacks are: open, close, read, write, destroy and garbage collection. Let's look at what these callbacks are responsible for.

When the session begins, the open callback is invoked to initialize the datastore. This can mean that a file is opened and its handle stored in a global variable. The open callback is passed two strings: a path and the session name. Both of these parameters can be controlled by the developer. However for our database backing store, these values will be unused.

The close callback is called when the session has ended. All global resources, like filehandles, should be freed at this point. The close callback receives no additional parameters.

The read callback is invoked when the opened session attempts to retrieve previously stored session data. It is passed the current session ID. It should return a serialized data string that it was passed when the write callback was last called for this session. You are not responsible for serializing or deserializing session data, which is a pity. There are some issues with the serialization method used by PHP sessions.

The write callback is invoked when the session is ready to be closed. It is passed a session ID and a serialized data string. The callback must store this information is a place where it can retrieved later.

The destroy callback is invoked when the session is to be explicitly destroyed. It is passed a session ID. Session destruction often happens when a user explicitly signs out of an application.

Finally, the garbage collection callback is invoked automatically by the PHP session mechanism to clean-up old sessions. It is called with a maximum life value expressed in seconds. All sessions older than that value should be destroyed.

Even though this session customization mechanism is expressed as function callbacks, I find that it is architecturally cleaner to wrap up session handling into a static class. It puts all the callbacks and static members into one descrete namespace that can be more easily consumed by other web applications. Please note that much of the error checking as been left out.

First, you will need create the following table in a mysql database (which is called 'myapp' in this code):

CREATE TABLE sessions (
  id int auto_increment not null primary key, 
  d text,
  updated int
);
<?
class Sessions {
  public static $Db = null;
  public static $SessionName = "MySession";

  public static function Open($path, $session_name) {
    if (Sessions::$Db == null) {
      $dsn = 'mysql:host=localhost;dbname=myapp'
      Sessions::$Db = new PDO($dsn, 'db_user', 'db_pass');
    }
    return True;
  }

  public static function Close() {
    if (Sessions::$Db != null) {
      Sessions::$Db = null;
    }
    return True;
  }

  public static function Read($sid) {
    $sql = "SELECT * FROM sessions WHERE id=?";
    $sth = Sessions::$Db->prepare($sql);
    $sth->execute($sid);
    $tbl = $sth->fetchAll();
    return $tbl[0]["d"];
  }

  public static function Write($sid, $data) {
    $vals = array("d"=>$data,
		  "updated" => time());

    $sql = "SELECT COUNT(id) FROM sessions WHERE id=?";
    $sth = Session::$Db->prepare($sql);
    $sth->execute($sid);
    $tbl = $sth->fetchAll();

    if ($tbl[0][0] < 1) {
	// New session
	$vals["id"] = $sid;
        $sql = "INSERT INTO sessions ('id','d','updated') WHERE (?,?,?)";
        $sth = Sessions:$Db->prepare($sql);
        $sth->execute($vals['id'], $vals['d'], $vals['updated']);
      } else {
        // Existing session
        $sql = "UPDATE sessions SET d=?,updated=? WHERE id=?";
        $sth = Sessions::$Db->prepare($sql);
        $sth->execute($vals['d'], $vals['updated'], $vals['id']);
      }
    }
    return True;
  }

  public static function Destroy($sid) {
    if (strlen($sid) > 0) {
      $sql = "DELETE FROM sessions WHERE id=?";
      $sth = Sessions:$Db->prepare($sql);
      $sth->execute($sid);
    }
    return True;
  }

  public static function garbageCollect($max_life) {
    $threshold = time() - $max_life;
    $sql = "DELETE FROM sessions WHERE updated < $threshold";
    Sessions:$Db->query($sql);
    return True;
  }

  public static function initCallbacks() {
    $rc = session_set_save_handler(
				   "Sessions::Open", 
				   "Sessions::Close",
				   "Sessions::Read",
				   "Sessions::Write",
				   "Sessions::Destroy",
				   "Sessions::garbageCollect"
                                   );

    if ($rc === False) {
      Utils::Logger("Could not set session handlers.");
    }

    $cookie_args = array(60*60*24*14, // Lifetime in seconds
			 "/", // Domain path
			 ".example.com", // Domain where valid
			 False, // Send cookie over unsecure connections
			 False, // httponly
                         );

    session_set_cookie_params($cookie_args);
    return True;
  }
}

Sessions::initCallbacks();
?>

This static class defines the session save handlers as static method. When the file is included, it calls another static method to install the session handlers. Most of this code is pretty straight forward, thanks to the placeholder facility of PDO. I would note that the session table's updated column is a unix timestamp rather than a mysql datetime type. This makes the GC reaping code much easier to write.

A note about using auto incrementing IDs for sessions identifiers. It is considered poor security to use an easily predictable sequence for session IDs since an attack can attempt to hijack legitimate sessions by guessing this number. PHP provides a uniqid() function which is much better suited for session IDs.

18th January 2010
Send to twitter Send to Facebook

For those looking for a reasonable emacs mode for PHP work, consider this project. Simply find the php-mode.el file in the download and add the following to your .emacs file:

(load "/path/to/php-mode.el")

Replace "/path/to" with something appropriate for your system.

Now when you edit .php files, you'll get proper indenting and syntax coloring.

26th September 2009
Send to twitter Send to Facebook
"Do not meddle in the affairs of Wizards, for they are subtle and quick to anger."
--J. R. R. Tolkien

Dynamic, weekly-typed languages like Perl and PHP are wonderful productivity engines. It's amazing how much work one can accomplish with so few lines of code. Both languages allow the programmer to treat simple scalar variables as numbers or strings without a lot of casting or explicit conversions. However, there is a price for this magic.

Consider the following PHP code:

  array_push($array, '"' + $file + "'");

This looks harmless enough. It looks like contents of the $file string are being enclosed in double quotes and that new string is being pushed on the end of $array.

Not so fast! The + operator is a little magical. That is, it operates as a concatenation operator when the operands are strings and as a sum operator when the operands are integers. Wait, didn't I just say that values are weakly typed in PHP? How can the interpreter tell the difference between strings and ints?

The answer for both Perl and PHP is that strings that start with integers are considered to be integers for the purposes of magic.

In the code sample above, the filename in question indeed started with "2009-09". PHP took the integer part of the string, 2009, because of the + operator. Then it clearly had a string operand ('"') and a integer (2009), so it "promoted" the integer back to a string, "2009".

And that's how's how I lost my filename, which caused me to spend the next 30 minutes debugging the problem.

30th April 2007
Send to twitter Send to Facebook

I know that you've all been looking for an updated version of the BASIC program that appeared in (1982) Dragon #74 to generate D&D characters, so I push aside my important work to complete this awesome, gold-plated first edition rules, D&D character generator!

It's multi-user, for the web! It even allows you to store and edit an entire Rogues Gallery of NPCs.

All the values are accurate according to the 1981 rulebooks. The real kicker is that character homelands are assigned from pools of locations keyed on class and alignment. Now who's laughing at all my Gazetteers?

Don't all rush to thank me...

UPDATE: In case there wasn't enough retro-fun here, I've added a feature to this generator so that your character stats get written to a scanned character sheet. Now you can face your DM with pride!

10th December 2005
Send to twitter Send to Facebook

(This article continues my thoughts on the taskboy CMS.)

How the taskboy CMS works

Once I decided that content would be managed through emacs (to as large a degree as possible), the rest fell mostly into place. The blog, the music section, the polls and the ratings would all be stored in mysql and accessed through a XML-RPC API. I would use PHP to define the layout, pulling the content from the database where needed. Templates as such are not used. To my thinking, a PHP page is the template. I also decided against database abstraction classes, since I'm unlikely to move from mysql any time soon. I do have a collection of PHP utility functions (like, sql_insert, sql_delete, sql_select) to make database access less painful. Each PHP page calls the same header and footer pages. Much of this code was developed along side State Secrets. Together, this makes the PHP stuff pretty easy to modify.

Getting from emacs to PHP is a little circuitous, so please bear with me. It is straight forward to write a perl script that's an XML-RPC client using the Frontier::RPC2 library. So that's what I did first. I verified that I could talk to the PHP page that processes XML-RPC requests. Emacs is an extentable editor using the macro language lisp. The creator of Perl, Larry Wall, said of lisp that it had all the visual appeal of "porridge with toenail clippings" and I agree. However, I did learn just enough lisp to write the current emacs buffer to standard out to be read by a perl script which could then make the appropriate XML-RPC request and make some snappy response that emacs could deal with. This solution is what I wrote about on use.perl.org.

Gnu Privacy Guard

The new wrinkle for taskboy is security. The XML-RPC messages go across the network in clear text. The primary risk I wanted to address is not that someone will see my blog before it's posted, but that an unauthorized fool would mess with my XML-RPC service. Whatever authorization mechanism I choose would have to work over clear text. It's true I could have used SSL with HTTP Authentication for the web services PHP page, but I didn't want to. Fortunately there is already a solution for this kind of problem, but for a different form of internet messaging.

Back in the mid-80's, Phil Zimmerman had a problem: he couldn't prove he was him. That is, email that claimed to be from him could have been forged by some joker only claiming to be him. How could those receiving email from he be assured that the sender Phil Zimmerman was the Phil Zimmerman? The answer became known as Pretty Good Privacy and it involved some very scary math. But you can think of it as something like a lock and key mechanism. When an email is sent out, a Very Big Number is computed with the content of the message and your private key. Your private key has a sibling called a public key that the recipient of mail will already have (and verified). When the recipient gets this message, pgp uses the public key on file to decode the message (or signature). If nothing has been changed in the message, the math will work out (via magic) and you can be pretty sure that Phil indeed has told you to "go pound sand."

The important concept here is that PGP was meant to guarentee the identity of a sender using a message that anyone could read, but not change. Now in web services, I also have messages that anyone could read, but I want the server to accept only requests from me. Although it's not a seemless fit, PGP turns out to be a good authenication method for private web services. Here's how I modified Edd Dumbill's XML-RPC PHP library and Ken MacLeod's Frontier::RPC to use Gnu Privacy Guard (any open source version of PGP) to look down my web service. The strategy in both cases is that requests should be signed, not responses. It would be staight-forward to implement response signing too, but I don't deem it necessary for my application.

Tweaking the PHP server

This class merely extends the xmlrpc_server class found in xmlrpcs.inc. I need to intercept the content, verify the signature, remove it if the message checks out and pass the rest of the XML doc to the parent class for handling. Hats off to Edd and the boys for getting the class partitioned so that I needed to override only one method.

One PHP tip: name your class files with .php. That way, you can point a browser to them and check the syntax. After all the syntax typos are gone, the page will appear blank. The the contents of files with .inc extensions are typical just displayed by the web server without parsing.

VerifyRequest($data)) {
               return $this->RPCError("Couldn't verify request");
            }
            $data = $this->RemoveSignature($data);
        }
        # pass off to parent
        return parent::parseRequest($data);
    }
    #-----------------------------------------
    # Look at the body of the request.  Does it have
    # a signature to verify?
    function VerifyRequest ($data="") {
       # BTW: I hate this solution
       # write out to a tmpfile
       $infile = "/tmp/" . posix_getpid() . ".vrf";
       if ($fh = fopen($infile, "w")) {
           fwrite($fh, $data);
           fclose($fh);
       } else {
           return 0;
       }
       # is this signed by someone I trust?
       $cmd = "/usr/bin/gpg --homedir=/path/to/gpg "
              . "--verify <$infile 2> /dev/null";
       $retval = 1; # default to failure
       if (file_exists($infile)) {
          system($cmd, $retval);
       } else {
          return 0;
       }
       unlink($infile);
       return $retval ? 0 : 1;
    }
    #-------------------------------------------
    # remove signature header/footer
    function RemoveSignature ($data="") {
        # for GPG
        # strip of the GPG stuff to get the basic XML back
        $preamble = "/-----BEGIN PGP SIGNED MESSAGE-----r?n"
                    . "Hash: SHA1r?nr?n/";
        $footer = "/-----BEGIN PGP SIGNATURE-----r?n"
                  . "Version: .+r?nr?n(S+r?n)+"
                  . "-----END PGP SIGNATURE-----/";
        $data = preg_replace($preamble,"", $data);
        $data = preg_replace($footer,"",$data);
        return $data;
    }
    #--------------------------------------------
    # wrapper for easier (and non-granular) error reporting
    function RPCError ($msg=0) {
        return new xmlrpcresp(0,500,"Bad request: $msg");
    }
}
?>

A few notes on this amateurish PHP code. First, any security wonk will tell you not to create temp files with PID names. In my case, I trust the other users on my server and don't feel compelled to improve the security here. You may want to. I'm using the fact that gpg process has an exit value of 0 if the verify succeeds. The only way I saw of getting the exit value of a process in PHP is by using system(). There are a couple of other process handling functions, but those didn't seem to give me this simple result to check (I could have used popen() and grepped through the output, but that seemed painful [although I might have done that if this were a perl module]).

parseRequest() is called by the parent class to unpack the XML request. Here, I look for the GPG signature and if all goes well, I pass just the XML string to the parent parseRequest() for processing.

Keep in mind that PHP runs as whichever user Apache runs as. This affects GPG. You have to set up the file ownership for the keys so that Apache can read and write to a directory. You should create keys specifically for this web service and not reuse your own GPG stuff. You were warned.

This class is used identically to the xmlrpc_service class defined in xmlrpcs.inc. No, I don't know what the "da_" stands for in the class name. I though I wrote "ds_", which would have stood for "digital signature."

Expanding the Frontier

For the perl client, I simply defined to classes at the start of the program. Keep in mind, this is a win32 perl program.

package RPCEncoder;
use Frontier::RPC2;
@RPCEncoder::ISA = qw[Frontier::RPC2];
sub encode_call {
    my ($self) = shift;
    my $request = $self->SUPER::encode_call(@_);

    # sign it.  2-way opens hurt my brain
    my $outfile = "C:/blog/tmp.txt";
    unlink $outfile;

    my $cmd = qq[|C:/blog/gnupg/gpg.exe --homedir=/blog/gnupg ]
              . qq[--clearsign  > $outfile];
    open GPG, $cmd or die "Can't proc open: $!";
    print GPG $request;
    close GPG;
    
    open IN, $outfile or die "Can't open signed $outfile: $!";
    undef($request);
    while () {
        $request .= $_;
    }
    close IN;
    unlink($outfile);
    return $request;
}

sub decode {
    my ($self) = shift;
    my ($string) = shift;
    my %args = ('Style' => 'Frontier::RPC2',
                'use_objects' => $self->{'use_objects'},
               );                          
    $self->{'parser'} = XML::Parser->new(%args);
    return $self->{'parser'}->parsestring($string);
}
#-----------------------------------------------------
package RPCClient;
use Frontier::Client;
@RPCClient::ISA = qw[Frontier::Client];
sub new {
    my ($self) = shift->SUPER::new(@_);
    my %args = ('encoding'    => $self->{'encoding'},
                'use_objects' => $self->{'use_objects'}
               );
    $self->{'enc'} = RPCEncoder->new(%args);
    return $self;
}

The perl is a little weirder because of the way the Frontier Client works with XML::Parser, itself a horrible creation of Cthulhu. The Frontier::Client constructor needs to be overrided so that I can insert my custom RPCEncoder class, which is a thin coating over Frontier::RPC2. All the XML encoding and decoding happens in Frontier::RPC2 and that's what I need to intercept.

When making a request, I need to sign the XML string before it goes on the wire. All things being equal, I'll like to open the gpg process for reading (to feed it the string I've got in memory), but also read from it to get the output. This is a kind of double pipe, which is easy to do in shell, but weird to do with perl and especially so on Windows. Once again, I write a temp file and I don't even pretend to give security a mind. Windows boxes are typically single user machines and mine doubly so. Also note that I don't need to worry about running as a different user when I make the XML-RPC request. I'm in emacs (which runs as the current user); it spawns a shell to run perl; perl spawns a shell to run gpg.exe). All these processes run will run as me.

I had to also override decode(), because the parent uses ref($self) to determine the class name of the XML callbacks (n.b. BAD MONKEY!). This really should have been hard coded to 'Frontier::RPC2' since the callbacks all have hardcoded class names (see the code for the real scoop). I think this was an attempt to make child classes easier to write, but this trick backfired.

A Quick Note on GPG setup

Getting up to speed on how GPG works took longer than integrating it into the taskboy web service. I cannot go in to all the set up details here, but if you are familiar with ssh key mananagement, you will be well ahead of the game in GPG. If ssh keys make your brain hurt, GPG is a veritable migraine. But it boils down to this: you must make a GPG key pair for the source machine with the perl/emacs setup. You must copy the public key to the server. You must import that key into GPG and verify it (with gpg --edit). If you don't do all of these steps, this digital signature for XML-RPC hack won't work and you'll be mystified at what went wrong. Verify your GPG at all stages using test files, so that you can get the GPG errors.

Note to jjohn: Move the *gpg files to wherever gpg want to find them. It will make things go easier on you.

Next: Some dirty thoughts on SOAP

About this blog

The taskboy blog is a exploration of computer technology by Joe Johnston. Topics of posts include practical examples Perl, PHP, Python and Java as well as book reviews, industry insights and miscellaneous good stuff.

Current Status

Watching _Brass Latern_. Ah IF, your coyness is your charm.

Posted: Sun Sep 05 16:02:15 +0000 2010