Latest News

NOTE: The full code archive of mechanism described below can be found here
Joomla is a PHP-based CMS that enjoys wide-spread popularity. It's got a many built-in features that make it great for blogs and news-oriented sites right out of the gate. Additionally, it supports three kinds of extension mechanism: components, plugins and modules. Components are low-level facilities that generally support the other two. Modules are often user-visible blocks of HTML that can be selectively added to the page users see. Plugins respond to various events (page rendering, authentication requests, etc) generated by the Joomla application.
Joomla comes with a variety of login plugins that all use the login module. These plugins allow users to be validated against an external authentication mechanism like LDAP or GMail.
Sometimes it is desirable to log users into the Joomla system who have already been authenticated by a different system without asking for their credentials again. This is called signle sign-on (SSO). SSO is a very important usability and security feature of many Service-Oriented Architectures (SOA). In this article, I will present a token-based mechanism for creating SSO to joomla using the standard extension methods.
To understand this problem a bit better, it is critical to realize that there are two seperate notions of identity in an SSO schema. There is the previously authorized identity (that is, the identity that the user supplied to the non-Joomla system that originally authenticated them) and the user account on the Joomla system that is stored in the local users table. One of the challenges of SSO is to map the remote identity to the local one. For the sake of this excerise, let's assume that the usernames in both the remote authentication system and the local Joomla one are the same.
The next problem is to create a protocol by which authentication credentials may be passed from the remote system to the local Joomla one. To accomplish this, I choose to use to copy the existing mod_login form and make some minor adjustments to accept HTTP GET parameters. These GET parameters are translated into values in a form that can be processed by the default user compoent. Since the user component calls out the enabled authentication plugins, this the kind of routing is desirable.
This form really needs three bits of information to authorize a user: the username, the session token and a checksum. The username is self-evident. The session token is provided to all authenicated requestors and is discussed more later. The check sum is hash of the username, token and a shared secret known to this system and the remote system passing users to it. More on this later too.
Using a bit of javascript magic, this hidden form is submitted automatically.
Of course, a custom authentication plugin is also required. The plugin needs to read a few of the custom form values that are not passed in through the normal onAuthenticate() call, so it is necessary for the plugin to directly read from the superglobal $_POST. The job of this plugin is very simple. If the token is valid (that is, it can be found in a DB table and is younger than 4 hours) and the hash value of the username, token and shared secret matches the given hash, then the user is authenticated. The user is found in the local system and the response object is populated accordingly.
The session token can be any string identifier. In this case, it is the MD5 hash of the value returned by the PHP built-in uniqid(). This value is generated by a script called 'session.php'. The script generates this value, stuffs it into a DB table and simply echoes the value to the caller.
The key to the security of this system comes from the secret string known only to the remote system that wishes to pass users to the local Joomla system and authentication plugin. This secret is used to generate a hash of the usernam and the session token. By using a hashing mechanism like MD5 or SHA1, this checksum value provides pretty good assurance that the values passed in were from a known and trusted source.
The way the remote system and the Joomla system interact to make this autologin happen is the follow:
- The remote client calls the session.php script on the local Joomla system
- The remote client hashes the session token, username and secret
- The remote client generates a URL to the local Joomla system's homepage that passes in the following GET parameters: u, t, s (for username, token and checksum respectively)
- The remote client redirects the user to this URL
- If the token is authenticated, the user is logged into the local Joomla system as a local Joomla user.
You'll also notice that you could easily map all remote users to one generic Joomla user if that is desirable.
I hope you find this useful in crafting your own Joomla solutions

I've been working with the PHP CMS WordPress a lot lately. It's a pretty simple system that doesn't make its internals hard to get to, which I appreciate.
One of the internal functions WP provides is wp_mail. This, you might have guessed, is used to send SMTP mail. The parameter list for this function is a bit long and long parameters lists are hard to remember:
wp_mail( $to, $sub, $body, $hdrs, $attach );
These parameters are pretty self-evident: mail recepient, subject line, body of message, SMTP headers, attachments. The last two parameters are optional. This works great for sending plain, unformated text messages. However, you may want to tweak this a bit.
The first thing you might want to do is change the default sender. This is done by adding a header:
$to = "nemo@uptopia.com";
$sub = "Your submarine parts";
$msg = "I have the new parts for your fabulous machine.";
$headers = array("From: Joe Johnston <jjohn@taskboy.com>");
$h = implode("\r\n",$headers) . "\r\n";
wp_mail($to, $sub, $msg, $h);
This makes the email look like it was sent by me even though the the web server process running the PHP script isn't owned by my account.
Another common task is to send HTML-formated email using this system. To do this, you must change the content type of the message to text/html. This is most easily done through the headers, even though you are supposed to be able to do this through the filter wp_mail_content_type. In my testing, this did not work, but the following code did:
$to = "nemo@uptopia.com";
$sub = "Your submarine parts";
$msg = "<html><body><h1>Awsome news</h1>
<p>I have the new parts for your fabulous machine.</p>
<address>--Joe</address>";
$headers = array("From: Joe Johnston <jjohn@taskboy.com>",
"Content-Type: text/html"
);
$h = implode("\r\n",$headers) . "\r\n";
wp_mail($to, $sub, $msg, $h);
By adding the content type to the headers, the recipient's email client should format the message accordingly.
Of course, sending HTML email has risks. It could be caught in spam filters. The client may not support HTML formatting (although that's rare). The client may disable email HTML from using javascript, CSS or grabbing remote assets like images.
Caveat Spammer.

The Apple iPhone OS platform has become increasingly popular target for new development. This is the OS that powers the iPod touch, iPad and of course, the iPhone.
There are two kinds of apps that one can create for these devices. The first is a native application, usually written in Objective-C and published through Apple's App Store. Not only is this kind of app harder to write, but developers may not like their hard work blocked by the whims of Apple.
Because these devices all have at least WiFi access to the internet, it is possible to design web applications tailored to these mobile devices. These "cloud" apps run like normal web applications but must constrain their UI a bit.
iWebKit (ignore the malware site warning if you see it) [download] makes this significantly easier to do. This project is a set of CSS and Javascript files that can give your web app the standard look and feel of other web apps for the iPhone.
There are a few conventions that iWebKit imposes, but these will be familiar to most XHTML developers anyway. In the space of a few hours using this project, I was able to produce a iPhone version of the Feed Bag. If you are using the Safari browser or Google's Chrome, that page will appear much as it would on an iPhone.
While this application is very simple, it does look like a native app. This makes me very happy.

Here's a post to remind me of simple Git procedures. Maybe other people with find this helpful. After using CVS for a decade, it's a bit hard to wrap my head around git, but I'm making that journey.
I'm using github as my repository.
START A NEW PROJECT
mkdir MYPROJECT cd MYPROJECT git init touch README git add README git commit -m "first" git remote add origin git@github.com:/GITHUB_USERNAME/MYPROJECT.git git push origin master
A git repository URL looks something like this: git@github.com:/taskboy3000/Tester.git
WORK ON AN EXISTING PROJECT
git clone [GITPROJECTURL]
ADD FILES TO LOCAL REPOSITORY
git add [FILE] git commit -m "My comment about the files I added"
You can also add all the files in your project at once:
git commit -a -m "All files, including tilde files, have been added"
UPDATE REMOTE MASTER WITH LOCAL SANDBOX
git commit -m "Final checkin" git push origin master
UPDATE LOCAL SANDBOX WITH REMOTE MASTER
git fetch origin git merge origin master
REVERT LOCAL SANDBOX TO REMOTE MASTER
git reset --hard
This will restore deleted files and overwrite uncommitted changes.
Also see the github book.

I have been working through both the symfony JobBeet tutorial and cakePHP's blog tut. Both are basic CRUD apps. Here's my considered opinion of both.
Symfony is by far the more sophisticated of the two. The class system, ORM integration and YAML configuration really put this framework on par with any written in any other language. It's truly enterprise ready. However, it is simply a beast to learn. Installation virtually requires using an apache virtual host. Creating a skeleton app is easy, but non-intuitive. Just to create and process a simple HTML, you end up touching about 4 or so files in as many directories. Also, there is a heavy reliance on the PHP CLI, which can be an issue if your installation has different versions of php for the shell and apache (which the Mac does). While I can get the JobBeet tutorial going, I cannot get much further -- and I have two of the three books from the developers.
CakePHP is a lot simpler, if less ambitious, than symfony. It appears to be pretty lightweight and has no CLI dependency (although there is a lot of automation that is offered by the cake client). The relationship between the model, controller and view is pretty easily grasped. To create a form for an existing model, you're looking at editing two files in two closely related directories.
If you're looking for a solid MVC framework in PHP, you could do a lot worse than cakePHP.

«If I could save time in a bottle
The first thing that I'd like to do
Is to save every day
Till Eternity passes away
Just to spend them with you»
--Jim Croce
If you're anything like me, you live and die by your calendar. Over the years, I've moved from simple paper calendars to the Palm Pilot back to paper and finally ending up online with Yahoo and Google. This brief post talks about creating hyperlinks that add an event to the clicker's yahoo or google calendar.
Of course, not everyone uses these online calendars. Some use Outlook or iCal
or some other desktop solution. The key to publishing events for these systems
is to use the ICalendar format
which most desktop apps can import. The MIME type for such an ICalendar file is
text/calendar. You might also consider using the XHTML microformat
hCalendar, which I mentioned briefly in an
earlier post.
However, this microformat isn't likely to be understood by most desktop apps.
First, let's suppose that we have an event that we want to publish. I turn 40 next year, so let's create an birthday party event for that. In the iCalendar format, such an event might look like this:
BEGIN:VCALENDAR VERSION:2.0 PRODID:-//taskboy calendar app BEGIN:VEVENT SUMMARY:Joe turns 40 just this once ORGANIZER;CN=Joe Johnston:MAILTO:jjohn@taskboy.com DTSTAMP:20100421T105300 DTSTART:20111212T190000 DTEND:20111212T200000 END:VEVENT END:VCALENDAR
A great deal of this format is boilerplate stuff. The overall container is VCALENDAR, which has a BEGIN and END. Immediately after that is meta-information (like HTML's HEAD section) containing the format version and the app's product ID (which is arbitrary). After the header, the actual VEVENT section begins. There are many types of objects VCalendar can contain including VREPLY, VJOURNAL and VTODO, but this isn't about those (read the RFC for more info). Let's look at the VEVENT attributes in tabular form:
| Attribute | Meaning | |
|---|---|---|
| SUMMARY | A brief description of the event | |
| ORGANIZER | Who organized this event, in LDAP format | |
| DTSTAMP | When this ICalendar file was created | |
| DTSTART | When this event starts | |
| DTEND | When this event ends |
The purpose these elements is relatively self-evident. The datetime format used throughout is the ever-popular ISO8601. The ORGANIZER is given in LDAP format, but the spec does not require it to be so. However, most applications, like Outlook, will attempt to locate ORGANIZER in an LDAP system, so this makes a bit of sense.
Above is a basic, serviceable ICalendar file. You can even check this using the handy ICalendar validator. Enough about the desktop. Let's move on to web-based calendars.
Yahoo Calendar does not, as far as I can see, publish an API. However, people have ferreted out enough information to be useful. By creating a simple HTTP GET request (in the form of a hyperlink), events can be entered into Yahoo Calendar.
The base URL for Yahoo Calendar is: http://calendar.yahoo.com/. The following
parameters are required: v=60 and TITLE=event. Next come the
metadata for the event itself:
| Parameter | Meaning |
|---|---|
| DESC | A brief description of the event |
| ST | ISO8601 datetime of when the event begins |
| DUR | How long the event lasts in HHMM format |
| URL | A URL to a page describing the event |
| in_loc | A brief label for the event location |
| in_st | Street address of the event |
| in_csz | City/State/Zip of the event |
Remember: all parameters must be URLencoded. There is a TYPE parameter that can specify the kind of event. The default is "Appointment" (type 10). See the link to Chris's notes for more options there. The following link will add my Birthday event to your Calendar:
Google Calendar has a similar system, but it is
documented.
The base URL for adding calendar events is http://www.google.com/calendar/event.
There is one required parameter: action=TEMPLATE. Here is a rundown of their
parameters:
| Parameter | Meaning |
|---|---|
| text | A label for the event |
| dates | Of the form: START/END where START and END are in ISO8601 format |
| name | Description of the event |
| details | A description of the event |
| location | Description of the event |
These parameters need to be URLencoded too. Google differs slightly from the above formats in that the event can have both a label and a description. The start and end time of the event are given in one parameter, which is unnerving. Here's my example event for Google:
Because these links will not return the user to the site they came from, consider using a popup window for these links.
That's the gist of adding events to calendars. Of course, there are a lot more parameters that can be added to these publications and I haven't touched recurring events at all. Still, this is a good jumping off point for your own exploration of the magic of event scheduling.
jQuery in Action by Bear Bibeault and Yehuda Katz is a fine introduction into the wonderful jQuery. jQuery is a free javascript library that significantly eases the burden of manipulating the Document Object Model found in modern web browsers. The library has a distinctly functional feel to it, much like LISP. After reading this book, I'm humbled by the incredible job the architects of jQuery did in making such a clean and incredibly useful API. jQuery also bares the marks of Perl in its .map and .grep functions, and so warms the cockles of my heart.
Bibeault and Katz gently introduce the reader to this powerful library without overloading him with too much theory up front. This is not to say that the book is light on details. It is not. However, the less immediately useful details are pushed to the back of book. I found the Appendix on Javascript's damnabled Object system to be the most enlightening of all. While I understood the classless object system of Javascript before I read this, I did not really get the scoping rules (and I continue to be appalled at the insanity of it).
The order of the material is very reasonable. It begins with a bit of the philosophy of Unobtrusive Javascript, and then explains how to find DOM elements with jQuery selectors, how to manipulate those wrapped sets, how to install javascript events and how some of the built-in animation effects and utility functions work. All this is followed by a chapter on extending jQuery with plugins, a giant 50 page chapter on Ajax and a survey of useful plugins that are available on the jQuery site.
The highlights of this book include the discussion of Unobtrusive Javascript. This is literally a new concept to me and one that is much welcomed. The many references to the W3C HTML and DOM specs took me back a bit. During the ugly years of the Browser Wars, these specs were little more than wishful thinking. Yet now, most broswers (not IE) try to adhere to these documents. That's quite a sea change! The madness of event handling in DOM levels 0 through 2 is Lovecraftian (and perhaps explains a bit more of the problem my tic-tac-toe client has with IE).
The book is filled with practical examples of jQuery at work. The authors even have created little online tools to demostrate these concepts. I didn't find these to be all that useful, but I think I'm not the target audience. I'd rather come up with my own projects and apply these ideas to them.
I recommend this book to novices and pros alike who wish to get up to speed quickly on the use and philosohpy of jQuery.
As part of my technological foraging, I have been playing around with Ajax as presented through the new hotness that is jQuery. The result is a very humble, but all Ajax, tic-tac-toe game. This replaces the all flash version I had on this site for a while. Not only did I improve the AI of the computer player (thanks CS210!), but I have completely validated HTML and CSS files to boot.
Part of my inspiration comes from my current nighttime read, jQuery in Action by Bibeault and Katz. Besides gently easing the reader into jQuery concepts, the authors turned me on to the concept of unobtrusive javascript, which is the idea that HTML, CSS and javascript should be kept separate from each other, thus simplifying development considerably. One way to intrepret this concept is that HTML, CSS and Javascript all need to be keep in discrete files. That means no "onClick" attributes in the HTML, please. In structure, one can find liberation.
The architecture of my tic-tac-toe game follows a pretty standard client-server model. The server bit consists of a single PHP script that handles game session and AI. The client bit is simply an HTML file supported by CSS and Javascript. The client API is narrowly defined to prevent obvious forms of cheating and to enforce the idea that the client is a display and input mechanism that has little idea about how to play tic-tac-toe. The game logic is enforced by the PHP script. I'll discuss the design of the game engine later, but let's look at the client first.
The entire source for tic-tac-toe can be found here.
The place to begin with the tic-tac-toe client is the index.html file. This is a validating HTML 4.01 Transitional document. It has no TABLE tags at all. It depends on CSS float to make the playing board and handle the rest of the page layout. Because of some problems with Internet Explorer, there is a bit of IE-specific goo that forces a warning to appear in that browser. That will be the subject I return to last.
That HTML file is so clean, you could eat off it. Anyone who has done PHP or even a lot of javascripting will be surprised that such a simple file could be the basis of anything complex. As far as the game is concerned, the most interesting elements in the HTML file are the DIVs that have an ID attribute beginning with "s". In traditional video game parlance, these function as both buttons and sprites. I could have used HTML buttons here. Part of me still thinks that's the right way to go, but just using DIVs looks a lot more "gamey" to me. All the wiring of event handlers for these DIVs is handled in the ttt.js file, described shortly. The look and layout of the client is provided by the ttt.css file, which holds no surprises.
The client-side magic happens in the javascript. As it promises to do, the jQuery library greatly reduced the burden of locating and attaching elements, attaching click handlers and Ajax processing. Because the jQuery library is loaded before ttt.js, jQuery functionality is accessible in this file. The first four lines of code are pretty harmless:
var Game = new Object(); Game.timeout = 5000; Game.square_clicker_enabled = false; $(document).ready(init);
A global object called Game is created that will hold client-side game state. Scoping in javascript is a little primative, so having one global object in which to store various bits of information helps to reduce namespace clutter. The game property 'square_clicker_enabled' is set to false to prevent undesirable button clicking later on. I'll get to that. The most powerful statement here is the ready() function that calls init() (not shown yet) when the DOM is fully constructed in the browser. As any primer on jQuery will tell you, the more traditional onLoad() event for BODY elements does not run untill all the graphics are loaded on the page. The ready() function is ideal of initialization code. Speaking of which...
function init() {
$("#newGame").click(new_game_clicker);
var arg = new Object();
arg.get_board = 1;
$.ajax({url: 'move.php',
type: 'GET',
data: arg,
dataType: 'json',
timeout: Game.timeout,
error: bomb,
success: function(a) { paint_board(a);}
});
}
The init() function sets up the click handler for the "New Game?" button/DIV that ensures a new session and a blank board on the server. The fancy jQuery selector simply looks for an element with the ID of newGame. An object is created to hold parameters to be passed in the following Ajax request. The call to "get_board" simply requests the board state of the current game, if applicable. If the call is successful, the paint_board() function is invoked with the structure returned from the server. Even though the server returns a JSON serialized string, jQuery deserializes this structure and passed it to the function. That's some pretty terse code for an RPC mechanism!
function new_game_clicker() {
if (confirm("Really start a new game?")) {
var arg = new Object();
arg["new"] = 1;
$.ajax({url: 'move.php',
type: 'POST',
data: arg,
dataType: 'json',
timeout: Game.timeout,
error: bomb,
success: function(a) { paint_board(a);}
})
}
}
There's little new code here at all. All veteran javascripters will have seen the confirm() dialog function. The click handler for the new game button is mostly another Ajax call. This one creates a new session and empty tic-tac-toe board on the server. Again, whenever the state of the board might have changed, paint_board() needs to be invoked.
function square_clicker(e) {
if (Game.square_clicker_enabled) {
Game.square_clicker_enabled = false;
var arg = new Object;
arg.pos = this.id.charAt(1);
$.ajax({url: 'move.php',
type: 'GET',
data: arg,
dataType: 'json',
timeout: Game.timeout,
error: bomb,
success: function(a) { paint_board(a);}
});
} else {
alert("Thinking!");
}
}
The click handler of each game board square is a little trickier. At its heart, the handler is merely responsible for sending the human's move to the computer using an Ajax call. When the player moves, the computer also makes a move and the new board state is returned and passed to paint_board(). Again, there are a lot of details behind that tiny bit of code!
The Ajax call is protected by a boolean. The idea is to prevent humans from wildly click on squares before the computer returns with the new game state. The opening move for the computer can take a few seconds to calculate, even when the depth of recursion is limited! I suppose that's why algorithms with a runtime performs of O(n!) are frown upon.
As simple as this mechanism is, Internet Explorer does not seem to like it very much. When I use that browser, I get all kinds of weird board states. It is a mystery to me why this happens, but this is why I include a warning to IE users in the HTML document.
function paint_board (a) {
if (a.board == null) {
$("#status").text("No game in progress");
Game.square_clicker_enabled = false;
} else {
for (var i=0; i < a.board.length; i++) {
$("#s" + i).empty();
if (a.board.charAt(i) == '0') {
var events = $("#s"+i).data("events");
if (events == null) {
$("#s"+i).click(square_clicker);
}
} else {
$("#s"+i).unbind('click',square_clicker);
var e = a.board.charAt(i).toUpperCase()
e = $("<span></span>").text(e).addClass("p");
$("#s"+i).append(e);
}
}
Game.square_clicker_enabled = true;
if (a.msg == null) {
a.msg = " ";
}
$("#status").text(a.msg + " ");
}
}
Finally, the heart of the client code appears in paint_board. Given a structure from the server, it displays the game state on the board to the human. The board's state is presented as a nine character string where each position represents a place on the board. A zero is an unoccupied space, an 'x' represents the human's move and a 'y' represents the computer's. Notice that the client doesn't even know that much. It simply knows that 0 states are presented one way and non-zero another.
When the board is painted, the square_clicker_enabled flag is set to true, allowing the human to make a new move. If an open square is indicated by the board state, a click handler is installed if one does not exist. If the square is occupied, the click_handler is removed. Doing this without jQuery would have been horrible.
And there's the client! Amazingly small and even valid HTML. Will wonders never cease?
This post is already long. At some point, I'll go into the server code which has some moderated clever AI and some awful code to determine a win condition. Keep reachin' for the stars!
Pymedia is a python wrapper around the ffmpeg C audio/video library. It is sometimes useful to compose a video from a series of graphic files or to decompose a video into a series of graphic files. Pymedia is one tool to do this.
There is a sample program on the pymedia site that claims to create a video from a series of separate graphic files. However, it contains a bug. Besides, it may not be as clearly written as it could be. Here's my rewrite of that code.
I have hard-coded in the function getFiles() that is a routine to find all the graphic files I want in the movie as sort them. If you reuse this script, you will need to write your own version of this.
import sys, os, time
import pymedia.video.vcodec as vcodec
import pygame
def makeVideo(files, outFile, outCodec='mpeg1video'):
pygame.init()
fw = open( outFile, 'wb' )
if (fw == None) :
print("Cannot open file " + outFile)
return
if outCodec == 'mpeg1video' :
bitrate= 2700000
else:
bitrate= 9800000
start = time.time()
enc = None
frame = 1
for fpath in files :
img = pygame.image.load( fpath )
# Init once, but need details from an image
if enc == None :
params= {
'type': 0,
'gop_size': 12,
'frame_rate_base': 30, # 125,
'max_b_frames': 0,
'height': img.get_height(),
'width': img.get_width(),
'frame_rate': 90, # 2997,
'deinterlace': 0,
'bitrate': bitrate,
'id': vcodec.getCodecID( outCodec )
}
enc = vcodec.Encoder( params )
# Create VFrame
bmpFrame= vcodec.VFrame( vcodec.formats.PIX_FMT_RGB24,
img.get_size(),
# Covert image to 24bit RGB
(pygame.image.tostring(img, "RGB"), None, None)
)
# Convert to YUV, then codec
d = enc.encode(bmpFrame.convert(vcodec.formats.PIX_FMT_YUV420P))
fw.write(d.data)
frame += 1
if frame % 10 == 0 :
sys.stdout.write("Progress: %05d/%05d\r" % (frame, len(files)))
sys.stdout.flush()
fw.close()
pygame.quit()
print '\nDone\n%d frames written in %.2f secs( %.2f fps )' \
% (frame,
time.time() - start,
float(frame) / (time.time()- start)
)
def getFiles () :
# Change this function to get files in the order you want them
files = []
base_dir = 'M:/backups/hosted_sites/taskboy/public_html/spy'
for f in os.listdir(base_dir) :
if f.startswith("20") and f.endswith("png") :
files.append(os.path.join(base_dir, f))
files.sort()
return files
if __name__== '__main__':
files = getFiles()
makeVideo(files, "out.mpg")
The core of this code is in the makeVideo() function. Pygame is used to load the graphic files into a format that pymedia's VFrame object can accept. That object is converted into a YUV format suitable for MPEG encoding.
You can control the speed of the images by setting the frame_rate in the params dictionary. Lower numbers are slower than bigger numbers. I leave as an exercise for the reader a refinement that allows for more fine-grained control of the MPEG encoding.
XHTML is a great idea, once you understand the shear Lovecraftian terror that is SGML, of which HTML is a child. Very few people understand all of SGML, including me, but here's the important point: SGML has all kinds of rules to imply elements. If you don't close your OPTION tag in HTML, the browser will create a node for the end tag anyway when in interprets that document. That's great, but in the case above, the browse does something completely predictable. However, there are plenty of ambiguous cases of what missing nodes should be created and trust me, brother, you don't want to go there. But this is a horrid topic for another day.
The problem with XHTML is that it is XML. So when you need to add a SCRIPT section, you need to escape it XML-style like this
<script type="text/javascript">
<![CDATA[
{js code here}
]]>
Wow! That's ugly! But wait, there's more. No Javascript engine knows anything about XML CDATA blocks so you need to hide that with Javascript style comments. So you get this:
<script type="text/javascript">
//<![CDATA[
{js code here}
//]]>
What a mess! If this alone doesn't make you run for keeping your JS in a separate file, nothing will.
From Joel's blog:
One principle duct tape programmers understand well is that any kind of coding technique that’s even slightly complicated is going to doom your project. Duct tape programmers tend to avoid C++, templates, multiple inheritance, multithreading, COM, CORBA, and a host of other technologies that are all totally reasonable, when you think long and hard about them, but are, honestly, just a little bit too hard for the human brain.
The Worse is Better article referred to in Joel's post was very influential to me. Also see Eric Raymond's The Cathedral and the Bazaar, which is a little different restatement of the issue.
The issue is software complexity and how to deal with it. It's an issue all programmers have been dealing with since the was such a thing as coding. In a perfect world, one could architect and plan every last detail of a software project before any coding happens. After all, that's what real architects of buildings do.
Unfortunately, software is inherently malleable and the hardware platforms change every 12 months. And new software solution appear every day. Planning takes a long time and complete planning is a task that I think can never be completed. So how can systems be designed at all?
There are many methodologies that have been proposed to accomplish all the good things that top-down design gives without all the costs. I subscribe to no formal methodology, but believe in iteration from prototypes.
Start with a flexible platform that's amenable to constant change. The Linux, Apache, MySQL, Perl (LAMP) stack is exactly this kind of platform. Start building prototypes rapidly and get feedback from your clients on them. Apply changes as needed. Use source control to trim Bad Thinking. Design should be only a complex as necessary and no more. Don't paint yourself into a corner, but don't look to support all possible features that might or might not be requested of your software in the future.
As Fred Brooks said so wonderfully, there is no silver bullet to software design. And there probably will never be.
A feature I wanted for my Spycam page is an animated loop of the last 5 pictures. All the pictures are in PNG format. There are several techniques that would accomplish this: GIF animation, SVG/Flash and JavaScript. I choose to use a pure JavaScript (or ECMAScript, to be politically correct) solution, which I'll detail in a moment. But first, I'll briefly talk about the alternatives.
Animated GIFs have been around nearly as long as the web. Here is an example of one that I created for my late Aliens, Aliens, Aliens web site:
The GIF file format allows for multiple frames and timing instructions for the pace of animation. At first, I thought I could do the same kind of animation with PNG files, but that format does not allow animation. With image manipulation libraries, I could create a new GIF and stuff it with frames converted from the five most recent spycam photos. However, that seemed like a lot of work and I didn't really want to create yet another file to keep track of. However, the advantage of this solution over the one I choose is that it scales better. That is, by serving a static animated GIF, taskboy.com vistors will not be continually pulling down an image for each animation loop. It's true that most browsers cache web assets, so it's likely that once the frames are loaded in the JS animation scheme, taskboy.com will be left relatively unmolested.
The next solution that occurred to me was to create a Flash SWF file of this loop. However, I'm not much of a Flash guy, nor am I emotionally ready to invest the time to learn the Perl interface to it. Of course, there is the XML-based, open source alternative to Flash called SVG. SVG does support animation, as you'd expect, but many browsers including Firefox do not yet support SVG animation. This was quite disappointing.
Finally, I hit on the relatively old idea of periodically switching the photo using DOM manipulation. The code to do this works in modern browsers (even IE 6). In pseudo-code, the strategy looks like this:
- Setup an array of images in order from oldest to newest
- Look for a well-known ID block and update the attributes of the child IMG tag with the next image in the array
- Update the description in a well-knwon ID block
- Schedule steps 2-3 to run again after a brief delay
Consistent browser support for DOM manipulation directly led to the Web 2.0 explosion of the early 2000s. It is possible to use Javascript to add, remove and edit nodes of the document tree in a consistent way across all major browsers. For those of you too young to remember, this was not always the case. There was an ugly time, known as the Browser Wars, that left many a young Javascript programmer mamed and bitter.
Let's back into this code from the HTML. After all, this is the structure to be changed dynamically. Fortunately, this document structure is pretty simple:
<div id="animator" align="center"> <div align="center" id="pic_desc"></div> </div>
This is simply a DIV block (ID = "animator") that encloses another DIV (ID = "pic_desc"). Where's the image tag? That will be inserted dynamically later and put ahead of the second DIV. However, an empty IMG tag could be added to the HTML right at the start. This will save some complexity in the Javascript later on, as will be noted.
The Javascript for this isn't complex in concept, but it may look a little daunting. However, it consists only of one globally scoped array and one function.
var Pictures = new Array("2009-09-24_21-10-01.png",
"2009-09-24_21-11-01.png",
"2009-09-24_21-12-02.png",
"2009-09-24_21-14-01.png"
);
function show_next_image(last_index) {
// Calculate the next image's index
last_index = (last_index + 1) % Pictures.length;
// Find right element
var e = document.getElementById("animator");
if (e != null) {
var i = null;
for (var j=0; j < e.childNodes.length; j++) {
if (e.childNodes[j].nodeName == "IMG") {
i = e.childNodes[j];
break;
}
}
if (i == null) {
i = document.createElement("img");
e.appendChild(i);
}
i.setAttribute("src", Pictures[last_index]);
i.setAttribute("title", Pictures[last_index]);
// Update description
e = document.getElementById("pic_desc");
if (e != null) {
var t = null;
if (e.childNodes.length == 0) {
t = document.createTextNode("");
e.appendChild(t);
} else {
t = e.childNodes[0];
}
t.data = Pictures[last_index];
}
}
setTimeout("show_next_image("+last_index+")", 5000);
}
The global array, Pictures, isn't difficult to understand. However, how those values are generated might be somewhat non-obvious. Recall that Javascript executes in the client's browser, which is to say, not on the server. How can the array known which files are current? The client doesn't have access to this information. The answer is that the PHP script on the server must determine these values. The script has access to the server and can see all the available files. The PHP code then has to generate the list of filenames that appears in the rendered Javascript code. This is the sort of thing that makes CGI scripting a little mind-blowing at times. You write code in one language that generates code for another.
The function, show_next_image(), depends on the global Pictures array. It is called with index of element of Pictures used previously to update the IMG tag. The % operator may not be familiar to you. It is the modulus operator. It returns the remainder of the integer division between its operands. It has the handy property of generating values between 0 and right-hand value. So, n % 4 never generates values greater than 3. To determine the index of the new Pictures element to display, the passed-in last_index is incremented and the result modded by the length of the array. After all, it is desirable that when the last image in the array is used, the next image should be the first image in the Pictures array.
After determining which Pictures element is to be used, the DOM needs to be examined to find the right nodes to update. Here, the getElementByID() function is used to find the parent DIV. Initially, this DIV has two nodes as written: a text node containing a new line and another DIV. The enclosed child nodes are searched for one that is an IMG tag using the slightly misnamed nodeName property. If such a node is found, its reference will be stored in i. If no node reference is found (as will happen in the initial run), a new IMG node is created with createElement(). All nodes except text nodes are created with this DOM function. The new node then is added to the list of child nodes that the animator DIV has.
I need to point out to anyone following this code carefully that a small display bug has been introduced in this code. If there is no initial IMG tag, the created IMG node will appear below the description, which is undesirable. To prevent this, I simply have an empty IMG tag in the HTML about the "pic_desc" DIV. If you want a pure solution that does not require this HTML stuff, feel free to leave your solution in the comments section of this entry.
An empty IMG tag isn't useful. An IMG node requires at least one attribute, SRC, to do something visible. The SRC attribute tells the browser where to fetch the image to be displayed. This is exactly the information that is stored in the Pictures array. Each element is the relative path to a graphic asset on my server.
Although it is possible to set node attributes with a number of different syntaxes, I recommend always using the setAttribute() function. Not only does it work with all kinds of DOM nodes in all major modern browsers, but if the attribute does not current exist in the node, it is created. This Do What I Mean behavior may be a legacy of Perl, but I can't be sure. Certainly, I don't see a lot of other examples in DOM where one gets a free lunch.
Once the picture is updated, the description follows. The only thing really different about this node manipulation is that it deals with a text node. Text nodes are the stuff that falls outside of HTML tags. This includes the oft-forgotten newlines that are common in hand-written HTML documents. Text nodes are created with the appropriately named createTextNode() function. The text in a Text node is updated through its data property. The pic_desc DIV should only ever have one node that is a Text node. If it has one, it is updated. If not, one is created and inserted as a child of that DIV. Then the data property is updated.
The most interesting part of this code is also the shortest bit. The DOM function setTimeout() takes two arguments: a string that represents code to be executed at some future date and delay in milliseconds that the browser will wait before executing. Notice that the code to be excuted is a string. That is, it will be eval'ed when the delay time elapses. Eval'ed code is a little weird the first time you run into it. It's code that you write at runtime to be executed at runtime. Most code is traditionally written before compile or interpreter time. In this case, I want to schedule this function, show_next_image(), to run again. You'll notice that this will cause the function to run forever, which is the desired effect.
A quick word about the delay setting in this function. In my testing, I found the delay value unreliable. Here, 5000 milliseconds are specified, but I believe the real-time delay between executions of this script were much shorter. Is there a way to create a more highly reliable counter? You could keep track of when time last went off and make decisions about whether to run again, add or substract a delay. If you're trying to write a game in JS with 20 updates per second using this method, you've gotten on the road to perdition.
The last bit of code needed to kick off this animation extravaganza is something that initially invokes the code. In this simple example, using the onload event for body will work well.
<body onload="show_next_image(-1)">
This executes only once when the page is loaded and calls the desired function with an index before the first one. If calling show_next_image() with a -1 index seems like a cheat to you, you can rewrite the function so that it detects when it is called with a null value and assigns 0 to last_index. For me, I'll stick with -1.
If you did not want to use DOM events to invoke this function, you could simply put a script section at the bottom of the BODY tag that called show_next_image(-1) (or its moral equivalent).
Mastering this simple DOM technique opens the way for much more sophisticated work. But that's another post.
"Do not meddle in the affairs of Wizards, for they are subtle and quick to anger."--J. R. R. Tolkien
Dynamic, weekly-typed languages like Perl and PHP are wonderful productivity engines. It's amazing how much work one can accomplish with so few lines of code. Both languages allow the programmer to treat simple scalar variables as numbers or strings without a lot of casting or explicit conversions. However, there is a price for this magic.
Consider the following PHP code:
array_push($array, '"' + $file + "'");
This looks harmless enough. It looks like contents of the $file string are being enclosed in double quotes and that new string is being pushed on the end of $array.
Not so fast! The + operator is a little magical. That is, it operates as a concatenation operator when the operands are strings and as a sum operator when the operands are integers. Wait, didn't I just say that values are weakly typed in PHP? How can the interpreter tell the difference between strings and ints?
The answer for both Perl and PHP is that strings that start with integers are considered to be integers for the purposes of magic.
In the code sample above, the filename in question indeed started with "2009-09". PHP took the integer part of the string, 2009, because of the + operator. Then it clearly had a string operand ('"') and a integer (2009), so it "promoted" the integer back to a string, "2009".
And that's how's how I lost my filename, which caused me to spend the next 30 minutes debugging the problem.
The built-in webcam in the XO laptop fascinates me, in the same way parrots are fascinated by mirrors. I like to pick at it to see what I can do. I've been trying to find a simple command line invocation to take a picture and I have found it here:
gst-launch-0.10 v4l2src ! ffmpegcolorspace ! pngenc ! filesink location=foo.png
Please note that the first parameter in that list is in all caps V4L2SRC, not V412SRC, as this post points out. That typo caused no little end of annoyance for me.
This command snaps a photo and creates a PNG file called foo.png in the current directory.
The key to understanding video on the XO is learn about GStreamer, a Gnome multimedia interface that works on pipelines. That is, you build up commands with gstreamer like you uses command pipes in shell programming. Obviously, if you're not familiar with shell programming, GStreamer is likely to be quite opaque for a while.
In any case, GStreamer can be used to capture pictures, video and audio. It can also play audio files and stream video to other servers. Pretty strong ju-ju.
Also see this python hack for making the XO into a spy camera.
The Perl module Math::BigInt::GMP is a wrapper around the GNU Multiple Precision arithmetic C library and is useful for certain other math-intensive Perl operations. Ominously, the C library page warns that "GMP is very often miscompiled! Please never use your newly compiled libgmp.a or libgmp.so without first running make check" which makes the hairs on the back of my neck stand on end. However, I needed this lib.
On my Ubuntu linux machine, the Perl library (Math::BigInt::GMP) would not compile because of gmp was not installed. It would have been nice if the docs for this lib had mentioned this dependency. In any case, I install gmp with the following shell command:
apt-get install libgmp3-dev
If prompted by apt-get to install additional dependencies, agree. Then go back and install Math::BigInt::GMP.
make clean; perl Makefile.PL && make test && make install
I'm beginning to think that entropy is enclosing around the Great Camel.
M3U files are more commonly known as MP3 playlist files. These are simple files that contain URLs to MP3 files served over an HTTP server. These files may can additional metadata that can be used by MP3 players (like Winamp) for display purposes. I few months ago, I built a simple playlist server in Perl so that I could listen selectively to my vast MP3 collection. You may find the entire source code for this playlist server, called Pixie, here. It has been tested under both Windows and Linux, but should work on Mac OS X too.
At its heart, the Pixie is simply an embedded HTTP server. It serves four specific kinds of pages: an M3U playlist file, a CSS file, the HTML music selection page and specific MP3 files. In additional, it has two HTTP services that are essential to this process: adding MP3s to the current playlist and clearing the list entirely. There can be only one playlist per user.
When a user first points a web browser to the URL belonging to Pixie, a page is presented with all the directories and MP3 files found in the top level of the directory specified by the "-d" parameter. In my case, that's the M:/mp3 folder.
Folders may be traversed and the assets of those directories may be added to the playlist. Notice that there is a crumb trail at the top of the page that leads you back to the root directory.
After a few music assets are selected, the current play list is displayed. Notice that the assets come from different directories.
To listen the the playlist, simply click "Play now" in the Current Playlist section. What could be easier?
The pixie.pl script is somewhat long. It clocks in at 447 lines, even though that includes a small usage screen, a CSS file and an HTML template for the directory listing pages. This script is a little long for a blow-by-blow description of each line of code, but a few points about it should prove illuminating for those wanting to write their own HTTP servers in Perl.
It is perhaps useful to know that I structured the HTTP part of this code on the mod_perl/Apache model. That is, there are some global variables available to the fucntions that handle HTTP responses. The heart of the server can be seen in the relatively small main line code below:
my $S = HTTP::Daemon->new(LocalPort => $Opts->{p},
Reuse => 1,
Listen => 5,
timeout => 10,
);
while (my $c = $S->accept) {
Log("Connection from: " . $c->peerhost);
while(my $r = $c->get_request) {
$This_Request = $r;
$This_Connection = $c;
handle_request();
}
}
exit 0;
This code snippet starts with a pretty standard instantiation of an HTTP::Daemon object, which itself is a subclass of IO::Socket. For servers, it is important to set the Reuse parameter which allows the TCP port to be reused quickly after the last process has exited. Without this parameter, you'll find that you cannot invoke a script that uses the same port without a "cooldown" period specific to the OS.
With the server socket in place, pixie waits for new client connections in an accept loop. From the client socket, the HTTP::Request object can be obtained. Both of these important objects are stored in global variables for use in the handle_request() and later functions. Why not pass these objects into handle_request()? It turns out that there are all kinds of places these objects are useful for. Passing them explicitly gets to be a bit onerous. Let's look at handle_request().
sub handle_request {
my ($c, $r) = ($This_Connection, $This_Request);
if ($r->method ne 'GET') {
$c->send_error(HTTP_FORBIDDEN);
next;
}
my $path = $r->uri->path;
my @query = $r->uri->query_form;
if ($path eq '/serve.m3u') {
# Assemble this sessions selections
# into an m3u and serve that file
do_serve_playlist($c, $r);
} elsif ($path eq "/clear") {
# Clear playlist
do_clear_playlist($c, $r);
} elsif ($path eq "/pixie.css") {
do_serve_css($c, $r);
} elsif (@query) {
# Could be an add request
# Set a cookie, if needed
do_add_asset($c, $r);
} else {
do_browse($c, $r);
}
}
This function can be thought of as trampoline code. It's just is to route the handling of the request to the right routine, which in this case I call "page handlers". Page handlers are functions that all start with "do_" and are responsible for actually sending an HTTP response with content.
The function handle_request() does its routing based on a quick analysis of the details of the current request. Every HTTP::Request object has an initialed URI object in it. The URI object breaks apart the requested URL into logical parts and saves us from writing custom parsing code. You notice that two paths look like they reference real files: pixie.css and serve.m3u. However, this is an illusion. All web servers can be thought of as file systems proxies. Like all proxies, you never can be quite sure how the resource you are requesting is stored on the back end.
There is also a magic path called "/clear" that signals the server to clear the current playlist from memory. There is only one function that does HTML form handling because there is only one form and it only adds MP3 files to the current playlist. If none of these requirements are met, do_browse() is called which serves either a specific file or a directory listing. It is this function I'd like to turn next to since it contains HTTP Cookie handling.
sub do_browse {
my ($c, $r, $cookie) = @_;
my $path = urldecode($r->uri->path);
if ($path =~ /\.\./) {
return $c->send_error(HTTP_FORBIDDEN);
}
my $fs = get_fs();
my $real_dir = $Opts->{d} . $path;
$real_dir =~ s!/!$fs!g;
my $res = HTTP::Response->new(HTTP_OK);
if (-d $real_dir) {
$res->header("Content-type" => "text/html");
if ($cookie) {
$res->header("Set-Cookie" => "sid=$cookie; path=/");
} else {
my $sid = get_sid($r);
if ($sid && !exists $Sessions{$sid}) {
Log("Can't find SID '$sid' in: "
. join(", ", keys %Sessions)) if $DEBUG;
my $epoch = "Wed, 31-Dec-1969 01:00:00 GMT";
$res->header("Set-Cookie" => "sid=$sid; expires=$epoch;");
Log("Deleting old cookie '$sid'");
}
}
$res->content(make_page($real_dir,
$path,
($cookie||get_sid($r))));
return $c->send_response($res);
} elsif (-e $real_dir) {
# Serve real file in a new process
$c->send_file_response($real_dir);
} else {
return $c->send_error(HTTP_FORBIDDEN);
}
}
This page handler is the most complicated because it must decide if the requested path is valid, if a cookie needs to be set or removed or if a file or directory listing needs to be sent. Let's start at the beginning.
The path in the URI could need URL decoding, so that is done first. Next, a quick sanity check is performed to make sure the request isn't attempting to get a resources the server isn't meant to serve. The parent directory URL hack was a common exploit in early web servers. Next, all directory separators are converted to the OS appropriate. Whatever happens next will require a new HTTP::Response object, so one is created.
If the path sent is a directory, a directory listing is required. Directory listings are generated by the make_page() function. The content-type is set in the response object, as pixie will send some kind of HTML. If the browser sent us a Pixie cookie, we simply update it with the current Session ID. If cookie has a Session ID but the server has no record of it, the cookie is deleted from the browser. Which is to say, a new cookie is sent with an old expiration date.
I've glossed over the details of Pixie session management in the above paragraph. When a user builds a playlist, the list needs to be kept somewhere. Pixie stores this list in server memory. Each list is assigned a random number which is its session ID. This ID is passed to the client with HTTP cookies. Every time the client makes a request, this cookie is passed back to Pixie. There is a global hash table called %Sessions that stores the association between ID and play list.
To finish off do_browse(), if the path of the request points to a real file, it is served without much more sanity checking. There is definitely room for improvement here in terms of security. The next page handler of interest is the one that handles requests to add files to the current playlist: do_add_asset.
sub do_add_asset {
my ($c, $r) = @_;
my $path = $r->uri->path;
my @query = $r->uri->query_form;
# Is there a cookie?
my $sid = get_sid($r);
unless (exists $Sessions{$sid}) {
Log("Could not find $sid in: "
. join(", ", keys %Sessions)) if $DEBUG;
$sid = time();
Log("Creating new SID '$sid'") if $DEBUG;
}
# For all the "a" params,
# base64 decode and add to Sessions hash
for (my $i=0; $i < @query; $i += 2) {
if ($query[$i] eq "a") {
# retain order through value
my $cnt = scalar keys %{$Sessions{$sid}};
$Sessions{$sid}->{decode_base64($query[$i+1])} = ++$cnt;
}
}
Log(sprintf("\%Sessions has %d keys\n",
(scalar keys %Sessions))) if $DEBUG;
return do_browse($c, $r, $sid);
}
Much of the first part of this routine should be familiar by now. What's interesting is that if no valid Session ID is found, a new one is created based on epoch time. If security is a concern, you should use a different method to generate IDs, like UUIDs. In any case, for each query parameter in the request (which is to say, MP3 file paths), the path is decoded from base64 and added to the sessions hash. This is complicated by wanted to preserve the order in which the songs are selected. This ordering is perserved in the Sessions hash. Let's see how the actual playlist files are served.
sub do_serve_playlist {
my ($c, $r) = @_;
my $sid = get_sid($r);
if (!$sid || !defined $Sessions{$sid}) {
$r->uri->path("/");
return do_browse($c, $r);
}
my $res = HTTP::Response->new(HTTP_OK);
my @files = map {$Base_URL . $_} get_sorted_playlist($sid);
my $out = make_playlist(@files);
$res->header("Content-type" => "audio/x-mpegurl");
$res->header("Content-Length" => length($out));
$res->content($out);
$c->send_response($res);
$c->shutdown(2);
return;
}
The trickiest part about serving the playlist is getting the MIME type right. The MIME type gives a hint to the browse about the kind of file being served and what sort of external application the browser should use for it. Creating the playlist file is handled by make_playlist() and is pretty straight forward. Note the use of the draconian shutdown(2) on the client socket. I found on Windows that without this call, Winamp never launched. By closing both ends of the client socket, the web browser can be sure it has the entire file, which means that it is safe to launch the external program.
An interesting feature of Pixie is that the look and feel of the directory listings can be controlled with an external CSS file. Simply create a pixie.css file in the root of the MP3 directory and go to town. You can see what the default CSS file looks like simply by pointing your browser to http://localhost:[pixieport]/pixie.css.
Finally, there is ample room for improvement in the Pixie server. There are a number of security enhancements that can be made to ensure that only authorized files are sent. Pixie is a single threaded application and does not handle concurrency at all. Concurrency is a pretty thorny issue to get right for a platform neutral server. The core of the issue is the way Perl handles sockets and filehandles. On Linux, I would fork a new process for each new client request. That's a very clean way to make Pixie more responsive. Child processes inherit the open filehandles of the parent and so sockets can be handled independently in each process. On Windows, the fork() builtin merely emulates forking behavior with threads. Unfortunately since sockets look like filehandles, closing the client socket in the parent after fork (which is what you'd do on Linux) closes the socket in the child. It's not clear to me what solution would work here. I thought perhaps IO::Select would be a good choice, but then I suspect that when music files are sent, that will almost always block the directory listing traffic. I suppose this is a scaling mystery to be solved on another day.
xv is an ancient but small X graphic file viewer. If you try to compile the latest source (3.10a) on Centos 5, you'll get the following fatal error message:
In file included from xv.c:11: xv.h:119 error: conflicting types for 'sys_errlist'
To get a successful compile, simply edit xv.h. Remove all the #defines around 'include <errno.h>' (about 4 or 5 lines). Replace with the single line:
include <errno.h>
Type 'make' again and your compile should succeed.
Sometimes you need to secure a message could be intercepted by party for whom the message is not meant. One moderately good way to do this is using the Vigenere cipher.
The Vigenere cipher is a kind of caesar or subsistution cipher. It uses a sequence of letters called a Key to determine the substitution. Compare this to the constant shift function of caesar. In Caesar crypt text, all occurences of a particular clear text letter will be the same subsistute letter (i.e. 'Z' always stands in for 'E'). In Vigenere, the substitute for a clear text letter changes depending on where that letter occurs. Neat trick, eh?
The Key is a shared secret known only to those parties authorized to read the message. The encoded text cannot be cracked in the same way that a caesar inciphered text can be, although there are sophisticated avenues of attack known to the NSA and other professionals.
Vigenere works by using a tabula recta, which is is a table of the alphabet in rows and columns, like the following:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z B C D E F G H I J K L M N O P Q R S T U V W X Y Z A C D E F G H I J K L M N O P Q R S T U V W X Y Z A B D E F G H I J K L M N O P Q R S T U V W X Y Z A B C E F G H I J K L M N O P Q R S T U V W X Y Z A B C D F G H I J K L M N O P Q R S T U V W X Y Z A B C D E G H I J K L M N O P Q R S T U V W X Y Z A B C D E F H I J K L M N O P Q R S T U V W X Y Z A B C D E F G I J K L M N O P Q R S T U V W X Y Z A B C D E F G H J K L M N O P Q R S T U V W X Y Z A B C D E F G H I K L M N O P Q R S T U V W X Y Z A B C D E F G H I J L M N O P Q R S T U V W X Y Z A B C D E F G H I J K M N O P Q R S T U V W X Y Z A B C D E F G H I J K L N O P Q R S T U V W X Y Z A B C D E F G H I J K L M O P Q R S T U V W X Y Z A B C D E F G H I J K L M N P Q R S T U V W X Y Z A B C D E F G H I J K L M N O Q R S T U V W X Y Z A B C D E F G H I J K L M N O P R S T U V W X Y Z A B C D E F G H I J K L M N O P Q S T U V W X Y Z A B C D E F G H I J K L M N O P Q R T U V W X Y Z A B C D E F G H I J K L M N O P Q R S U V W X Y Z A B C D E F G H I J K L M N O P Q R S T V W X Y Z A B C D E F G H I J K L M N O P Q R S T U W X Y Z A B C D E F G H I J K L M N O P Q R S T U V X Y Z A B C D E F G H I J K L M N O P Q R S T U V W Y Z A B C D E F G H I J K L M N O P Q R S T U V W X Z A B C D E F G H I J K L M N O P Q R S T U V W X Y
Notice the letters in column 2 are shifted 1 from column 1. This is important, as you'll see in a moment.
To encode a message, you need a key. The key needs to be as long as the clear text message. Repeat the key as needed to make the key long enough. For example, if my key is "ORANGE" and my message is "IAMASECRETAGENT", then you would need to expanded the key to "ORANGEORANGEORA". To encipher the message, walk through each letter of the clear text. For each letter, find the column that starts with that letter of the clear text and look down that column for the row that starts with the corresponding letter of the Key. The letter you find there is the cipher text substitution for that letter. So, the first letter in this example is "I" with a key letter of "O", which means that the substitution is "W". Here's the completed example:
| Clear text: | IAMASECRETAGENT |
| Key: | ORANGEORANGEORA |
| Crypt text: | WRMNYIQIEGGKSET |
In the real world, messages will comprise more characters than just the uppercase letters of the alphabet. There are two ways to handle this when implementing the Vigenere cipher in code. Either reject all characters that are not part of the tablua recta or simply make the subsistute character the same as the input. The latter technique is the one I favor, but that may make the crypt text easier for someone to crack. For example, what do think this crypt text is refering to?
zT4N://eYSqnA.MxDEh7CmcGPT.jEA/aR/yO/?Hy=BRe7hS
Surely, this is a URL. Further, one can safely assume that "zT4N" is "http" in clear text. That's a lot of context to give away.
Implementing this complex cipher in Perl is straight forward. The first thing to do is to build the tablua rect:
sub build_matrix {
my @Wheel = ((0..9), ('A'..'Z'), ('a'..'z'));
my $M = [[]];
for (my $y=0; $y < @Wheel; $y++) {
for (my $x=0; $x < @Wheel; $x++) {
$M->[$y]->[$x] = $Wheel[($y + $x) % @Wheel];
}
}
return $M;
}
Notice that my alphabet is composed of numbers and both cases of letters. This set of characters is contained in the @Wheel array. By looping through the array twice, it is possible to create a two dimensional array. A careful look at the code reveals our old friend the shift function from the caesar cipher. Each column is shifted by one.
You may not be familiar with the modulus operator %. The modulus operator returns the remainder of an integer division. For example, 13 % 5 is 3. Modulus operations have the interesting property that they always return values between 0 and one less than the second operand. Of course, this fits neatly in with arrays that begin indexes their elements at zero.
One last note. Even though this function returns the tablua recta, the caller stores the array in a globally visible variable called $M. This global is referenced in the encoding function below.
sub enc {
my ($plain, $key) = @_;
# Find the row of the cipher
for (my $y=0; $y < @$M; $y++) {
if ($M->[$y]->[0] eq $key) {
# Find the column of plain text
for (my $x=0; $x < @{$M->[$y]}; $x++) {
if ($M->[0]->[$x] eq $plain) {
return $M->[$y]->[$x];
}
}
}
}
return $plain;
}
This function expects a clear text letter and the corresponding key letter. The row of the key letter is found in the tablua recta (i.e. $M) and then the column of the clear text letter is found. Whatever letter is found in $M at that location is returned to the caller. Notice that if the plain letter cannot be found in the tablua recta, that character is returned as the subsistution.
What remains is to show the main part of the program that calls these functions:
my @key_text = split //, $key;
my @plain_text = split//, $message;
my $new = "";
for (my $i=0; $i < @plain_text; $i++) {
$new .= enc($plain_text[$i], $key_text[$i % @key_text]);
}
print "$new\n";
Here, the Key and the message are hard coded in $key and $message respectively. Each of these is split into arrays of characters. For every character in the plain_text array, the encoding function is called and the cipher text is built and later printed.
Notice the return of the modulus operator. Instead of trying to expand the key to match the length of the message, a simple modulus operation is used. This saves a bit on memory and it is easier to implement than the literal expansion of the key.
To decrypt the message, replace the enc() call in the main line with dec(), shown below:
sub dec {
my ($cipher, $key) = @_;
# Find the row of the key
for (my $y=0; $y < @$M; $y++) {
if ($M->[$y]->[0] eq $key) {
# Find the column of the cipher
for (my $x; $x < @{$M->[0]}; $x++) {
if ($M->[$y]->[$x] eq $cipher) {
return $M->[0]->[$x];
}
}
}
}
return $cipher;
}
Notice that it is the inverse operation of enc(). It is passed a cipher letter and the corresponding key. The row of the key is found and the column that contains the cipher letter is found. Then the first letter of that corresponding column it found, which represents the clear text letter. If no match is found, the cipher text is returned.
Of course, you don't need to use this implementation. There is one on CPAN already, although it uses only 'A'..'Z' for its alphabet and strips off other characters.
Happy enciphering.
Security, it is said, is a process, not a product. For every method of securing information from unauthorized eyes, there is (or will be) a counter measure. There are two aspects of security: authentication and authorization. Authentication is concerned with determining the identity of user wishing access to a secured resource through the use of some kind of credentials. Authorization is the set of rights a user has over a secured resource. In Unix terms, a user's account name and password as stored in the file /etc/passwd are the authorization credentials needed to log into the system. The permissions on files and directories represent the authorization mechanism.
In the world of web services, the need for authentication and authorization is clear. For example, twitter.com allows a user to update his status through a public API. However, only the owner of that account should be allowed to make updates. Reusing an existing authentication/authorization mechanism, Twitter's API expects account credentials to be passed through the basic authorization method of HTTP.
An even more secure authentication/authorization mechanism is X.509 PKI standard in which clients and servers exchange encrypted credentials that identify themselves to each other. Each certificate has a "web of trust" or chain of authorizing servers that can be consulted to establish the validity of its origin. Setting up this web of trust, however, can be a daunting task.
Some applications may simply wish to hide information from authorized eyes without the complexity of a full PKI implementation. One old cryptographic technique to do this is the Vigenere cipher, whose origins date back to sixteenth century (although the name come from a nineteenth century diplomat). The Vigenere cipher is a novel twist on the very ancient caesar cipher.
A caesar cipher works by substitution a letter of the plain text message with a different letter of the alphabet. The substition is not random, but represents a constant "shift." To understand a shift, image an ordered list of the capital letters from 'A' to 'Z', where 'A' occupies position 0 and 'Z' is at position 25. A shift is a function that takes a plain text letter, adds a fixed amount of position and returns the letter at the new position. For example, shift('A',1) would produce 'B' and shift('Z', 1) would produce 'A'. To decrypt a caesar encrypted message easily, you need to know the alphabet used and the shift amount.
ILIKEPIECaesar(2):
XAXZTEXT
Caesar ciphers were used heavily for centuries, but are not particularly secure because they do not change the frequency of occurence of letters. In languages with alphabets, some letters occur more frequently than others. In English, the most common letter is 'e'. If caesar-encypted text is long enough, whatever letter presents 'e' will also appear often. One can then workout the shift and from there the rest of the message in a process that resembles "Wheel of Fortune."
For the curious, here's a snippet of Perl that represents the shift function:
sub encode {
my ($s, $shift) = @_;
my @parts = split //, $s;
my $t = "";
for my $p (@parts) {
$t .= chr(ord('A') + ((ord($p) + $shift) % 26));
}
return $t;
}
To decode a caesar inciphered message, the following function will work:
sub decode {
my ($s, $shift) = @_;
my @parts = split //, $s;
my $t = "";
for my $p (@parts) {
$t .= chr(ord('A') + ((ord($p) - $shift) % 26));
}
return $t;
}
Next time, we'll look at how to implement a Vigenere cipher in Perl.
(Note: Thanks to gizmo, I have corrected an abbreviation expansion problem.)
Uniform Resource Locators
are an addressing scheme at the heart of the Web. Without them, there would
be no stardard way to refer to a resource offered by a web server. URLs
remove the ambiguity of addressing a resource, but at the cost of creating
some rather formidable namespaces (e.g.
https://addons.mozilla.org/en-US/firefox/addon/9549).
In general, long URLs aren't a problem. Either through web page hyperlinks or web browser bookmarks, URLs fade into the background for most users. However, sometimes it is more convenient to have a shorter reference to a resource than the fully qualified URL. For example in the late nineties on IRC, it was common to see tiny.cc URLs pasted into chat rooms. Long URLs tend to clutter up already busy chat room windows. With the advent of text message-based systems like Twitter, which limit status updates to 140 characters, long URLs are actually consuming a valuable resource. The most common URL shortener used on Twitter.com appears to be bit.ly
There are several URL shortening services out there and they all work
pretty much the same way. The user supplies the full URL. The service
hashes the URL into something smaller and appends this to its own namespace.
Using the bit.ly service, the mozilla URL becomes:
http://bit.ly/g0Z9. When someone accesses this bit.ly URL, he
will be seemlessly redirected to the original resource.
Bit.ly provides a REST interface to their service
(API).
To use this, create an account on
bit.ly's system. Now you are ready to build a Perl REST client for the shorten
service (http://api.bit.ly/shorten).
The following code is a listing of a small command line Perl script that expects to be passed a long URL. It uses the bit.ly REST service to return a shortened version.
use strict;
use LWP::UserAgent;
use Getopt::Std;
use HTTP::Request;
use URI;
my $VERSION = "1.0";
my $Opts = {};
my $bitly_api_url = q[http://api.bit.ly/shorten];
my $long_url = pop @ARGV;
getopts('u:p:?', $Opts);
if (!$long_url || $Opts->{'?'}) {
print usage();
exit;
}
set_defaults($Opts);
my $ua = LWP::UserAgent->new;
my $fetch_url = URI->new($bitly_api_url);
$fetch_url->query_form({'version' => "2.0.1",
'format' => "xml",
'longUrl' => $long_url,
});
my $req = HTTP::Request->new(GET => $fetch_url);
$req->authorization_basic($Opts->{u} => $Opts->{p});
my $res = $ua->request($req);
if ($res->code == 200) {
my ($url) = ($res->content
=~ m!([^<]+) !);
unless ($url) {
warn("FAIL: [". $res->content . "]\n");
exit 1;
}
print "$url\n";
exit;
} else {
warn("FAIL:[".$res->content."]\n");
exit 1;
}
#-----
# sub
#-----
sub usage {
return <
OPTIONS
? - Display this screen
u [USERNAME] - Bit.ly username
p [PASSWORD] - Bit.ly password
EOT
}
sub set_defaults {
my ($h) = @_;
$h->{u} ||= "taskboy3000";
$h->{p} ||= "s3c3rt";
}
This code uses the standard Perl module Getopt::Std to parse optinal
command line arguments. The set_defaults function merely uses
my bit.ly credentials if none are provided through optional parameters. Next,
a new LWP::UserAgent object is created to make client HTTP calls. The bit.ly
shorten service expects a GET request with optional arguments encoded as
query parameters in the URL. The bit.ly service can respond to requests with
data in various formats (e.g. XML, JSON). In this case, the format parameter
is set to "xml."
The URI class manages the extra parameters through the
query_form method and urlencodes these into the new
URL. A simple HTTP::Request object is passed the new URL and the bit.ly
credentials are added to the HTTP request header using the
authorization_basic method.
Once the HTTP request has all the information, it is ready to be sent to the bit.ly server. The HTTP::Request object is passed to the LWP::UserAgent::request method, which contacts the server and encodes the response as an HTTP::Response object.
If an error occurred in transmission, the response will have a HTTP status code other than 200. Even if the requests succeeds, the service might fail due to missing or bad credentials. A simple regex extracts the shortend URL from the XML message and reports on the command line for easy consumption by other command line tools.
This script will run on any platform supported by Perl.
For a long time, I've ignore the Representational State Transfer (REST) architecture. For one thing, I don't particularly agree with its premise that remote procedure calls (RPC) that use HTTP as a transport mechanism should obey the same semantics as regular web traffic. Things like XML-RPC and SOAP are, to my thinking, happening on an entirely different layer of the application stack than HTTP. Indeed, there are implementations of XML-RPC that do no use HTTP at all.
I remember pretty heated arguments I witnessed at tech conferences in the early 2000s about this seemingly unimportant technical point. For REST adherents, web services are another form of web traffic and should be treated as such. Given that Twitter, Facebook and Bit.ly all use REST for their APIs and older apps like liveJournal use XML-RPC/SOAP, I guess REST is the new hotness.
I've recently had reason to interact with the Twitter and Bit.ly APIs. This has made me come to terms with REST RPC mechnanisms. I admit, the sad, sick part of me that enjoys playing around with low-level HTTP stuff finds satisfaction in the way these API leverage existing HTTP features like basic authentication, extra path info, and GET and POST semantics. In this post, I thought I would show a bit of Perl code I wrote post status updates to Twitter, an activity more commonly referred to as "tweeting."
Twitter's API documentation is relatively straight forward, if you already have a solid grounding in HTTP. The API call to tweet is called "statuses/update". The basics of the RPC mechanism are easy enough:
- The caller makes a HTTP GET or POST request
- The sender replies with content in the form of JSON or XML
Let's start with the request. There are serveral bits of information required by the API: user credentials, the URL and additional query parameters. The user credentials are passed as part of the HTTP request header as a basic authentication field, which is merely a base64 string that is the concatenation of the username and password of your Twitter account. Fortunately, Perl's HTTP::Request::Common class makes it easy to add basic auth credentials to the request without knowing how this information is encoded in the HTTP request.
The next bit is the URL to the function. This is a core idea of REST --
function calls should have URIs and look like ordinary web resources.
In this case, the URL is http://twitter.com/statuses/update.xml.
Interestingly, the response from twitter can be encoded in a number of formats.
These formats are determined by the extension you give to the URL. For
instance, I could have request the metainformation about myself in
JSON with the following URL:
http://twitter.com/users/show/taskboy3000.json.
The text of the tweet must be passed to the URL as if it were POSTed from a
form. The parameter name is status. The status must be encoded
as if the data were submitted from an HTML form. Again, Perl makes this very
easy, as will be shown below.
use LWP::UserAgent;
use HTTP::Request::Common ('POST')
my $api_url = q[http://twitter.com/statuses/update.xml];
my $status = "Tweeting from the API!";
my $twitter_username = "taskboy3000";
my $twitter_password = "s3cr3t";
my $ua = LWP::UserAgent->new;
my $req = POST($api_url => [status => $status]);
$req->authorization_basic($twitter_username
=> $twitter_password);
# Make the request
my $res = $ua->request($req);
The code above is sets up and makes the status RPC call to twitter. The first thing needed is an LWP::UserAgent object, which is kind of like a web browser. It makes HTTP requests of web servers. To construct the POST request, I use HTTP::Request::Common::POST. Because I can pass in form parameters as plain perl data structures, it frees me from worrying about urlencoding values and fooling around with HTTP headers that are germain to the task at hand. POST() returns an HTTP::Request object.
Adding my twitter account credentials to the request is a simple one line call to authorization_basic(). Very handy and very clean. That's all the setup I need to make the request. I pass in the HTTP::Request object to the User Agent object. That makes the actual network connection to the URL. The response comes back in the form of an HTTP::Response object, which I'll discuss next.
If all has gone well with the request, I'll get back an XML document that looks something like this:
<?xml version="1.0" encoding="UTF-8"?> <status> <created_at>Tue Apr 07 22:52:51 +0000 2009</created_at> <id>1472669360</id> <text>At least I can get your humor through tweets. RT @abdur: I don't mean this in a bad way, but genetically speaking your a cul-de-sac.</text> <truncated>false</truncated> <in_reply_to_status_id>1472669230</in_reply_to_status_id> <in_reply_to_user_id>10759032</in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <user> <id>1401881</id> <name>Doug Williams</name> <screen_name>dougw</screen_name> <location>San Francisco, CA</location> <description>Twitter API Support. Internet, greed, users, dougw and opportunities are my passions.</description> <url>http://www.igudo.com</url> <protected>false</protected> <followers_count>1027</followers_count> <profile_text_color>000000</profile_text_color> <profile_link_color>0000ff</profile_link_color> <friends_count>293</friends_count> <created_at>Sun Mar 18 06:42:26 +0000 2007</created_at> <favourites_count>0</favourites_count> <utc_offset>-18000</utc_offset> <time_zone>Eastern Time (US & Canada)</time_zone> <profile_background_tile>false</profile_background_tile> <statuses_count>3390</statuses_count> <notifications>false</notifications> <following>false</following> <verified>true</verified> </user> </status>
Most of this, I don't care about. However, I do want to see if there's an
unless ($res->is_success) {
my $c = $res->content;
my ($errstr) = ($c =~ m!<error>([^<]+)</error>!);
warn(sprintf("Post failed (%d): $errstr\n", $res->code));
exit 1;
}
print "OK\n";
exit 0;
Without the services of a full XML parser, it's relatively easy to look for an error tag and extract the contents for display. The error message I've encountered most is essentially "you used the API too much". Twitter does restrict the usage of some of their API calls, but not the status one.
If you collapse all the Perl code, you're looking at less than 20 lines of
code. If you wanted to, you could even make posts using the very handy
command line tool curl:
curl -u taskboy:s3cr3t -d "status=hello curl" \
http://twitter.com/statuses/update.xml
I will leave the checking of error messages from curl output as an excerise for the reader.
As I said, REST RPC mechanisms are fun and interesting if you already understand HTTP. However, not everyone does. I think XML-RPC and SOAP libraries to a better job of insulating the programmer from the HTTP protocol, allowing him to focus on the API task at hand.
For complicated reasons, I started playing with Flat Assembler today. Ripping apart someone else's code, I was able to come up with this gem that creates a dialog box with a dead button. When compiled, the executable is a mere 2,500 bytes. I'm not sure I'd want to do an entire app in assembler, but it does seem to cut to the chase of the Windows API very well.
format PE GUI 4.0
entry codestart
include 'win32a.inc'
IDD_MAIN = 100
ID_START = 201
section '.data' data readable writeable
hInstance dd ?
section '.code' code readable executable
codestart:
invoke GetModuleHandle, 0
mov [hInstance], eax
invoke DialogBoxParam, eax, IDD_MAIN, HWND_DESKTOP, \
MainDlg, 0
invoke ExitProcess, 0
proc MainDlg hdlg, msg, wparam, lparam
push ebx esi edi
cmp [msg], WM_INITDIALOG
je .wminitdlg
cmp [msg], WM_COMMAND
je .wmcommand
cmp [msg], WM_CLOSE
je .wmclose
xor eax, eax
jmp .finish
.wminitdlg:
jmp .finish
.wmcommand:
cmp [wparam], BN_CLICKED shl 16 + ID_START
je .startbutton
.wmclose:
invoke EndDialog, [hdlg], 0
.startbutton:
jmp .finish
.finish:
pop edi esi ebx
ret
endp
section '.idata' import data readable writeable
library kernel, 'KERNEL32.DLL',\
user, 'USER32.DLL'
import kernel,\
GetModuleHandle,'GetModuleHandleA',\
ExitProcess, 'ExitProcess'
import user,\
DialogBoxParam, 'DialogBoxParamA',\
EndDialog, 'EndDialog'
section '.rsrc' resource data readable
directory RT_DIALOG, dialogs
resource dialogs,\
IDD_MAIN, LANG_ENGLISH + SUBLANG_DEFAULT, \
main_dialog
dialog main_dialog, 'Dialog test', 0, 0, 150, 50, \
WS_CAPTION + WS_POPUP + WS_SYSMENU +\
DS_MODALFRAME + DS_CENTER
dialogitem 'BUTTON', 'Hello', ID_START, \
50, 15, 50, 20, WS_VISIBLE + WS_TABSTOP
enddialog
If you want to spider a site from a path far down a branch of the document tree, try the following wget invocation:
wget --mirror --relative --no-parent [URL]
This prevents wget from traversing back up the parent and fetching the whole site.
Loyal Taskboy Readers,
Markdown is an open source text filter written in God's Own Perl. Even though the taskboy blog is written in PHP, I can shoehorn this fitler into the comment system to allow greater markup. I may write a primate forum for taskboy built on the comment system. Stack Overflow uses this system, which is what brought it to my attention.
I ask you, readers, is it desirable that I add the Markup filter to comments?
UPDATE: Thanks to the Internet's feature of building applications before I need them, I have just run across Markup for PHP.
Blasterama was written by R. S. Brook as an example of python game programming for his sons.
This straight-forward arcade game reminded me a lot of the atari games I played growing up. Since I had access to the source code, I hacked in joystick support. Pygame does a great job at reducing the complexity of joysticks, so this was easy. Next I started cleaning up the code to make it easier to add the following features to the game:
- a lives system
- scoring
- background music
- a wave-level system
- ship defense shields
- scrolling starfield background
- game restart
- different sized aliens
- windows setup installer
- a backstory for the game :-D
So download the Windows package now or grab the source code. It's good for a laugh!
Following on the earlier post I made about programming hex maps for games, I have taken a stab at a Java program that generates different sized maps. You can download the zip file here. If you've got a recent version of the JRE (1.5), this should work for you out of the box by double clicking on it. Otherwise, you can run it by unpacking the archive and typing "java -jar HexMap.jar".
This toy just generates hex maps according to adjustable parameters. Hit the "Repaint" button to see the glory.
In a future post, I'll be explaining the math and heuristics I used. I'll also make the source code available then. However, there are better implementations I have in mind for this. Eventually, I should be able to emit an actual game that uses this work. No time, no time. Sigh.
Hex maps. Every wargamer knows them. Expert D&D players broke out of their dual-axis world during their Isle of Dread campaign to use these six-sided monsters. Here's a close up of one:
Hex maps, it is claimed, offer a more natural choice of direction for players who want to model open air terrain than the good old graph-paper, cartesian plane, 2 perpendicular axes maps. Graph paper maps do match up well to the four cardinal points of a compass, which is nice. Even better for programmers, these kinds of maps are easily represented with 2 dimensional arrays, a data structure that even grumpy old C supports!
And it's easy to find the neighboring squares in this kind of map. Consider a coordinate system whose origin (0,0) is the upper most left corner of a grid. Going to the right adds to the x coordinate, while going left subtracts from it. Similarly, going down to the bottom of the grid adds to the y coordinate, but going up lessens the y value. If the starting square can be thought of as being at the coordinate (x,y), then neighbors can be found at predictable offsets:
- Northern neighbor: (x, y-1)
- Eastern neighbor: (x+1, y)
- Southern neighbor: (x, y+1)
- Western neighbor: (x-1, y)
It's easy to extrapolate how you could find the diagonal squares too if you wanted to, but that's not desirable for wargamers. There are at most only four equidistant neighbors for any given square on the graph paper map. Sure, you could include the four squares pointed to by the diagonals of the square, but these are more distant from the center of the square than the neighbors at the cardinal points. And that makes wargamers mad! How can you accurately plot the range of your 18th century dutch cannon using only the cardinal compass points?! Hex maps offer up to six equidistant neighbors for any given hex. And that means more accurate pathing for cannon fire. Super!
But how do you model a map in which each element has six neighbors? You could make a linked list of structures that contain pointers to each of their neighbors. But that's not a particularly natural fit and would cause a great deal of programming overhead to populate and search. Is there a way to get all the benefits of a hex map into a 2D array implementation? Yes, there is but you need to to think about a hex map in a certain way.
Look at Figure 1 again, but try to see the map as three columns of hexes stacked on top of each other. Notice that although the first column (with hexes A and D) are horizontally aligned with the last column (with hexes C and F) but the the middle column (with hexes B, E and G) is a bit offset. This is an important detail that will affect our hex map model. We can order these hexes from left to right, top to bottom if we take into account the offset columns that happen every other time. That is, all the odd number columns are offset from the even numbered ones. We can make a horizontal row by starting with a left-most hex. Then, find the "northeastern" neighbor in the odd row. To find the horizontal neighbor of that hex, look to its "southeastern" side. In Figure 1, the first horizontal row is marked out as hexes A, B and C. Continue this alternation until you hit the last column of hexes.
Once we have a predictable sorting order for our hexes, we can use a 2D array to model this map. Why bother with a 2D for a linear list? The answer is that a 2D array naturally maps to the way computer screens are represented in most drawing libraries. 2D arrays also fit easily into an SQL database system. So, even though the thing we want could be represented in a number of ways, there's a lot of convenience to be had in thinking of (x,y) coordinates. So given a hex coordinate, how can we determine its neighbors in a 2D array? The answer is: it depends!
It depends on whether the hex is in a odd numbered column or an even numbered one. Consult the two tables below. I look for neighboring hexes in a clockwise direction starting at the top of the hex (which would be the northern neighbor).
| Neighbor | Coordinate offset |
|---|---|
| North | (x, y-1) |
| Northeast | (x+1, y) |
| Southeast | (x+1, y+1) |
| South | (x, y+1) |
| Southwest | (x-1, y+1) |
| Northwest | (x-1, y) |
Figure 3: Calculating Neighbors for Hexes in Even Columns
| Neighbor | Coordinate offset |
|---|---|
| North | (x, y-1) |
| Northeast | (x+1, y-1) |
| Southeast | (x+1, y) |
| South | (x, y+1) |
| Southwest | (x-1, y) |
| Northwest | (x-1, y-1) |
Figure 4: Calculating Neighbors for Hexes in Odd Columns
And for the purposes of this discussion, column 0 counts as even. Does your brain hurt a little? Mine too. Let's make this more concrete.
Using Figure 1, let's create a 2D array that maps the hexes in the right order.
| Â | 0 | 1 | 2 |
| 0 | A (0,0) | B (1,0) | C (2,0) |
| 1 | D (0,1) | E (1,1) | F (2,1) |
| 2 | - | G (2,2) | - |
Figure 5: Mapping a Hex Map into a 2D Array
I have included the coordinates of the hex in this table for easier reference. Let's walk through our algorithm to find all the neighbors of each point.
| Hex | N | NE | SE | S | SW | NW |
|---|---|---|---|---|---|---|
| A (0,0) | (0,-1) | D (1,0) | E (1,1) | B (0,1) | (-1, 1) | (-1,0) |
| B (1,0) | (1,-1) | (2,-1) | C (2,0) | E (1,1) | A (0,0) | (0,-1) |
| C (2,0) | (2,-1) | (3,0) | (3,1) | F (2,1) | E (1,1) | B (1,0) |
| D (0,1) | A (0,0) | E (1,1) | (1,2) | (0,2) | (-1, 2) | (-1,1) |
| E (1,1) | B (1,0) | C (2,0) | E (2,1) | G (1,2) | B (0,1) | A (0,0) |
| F (2,1) | C (2,0) | (3,1) | (3,2) | (2,2) | G (1,2) | E (1,1) |
Whew! That's quite a table of mind-numbing numbers! What this table is attempting to show is the neighboring hexes for each hex appearing in the left-hand column. Computations which result in a valid location are shown with that area's label. Computations that produce bum results are shown in italics, just to give you some idea how the edge cases work.
Now that you can easily find the neighbors of any hex, you can implement any of the class of algorithms designed to calculate shortest path, distance, etc. Of particular note, A* pathfinding and shortest distance routines should fall out nicely once you wrap the neighbor calculations into some kind of function.
Which is an exercise I leave for the reader. I realize that modeling hex maps is a solved problem, but I worked this stuff out for myself. Thinking of these sorts of things keeps me off the streets and high on life.
Peace out.
UPDATE: Other people are interested in hex maps too.
The silly D&D character generator I built does not print the character sheets any more. I'm trying to figure out why. It worked like a champ for a while.
UPDATE: I appear to have fixed it.
python -m pdb [script]That is all.
This is my current hobby. It's giving me a much clearer understanding how to actually use assembler to do common programming tasks. While there is little call for 6502 programmers these days, I feel like I can better understand general purpose CPU programming now.
This code is a moderate hack of the kernel code shown here.
;; Built with: http://www.atari2600.org/DASM/DASM22010b.zip
;; bin\DOS\DASm.exe playfield2.asm \
;; -IDASM\Machines\ATARI2600 \
;; -f3 -v5 -o..\ROMS\a.bin
;;
processor 6502
include "VCS.h"
include "macro.h"
PATTERN = $80 ; first byte of RAM
SHAPE = $55
BGCOLOR_ADDR = $81
BGCOLOR = $0A
BGTIMER_ADDR = $82
BGTIMER = 40
PLAYFIELD_COLOR = $0
TIMETOCHANGE = 20
SEG
ORG $F000
Reset
;; Zero out RAM, TIA
;; RAM - $80-$FF
;; TIA - 0 - $7F
ldx #0
lda #0
RAMLoop
sta 0,x
inx
bne RAMLoop
;; -------
;; Init playfield
;; -------
lda SHAPE
sta PATTERN
lda #PLAYFIELD_COLOR
sta COLUPF
ldy #0 ; animation clock
lda #$10 ; Remember,
; high nibble
; is read
; backwards
sta PF0
lda #$80 ; Entire byte
; read backwards
sta PF2
lda #$01
sta CTRLPF
;; setup the bgcolor stuff
ldx #BGTIMER
stx BGTIMER_ADDR
ldx #BGCOLOR
stx BGCOLOR_ADDR
stx COLUBK
StartOfFrame
; Start of vblank processing
lda #0
sta VBLANK
lda #2
sta VSYNC
; 3 scanlines of VSYNCH
sta WSYNC
sta WSYNC
sta WSYNC
lda #0
sta VSYNC
; 37 lines of vertical blank
ldx #$CA
VertBlank
inx
sta WSYNC
bne VertBlank
;; Playfield animation
iny
cpy #TIMETOCHANGE
bne NOTYET
ldy #0 ; reset ani clock
;; bit shift the pattern
asl PATTERN
bne NOTYET
;; Reset pattern
lda SHAPE
sta PATTERN
NOTYET
lda PATTERN
sta PF1
;; Change the color by adding
;; the aniclock tick to the
;; base color
clc
tya
adc PLAYFIELD_COLOR
sta COLUPF
;; 192 scanlines of picture
ldx #$C0
MainPicture
sta WSYNC ; strobe Wait for Sync
dex
bne MainPicture
;; Is it time to change
;; the BG COLOR?
dec BGTIMER_ADDR
bne LOAD_BGCOLOR
;; Yes - restore timer,
;; change color
ldx #BGTIMER
stx BGTIMER_ADDR
asl BGCOLOR_ADDR
bne LOAD_BGCOLOR
;; Reset BGCOLOR
ldx #BGCOLOR
stx BGCOLOR_ADDR
LOAD_BGCOLOR
ldx BGCOLOR_ADDR
stx COLUBK
;; Vertical Blanking
lda #%01000010
sta VBLANK
; 30 scanlines of overscan...
; #D2 = 226, 256-30
ldx #$D2
OverScanLoop
inx
sta WSYNC
bne OverScanLoop
jmp StartOfFrame
ORG $FFFA
;; Interrupt Vectors
.word Reset ; NMI
.word Reset ; RESET
.word Reset ; IRQ
END
I'd like to make some playable game for the atari 2600 some day. That would bring my childhood dreams to life.
The following is a follow-on to Jason Edelman's fine tutorial. (Also see listomatic.)
Here's what I want:
This is a pretty basic application of ID selectors and that nutty img() function in CSS. Even though you could view the source of the is page to see how it's done, I'll just escape it here for the lazy:
<style type="text/css">
#nav {
width: 300px;
font-size: 12px;
line-height: 20px;
}
#nav UL {
list-style-type: none;
margin: 0;
padding: 0;
}
#nav LI {
width: 100px;
background: #DDD;
background-image: none;
list-style-type: none;
margin: 0;
padding: 0;
}
#nav A {
width: 100px;
display: block;
}
#nav A:hover {
color: #CC3;
background: #CFC;
background-image: url(img/arrows.gif);
background-repeat: no-repeat;
background-position: right;
}
</style>
<ul id="nav">
<li><a href="first.html">First item</a></li>
<li><a href="second.html">Second item</a></li>
</ul>
Notice that the width for the #nav A element and the #nav LI element should agree. I wish there were a more generic way of setting this size (I suppose some fancy javascript could do it), but since I intend to use this for navigational stuff, this is just fine for me.
This code was tested in IE and Firefox.
If you are using the Perl module XML::RSS in the following way:
my $R = XML::RSS->new;
foreach my $url (@urls) {
my $content = wget($url);
$R->parse($content);
}
In older versions of the XML::RSS module, this code worked fine. However, if you have upgraded the module recently, you might have noticed the error message:
Modification of non-creatable array value attempted, subscript -1 at /usr/local/lib/perl5/site_perl/5.8.5/XML
/RSS.pm line 792.
It is not the feed that has gone rancid on you, but the library. Try instantiating a RSS object within the loop like this:
foreach my $url (@urls) {
my $content = wget($url);
my $R = XML::RSS->new;
eval{$R->parse($content)};
if ($@) {
warn("Parse error: $@");
next;
}
}
I've added some "exception" handling for free in this example so that parse errors don't blow up your program.
I'm afraid I did not dive into XML::RSS.pm to figure out the problem, but if someone with that knowledge wishes to post below, I'm sure won't be the only one who welcomes enlightenment.
This page contains a truly great hack to force XP's disk manager to offer disk mirroring. Microsoft frequently disables functionality to distinguish their products from one another. The difference between NT workstation and server was a few registry keys. Here, the hack is a little more complicated, but not for emacs or vi users.
Of course, you will want to back up the original files. And this hack voids warantees, etc...
Note that the change in dmconfig.dll isn't quite right. There should be only 4 null bytes after WINNT, not 7. This changes the file size and the offsets. Very bad.
UPDATE: Happy 1000th post, taskboy.
I woke up early this morning and starting reading about A* pathfinding, which led me to implement a binary heap in Python.
Once again, I've been playing with python for game programming on Windows. I'm up to atari 2600-level games now. Yay!
Compiling python scripts in .EXE files is extraordinarily easy, which makes shipping a python game to a machine without python installed possible. This is a huge advantage over Perl. Even the dependent libraries used by called modules are bundled easily.
I'll start posting some of my python work as soon as I create modestly interesting game.
UPDATE: For those interested in 30 seconds of Atari-like fun, I present Stubby Falls Down (3MB Windows Installer).
From Gamasutra (registration required):
«When I first started writing commercial game code, my code was liberally littered with comments, and I couldn't imagine any drawbacks to this. As time passed, I noticed something odd: the code and the comments grew increasingly out of sync, and I found that wrong comments cost me more time than correct comments saved me. The cutting and pasting, and late night alterations that happen when the pressure's on all meant that the code changed while the comments didn't.»
I couldn't be more in sync with the sentiments of this article. Copious comments can breed laziness. Debug code! If the code is hard to understand, rewrite it!
Because I can't do anything original, I copied Sean Burke and pulled my old posts from use.perl.org into taskboy, complete with the right post dated.
Now Taskboy is the COMPLETE REPOSITORY FOR WORLD KNOWLEDGE about all things Joe Johnston-ish.
Perl now has three implementions of XML::RPC, Frontier::RPC2, SOAP::Lite and RPC::XML. Each is interesting and somewhat broken in its own way.
Frontier::RPC2 0.06 has issues with Boolean, iso8601 and Base64 because these classes have a bug in their constructors.
Given:
sub new {
my $type = shift;
my $value = shift;
return bless \$value, $type;
}
Instead of checking to see if $type is a reference, this code blesses. If a reference is passed in, the blessed class is sometime like "Frontier::RPC2::Base64=SCALAR(0x65432)". This breaks code later than encodes/decodes these values into XML. Perltoot has the solution to this problem
RPC::XML is an interesting module in that it brings type checking to Perl, in a sense. When creating an XML-RPC server, each remote procedure has to the arguments it accepts along with its return type. This is call a signature. Although type checking is unPerlish, it is very helpful when dealing with other languages that are subPerl. More docs on signatures would be great.
Finally, SOAP::Lite brings its own bad self to the XML-RPC party. Although in my testing it works with all the XML-RPC datatypes, the programming interface is similar to SOAP::Lite (not surprisingly). It is a style which takes a bit of getting usef to.
The good news is that, at least from a client level, all these modules seemed to be compatible. That is, a Frontier::Client can talk to an XMLRPC::Lite server.
More to come.
I've been playing with Image::Magick lately. I started with photomosaics, which require a lot of CPU power and time. Although I had some success at creating them, I problem need to use larger source JPGs for a better mosaic.
The next thing I did was create a montage of graphics pulled from Rich Site Summary files. Check out these at Taskboy. Unfortunately, I don't have a general solution for creating these yet. To make them interesting, they should follow the links mentioned in each RSS item. Oh well. It's a fun toy anyway.
Let's all say it together: XML::Parser Sucks! There, that was cleansing.
After a much prodding of my XML buddies (hi jmac!) and an evil notion of using goto (thankfully Perl doesn't let you jump into the middle of a function), I came across a seemingly little used XML::Parser function So, here's a very goofy example of how to work with this bod boy. I'll be looking to shove this into Frontier::RPC2 in a most Eee-VEIL way. ;-)
parse_start which returns a new XML::Parser::ExpatNB object (with oh so little documentation) that does EXACTLY WHAT I NEED! I need a parser that parses a stream in increments.
Consider how useful this is for dealing with XML messages coming across the network that might f'ing HUGE! This parsing method will at least give me an opportunity to chunk the data into smaller bits (save for the pathological 45TB between a single
use strict;
use warnings;
use XML::Parser;
my $p = XML::Parser->new(
Style => 'My::Pkg',
);
print "Reading from __DATA__\n";
my $data; # A place for my text data
# Don't be fooled: it's an
# object constructor
my $nb_p = $p->parse_start(data => \$data);
while(my $l = <DATA>){
chomp($l);
$nb_p->parse_more($l);
if(my $s = ${$nb_p->{data}}){
print "Back at the range, I got $s\n";
}
}
$nb_p->parse_done;
package My::Pkg;
sub Init {
my($expat) = @_;
print "Hello!\n";
}
sub Start {
my($expat, $tag, %attrs) = @_;
${$expat->{data}} = undef;
print "Start: $tag\n";
}
sub Char {
my($expat, $text) = @_;
${$expat->{data}} = undef;
return if $text =~ /^\s*$/;
$expat->{char_bag} = $text;
}
sub End {
my($expat, $tag) = @_;
print "End: $tag\n";
${$expat->{data}} =
$expat->{char_bag};
# clean up
$expat->{char_bag} = '';
return;
}
__END__
<?xml version="1.0" ?>
<a>
<b>
<c>
<d>fiddlesticks</d>
</c>
</b>
</a>
I confess that I don't use the '-w' flag much in Perl except when I do syntax checking like:
$ perl -wc [script]
For the most part, the only warning perl issued during runtime for my programs 'Use of uninitialized value ...', which is a pretty lame warning since Perl defines new created variables to be, er, undefined. In C, you sure as heck don't want to be counting on any uninitialed variable values. Perl does the right thing, however, and that's why we use it.
Today, I found a Perl "feature" that I nearly reported as a bug. Take a look at this code:
use strict;
bar("value");
print "end\n";
sub bar {
for ($_[0]) {
/[a-z]/ && do {
print "Plunging into foo()\n";
foo();
print "You got me!\n";
return 1;
};
}
print "fall through\n";
return 1;
}
sub foo {
my $ans = '';
print "type 'q' to quit\n";
do{
last if lc (substr($ans, 0, 1)) eq 'q';
print "\nstill in loop\n";
} while( $ans = <> );
return;
}
Earlier today, I would have expected this program to print:
Plunging into foo() type 'q' to quit You got me end
This expectation would have also gone unfulfilled, since the real output is more like:
Plunging into foo() type 'q' to quit still in loop fall through end
What's going on here? It looks like the 'last' in foo() is jumping back to the loop in bar(). In fact, this is the case and -w provides some insight:
Exiting subroutine via last at ...
Kudos to p5p for adding this warning! I suspect the problem lies with the infrequently-seen-in-Perl-but-useful-Pascal construction 'do{}while()'. Since I haven't run into this behavior before and I rarely use 'do{}while(),' I'm pointing fingers at it. Still this behavior seems a little bold to me. What do you think?
Here's the deal: you're on win32, you need to fork() and you're using Perl. Since version 5.6, win32 Perl has had fork() emulation. That's good. It's emulation (accomplished at the interpreter level with threading), so standard Unix tricks like using a signal handler to reap children and control the number of child process don't work. That's bad.
Here then is my Win32 Perl recipe for the week: Limiting forked child "processes" under Win32 Perl (versions 5.6 and higher).
use strict;
use POSIX ":sys_wait_h";
use constant MAX_KIDS => 4;
my (%children, $quit) = ((), 0);
print "[$$] Entering main loop\n";
while (!$quit) {
reap(\%children);
if ((keys %children) <= MAX_KIDS) {
print "[$$] Forking child $_\n";
if (my $pid = fork()) {
$children{$pid} = 1;
} else {
do_child();
exit;
}
} else {
$quit = 1;
}
sleep(1);
}
# reap existing kids
while (keys %children) {
reap(\%children);
sleep(1);
}
print "All children reaped\n";
#--------
# sub
#--------
sub reap {
my ($kids) = @_;
for (keys %{$kids}) {
print "[$$]child '$_' reapable?\n";
next if waitpid($_,WNOHANG()) != -1;
print "[$$]child '$_' reaped\n";
delete $kids->{$_};
}
}
sub do_child {
sleep(1) for 0..10;
print "[$$] done\n";
}
Note that you can do other work in the main reaping loop.
The main loop is really the service state loop for a Win32 service. Also note that reaping the child "processes" isn't strictly necessary. Not only are the children not real processes (they are interpreter threads), but also no zombies can be created if the parent process exits without calling wait(). All child "processes" will be terminated with the parent. So you've got that going for you.
See David Roth's web site, books or articles for more excellent details on the devilish art of creating Win32 services with Perl.
[Original use.perl.org post and comments. Minor cleanup on 11/30/2007.]
Since use.perl.org has become my de facto backup solution, I now post the scripts I use to blog from winders. These are modified versions of the scripts I mentioned in a use.perl.org article published a while ago.
The emacs file:
(defvar prog
"C:/perl/bin/perl.exe F:/blog/use_perl_blog.pl"
"use_perl_journal: A SOAP client for use.perl journaling"
)
(defun edit-entry ()
"Add an entry or edit an existing one"
(interactive)
(setq cmd (concat prog " edit"))
(widen)
(shell-command-on-region (point-min) (point-max) cmd)
)
(defun get-entry (n)
"Get journal entry from use.perl.org"
(interactive "sJournal ID: ")
(setq buffer (generate-new-buffer "*use_perl_journal*"))
(switch-to-buffer buffer)
(setq cmd (concat prog (concat " -i " (concat n " get"))))
(shell-command-on-region (point-min) (point-max)
cmd 1 nil nil)
)
(defun list-entries (uid limit)
"Get journal entries"
(interactive "sUser ID: nsLimit: ")
(setq buffer (generate-new-buffer
"*use_perl:list_entries*"))
(switch-to-buffer buffer)
(setq cmd (concat prog (concat " -l " (concat
limit " -i " (concat uid " list")))))
(shell-command-on-region (point-min) (point-max)
cmd 1 nil nil)
)
(defun delete-entry (jid)
"Delete journal entry"
(interactive "nEntry ID: ")
(setq cmd (concat prog (concat " -i " (concat jid
(concat " delete")))))
(shell-command-on-region (point-min) (point-max)
cmd 1 nil nil)
)
;; don't use tabs
(setq-default indent-tabs-mode nil)
(global-set-key "C-xtl" `list-entries)
(global-set-key "C-xtg" `get-entry)
(global-set-key "C-xts" `edit-entry)
(global-set-key "C-xtm" `edit-entry)
(global-set-key "C-xtd" `delete-entry)
The perl script:
# -*-cperl-*-
# A SOAP client to post USE.PERL.ORG journal entries
use strict;
use HTTP::Cookies;
use SOAP::Lite;
use File::Basename;
use Digest::MD5 'md5_hex';
use Data::Dumper;
use Getopt::Std;
use constant DEBUG => 0;
use constant UID => -1; # your UID here
use constant PW => 's3cr3t'; # your pw here
use constant URI => 'http://use.perl.org/Slash/Journal/SOAP';
use constant PROXY => 'http://use.perl.org/journal.pl';
my $Dispatch = {
'get' => &get_entry,
'list' => &list_entries,
'add' => &add_entry,
'edit' => &edit_entry,
'delete' => &delete_entry,
};
my $opts = {};
getopts('h?vi:u:l:', $opts);
my $action = pop @ARGV;
unless ($action) {
print usage(), "n";
exit;
}
my $soap_client = make_soap();
my $exit_value = 0;
if (defined $Dispatch->{$action}) {
$exit_value = !$Dispatch->{$action}->($opts, $soap_client);
} else {
warn("Unknown action '$action'");
print usage();
$exit_value = 1;
}
exit $exit_value;
#------
# subs
#------
sub usage {
my $base = basename($0);
return qq[
$base - manage use.perl.org blog
USAGE:
$base [options] [actions]
OPTIONS:
? print this screen
h print this screen
v verbose mode
i entry ID
l limit the number of listed entries to this number
u use.perl.org user ID
ACTIONS:
add
delete
edit
get
list
Input files take the following form:
id:
subject:
body:
];
}
sub make_soap {
my $cookie = HTTP::Cookies->new;
$cookie->set_cookie( 0,
user => bakeUserCookie(&UID, &PW),
"/",
"use.perl.org",
);
return SOAP::Lite->uri(URI)->proxy(PROXY,
cookie_jar => $cookie);
}
sub add_entry {
my ($opts, $c, $in) = @_;
$in ||= parse_input();
my $ret;
if ($in->{subject} && $in->{body}) {
if ($in->{id}) {
return edit_entry(@_, $in);
} else {
$ret = $c->add_entry($in->{subject}, $in->{body});
}
} else {
$ret = $c->add_entry("Random thought #$$", $in->{all});
}
return if had_transport_error($ret);
print "add_entry got articleID: ", $ret->result, "n";
return 1;
}
sub delete_entry {
my ($opts, $c) = @_;
my ($id) = $opts->{i} ||
die "delete requires a journal IDn";
my $ret = $c->delete_entry($id);
return if had_transport_error($ret);
print "Deleted article ID '$id'n";
return 1;
}
sub edit_entry {
my ($opts, $c, $in) = @_;
# add_entry may have already read STDIN
$in ||= parse_input();
unless ($in->{id}) {
# warn("No article IDn");
return add_entry($opts, $c, $in);
}
my $ret = $c->modify_entry($in->{id},
subject => $in->{subject},
body => $in->{body},
);
return if had_transport_error($ret);
print "Updated article $in->{id}n";
return 1;
}
sub get_entry {
my ($opts, $c) = @_;
my $id = $opts->{i}
|| die "get_entry requires a journal IDn";
my $ret = $c->get_entry($id);
return if had_transport_error($ret);
if (my $hr = $ret->result) {
while (my ($k,$v) = each %{$hr}) {
print "$k: $vn";
}
} else {
warn ("Couldn't fetch journal entry '$id'n");
return;
}
return 1;
}
sub list_entries {
my ($opts, $c) = @_;
my ($uid, $limit) = (($opts->{u} || &UID), $opts->{l});
my $ret = $c->get_entries($uid, $limit);
return if had_transport_error($ret);
my $ar = $ret->result;
for my $row (@{$ar}) {
while (my ($k,$v) = each %{$row}) {
print "$k: $vn";
}
print "n";
}
return 1;
}
sub parse_input {
my %rec;
my $last_field = 'all';
while (defined ($_ = )) {
chomp($_);
if (/^(w+):s*(.*)/) {
$last_field = $1;
$rec{$last_field} = $2;
} else {
$rec{$last_field} .= "n$_";
}
}
return %rec;
}
sub bakeUserCookie {
my ($uid, $pw) = @_;
my $c = $uid . "::" . md5_hex($pw);
$c =~ s/(.)/sprintf("%%%02x", ord($1))/ge;
$c =~ s/%/%25/g;
return $c;
}
sub had_transport_error {
my ($ret) = @_;
if ($ret->fault) {
warn ("Oops: ", $ret->faultString, "n");
return 1;
}
return;
}
To post:
- M-x load-file
- new buffer with "id:nsubject:nbody:";
- add blog content to buffer
- M-x t s to publish blog to use.perl
UPDATE: It's nice to see that this script still works in 2009, four years after I wrote it. Must have done something right.
Whether pornography is psychologically damaging, socially corrosive or just plain nasty, it is clear that at least for men, it is a strong attractor and motivator. It is time we left behind antiquated nineteenth century morality and boldly embraced this core driver of male behavior to fill the more serious deficit of quality programmers in the U.S.A.
To this end, I submit my own work, a perl script that fetches the publicly available gallery at IShotMyself.com. The real benefit of this script (well, at least secondarily) is that it is my first concerted use of Andy Lester's WWW::Mechanize. Because of the positive reinforcement generated by this script, I am more likely to use this module in other, less pornographic work -- and so I move the Wheels of Industry forward! Adam Smith's Invisible Hand of Capitalism guides me!
Briefly, the script fetches the front page and looks for a link labled "FOLIO [\d+]". This page is then fetched and links that start with "javascript:" are culled. By adding on "&m=img" to these links, it is possible to get to the actual picture in question. These images are stored in a directory on my Winders box, hence the funny path names.
Based on the impressive success of this experiment, I recommend that we teach children to read by giving them digests of Penthouse Forum. We must always be thinking of the children!
I further recommend that we ease the suffering of the impoverished by eating their babies.
More Good Ideas (â„¢) to come...
use strict;
use WWW::Mechanize;
my $base_url = qq[http://www.ishotmyself.com];
my $main_page= qq[/public/main.php];
my $dest_dir = "ishotmyself";
print "Fetching $base_url/$main_page\n";
my $B = WWW::Mechanize->new;
$B->get(qq[$base_url/$main_page]);
unless ($B->success()) {
die "Couldn't fetch main page: ", $B->status();
}
print "Main page fetched\n";
print "Dumping folio links\n";
my $todays_gallery;
my $gallery = "unknown";
foreach my $l ($B->links()) {
next if $l->text() !~ /folio \[\d+\]/i;
$todays_gallery = $l;
($gallery = $l->url) =~ /([^=]+)$/;
$gallery = $1;
print "\t", $l->text(), " : ", $l->url(), "\n";
}
# find today's folio name
print "Fetching ", $todays_gallery->url, "\n";
$B->get($todays_gallery->url);
unless ($B->success()) {
print "Couldn't fetch '" . $todays_gallery->url(),
"' : ", $B->status(), "\n";
print $B->content, "\n";
}
#open OUT, ">out.txt";
#print OUT $B->content, "\n";
#close OUT;
print "Dumping folio links\n";
my @found;
foreach my $l ($B->links()) {
next if $l->url !~ /^javascript:popupLandscape/;
print "\t", $l->text(), " : ", $l->url(), "\n";
# unjavascript!
(my $url = $l->url()) =~ /\('([^']+)'\)/;
if ($1) {
push @found, "$1&m=img";
} else {
warn "Couldn't unjavascriptify!\n";
}
}
unless (-d $dest_dir) {
mkdir $dest_dir;
}
$dest_dir .= "\\$gallery";
unless (-d $dest_dir) {
print "Creating $dest_dir\n";
mkdir $dest_dir || warn "mkdir $dest_dir failed: $!";
}
print "chdir $dest_dir\n";
chdir $dest_dir;
print "Fetching public images\n";
my $cnt = 1;
for my $l (@found) {
$B->get($l);
unless ($B->success()) {
warn "picture fetch failed: ", $B->status, "\n";
next;
}
my $imgfile = sprintf("${gallery}_%03d.jpg", $cnt++);
if (-e $imgfile) {
print "\tOVERWRITING $imgfile\n";
}
if (open(OUT, ">$imgfile")) {
binmode(OUT);
print "\tWriting $imgfile\n";
print OUT $B->content();
close OUT;
} else {
warn("open $imgfile failed : $!");
}
}
print "done\n";
In my continuing quest to implement hot technology from the year 2000, I've created an RSS aggregator.
This application is more than a rehash of Rael Dornfest's Meerkat, which I miss.
I'm very fond of Firefox's Live Bookmarks, but I don't always have my computer with me. This way, I can get to the important feeds from any machine.
All VC funding offers should be directed to the email at the bottom of this page. The gravy train is pulling out of the station!
I'm also working on a sort of Wiki for Taskboy, but it's not quite ready for prime time yet.
Of late, I've had to think about mechanisms to protect web forms from getting spammed. For reasons that aren't very clear to me, spammers are writting bots to randomly fill in forms and submit them, as if that will really boost sales of the advertised (and often misspelled) product.
I apologize in advance for the scattered and disorganized state of this entry. I wanted to say something about what I've been working on.
You can look at a working version of the solution described below here.
One solution I've see comes from Blogger, which can require users to type in the letters that appear in a image. The letters in the image are distorted and there is often some background noise. This is to throw off optical character recognition routines that spammers might imploy to figure out what letters appear in the image. The distortions and noise thwart most OCR attempts.
I like this approach, but since I'm not very good with image manipulation, I needed another way to do something similar. The answer came from some reading I did years ago when I wanted to make video games for a living. In particular, I recalled that console fonts stored in PC ROM where typical composed of 8x8 matrixes. I then thought that if I could recreate these matrixes, I could represent these patterns as alternating colors in an HTML table.
Of course, it wouldn't be especially hard to create routines that read in thees tables of characters and produce the original character, but that's a refinement for another day.
Once I could translate characters into an HTML table, it was an easy thing to create a random series of characters to feed to this routine. To prevent simple packet sniffing of the secret word, I needed a way to pass something back to the client that could be used to verify the word. Since sending the password in clear text was obviously a non-started, I needed some kind of hash. The most natural hashing algorithm to use for what amounts to passwords is DES, which has been used by UNIX systems to protect their authentication system for years (although it's not really as unbreakable as PKI schemes).
The next problem was to create a mechanism where an HTML page could request this bundle of HTML tables that represent a word. This is where the oft-talked about Ajax techniques come into play. Ajax is simple an asynchronous RPC mechanism that's built with javascript and some other server-side programming technology. It took me a good bit of time to work through the most common gotchas that prevent reliable transportation of the Ajax RPC messages. Popular browsers have different DOM parsers that treat large (> 4096 bytes) data differently.
There are two major problems with the approach described here. The first is that it would be easy to write an OCR routine to read the HTMLized font. The second problem is that anyone can bypass the security by creating there own DES hash of an arbritary string and passing that string in clear text to the ajax server.
Let me tackle the second problem first. To ensure that the hash-value recieved by the ajax server can be trusted, two methods can be used. The first is to rewrite the secret word checker to use a session-based system. The user would be passed a session ID from the server. The server would remember what secret phrase it sent to the user. The user sends back their guess in clear text. The validator retrieves the secret from a backing store and makes the check.
I don't much like this stateful solution, although I believe it is superior to the original implementation. If you've got an app that already has sessions, perhaps this is a natural fit.
Another solution to this problem is to use a Public Key Infrasture mechanism to encode the secret into a hexadecimal string. When the user returns this encoded secret along with the clear-text guess, the server decrypts the secret using a private key. If the encoded secret has been tampered with, the decryption will fail. I would recomend this solution as a refinement to the anti-spamming mechanism I've described.
The second and more difficult problem with my mechanism is that it would be
trivial to write OCR for the HTML emitted. It's true that the spam bot would
have to speak javascript to get to this data at all, since the HTML is
injected into the static web page via innerHTML(). But I don't
think that's sufficiently obscure enough to baffle spammers. What would be
needed is some kind of "noise" in the HTML to making OCR difficult. Other
than adding spacing, altering capitalization and adding weird HTML attributes,
there's nothing that a solid HTML parser and a good coder couldn't get a round.
All of this has made me appreciate just how good the human brain is at finding meaning in chaos. Digital creations just aren't up to that task yet.
Here's another boring programming note to myself. Please note that I had to break lines of code for formatting reasons. Please use this syntax as a guideline. Don't try to cut and paste it.
Ajax is all the rage now. To me, it's just web services done with Javascript and Javascript blows bigger chucks than Java (and we all know how I feel about that).
In playing with the simple code on Mozilla's site, I realized that the browser complicates using Ajax as a simple RPC mechanism. The reason? Browsers are inherently multithreaded applications and so all the RPC Ajax calls must be asynchronous. That means that you can't directly port something like XML-RPC or SOAP (both of which are more or less synchronous) to Ajax. Browsers run javascript on events. When the Ajax response finally comes back to the browser, a callback function is needed to continue processing the response.
That's not so bad, but what if you really want to do this:
<script>
var uid = ↵
RPCRequest('server.php?action=getUID?username=jjohn');
</script>
The problem is that RPCRequest returns as soon as the request is made, not when the response comes back.
Here's the typical Ajax code (via the Mozilla article) that makes the request and parses the response:
// file: ajax.js
// make the request, register the callback
function makeRequest(url, cb) {
var http_request = false;
if (window.XMLHttpRequest) { // Mozilla, Safari, ...
http_request = new XMLHttpRequest();
if (http_request.overrideMimeType) {
http_request.overrideMimeType('text/xml');
}
} else if (window.ActiveXObject) { // IE
try {
http_request = new ActiveXObject("Microsoft.XMLHTTP");
} catch (e) {
}
}
if (!http_request) {
alert("No HTTP object!");
return false;
}
http_request.onreadystatechange = function() {
alertContents(http_request, cb); };
http_request.open("GET", url, true);
http_request.send(null);
return http_request;
}
// the callback to handle the response from the server
function alertContents(http_request, cb) {
try {
if (http_request.readyState == 4) {
if (http_request.status == 200) {
// parse out response
var resp = http_request.responseXML;
var val = ↵
resp.getElementsByTagName("response"). ↵
item(0).firstChild.data;
cb(val);
} else {
alert('There was a problem with the request.');
}
}
} catch (e) {
alert("Exception: " + e.message);
}
}
In this code, I expect a rather meager XML response that has only a response tag.
I got the notion that what I should do with the response handler is send it a callback from the caller to handle the parsed response. Something like the following:
<!-- file: test.html -->
<html>
<head>
<script type="text/javascript"
language="JavaScript" src="ajax.js">
</head>
<body>
<span
style="cursor: pointer; text-decoration: underline"
onclick="makeRequest('server.php', ↵
function(x){ alert(x) })">Make a request</span>
</body>
</html>
Again, this works fine when the anonymous function has a globablly defined
function like alert(). But what if you want to define a function
in the caller's HTML page and have it called from code in the ajax.js page?
I haven't figured that out yet (but see below).
The problem is that the scope of functions defined on the calling page is not visible (it seems) to the functions in the ajax.js page. I think I could get around this using fully-qualified DOM names ore something nutty like that.
I want to use objects, but of course, Javascript has classless objects which don't help here. I guess I could define the callback behavior for each call, but that seems dumb.
I need to sleep on this a bit.
UPDATE: What a difference eight hours of sleep makes!
After digging around the javascript books I have, I came up with this more object-oriented version of the ajax code that removes the callback function in favor of a user-overridden method. That nicely divides that generic RPC tasks from the caller-specific ones. The code appears to run in both Firefox and Internet Explorer on my XP box, but I'm sure it will break in Opera and Safari.
Let's look at the new ajax.js file. This defines an "class"
call RPCClient.
/* Much of this is from
* http://developer.mozilla.org/en/docs/AJAX:Getting_Started
*/
function makeRPCRequest(url) {
if (window.XMLHttpRequest) { // Mozilla, Safari, ...
this.http_request = new XMLHttpRequest();
if (this.http_request.overrideMimeType) {
this.http_request.overrideMimeType('text/xml');
}
} else if (window.ActiveXObject) { // IE
try {
this.http_request = ↵
new ActiveXObject("Microsoft.XMLHTTP");
} catch (e) {
this.errstr = e.message;
}
}
if (!this.http_request) {
alert("No HTTP object!");
return false;
}
// Can't use 'this' in callback.
// Gets confuses with http_request.
var thisObj = this;
this.http_request.onreadystatechange = function() {
thisObj.watchResponse();
};
this.http_request.open("GET", url, true);
this.http_request.send(null);
}
function parseRPCResponse() {
try {
if (this.http_request.readyState == 4) {
if (this.http_request.status == 200) {
// parse out response
var resp = this.http_request.responseXML;
var val = ↵
resp.getElementsByTagName("response") ↵
.item(0).firstChild.data;
this.handler(val);
} else {
alert('There was a problem with the request.');
}
}
} catch (e) {
alert("Exception: " + e.message);
}
}
// define the RPCClient object
function RPCClient () {
this.errstr = false;
this.http_request = false;
}
// navigator 3 bug workaround for prototype
new RPCClient();
RPCClient.prototype.send = makeRPCRequest;
RPCClient.prototype.watchResponse = parseRPCResponse;
RPCClient.prototype.handler = function (x) ↵
{ alert("Default handler.n" + x) };
The idea of this code is that the user overrides the handler
method to do something useful with the value that comes back from server.
Most of this code just replaces procedural elements with OO mechanisms. However, notice this stupid looking assignment:
var thisObj = this;
I had to do this because of scoping behavior that, as a Perl coder, I find
nutty and wrong. The trouble comes when I want to use the current object in
the http_request callback/method onreadystatechange. Like most
method declarations, I use an anonymous function to override the default
action. When I try to use the this in the callback/method, it
then refers to http_request object, not the RPCClient object.
Ugh.
I guess JavaScript has to work this way, since it doesn't have proper
namespaces or Perl closure rules. How can I tell JavaScript which object
this refers to? The answer is not to use this, but
a reference to the current RPCClient object.
This is a pretty subtle point and it's easy to get forget. Of course, if you only use JavaScript, it probably makes perfect sense to you.
Time for the calling HTML page.
<!-- file: test.html -->
<html><head>
<script type="text/javascript" language="JavaScript1.2"
src="ajax.js"></script>
<style type="text/css">
.fakeURL { text-decoration: underline; cursor: pointer;}
</style>
<script type="text/javascript" language="JavaScript1.2">
var rpc = new RPCClient();
rpc.handler = function(x) { alert("From the caller: " + x)};
function dump (obj) {
document.write("<div><p><b>Object properties</b></p><dl>n");
for (var p in rpc) {
document.write("<dt>property: <code>" + p
+ "</code></dt>" + "<dd>type: <code>"
+ typeof(rpc[p]) + "</code></dd>n");
}
document.write("</dl></div>n");
}
</script>
</head>
<body>
<script type="text/javascript" language="JavaScript1.2">
//dump(rpc);
</script>
<span class="fakeURL"
onClick="rpc.send('ajax.php')">Make a request</span>
</body>
</html>
There's a bit of debugging code in there that dumps the properties of
the RPCClient object. I left that code here for future debugging needs.
A new RPCClient object is instantiated as you'd expect. The default
handler is overridden with a trival method.
Now the RPC mechanism looks a little cleaner! It's still asynchronous, but all the caller side code can be stored with the caller.
For the record, here's the PHP server, which is just echoes the values passed to it:
<?php
header("Content-Type: application/xml");
header("Cache-Control: no-cache");
print "<?xml version='1.0'?>";
?>
<response>
<? foreach($_GET as $k => $v) {
print "$k => $vn";
}
?>
</response>
This PHP is pretty trivial, but there are a few gotchas.
You have to set the MIME type correctly and disable browser caching.
You also have to emit XML, which is no big thing, except that the XML declaration looks like PHP code to PHP, so I had to be a little crafty in emitting it.
Finally, I just echo the key-value pairs I receive. Simple!
The next iteration of the code will pass the response back as a base64 encoded string. I don't need to pass complex data to the browser, but I may want to pass HTML. From my years of XML-RPC hacking, the best way to do this is by encoding the HTML. Otherwise, you'll go mad.
Have I mentioned that XML sucks too? It does.
Happy hackin'.
The more I use Java, the more I realize what "ivory tower development" means. And I don't like one bit, sir.
Java doesn't support printf. The C function
printf is something nearly
100% of all programmers worldwide know how to use because it's how you make
words appear on a screen or in a file. More importantly, it's the function
you use to format text. That is, it allows you to control how
many decimal places appear for floating point numbers, whether leading zeros
appear in front of numbers, whether to show numbers as hexadecimal, octal or
decimal, and a whole lot more. It's used in that most famous
of introductory programs, hello.c:
#includevoid main (char **argv, int argc) { printf("Hello, world!n"); }
It's so basic a function that you can forget how darn useful it is. But, Java doesn't have it. What Java has a method, which is part of those kinds of classes that concern themselves with streams, which is a highly generic way of thinking about files and consoles and network sockets and your mom.
Ok, let's leave your mom out of this (for now).
That Java method is named println. It doesn't
know how to format your output (to make your decimal numbers all pretty or to
add fancy columns to your output), but it does know how to put the
platform-appropriate newline character on the end of your string! So,
you've got that going for you!
Now, the designers of Java aren't stupid (more on this later). There are
problems that programs using printf encounter. For instance,
internationalizing the output of programs with printf can be
difficult. Sometimes. In badly designed large programs, which few programmers
ever have to deal with.
More damningly, printf is ideologically impure. It's a
function and Java's all about the Objects, baby. Sure, they could
have made a static method of the String (or OutputStream or whatever) class
called printf, but that might cause
someone a little grief sometime somehow. Better to cheese off every programmer
coming to the language. Yup. Great thinking there.
Another noticable absentee from Java fopen. The C function
fopen is known by 99% of all
programmers worldwide and nearly all popular programming languages have
something like open (Perl has 3 closely related file open functions but you
probably only ever need open).
As programming interfaces go,
fopen is pretty nasty. You need to pass it a filename (and
all operating systems name files differently). You need pass it a mode (are
you reading from, writing to, appending to the file or all of the above?).
And what do you get back from fopen? A file handle? A file
descriptor? A lottary number? Who knows? However, it's a big puddle of
programming poo that 99% of us have learned to deal with.
Instead, java has many circumlocutions to address files and even more to get data into and out of them. Working with files is task performed in 98% of all programs ever written. And Java makes getting to these files weird, frightening and confusing. You need a File object, which is then passed to a Output/Input Stream object, which in turn is passed to a Buffered Stream object. Nice and Object Oriented. And a complete pain in the genitals.
So, let's recap. Java, which was loosed upon the world in the nineties purposefully ignored very popular input/output conventions so that 100% of the programmers who learned any other language (most have syntaxes derived from C) would be utterly baffled by the most common of all programming tasks.
Are there times where all this OO sugar pays off? Sure. And Java is very helpful in these cases. Could Java have provided a nice compatibility layer for the oceans of existing C-based programmers? Sure, it could have. Easily, in fact. But the Java language designers thought they knew more about the how to tackle the jobs that Java programmers would face than the programmers themselves.
In classical literture, this type of overweaning pride is called hubris. In my native country Massholia, we just call that "being a f*cktard."
In the realms of politics, computer engineering and cooking, so much effort is spent trying to find the perfect solution for a problem or to determine an ideal candidate for a task. Sometimes, you just have to pick the best of the choices available to you and move on with your life. It's so seductive to lavish attention on trivial details in the face of a large, seemingly insurmountable problem.
I may return to this topic later, since I have many examples from Java where good enough solutions were heinously over-engineered into unworkable messes in the pursuit of perfect. SOAP is my favorite whipping boy for a protocol done wrong by over-wrought planning. Right now, I'm preparing for the weekend.
Happy Labor-daybor.
Here's a quick note to me on how to create a CGI script that streams a process to the browser in such a way as to prompt the user for a "save as" box. This is technique is older than dirt, but still useful.
#!/usr/bin/perl --
use strict;
use CGI;
use POSIX q[strftime];
my $q = CGI->new;
if (open my $in, "/usr/bin/zip -r - logs/ |") {
my $filename="lhp_logs-".strftime("%Y%m%d", localtime()) . ".zip";
print $q->header({ "-type"=>"application/x-unknown",
"-Content-Disposition" =>
"attachment; filename=$filename",
}
);
my $buf ="";
while (read($in, $buf, 4048)) {
print $buf;
}
} else {
die "Oops: $!";
}
The error checking isn't particularly robust here. Please gold-plate this code as needed for your next review.
A project for work required me to program in that most rigid and litigious
of languages, Java. I used Apache
Axis to make SOAP calls over SSL, thus making the project buzzword
compliant. This post is about how, with the help of John Cho at VMware, I
was able to coax Java in talking with SSL hosts whose X509 certificates have
no recognized identity by the client. Typically, this means the hosts are
self-signed and are not know to trusted root certificate servers. Without
this hack, Java will refuse to use the SSL connection until the user imported
the client certificate from the self-signed server, using the
keytool utility from the JRE.
NOTE: Disabling authenticating of SSL certs is not compatible with security-sensative projects. Please understand that by using this hack, you are reducing the effectiveness of SSL security (although the actual network traffic through the SSL tunnel is still encrypted). The decision to use this hack is typical of the design choices made on the security-versus-convenience axis.
Although this code is specific to Axis, the careful reader may find this code applicable to their next Java project.
Apache Axis 1.4 (and earlier) uses the standard java.net.ssl package that is distributed with Sun's JDK 1.5 (it's in older JDKs too). The Axis class that creates the SSL socket factory that contacts the class that actually does the cert authenication is:
org.apache.axis.components.net.JSSESocketFactory.java
What we need to do create a special X509TrustManager that accepts all certs and install it as part of the SSL Factory. There's a lot of abstraction and action at a distance code here, so let's start with how to implement an all-trusting X509Certificate class.
class TrustyTrustManager implements javax.net.ssl.X509TrustManager {
public java.security.cert.X509Certificate[] getAcceptedIssuers() {
return null;
}
public void checkClientTrusted(
java.security.cert.X509Certificate[] c,
String authType) throws CertificateException {
// do nothing, accept by default
}
public void checkServerTrusted(
java.security.cert.X509Certificate[] c,
String authType) throws CertificateException {
// do nothing, accept by default
}
}
Recall that X509TrustManager is just an abstract class. It requires three
methods to be implemented. Fortunately, these methods are easy to deal with,
if you don't care about authenication. The first,
getAcceptedIssuers(), is supposed to return an array of certs
for trusted authority servers. Since we aren't going to consult any, we
can return null there. The last two methods throw an exception if the
described certificate cannot be authenticated by building a "trust path" from
the root authorities to the given one. Since not throwing an exception means
that the trust path was succesfully constructed, we do absolutely nothing in
these methods. If only all code were so easy to write!
The only catch here is that we do need to import
java.security.cert.CertificateException to handle the exceptions
that we'll never throw, otherwise we get a compiler error. Thanks for the
extra hoop, Java.
I added this class as an inner, private class to the JSSESocketFactory.java file, but you may want to isolate this code into its own file for use in other projects.
Now we need to install this X509Certificate class. This can be done by
change the code in initFactory() to something like the following:
protected void initFactory() throws IOException {
// Inspired by John Cho
try {
javax.net.ssl.TrustManager[] trusty =
new javax.net.ssl.TrustManager[] {
new TrustyTrustManager()
};
javax.net.ssl.SSLContext sc =
javax.net.ssl.SSLContext.getInstance("SSL");
sc.init(null, trusty, new java.security.SecureRandom());
sslFactory = (SSLSocketFactory) sc.getSocketFactory();
} catch (Exception e) {
throw(new IOException("SSLFactory: " + e.getMessage()));
}
}
Let me step through the code backwards, since it will help to know where
we're going with this method. The whole point of initFactory()
is to initialize the protected class member sslFactory with a
valid object. SSLSocketFactory objects are returned by
javax.net.ssl.SSLContext objects, which the documentation describes like this:
«Instances of this class represent a secure socket protocol implementation which acts as a factory for secure socket factories. This class is initialized with an optional set of key and trust managers and source of secure random bytes.»
I emphasized "trust manager" here, since that's what we implemented with the TrustyTrustManager class. Now, we just need to create an SSLContext object for the SSL protocol (not sure why there are other options here), and shove in our TrustManager.
Well, not quite. It turns out that SSLContext expects to get an array of TrustManagers.
init(KeyManager[] km, TrustManager[] tm, SecureRandom random)
We don't need KeyManagers at all and the random seed can be generated from
a method in java.security. What we do need is an array of
TrustManagers with one TrustyTrustManager object in it. Once this is
assembled, init() can be called correctly and the
SSLSocketFactory obtained.
There you are! It's as simple as rocket science, but not as hard as brain surgery.
Some advanced Java coders will probably decry this code as
cheating. It's not very OO to manhandle an existing class like this. Surely,
there must be a way to subclass JSSESocketFactory and override the behavior of
initFactory() more cleanly. The answer is "yes," but that means finding all
those places that instantiate JSSESocketFactory objects and subclassing
those classes. And, of course, you can see how this subclassing will
quickly ripple through the entire object heirarchy of Axis to destroy all
my free time. In Perl, I might have simply poked through the package namespace
at some point to overwrite initFactory(). I'm not aware how to
do that in Java effectively, but I welcome your suggestions.
UPDATE: For additional information on doing this with Apache's XML-RPC lib for Java, check this out. I think this technique may work with axis too, since it is built of the standard Sun SSL stuff. But, I haven't tried.
SOAP::Lite is a suite of Perl modules for doing SOAP RPC. SOAP::Lite was originally written by mad Russians. Its incredible flexibility is also it's main drawback. Debugging isn't as obvious as it could be with this library.
Most of the time you the would-be SOAP scripter want to know what the request and response XML messages look like. The SOAP::Trace doc doesn't make this clear (since the sample code is filled with mistakes). Here is one way to get SOAP to spew the XML messages to STDOUT they way you'd expect from a more humble module like Frontier::Client.
use SOAP::Lite "trace" => ["transport" => \&log_it];
sub log_it {
my ($in) = @_;
if (ref $in && $in->can("content")) {
printf "**GOT: %sn", (ref $in);
print "-"x60, "n";
print $in->content, "n";
print "-"x60, "n";
}
}
There's a lot of fun things going on in the log_it() subroutine. Notice the use of the oft-forgotten can() method, which all Perl objects have. Yes, this is a very special code review.
Should SOAP::Lite have a simple flag like "trace_xml" which does all this for you? Sure it should. But until then, you've got this humble blog and your old Uncle jjohn to help you.
Now get the hell off my lawn, you kids!
Programmers love to argue about which programming languages are better than others. If you listen carefully, you'll notice that there's a hierarchy to the debate. Java programmers laugh at C++. C++ coders deride C monkeys. Everyone laughs at shell hackers. No one can understand assembly programmers and those that "think in LISP" occupy a sublime and knowing orbit above all rest (so they think).
But I'm a Perl hacker first and foremost. Despite some of the more vocal members of the community, Perl is a humble language (with, as its critics will quickly add, much to be humble about). In recent years, I've forced myself out of this comfort zone into the wider world of programming. Like all generalist, I try to pick the right tool for the task. For instance, PHP is a best tool I've found for general web applications, which is why this site uses it. For quick and dirty Windows games, Python (with its pygame library bindings to SDL) stands alone. For repetative system admin tasks, shell scripts and cron combine like Voltron to create superlative software robots. The thing is, most programming languages are pretty similar. They have data types, loops and conditional branching. That's really the bare minimum you need to get things done. Where languages differ is in the services they offer the programmer.
I've heard that you love someone not for his winning attributes, but for his faults. However, I've found that there are definitely some faults that are harder to love than others. This missive is about those faults in Java that keep our relationship forever at the stage of the first date. If Java is your "Main Mama", you may want to stop reading this now.
For my work, I need to hack up a bit of Java that speaks XML-RPC. Now, you may recall that I'm no stranger to working with this protocol. I've written XML-RPC clients and servers in Python, Perl, PHP, ASP and even (God help me) C. But I missed out on Java until now.
No problem, I thought. I'll just look through my book at the chapter Simon St. Laurent wrote about using the helma XML-RPC library for Java. Simon did the lion's share of the book and did the most thorough job of any of us on the project, so I took a look. Funny thing: that library has morphed into the Apache XML-RPC library. Ok, fine. How different could it be? I mean, it was working fine before. How many changes were needed? Perl's Frontier::RPC library has hardly changed in six years (and we use it heavily at Leostream).
As it turns out, the one thing Java does well is faciliate abstractions. With the newest version of the library, you can tweak all kinds of parameters, swap out XML parsers, add additional data types (which defeats the whole effing point of XML-RPC), create new class factories -- the list goes on! In fact, there's a whole class just for configuring the XML-RPC client! Excessive you say? Just wait.
The one thing you can't do with this library is start using it quickly. The main culprit? Missing dependencies. When I tried to run a simple XML-RPC client, it complained about not knowning how to encode the XML-RPC timestamp thingie. Oy. But wait! Weren't JAR files supposed to solve this issue? Wrong again, Fatty!
Ok, so I needed to install subversion just to get the bleeding-edge version of the missing ws-common library. That's not so bad, right? Wrong. I also needed to get a nightly snapshot of the TRUNK code of the main library because the "release" version could not handle structures correctly (the unknown "string" problem). Fine. That happens. It's open source so you've got to expect the release management to get a little "cowboy" sometimes.
At length, my "hello, world" XML-RPC program got up and running. After several phone calls gloating about this teapot triumph, I proceeded on to handling more realistic and complicated data structures. I had my test server return a structure to my Java client that had a value that was an array. In perl, the structure looks something like this:
{ "foo" => "bar",
"boz" => [ "boom", "doom", "soon" ],
}
Here's a quick test: how many dictionary classes does Java have? I'm talking about generic collection types that hold key-value pairs. 1? 3? 10? Wrong! It's a trick question. In Java, new Map classes spontaneously generate all the time.
I bring this up because in order to traverse this data structure, I need to know the how to type the objects correctly. Now, in simple programs where you control the data, that's easy. When you have to deal with arbitrary data coming in from an unknown source, Java whips out the hate on you.
Because Java sucks is very advanced, I have to iterate
through methods to transverse this structure. Something like the following:
Map my_struct = get_the_struct();
Iterator it = struct.keySet().iterator();
while (it.hasNext()) {
Object this_key = my_struct.next();
Object raw_object = result.get(this_key);
// here comes the good part
Class c = raw_object.getClass();
if (c.isArray()) {
// fetching this object was so nice, I do it twice!
Object [] this_value = (Object []) result.get(this_key);
System.out.print(this_key + " => ");
int i;
for (i=0; i < this_value.length; i++ ) {
System.out.print(this_value[i] + ", ");
}
System.out.println();
} else {
// simple data type
System.out.println(this_key + " => " + raw_object);
}
}
It took me about 3 hours to puzzle out this code. It would have been swell for the docs to have an example of handling complex data like arrays and hashes, but then I would have missed out on my afternoon of personal discovery and emotional growth.
After many fruitless web, book, and source code searches, I managed to hack this code to handle my "weird" data. It's crappy, but it works.
The truly loathesome part is the way I had to work with hash values that are arrays. For reasons that aren't clear, I couldn't just use the value returned from a Map if it is an array of Objects. That would be too easy. I had to properly cast the data because Java is a bucket full of venomous hate. Of course, I need to check if the object is, in fact, an array and then fetch the object again for the cast!
Allow me to paint with a very broad brush for a moment. The stunt programming exhibited in the code above is exactly the kind of stupidity that prevents Java folks from learning about what's going on in the rest of the computer universe. Strict data typing is 100%, no-foolin' legalese.
All modern scripting languages handle this kind of "collection of random data types" better than Java. VBScript is only slightly less lawyerly about it, but it too sucks hard on big, stiff data structures (VBScript has two assignment operators: one for objects, one for everything else. Thanks for nothing, Microsoft).
It's enough to really bring me down, man.
What the hell is wrong with these language designers? Can they please stop worrying about continuations, anonymous classes, multiple inheritance, abstract interfaces, factory classes and orthogonality long enough to make a language that's useful for the kind of problems I have to deal with? I live in world of strings. If your language makes dealing with strings hard for me, I will hate you with my fists.
Can I get a "hell ya!"?
Jesus H. Christ.
Recently, I had to wade deeper into the murky, fetid waters of JavaScript. Like an encounter with a cheap hooker, a session coding with JavaScript makes me want to shower and weep. What I had to struggle with on this occasion was JavaScript's notion of associative arrays, which are implemented as Objects. All you Perler, Pythons and Rubyists out there, hold on! It's not that associative arrays are a type of object in JavaScript, it's that Objects are associative arrays (have I BLOWN your MIND yet, Java?).
Normally languages which provide associative arrays also build in
routines to list the keys, the values or both from an instance of these
variables. Not JavaScript. It's true that you can iterate through an Object's
attributes with a for/in loop, but that's a bit like using a
hammer to repair a watch, which is an analogy that also holds for parsing
query strings in URLs with JS. What a nightmare. It's always 1996 in
JavaScript's world.
Weirded out yet? Wait! I've held the most offensive bit of JavaScript classes for last. In nearly every other God-fearing lanaguage I've used, Objects are declared. That is, the object's attributes and methods are defined (usually) before an instance of that class can be used. So JavaScript, whose name clings to the fame of that paragon of OO-design Java, should have some kind of class declaration, right? I mean, declaring a class would make inheritence and object inspection easier and thus leverage the the major benefit of Object Oriented (OO) programming, right? Therefore, JS must have class declarations. But sadly, this isn't the case. JavaScript sports Classless objects. Objects in JS are defined at run time procedurally, in what seems to be an attempt to make the brains of OO zealots melt.
Classes without an inheritence mechanism are like pants without bottoms and only David Lee Roth could get away with wearing assless pants.
How does JavaScript allow users to define their own classes? You make a generic Object and start assigning methods and attributes to it, as if it were a dictionary, which it really is! How does object inheritence work? Not very well, but you can try assigning to the pseudo-attribute .prototype. Or not. Apparently, prototype is sort of busted.
Another consequence of not having classes is that you can never find out
what kind of Object you're dealing. That is, you can't ask an object,
"what class do you belong to, little fella?" The confused bastard will
answer "I'm an Object Object," which sounds a little desperate -- like the
JS object really wants you to believe it's a first class object, which
it isn't. This reminds me of a common folklore tenet in which all things
have a Truename that, if uttered, will give the speaker power over that thing.
Are JS objects superstitious? It's true you can "override" (or overwrite)
the .toString method with something about the class name, but
that's hackiferic.
Anyway, why bother with class heirarchies? Most JS scripts last only a short while and have so limited a scope. Then again, why bother pretending to have objects at all? Let's call a hash "a hash" and be done with it.
Classless classes are the assless pants of the Internet.
I've been working on building a new game since before last Christmas. I'm not ready to announce it yet, but I think it will be simpler to learn (and build) than State Secrets proved to be. As a warm-up to the real thing, I wanted to acquire the Flash skills that I think I'll need for game, which I think will have a flash interface that connects to a PHP script (a la, Funeral Quest.
Surprisingly, I think that in just a few days of watching online tutorials about Flash and some reading of the docs, that I've got the core skills that I need. Flash is a weird environment to work in, but I appreciate that Macromedia has hidden away threads and forking from me. In any case, I have created this very easy-to-defeat Tic Tac Toe game in flash. The client doesn't store any game state information, but uses the super-weird LoadVars class to make RPC calls to a PHP script which plays like a drunk co-ed. This is a technology exhibition rather than a very enjoyable toy. You may download the amateur-ish flash source code here. There are problems on the server-side that I don't care to fix. Sessions aren't properly implemented either. Good thing the price is right!
Those suffering from low self-esteem will appreciate the near-impossibility of losing this game.
I've begun again to learn how to use Macromedia's Flash. It's hard for me to pick up because the tool that authors flash documents uses an animator's paradigm, which is very unfamiliar to me. However, I did manage to put this crappy demo together of a bouncing ball married to a crappy MIDI cover of scarface's "no tears."
By God, if that's not entertaining, I don't know what is!
Perl offers many ways to commafy a number. That is, insert commas every three number for integers larger than 999. Here is my commafy routine:
sub commafy {
my $num = shift;
my $new = "";
while ($num =~ s/(?<=d)(d{3})$//g) {
$new = ",$1$new";
}
$new = "$num$new";
return $new;
}
It works backwards from right to the left. It uses the "new" look behind assertion in the regex. Works fine in Perl 5.8, and I think Perl 5.6.
UPDATE 2: Please see this post for the latest working version of this module.
UPDATE 1: This code works with XML::RSS version 1.05 or so. The newest versions of this library removed the encode() method for reasons beyond my reckoning. You can either use this code as a starting point for porting to the new XML::RSS module (and tell me how you did it!) or simply use the older version, which is still available on CPAN.
Like searching for Bigfoot, creating a podcast feed that's recognized by Apple can be an elusive, furtive and lonely process. Most know that podcasts are really just RSS 2.0 feeds with some extra tags. This seems like something perl should handle. The perl module XML::RSS nearly has everything necessary, but there's always a catch: the itunes namespace.
All is not lost, because XML::RSS is a class that can be inherited from. With a little overriding goodness, you too get make valid feeds that even Apple's iTunes music store will accept. Here's my module that inherits from XML::RSS. While it's not a complete solution, it works well enough for me.
package XML::RSS::Podcast;
use XML::RSS;
@XML::RSS::Podcast::ISA = qw[XML::RSS];
sub as_string {
my $self = shift;
return $self->as_podcast_rss;
}
sub as_podcast_rss {
my $self = shift;
my $enc = $self->{encoding};
my $output = <<EOT;
<?xml version="1.0" encoding="$enc"?>
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
version="2.0">
EOT
$output .= $self->podcast_start_channel;
for my $i (@{$self->{items}}) {
$output .= $self->podcast_item($i);
}
$output .= $self->podcast_end_channel;
return $output .= "n</rss>n";
}
sub podcast_start_channel {
my $self = shift;
my @fields = qw[ttl title description link language
pubDate lastBuildDate creator
webMaster copyright
];
my @image_fields = qw[title url description link
width height];
my @itunes_fields = qw[subtitle author summary
image];
my $output = "<channel>n";
for my $f (@fields) {
if (length($self->{channel}->{$f})) {
my $s = $self->encode($self->{channel}->{$f});
$output .= "t<$f>$s</$f>n";
}
}
my $seen_image = 0;
for my $f (@image_fields) {
if (length($self->{image}->{$f})) {
unless ($seen_image) {
$output .= "t<image>n";
$seen_image = 1;
}
my $s = $self->encode($self->{image}->{$f});
$output .= "tt<$f>$s</$f>n";
}
}
if ($seen_image) {
$output .= "t</image>n";
}
# Owner name/email not handled
for my $f (@itunes_fields) {
if (length($self->{channel}->{itunes}->{$f})) {
my $s=$self->encode($self->{channel}->{itunes}->{$f});
$output .= "t<itunes:$f>$s</itunes:$f>n";
}
}
# FIXME: Doesn't handle sub cats.
if (ref $self->{channel}->{itunes}->{category}) {
for my $c (@{$self->{channel}->{itunes}->{category}}) {
my $s = $self->encode($c);
$output .= qq[t<itunes:category text="$s" />n];
}
}
return $output . "n";
}
sub podcast_end_channel {
return "</channel>n";
}
sub podcast_item {
my $self = shift;
my $item = shift;
my @fields = qw[title guid pubDate description];
my @itunes_fields = qw[author subtitle summary
duration keywords explicit];
my $output = "t<item>n";
for my $f (@fields) {
if (defined $item->{$f}) {
$s = $self->encode($item->{$f});
$output .= "tt<$f>$s</$f>n";
}
}
if (ref $item->{enclosure}) {
$output .= "<enclosure";
for my $f (qw[url length type]) {
if (defined $item->{enclosure}->{$f}) {
$output .= qq[ $f="$item->{enclosure}->{$f}"];
}
}
$output .= "/>";
}
for my $f (@itunes_fields) {
if (defined $item->{itunes}->{$f}) {
$s = $self->encode($item->{itunes}->{$f});
$output .= "tt<itunes:$f>$s</itunes:$f>n";
}
}
return $output .= "t</item>n";
}
A word about the RFC822 pubDate. This seemingly arbitrary date format
can be easily generated with a call to strftime(). The
format string is "%a, %e %b %Y %H:%M:%S %z". You might think
that you can use mysql's DATE_FORMAT() to replicate this, but you'd be wrong.
Instead, generate mysql queries with UNIX_TIMESTAMP(), feed the result of
that to localtime() and feed that to strftime(). Simple, no? No, but such
are the challenges in programming.
Here's a sample of how I use this for pseudocertainty.com.
use strict;
use DBI;
use POSIX qw[strftime];
use MP3::Info;
my $rssfile = shift || "./ps-pod.rss";
my $dbh=DBI->connect("dbi:mysql:pseudo", "pwrUser", "s3cr3t")
or die "connect: $DBI::errstr";
my $shows = get_shows($dbh);
$dbh->disconnect;
my $rss = XML::RSS::Podcast->new(version => "2.0");
my $rfc822_fmt = '%a, %e %b %Y %H:%M:%S %z';
my $iMeta = { "author" => "Joe Johnston and Mike Lord",
"summary" => 'UFOlogy, Cryptozology and
the people who love them are
discussed on this
internet-only radio show',
"subtitle" => "Don't be Certain.
Be PseudoCertain.",
"category" => ["Talk Radio"],
};
$rss->channel(title => 'PseudoCertainty',
"ttl" => 60, # time to live
link => 'http://www.pseudocertainty.com/',
language => 'en-us',
description => 'UFOlogy, Cryptozology and the
people who love them are discussed
on this internet-only radio show',
copyright => "Copyright Joe Johnston and Mike Lord",
webMaster => "jjohn@pseudocertainty.com",
pubDate => strftime($rfc822_fmt,localtime()),
"itunes" => $iMeta,
);
for my $r (@$shows) {
# no more than 30 words
my @words = map { s/(<[^>]+>)//g; $_; }
split /s+/, $r->{about};
my $desc = "not set";
if (@words > 30) {
$desc = join " ", @words[0..29], "...";
} else {
$desc = join " ", @words;
}
my $finfo = get_mp3info("/path/to/shows/$r->{mp3_filename}");
my $pl = qq[http://pseudocertainty.com/$r->{mp3_filename}];
my $enc = { url => $pl,
length => -s "/path/to/shows/$r->{mp3_filename}",
type => "audio/mpeg",
};
my $itunes = {
explicit => "N",
keywords => "UFO aliens zorknapp",
summary => $desc,
duration => $finfo->{TIME},
}
$rss->add_item( title => $r->{title},
link => $pl,
pubDate => strftime($rfc822_fmt,
localtime($r->{pretty_created})),
enclosure => $enc,
permaLink => $pl,
description => $desc,
"itunes" => $itunes,
);
}
$rss->save($rssfile);
#------------------------
# subs
#-------------------------
sub get_shows {
my ($dbh) = shift;
my $sql = qq[
SELECT *,UNIX_TIMESTAMP(created) as pretty_created
FROM shows WHERE publish = 1 ORDER BY created DESC;
];
my $sth = $dbh->prepare($sql);
die "get_shows: '$sql': " . $sth->errstr unless $sth->execute;
return $sth->fetchall_arrayref({});
}
And they called me a fool for wanting to make XML::RSS spew podcast RSS. But I showed them! I showed them all! Bwahahhaha!
(Note 1: This is the third and last part of a series on the taskboy CMS. It's the least informative of the three, substituting opinion for fact in a thuggish way.)
(Note 2: I stole this picture of TV cook and personal hero Alton Brown from the Internet Wayback Machine, since he no longer blogs. He had an ugly incident with the public, it seems. Personally, I think it became too much of a time sink for him. I hardly have a life at all and I can't add content here daily.)
XML, SOAP and other windbags
This is a very long post already, but I'll chime in some recurring trash talk about XML-RPC that irks me. Before I start, I want to clarify one thing: this screed isn't directed against Mr. Harold, who is to be praised for making his manuscript available for free. That's something I'd like to see more of from all quarters. Mr. Harold has done his part to make the world a little brighter by providing a free and quality resource for java programers.
OK. Time for the shit storm.
In Processing XML with Java, Elliotte Rusty Harold asserts the following while introducing the section on SOAP:
«XML-RPC was in large part invented by a single person who really didn't know a lot about XML. Consequently he made many very questionable choices...
Whereas XML-RPC was a quick hack by one developer, SOAP has been developed by a committee of XML experts from various companies including IBM and Microsoft.
You've undoubtedly heard the old saw about a camel being a horse designed by committee... SOAP is a much more robust protocol than XML-RPC. It is much better designed from an XML standpoint as well. It takes advantage of numerous features of XML such as attributes, Unicode, and namespaces that XML-RPC either ignores or actively opposes. (my emphasis) XML-RPC is adequate for simple tasks. However, if you get serious with it you rapidly hit a wall...
The biggest conceptual difference between SOAP and XML-RPC is that XML-RPC exchanges a limited number of parameters of six fixed types, plus structs and arrays. However, SOAP allows you to send the server arbitrary XML elements. (my emphasis) This is a much more flexible approach. »
This is a very common argument made against XML-RPC and here's why it's dumb.
The beauty of web services doesn't lie in XML; it lies in the simplicity of the process. Yes, XML (sort of) helps this simplicity, but the real value of a web service in the work it faciliates. It's the platform neutral procedure calls that make web services interesting. Too many SOAP advocates are thinking in terms of only one language (*cough*java*cough*) and so expect a very close mesh between their language and an RPC mechanism. News flash: XML-RPC ain't RMI. I've used XML-RPC in at least a have dozen languages. SOAP is just painful, even in java (which I used for a cell phone application). Sometimes I expect an array of strings just be an array of strings.
XML-RPC is caveman simple and that is a colossal advantage. It's restriction of datatypes is an advantage that many OO-heads should wake up to. Not only are most things in life not Objects or Classes, many things for which there are Objects and Classes shouldn't be. Sometimes a plain old ASCII string is Good Enough, despite political posturing.
I dislike XML and barely tolerate it in XML-RPC. All of the XML features mentioned in the quote, particularly namespaces and user-defined datatypes make SOAP a digital Babel. I hardly like attributes in XML. SOAP's flexibility is a liability. How many times of Perl hackers been taken to task for their language's robust syntax? WSDL is little help to taming the complexity of web services here. It contributes to the traffic jam. I LIKE that there's no offical DTD for XML-RPC. DTD's are sooo SGML. Well formedness is all you need. Everything else is a data validation problem for the application.
That's right, I said: the app needs to validate data. Does your language make this easy? If not, get one that does.
The KISS principal of XML-RPC goes very, very far here. The features that SOAP has are for a future web of web services that may never come. XML-RPC is here today; stable with a boatload of implementations in many languages. SOAP is one document format too far and too far out.
The End: "Well, I'm back."
(This article continues my thoughts on the taskboy CMS.)
How the taskboy CMS works
Once I decided that content would be managed through emacs (to as large a degree as possible), the rest fell mostly into place. The blog, the music section, the polls and the ratings would all be stored in mysql and accessed through a XML-RPC API. I would use PHP to define the layout, pulling the content from the database where needed. Templates as such are not used. To my thinking, a PHP page is the template. I also decided against database abstraction classes, since I'm unlikely to move from mysql any time soon. I do have a collection of PHP utility functions (like, sql_insert, sql_delete, sql_select) to make database access less painful. Each PHP page calls the same header and footer pages. Much of this code was developed along side State Secrets. Together, this makes the PHP stuff pretty easy to modify.
Getting from emacs to PHP is a little circuitous, so please bear with me. It is straight forward to write a perl script that's an XML-RPC client using the Frontier::RPC2 library. So that's what I did first. I verified that I could talk to the PHP page that processes XML-RPC requests. Emacs is an extentable editor using the macro language lisp. The creator of Perl, Larry Wall, said of lisp that it had all the visual appeal of "porridge with toenail clippings" and I agree. However, I did learn just enough lisp to write the current emacs buffer to standard out to be read by a perl script which could then make the appropriate XML-RPC request and make some snappy response that emacs could deal with. This solution is what I wrote about on use.perl.org.
Gnu Privacy Guard
The new wrinkle for taskboy is security. The XML-RPC messages go across the network in clear text. The primary risk I wanted to address is not that someone will see my blog before it's posted, but that an unauthorized fool would mess with my XML-RPC service. Whatever authorization mechanism I choose would have to work over clear text. It's true I could have used SSL with HTTP Authentication for the web services PHP page, but I didn't want to. Fortunately there is already a solution for this kind of problem, but for a different form of internet messaging.
Back in the mid-80's, Phil Zimmerman had a problem: he couldn't prove he was him. That is, email that claimed to be from him could have been forged by some joker only claiming to be him. How could those receiving email from he be assured that the sender Phil Zimmerman was the Phil Zimmerman? The answer became known as Pretty Good Privacy and it involved some very scary math. But you can think of it as something like a lock and key mechanism. When an email is sent out, a Very Big Number is computed with the content of the message and your private key. Your private key has a sibling called a public key that the recipient of mail will already have (and verified). When the recipient gets this message, pgp uses the public key on file to decode the message (or signature). If nothing has been changed in the message, the math will work out (via magic) and you can be pretty sure that Phil indeed has told you to "go pound sand."
The important concept here is that PGP was meant to guarentee the identity of a sender using a message that anyone could read, but not change. Now in web services, I also have messages that anyone could read, but I want the server to accept only requests from me. Although it's not a seemless fit, PGP turns out to be a good authenication method for private web services. Here's how I modified Edd Dumbill's XML-RPC PHP library and Ken MacLeod's Frontier::RPC to use Gnu Privacy Guard (any open source version of PGP) to look down my web service. The strategy in both cases is that requests should be signed, not responses. It would be staight-forward to implement response signing too, but I don't deem it necessary for my application.
Tweaking the PHP server
This class merely extends the xmlrpc_server class found in xmlrpcs.inc. I need to intercept the content, verify the signature, remove it if the message checks out and pass the rest of the XML doc to the parent class for handling. Hats off to Edd and the boys for getting the class partitioned so that I needed to override only one method.
One PHP tip: name your class files with .php. That way, you can point a browser to them and check the syntax. After all the syntax typos are gone, the page will appear blank. The the contents of files with .inc extensions are typical just displayed by the web server without parsing.
VerifyRequest($data)) {
return $this->RPCError("Couldn't verify request");
}
$data = $this->RemoveSignature($data);
}
# pass off to parent
return parent::parseRequest($data);
}
#-----------------------------------------
# Look at the body of the request. Does it have
# a signature to verify?
function VerifyRequest ($data="") {
# BTW: I hate this solution
# write out to a tmpfile
$infile = "/tmp/" . posix_getpid() . ".vrf";
if ($fh = fopen($infile, "w")) {
fwrite($fh, $data);
fclose($fh);
} else {
return 0;
}
# is this signed by someone I trust?
$cmd = "/usr/bin/gpg --homedir=/path/to/gpg "
. "--verify <$infile 2> /dev/null";
$retval = 1; # default to failure
if (file_exists($infile)) {
system($cmd, $retval);
} else {
return 0;
}
unlink($infile);
return $retval ? 0 : 1;
}
#-------------------------------------------
# remove signature header/footer
function RemoveSignature ($data="") {
# for GPG
# strip of the GPG stuff to get the basic XML back
$preamble = "/-----BEGIN PGP SIGNED MESSAGE-----r?n"
. "Hash: SHA1r?nr?n/";
$footer = "/-----BEGIN PGP SIGNATURE-----r?n"
. "Version: .+r?nr?n(S+r?n)+"
. "-----END PGP SIGNATURE-----/";
$data = preg_replace($preamble,"", $data);
$data = preg_replace($footer,"",$data);
return $data;
}
#--------------------------------------------
# wrapper for easier (and non-granular) error reporting
function RPCError ($msg=0) {
return new xmlrpcresp(0,500,"Bad request: $msg");
}
}
?>
A few notes on this amateurish PHP code. First, any security wonk will tell you not to create temp files with PID names. In my case, I trust the other users on my server and don't feel compelled to improve the security here. You may want to. I'm using the fact that gpg process has an exit value of 0 if the verify succeeds. The only way I saw of getting the exit value of a process in PHP is by using system(). There are a couple of other process handling functions, but those didn't seem to give me this simple result to check (I could have used popen() and grepped through the output, but that seemed painful [although I might have done that if this were a perl module]).
parseRequest() is called by the parent class to unpack the XML request. Here, I look for the GPG signature and if all goes well, I pass just the XML string to the parent parseRequest() for processing.
Keep in mind that PHP runs as whichever user Apache runs as. This affects GPG. You have to set up the file ownership for the keys so that Apache can read and write to a directory. You should create keys specifically for this web service and not reuse your own GPG stuff. You were warned.
This class is used identically to the xmlrpc_service class defined in xmlrpcs.inc. No, I don't know what the "da_" stands for in the class name. I though I wrote "ds_", which would have stood for "digital signature."
Expanding the Frontier
For the perl client, I simply defined to classes at the start of the program. Keep in mind, this is a win32 perl program.
package RPCEncoder;
use Frontier::RPC2;
@RPCEncoder::ISA = qw[Frontier::RPC2];
sub encode_call {
my ($self) = shift;
my $request = $self->SUPER::encode_call(@_);
# sign it. 2-way opens hurt my brain
my $outfile = "C:/blog/tmp.txt";
unlink $outfile;
my $cmd = qq[|C:/blog/gnupg/gpg.exe --homedir=/blog/gnupg ]
. qq[--clearsign > $outfile];
open GPG, $cmd or die "Can't proc open: $!";
print GPG $request;
close GPG;
open IN, $outfile or die "Can't open signed $outfile: $!";
undef($request);
while () {
$request .= $_;
}
close IN;
unlink($outfile);
return $request;
}
sub decode {
my ($self) = shift;
my ($string) = shift;
my %args = ('Style' => 'Frontier::RPC2',
'use_objects' => $self->{'use_objects'},
);
$self->{'parser'} = XML::Parser->new(%args);
return $self->{'parser'}->parsestring($string);
}
#-----------------------------------------------------
package RPCClient;
use Frontier::Client;
@RPCClient::ISA = qw[Frontier::Client];
sub new {
my ($self) = shift->SUPER::new(@_);
my %args = ('encoding' => $self->{'encoding'},
'use_objects' => $self->{'use_objects'}
);
$self->{'enc'} = RPCEncoder->new(%args);
return $self;
}
The perl is a little weirder because of the way the Frontier Client works with XML::Parser, itself a horrible creation of Cthulhu. The Frontier::Client constructor needs to be overrided so that I can insert my custom RPCEncoder class, which is a thin coating over Frontier::RPC2. All the XML encoding and decoding happens in Frontier::RPC2 and that's what I need to intercept.
When making a request, I need to sign the XML string before it goes on the wire. All things being equal, I'll like to open the gpg process for reading (to feed it the string I've got in memory), but also read from it to get the output. This is a kind of double pipe, which is easy to do in shell, but weird to do with perl and especially so on Windows. Once again, I write a temp file and I don't even pretend to give security a mind. Windows boxes are typically single user machines and mine doubly so. Also note that I don't need to worry about running as a different user when I make the XML-RPC request. I'm in emacs (which runs as the current user); it spawns a shell to run perl; perl spawns a shell to run gpg.exe). All these processes run will run as me.
I had to also override decode(), because the parent uses ref($self) to determine the class name of the XML callbacks (n.b. BAD MONKEY!). This
really should have been hard coded to 'Frontier::RPC2' since the callbacks all
have hardcoded class names (see the code for the real scoop). I think this was
an attempt to make child classes easier to write, but this trick backfired.
A Quick Note on GPG setup
Getting up to speed on how GPG works took longer than integrating it into
the taskboy web service. I cannot go in to all the set up details here, but
if you are familiar with ssh key mananagement, you will be well ahead of the
game in GPG. If ssh keys make your brain hurt, GPG is a veritable migraine.
But it boils down to this: you must make a GPG key pair for the source machine
with the perl/emacs setup. You must copy the public key to the server. You
must import that key into GPG and verify it (with gpg --edit).
If you don't do all of these steps, this digital signature for XML-RPC hack
won't work and you'll be mystified at what went wrong.
Verify your GPG at all stages using test files, so that you can get the
GPG errors.
Note to jjohn: Move the *gpg files to wherever gpg want to find them. It will make things go easier on you.
The next three posts are about how XML-RPC makes taskboy go, why XML-RPC is better than SOAP and how PHP and perl excel in their own domains. I wrote these as one long article, but, taking pity on my readers, I broke the leviathan into three parts. I expect to be rewarded for my gesture of mercy with crazy-go-nuts xmas stocking stuffers this year.
Web Logs
Although web logs, more popularly called blogs, have recently come to most people's attention during the 2004 US presidential election, the idea of having an easy mechanism to post ideas large and small onto a web site is not new at all. After writing the first dozen or so HTML pages by hand, most of us starting thinking there must be a better way.
Enter the Content Management System
The content management system (or CMS) is a software package that sits on the web server (and perhaps has a special client for your local machine) which faciliates publishing web content and maintaining a consistent look and feel from page to page. The core technology for a consistent look and feel is the use of a templating system. That is, a pattern of code that content can be paired with to generate the final HTML page that visitors to a site view. A natural fit with the use of templates is an underlying database in which to keep the core, user-defined content (and possibly the templates too). So much for web design history up to 1998.
Early CMS tools were written in a variety of languages. Back in the day, Perl was the king of dynamic web applications and was used to build many CMS systems. The trouble with Perl is that it's fidgity in spots to learn and (for some people) it's complexity isn't welcomed as a templating language.
Enter PHP
PHP is a dynamic, perl-like language specifically designed for the web domain. It's both fast and feature-full (and not a little unwieldy at times from problems stemming from it's early lack of namespaces). Knocking out simple, but powerful dynamic web sites with PHP is criminally easy and fast. PHP even has a Windows help file, for those of us workin' on Uncle Billy's farm.
But not everything needed to make a web site go happens in CGI land. There are plenty of system-level things and non-http tasks for which PHP isn't suited (despite some heroic hacks from the PHP community). Wouldn't it be nice to have a way to get the best of both worlds: to use PHP for web front end stuff and perl for backend management?
Enter XML-RPC
XML-RPC is way for programs written in different languages to talk to each other. This concept has the buzzword-compliant name of "middleware" and it's a far from new idea. Programs are, almost by definitions, fiefdoms onto themselves in which data gets trapped. It's hard enough to make computers do what you want to today, let alone try to build software in such a way that it will be useful tomorrow. XML-RPC is the method by which one program's procedure is serialized as XML, passed over a network using HTTP, executed on the receiving machine who then returns the response as XML to be unserialized back into a format mostly easily consumed by the caller. It is a "web service wire protocol" and is the forerunner of the more feature-rich (but strangely less useful) SOAP. This complex dance allows Perl and PHP to use each in appropriate contexts. But what are those contexts? The answer lies with existing CMS.
Thin clients? Fat chance!
The web browser is arguably the most important application to be written in past ten years. In network architecture terms, browsers are thin clients that handle a variety of display issues and a small subset of user input tasks. This is the key to the success of the web. The browser, as envisioned by Tim Berns-Lee, was for reading and changing web pages. By latching onto the read-only aspect, browser makers forestalled what would have been a virtually endless debate about how to implement security for changing web pages. Web page security became the domain of web servers. Which is why CMS systems must be installed on web servers and can, for convenience or to implement specific features, have a client piece on a web author's desktop. Many CMS just used the browser for their client.
And this is exactly the thing that most blogging software does: use the browser to compose entries and manage the content. Blogs, often being simply one page, wouldn't have gotten so popular if each required some kind of special input client on each author's machine. Blogs then are a very specialized form of CMS.
The problem is: I hate composing prose in web browsers. Being a programmer, I've invested a lot of time learning to use a programmer's editor (emacs, in my case). What I want from a blog CMS is to compose, publish and manage my blog entries from emacs editor on my desktop. Now many a blogging CMS have some kind of web service APIs that allow you to hack up glue for your editor of choice to do the same thing that I've done with taskboy. Much of the taskboy system derives from what I did for my use.perl.org journal.
When I decided to move my first blog off use.perl.org, I knew that I wanted to continue using emacs to compose and edit my entries. However, I now had to deal with the front end display logic for the blog. Having written and modified several perl CGI/mod_perl HTML generating programs, I wanted to avoid that solution. Instead, I wanted to learn more about PHP and I'm glad I did.
Next: The dirty details
Here's a perl tip for those trying to report progress on external programs that don't report that kind of information. The case this hack was designed for was gzip, but you'll think of many other examples of this class of problem.
Perl is an incredible flexible language. Of all its wonderous features, the ability to get a file handle to a process is the most arcane and little appreciated. It is, however, the key to reporting how much input is consumed by a process.
Here's a concrete example of what I'm talking about. The compression utility gzip is a stream-oriented program that works on chunks of data that it receives typically from stardard input (STDIN). You can therefore feed gzip a file of any size and it should work, given enough disk space. The larger the file, the longer gzip takes to run (I suppose this makes the runtime a Big O of (n), linear time [so much for using my comp sci degree]).
Occassionally, you'd like to know how far along gzip is in compressing a large file. Gzip does not report this, but does give you the compression ratio at the end of the run, if you called it with the -v flag.
Without hacking gzip, you can create a perl wrapper around gzip in which you can report how many bytes gzip has consumed of the source file. The idea is that the source file is read by perl and feed to gzip. Keep tracking of how many bytes are read in the perl script is simple. Here's some code.
my $infile = shift @ARGV || die "$0n"; open GZIP, "|/bin/gzip -c > out.gz" or die "can't open process to gzip: $!"; # disable output buffering to see the progress report $|++; open IN, $infile or die "Can't open $infile: $!"; my $original_size = -s $infile; my ($buf, $sum); my $chunk = 200; while (read(IN, $buf, $chunk)) { $sum += $chunk; print GZIP $buf; printf "progress: %02.2fr", ($sum/$original_size)*100; } print "n"; close GZIP; close IN;
This short script expects to be called with the name of the file to compress. The output file name is hard coded to be "out.gz", but it's a simple matter of programming to make this more flexible. The magic begins when we open the process to gzip. Here, the GZIP file handle will be written to. The source file is then opened for reading. I choose to read the source file in very tiny chunks to clearly see the progress indicate work. Here, 200 bytes are read from the source file and then feed to gzip. The number of bytes read is tracked and reported in a straight forward way.
Two penetrating glimpses into the obvious. One: this script is built
for some flavor of UNIX. Some modifications would be needed for Windows,
including the use of binmode(IN), binmode(GZIP).
Two: this is really just a specialized echo loop. While I'm not one to
yammer on about coding patterns, I would say that nearly 90% of the code I
write is some kind of echo loop, when you take away the business logic, error
checking and other distractions.
If you only learn one thing for a programming class, it should be the humble echo loop.
In my professional life, I'm often tasked with making a help system of somekind to help users understand the application I'm building. This is notiously difficult to do. However, certain technologies have come along to make showing users what to do possible, even over dialup connections. One of those technologies is Flash, which combinds lightweight vector graphics with mp3 support.
Unfortunately, the Flash studio app costs mucho dinero and I wouldn't use the tool enough to get the full value of my investment back. Also, what I want to do is capture stuff I'm doing on my desktop along with a voiceover of what's going on. This, to my knowledge, would require an additional package.
Fortunately, there have been a few open source projects that can manipulate the Flash file format to some degree. But what I need is to capture a Windows desktop session.
The solution comes in the python version of vnc2swf. This appears to be a platform-neutral solution, which is remarkable.
Very. Good. Hack.
I wish perl had done this first, but hats off to the python hackers that made this happen. I applaud the use of a scripting language I can hack. I particularly like the edit.py feature, which allows me to add an mp3 soundtrack. I have a lot of equipment and software to manipulate sound, including Sonar for the base recording and WAV editting and RazorLame for the mp3 encoding (which I did at 32kb sample rate monophonically). I did have to time shift the original VO track to 85% of the original. That's why the demo sounds like I just snorted crystal meth. I'm not sure if there's a better trick for syncing the video and audio.
Click on the picture for the full demo of my wonderful site.
Awhile ago, I wrote a script that presents a graphic representation of RSS feeds. It's mostly a toy, but kind of fun anyway. It looks at the links in the feed, fetches those pages and extracts images from them. It then creates a mosaic of these and turns that into an image map. As I said, it's a toy.
Yesterday, I noticed it wasn't working, so I hacked it a bit more. I also changed the way the images get packed, so that the pictures are more spread out (an idea I got from one of Jon Orwant's hacks he did for O'Reilly).
Enjoy!
For my long-term gaming project, State Secrets, I had to write this gem of SQL code:
SELECT c.*,length(session_id) as online FROM characters as c LEFT JOIN users as u ON u.current_char=c.id
LEFT JOIN sessions as s USING (username) WHERE u.username != "admin" ORDER BY c.savvy DESC
Even now, that double join makes my eyes bleed.
About this blog
The taskboy blog is a exploration of computer technology by Joe Johnston. Topics of posts include practical examples Perl, PHP, Python and Java as well as book reviews, industry insights and miscellaneous good stuff.
Current Status
Watching _Brass Latern_. Ah IF, your coyness is your charm.
Posted: Sun Sep 05 16:02:15 +0000 2010
Latest Feedbag
- Stadiums vanish, but their debt lives on
- Hillary Clinton on America's future: US retains role as world leader
- Stephen Hawking looks at the cosmos in 'The Grand Design'
- Need Niche Network Group Buying Deals? Meet ChompOn
- Q&A: Five key questions about midterm elections in Congress
- Microsoft intros Kinect bundle
- European Parliament All But Rejects ACTA
- Grain Sack Doubles Up As A Water Purifier Kit
- BMW Takes Internet Car Reveals To A Weird New Level
- Monocolumn: Imelda Marcos, Mark 2
Generated: 10:45 on 08/Sep/2010
Recent posts
- Very quick git primer for basic functionality
- Tips for spammers: don't insult me
- CakePHP vs. Symfony: a quick note
- Creating events for Yahoo and Google calendars
- SANs on a budget: iSCSI under Ubuntu
- iPad, iTouch and Kindle: Which is the better mousetrap?
- Rise of the Ad-Hocracy, Part II
- Rise of the Ad-Hocracy, Part I
- Small Hiatus




