Latest News

1st August 2008
Send to twitter Send to Facebook

It seems that a few of you are using or trying to use the XML::RSS::Podcast module I published on this blog. As was mentioned there, modern version of XML::RSS do not support the encode function required for generating well-formed XML documents.

That's all changed thanks to Rohan Carly. [Add your thanks to Rohan in the comments.]

I present the complete version of XML::RSS::Podcast below. If you find this module useful, I'll attempt to create a CPAN module for it.

package XML::RSS::Podcast;
use XML::RSS;
use HTML::Entities qw[encode_entities_numeric encode_entities];
@XML::RSS::Podcast::ISA = qw[XML::RSS];
our $VERSION = q[1.1];

# encode by Rohan Carly
# Stolen from XML::RSS::Private::Output::Base;
sub encode {
  my ($self,$text) = @_;
  return unless defined($text);
  
  my $encoded_text = '';
  
  while ($text =~ s,(.*?)(<!\[CDATA\[.*?\]\]>),,s) {
    # we use &named; entities here because it's HTML
    $encoded_text .= encode_entities($1) . $2;
  }
  
  # we use numeric entities here because it's XML
  $encoded_text .= encode_entities_numeric($text);
  
  return $encoded_text;
};

sub as_string {
  my $self = shift;
  return $self->as_podcast_rss;
}

sub as_podcast_rss {
  my $self = shift;
  my $enc = $self->{encoding};
  my $output = qq[<?xml version="1.0" encoding="$enc"?>
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" 
     version="2.0">];
          
  $output .= $self->podcast_start_channel;
  
  for my $i (@{$self->{items}}) {
    $output .= $self->podcast_item($i);
  }

  $output .= $self->podcast_end_channel;
  return $output .= "\n</rss>\n";
}

sub podcast_start_channel {
  my $self = shift;
  my @fields = qw[ttl title description link language 
                  pubDate lastBuildDate creator 
                  webMaster copyright
                 ];
  my @image_fields  = qw[title url description link     
                         width height];
  my @itunes_fields = qw[subtitle author summary
                         image explicit]; # Thanks Rohan
  
  my $output = "<channel>\n";
  
  for my $f (@fields) {
    if (defined($self->{channel}->{$f})) {
      my $s = $self->encode($self->{channel}->{$f});
      $output .= "\t<$f>$s</$f>\n";
    }
  }
  
  my $seen_image = 0;
  for my $f (@image_fields) {
    if (defined($self->{image}->{$f})) {
      unless ($seen_image) {
        $output .= "\t<image>\n";
        $seen_image = 1;
      }
      my $s = $self->encode($self->{image}->{$f});
      $output .= "\t\t<$f>$s</$f>\n";
    }
  }
  
  if ($seen_image) {
    $output .= "\t</image>\n";
  }
  
  # Owner name/email not handled
  for my $f (@itunes_fields) {
    if (defined($self->{channel}->{itunes}->{$f})) {
      my $s=$self->encode($self->{channel}->{itunes}->{$f});
      $output .= "\t<itunes:$f>$s</itunes:$f>\n";
    }
  }
  # Rohan's sub-cat handling code
  # Expects an array: [category, sub-category]
  if (ref $self->{channel}->{itunes}->{category} eq 'ARRAY') {
    my $major = $self->encode(${$self->{channel}->{itunes}->{category}}[0]);
    my $minor = $self->encode(${$self->{channel}->{itunes}->{category}}[1]);
    $output .= qq[\t<itunes:category text="$major">\n];
    $output .= qq[\t\t<itunes:category text="$minor">\n];
    $output .= qq[\t\t</itunes:category>\n];
    $output .= qq[\t</itunes:category>\n];
  }

  return $output . "\n";
}
  
sub podcast_end_channel {
  return "</channel>\n";
}

sub podcast_item {
  my $self = shift;
  my $item = shift;
  
  my @fields = qw[title guid pubDate description link];
  my @itunes_fields = qw[author subtitle summary 
                         duration keywords explicit];
  
  my $output = "\t<item>\n";
  for my $f (@fields) {
                     
    if (defined($item->{$f})) {
      $s = $self->encode($item->{$f});
      my $perma = "";
      if ($f eq "guid") {
        $perma = qq[isPermaLink="false"];
      }
      $output .= "\t\t<$f$perma>$s</$f>\n";
    }
  }
  
  if (ref $item->{enclosure}) {
    $output .= "<enclosure";
    for my $f (qw[url length type]) {
      if (defined $item->{enclosure}->{$f}) {
        $output .= sprintf("$f=%s", $self->encode($item->{enclosure}->{$f}));
      }
    }
    $output .= "/>";
  }
  
  for my $f (@itunes_fields) {
    if (defined $item->{itunes}->{$f}) {
      $s = $self->encode($item->{itunes}->{$f});
      $output .= "\t\t<itunes:$f>$s</itunes:$f>\n";
    }
  }
  return $output .= "\t</item>\n";
}
1;

For an example of usage, please see the previous post cited above. Report bugs in the comments or through email. Thanks.

29th March 2002
Send to twitter Send to Facebook

Let's all say it together: XML::Parser Sucks! There, that was cleansing.

After a much prodding of my XML buddies (hi jmac!) and an evil notion of using goto (thankfully Perl doesn't let you jump into the middle of a function), I came across a seemingly little used XML::Parser function parse_start which returns a new XML::Parser::ExpatNB object (with oh so little documentation) that does EXACTLY WHAT I NEED! I need a parser that parses a stream in increments. Consider how useful this is for dealing with XML messages coming across the network that might f'ing HUGE! This parsing method will at least give me an opportunity to chunk the data into smaller bits (save for the pathological 45TB between a single [even then, there may be options]). Anyway, this is a BEAUTIFUL, LOVERLY THING!!!!

So, here's a very goofy example of how to work with this bod boy. I'll be looking to shove this into Frontier::RPC2 in a most Eee-VEIL way. ;-)

use strict;
use warnings;
use XML::Parser;

my $p = XML::Parser->new(
			 Style => 'My::Pkg',
			);

print "Reading from __DATA__\n";
my $data; # A place for my text data

# Don't be fooled: it's an 
# object constructor
my $nb_p = $p->parse_start(data => \$data);
	
while(my $l = <DATA>){
  chomp($l);
  $nb_p->parse_more($l);
  if(my $s = ${$nb_p->{data}}){
    print "Back at the range, I got $s\n";
  }
}
$nb_p->parse_done; 

package My::Pkg;

sub Init {
  my($expat) = @_;

  print "Hello!\n";
}

sub Start {
  my($expat, $tag, %attrs) = @_;
  ${$expat->{data}} = undef;
  print "Start: $tag\n";
}

sub Char {
  my($expat, $text) = @_;
  ${$expat->{data}} = undef;
  return if $text =~ /^\s*$/;

  $expat->{char_bag} = $text;
}

sub End {
  my($expat, $tag) = @_;
  print "End: $tag\n";
  ${$expat->{data}} = 
     $expat->{char_bag};

  # clean up
  $expat->{char_bag} = '';
  return;
}


__END__
<?xml version="1.0" ?>
<a>
  <b>
    <c>
         <d>fiddlesticks</d>
    </c>
  </b>
</a>

[Original use.perl.org post and comments.]

13th December 2005
Send to twitter Send to Facebook

UPDATE 2: Please see this post for the latest working version of this module.

UPDATE 1: This code works with XML::RSS version 1.05 or so. The newest versions of this library removed the encode() method for reasons beyond my reckoning. You can either use this code as a starting point for porting to the new XML::RSS module (and tell me how you did it!) or simply use the older version, which is still available on CPAN.

Like searching for Bigfoot, creating a podcast feed that's recognized by Apple can be an elusive, furtive and lonely process. Most know that podcasts are really just RSS 2.0 feeds with some extra tags. This seems like something perl should handle. The perl module XML::RSS nearly has everything necessary, but there's always a catch: the itunes namespace.

All is not lost, because XML::RSS is a class that can be inherited from. With a little overriding goodness, you too get make valid feeds that even Apple's iTunes music store will accept. Here's my module that inherits from XML::RSS. While it's not a complete solution, it works well enough for me.


package XML::RSS::Podcast;
use XML::RSS;
@XML::RSS::Podcast::ISA = qw[XML::RSS];


sub as_string {
  my $self = shift;
  return $self->as_podcast_rss;
}

sub as_podcast_rss {
  my $self = shift;
  my $enc = $self->{encoding};
  my $output = <<EOT;
<?xml version="1.0" encoding="$enc"?>
<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" 
     version="2.0">

EOT

  $output .= $self->podcast_start_channel;

  for my $i (@{$self->{items}}) {
    $output .= $self->podcast_item($i);
  }

  $output .= $self->podcast_end_channel;
  return $output .= "n</rss>n";
}

sub podcast_start_channel {
  my $self = shift;
  my @fields = qw[ttl title description link language 
                  pubDate lastBuildDate creator 
                  webMaster copyright
                 ];
  my @image_fields  = qw[title url description link     
                         width height];
  my @itunes_fields = qw[subtitle author summary
                         image];

  my $output = "<channel>n";

  for my $f (@fields) {
    if (length($self->{channel}->{$f})) {
      my $s = $self->encode($self->{channel}->{$f});
      $output .= "t<$f>$s</$f>n";
    }
  }

  my $seen_image = 0;
  for my $f (@image_fields) {
    if (length($self->{image}->{$f})) {
      unless ($seen_image) {
        $output .= "t<image>n";
        $seen_image = 1;
      }
      my $s = $self->encode($self->{image}->{$f});
      $output .= "tt<$f>$s</$f>n";
    }
  }

  if ($seen_image) {
    $output .= "t</image>n";
  }

  # Owner name/email not handled
  for my $f (@itunes_fields) {
    if (length($self->{channel}->{itunes}->{$f})) {
      my $s=$self->encode($self->{channel}->{itunes}->{$f});
      $output .= "t<itunes:$f>$s</itunes:$f>n";
    }
  }

  # FIXME: Doesn't handle sub cats.
  if (ref $self->{channel}->{itunes}->{category}) {
    for my $c (@{$self->{channel}->{itunes}->{category}}) {
      my $s = $self->encode($c);
      $output .= qq[t<itunes:category text="$s" />n];
    }
  }

  return $output . "n";
}

sub podcast_end_channel {
  return "</channel>n";
}

sub podcast_item {
  my $self = shift;
  my $item = shift;

  my @fields = qw[title guid pubDate description];
  my @itunes_fields = qw[author subtitle summary 
                         duration keywords explicit];

  my $output = "t<item>n";

  for my $f (@fields) {
    if (defined $item->{$f}) {
      $s = $self->encode($item->{$f});
      $output .= "tt<$f>$s</$f>n";
    }
  }

  if (ref $item->{enclosure}) {
    $output .= "<enclosure";
    for my $f (qw[url length type]) {
      if (defined $item->{enclosure}->{$f}) {
        $output .= qq[ $f="$item->{enclosure}->{$f}"];
      }
    }
    $output .= "/>";
  }

  for my $f (@itunes_fields) {
    if (defined $item->{itunes}->{$f}) {
      $s = $self->encode($item->{itunes}->{$f});
      $output .= "tt<itunes:$f>$s</itunes:$f>n";
    }
  }

  return $output .= "t</item>n";
}

A word about the RFC822 pubDate. This seemingly arbitrary date format can be easily generated with a call to strftime(). The format string is "%a, %e %b %Y %H:%M:%S %z". You might think that you can use mysql's DATE_FORMAT() to replicate this, but you'd be wrong. Instead, generate mysql queries with UNIX_TIMESTAMP(), feed the result of that to localtime() and feed that to strftime(). Simple, no? No, but such are the challenges in programming.

Here's a sample of how I use this for pseudocertainty.com.

use strict;
use DBI;
use POSIX qw[strftime];
use MP3::Info;

my $rssfile = shift || "./ps-pod.rss";

my $dbh=DBI->connect("dbi:mysql:pseudo", "pwrUser", "s3cr3t") 
        or die "connect: $DBI::errstr";

my $shows    = get_shows($dbh);
$dbh->disconnect;

my $rss = XML::RSS::Podcast->new(version => "2.0");
my $rfc822_fmt = '%a, %e %b %Y %H:%M:%S %z';

my $iMeta = { "author" => "Joe Johnston and Mike Lord",
              "summary" => 'UFOlogy, Cryptozology and 
                            the people who love them are 
                            discussed on this 
                            internet-only radio show',
              "subtitle" => "Don't be Certain.  
                             Be PseudoCertain.",
              "category" => ["Talk Radio"],
                          };

$rss->channel(title => 'PseudoCertainty',
              "ttl" => 60, # time to live
              link  => 'http://www.pseudocertainty.com/',
              language => 'en-us',
              description => 'UFOlogy, Cryptozology and the 
                             people who love them are discussed 
                             on this internet-only radio show',
              copyright => "Copyright Joe Johnston and Mike Lord",
              webMaster => "jjohn@pseudocertainty.com",
              pubDate => strftime($rfc822_fmt,localtime()),
              "itunes" => $iMeta,
              );

for my $r (@$shows) {
    # no more than 30 words
    my @words = map { s/(<[^>]+>)//g; $_; } 
                  split /s+/, $r->{about};
    my $desc = "not set";
    if (@words > 30) {
        $desc = join " ", @words[0..29], "...";
    } else {
        $desc = join " ", @words;
    }

    my $finfo = get_mp3info("/path/to/shows/$r->{mp3_filename}");
    my $pl = qq[http://pseudocertainty.com/$r->{mp3_filename}];
    my $enc = { url => $pl,
                length => -s "/path/to/shows/$r->{mp3_filename}",
                type => "audio/mpeg",
              };
    my $itunes = {
                  explicit => "N",
                  keywords => "UFO aliens zorknapp",
                  summary  => $desc,
                  duration => $finfo->{TIME},
                                }
    $rss->add_item( title => $r->{title},
                    link  => $pl,
                    pubDate => strftime($rfc822_fmt, 
                               localtime($r->{pretty_created})),
                    enclosure => $enc,
                    permaLink => $pl,
                    description => $desc,
                    "itunes" => $itunes,
                   );
}

$rss->save($rssfile);

#------------------------
# subs
#-------------------------
sub get_shows {
    my ($dbh) = shift;
    my $sql = qq[
  SELECT *,UNIX_TIMESTAMP(created) as pretty_created 
    FROM shows WHERE publish = 1 ORDER BY created DESC;
                 ];

    my $sth = $dbh->prepare($sql);
    die "get_shows: '$sql': " . $sth->errstr unless $sth->execute;
    return $sth->fetchall_arrayref({});
}

And they called me a fool for wanting to make XML::RSS spew podcast RSS. But I showed them! I showed them all! Bwahahhaha!

About this blog

The taskboy blog is a exploration of computer technology by Joe Johnston. Topics of posts include practical examples Perl, PHP, Python and Java as well as book reviews, industry insights and miscellaneous good stuff.

Current Status

Watching _Brass Latern_. Ah IF, your coyness is your charm.

Posted: Sun Sep 05 16:02:15 +0000 2010