Posts tagged: perl

Reading Schedule Generator

Or: some stuff I worked on over break.

This is a small solution to (some would say) an even smaller problem: I used to ignominiously return unread books to the library, and face new issues of magazines after having only read a few items in the previous one. My to-be-read pile was only getting taller.

I found I was able to read more “reliably” if I had a specific daily page goal for each item. Somehow having an “official” schedule helped me overcome my affliction.

I’ve been doing this for a few years, and have written code to help out. (Certainly there must be a name for the psychological quirk that compels me to “write code to help out”. Let me know if you come across it.)  What follows is my current “solution”: a CGI-backed web form that accepts the schedule parameters for a single reading item (book or magazine) and generates a small HTML calendar schedule that I can print from a web browser, cut out, and use as a bookmark for the item.

This seems conceptually easy, if not trivial, so I imagine similar code is “out there somewhere”, but cursory Googling didn’t turn up anything.

Anyway: the finished product appears like this (specific example is the book Freedom™ by Daniel Suarez):(Not that I recommend this book: it's pretty bad so far.)

Yes, cut on the dotted line. So my goal for today (January 6) is to get up to (or past) page 127. And I should be finished in a couple weeks.

Now I’m not psycho about this: it’s OK to get ahead of the goals if the material is compelling and I have the time. It’s also OK to fall behind if there’s just too much other stuff going on. (However, in my experience, just knowing that I’ve “fallen behind” is just enough motivation to carve out some extra time to catch up later.)

Enough for the mechanics, on to the code. For a long time I ran a Perl script from the command line to generate an HTML schedule as a static file, which I would then open/print in a browser. But over break I decided that it would be (slightly) more flexible to do the whole thing from the browser, using an HTML form to get the schedule parameters, calling a CGI backend to display the result.

The form is here (hosted on my workstation). What it looks like:

It’s a very rudimentary form, with few bells or whistles. Well, OK, one bell: the Javascript jQuery UI datepicker widget for getting the schedule’s start and end dates. I didn’t know nothin’ about jQuery before I did this. (And I only know slightly more now; if you examine the source code for the form, it’s not very complicated.)

So you fill out the form. Using our example (and showing the datepicker in action):

… hit the submit button and the resulting page should produce the appropriate schedule. (I’m pretty sure it would work for you if you want to try it.)

The real work is performed by the Perl CGI script, which relies heavily on the smarts contained in the CGI, Date::Manip, and HTML::Template modules. (What’s left: some date arithmetic that’s only tricky if you’ve done it wrong a few times. I think it handles edge cases and DST changes correctly, although I haven’t checked lately.)  If you’d like to look at the script, the plain-Perl file is here, and the html-prettyprinted version is here. The HTML template used by the script is also important and that’s here.

Larry Wall on Software Patents

At about 1:30 into the video…

Reports from YAPC::NA 2013

Here I am again at YAPC (Yet Another Perl Conference), which I pledge to blog, as always, to the extent allowed by my attention span. I arrived here in Austin, Texas yesterday. The birds here are loud and aggressive, and the weather is nice.

YAPC::NA 2013

If you’d prefer not to read my drivel and instead eaves drop on the conference itself, check out the live feeds.

Day 1

The keynote speaker this morning is reviewing 25 years of Perl, the last 15 of which I’ve been along for the ride. Time flies. I am currently forgiving myself for writing my own web framework in Perl, being reminded that Catalyst (probably the leading Perl framework today) was only three years old and not particularly mature when I was looking for such a solution.

First focused talk: Bruce Gray, “Exception to Rule“, about… yup, exceptions, in lieue of returned errors, for error handling. He’s talking about the frustration of a module throwing ‘die’, and of course, the utility of ‘eval’. The bottom line is however that the Perl 5 core still does not boast a consistent error handling approach.

“It is almost always better to die than to give the wrong information.” (which includes silent failure)

Another recommendation for autodie (wraps all core IO keywords to return exceptions rather than whatever they do natively) as well as Try::Tiny (for more advanced handling).

When writing a module however, it should be the user’s (higher-level programmer’s) choice as to what style of error handling to use, and for this Damian Conway will be revealing a Lexical::Failure module which module authors will be able to use to offer this choice to users. Apparently this will be introduced at OSCON.

Next up, John Anderson on how to Automate Yo Self. First he offers a common suggestion to manage your home directory and shell config with source control. He’s also written a few tools such as App::GitGot, for help managing multiple git repositories. These are some fairly specific tools which I won’t list out here. Lastly he recommended tuning your text editor to save you time.

After lunch, I am listening to Curtis Poe on his new module Test::Class:Moose. I’m not a Moose user yet (translation: I don’t always use objects in Perl, but when I do, I don’t use Moose. Stay thirsty, my procedural friends.). However, I am very likely to encounter Moose in the future, so this might come in handy. People have been using Test::Class to test Moose, but this new module irons out a lot of the cruft and edge cases when just using Test::Class.

Now, Bill Humphries on Perl Meets Modern Web UI. He’s demonstrating the idea of a ‘single page website’, which means, after an initial full page load, the rest of the calls are AJAX. I’ve never really done this… I use AJAX opportunistically when it makes sense… and to be honest I don’t think the pros of ‘singe page website’ architecture quite outweigh the cons (I realize I am omitting most of both here since I can’t type that fast). The one compelling reason he did provide was for situations where you want to offload much of the work to the client, because your server (example: cPanel in shared hosting environments) has limited resources and needs to use them sparingly.

Time to finish off the day with Larry Wall’s keynote, which he has entitled “Stranger Than Fact”. Good quote: “We’re not only stranger than we imagine. We’re stranger than we can imagine.” Larry is the creator of Perl.

One thing I didn’t imagine was that Larry would announce he has prostate cancer. Here’s hoping he makes it out of this ok. There is much love in the room for this man. This is the cult of TIMTOWTDI (“There Is More Than One Way To Do It”), the cult of personal freedom in computing, and he is inarguably our leader. He spoke of his cancer, of programming language design, of the emerging codes of conduct at tech conferences. None have articulated my feelings about this latter better than Larry, who contrasted Law (the codes) with Grace. Have a big soul, he says (I paraphrase)… big enough that they can grind away as much as they want and you’ve still got more. That’s Grace.

I’ve rarely, if ever, encountered a person with such interdisciplinary reach in his worldview, someone not only with a deep technical mastery, but the ability to connect this with philosophy, cosmology, spirituality. Pfft, you say, I’ve seen that in a TED talk. No, you haven’t. Larry pulls this off with a down to earth humility that’ll make you feel both unworthy and totally welcome at the same time. All I can say is, long live the King! Long live Larry Wall and Perl.

Okay… there were also lightning talks after Larry. I’m no lightning typist, so, I give you this single highlight:

“Sufficiently encapsulated ugly is indistinguishable from beautiful.” -Matt Trout

Day 2

To start off the second day, I am in a session called Hack Your Mac With Perl by Walt Mankowski. First we are covering the OS X concept of ‘services’, which govern inter-application communication. These are not to be confused with Windows services– different mechanism entirely. First, download an app called ThisService which aids in the service creation. I can see myself using this. In the end, you get: a Perl script that is key-bindable to process highlighted text. Nice.

Getting rid of services is tricky; try something called ‘Service Scrubber’ for this.

Next he is talking about ‘FSEvents’. This is what tracks changing files/directories for apps like Spotlight and Time Macine. For this we use a module called Mac::FSEvents. Triggering a Perl script when something in the filesystem changes (an ‘upload this’ directory, for instance); I could see myself using this someday too. Nice talk.

Now it’s time for Tim Bunce’s talk about Profiling Memory Usage (in Perl). This is a highly technical session. I’m no expert on Perl internals but I like to hang around and pretend I know what’s up. Tim has written a module called Devel::SizeMe. Wow. He’s actually doing graph visualizations of nested data structures and the memory allocated for each element of each array, hash key/value pairs, etc. Also subroutines! Dizzying levels of detail here. It’s amazing to visualize the amount of memory/pointers that are set up by the interpeter even for a Perl process that does exactly nothing.

Now Liz is presenting an overview of offshoot projects across Perl history entitled Perl’s Diaspora. This covers Parrot, Perl 6, Rakudo, various VM projects targeting both Perl 5 and Perl 6, etc. Most of this is not worth recounting here despite being a great summary of the many and varied Herculean efforts by some of the smartest members of this community. If you have an interest in this stuff, you have probably already been following it.

Next up: Unicode Best Practices, a talk from Nick Patch. Unicode is notoriously difficult to work with, but you’ll have to if your applications need to be internationalized. Perl has some of the best unicode support among programming languages. First off, put use utf8 at the top of your program, which will indicate that your source code itself will use the utf8 encoding. This frees you from having to use escape sequences and external reference tables. This is perhaps how I ought to approach a problem we have at UNH with our SOAP services; XML loves to barf on invalid characters and much of the data we’re shucking is user inputted.

Note: with utf8 on, don’t use

\d
in regular expressions, use
[0-9]
if you’re really only looking to match on standard ‘western’ digits. Because yes, even digits look different in many languages.

The properties matcher,

\p
or non-matcher
\P
can be used to match or not-match ASCII like so:
\p{ASCII}
Super useful, there.
\p{L}
stands for letter, and there are a number of incredibly useful properties matches (such as for currency symbols etc.).

Although internationalization is obviously a challenging aspect of programming, I really hope I get to tackle it at some point. Breaking down language barriers and unwinding that whole Tower of Babel (Babble? heh.) thing just seems like a noble application of labor. Babble on! (Babylon?)

After my first (I know! I’m sheltered…) meal at an Ethiopian restaurant (yummy) with a couple fellow hackers, I am now at Auditing Open Source Perl Code for Security by John Lightsey. Not seeing any code yet–so far, this is high-level “how to plan a security audit”. Now he has moved on to, once you have discovered a security vulnerability in a piece of open source code, what are the various options for disclosure (or less ethical options involving non-disclosure). So, this is more of a community management talk than a technical talk.

Wait, no… now he’s showing some of the vulnerability reports he’s given to CPAN module authors, some fixed, some not. Glad to see I’m not using any of the unpatched modules.

Now Karen Pauley will give an update on what The Perl Foundation has been up to in the past year. This was mostly a review of grant applications and money distribution, which I won’t recount, but various Perl efforts are always open to donations. Considering the extensive use of Perl in our systems, I would love to see UNH consider throwing a few bucks in this direction.

Onward we go… now listening to the ever-entertaining Matt Trout on Architecture Automation, One Alligator at Once. The thrust of this talk is how, when hired to consult on a codebase that needs an overhaul, and especially when all knowledge of said codebase has left the building, what investigations should you apply to learn about what in heck you’re dealing with? Dist::Surveyor is a notable mention here, which will tell you as best it can what your entire module dependency list is.

Finally, today’s keynote from Stevan Little is Perl – The Detroit of Scripting Languages. His main point is that when the Perl 6 design and implementation process began in 2000, Perl 5 development stalled. This is largely due to a not-entirely-wrongheaded commitment to preserving backward compatibility. But that can make it pretty tough to move forward, too.

Day 3

I’m starting off my third conference day with A Date With Perl (great title), a talk from Dave Rolsky who maintains the DateTime suite of modules. I saw Dave speak at YAPC::NA 2007 on this topic as well; the guy deserves a Presidential Medal of Honor for doing this work. When you realize that not only are there leap years, but leap seconds, and when you are told that there is an ‘EST’ time zone in both North America and Australia, and knowing that time zone and daylight savings time changes are at the discretion of politicians… you start to get an appreciation for Dave Rolsky. Don’t try to do datetime calculations by yourself, EVER, use a library like DateTime.pm, so you can get the benefit of all the hard work that’s been done for you.

And by the way, date and time related edge cases listed above are just the tip of the iceberg. There are hundreds if not thousands of weird exceptions to the rules that govern when. Don’t go it alone.

Dave’s jokes are hilarious; too bad it’s so early for a lot of the people here. These deserve more laughs.

Side note: I am sure a lot of people at this conference hold degrees and advanced degrees in computer science, but I haven’t actually spoken to one yet. Dave was a music major, I’ve been chumming around with a guy (about twice as smart as me) who never went to college, met another guy who majored in theatre like I did, etc. I love this field. I do work for a University, but clearly even UNH acknowledges it needs more IT help than it can find in credentialed applicants.

Next up: Unit-test CGI Scripts with mod_perl2 via Plack by Nathan Gray. I sorely need to understand Plack better, as it likely has a future in my stack. Unfortunately this talk is addressing using Plack for testing existing CGI scripts, and is not enlightening me in the ways that I need.

Now Sawyer X is speaking on Asynchronous Programming FTW (“For The Win”). This is a talk about event loops, although forking and threading are similar options for parallel processing. Sawyer X prefers the AnyEvent module (there are many choices) for handling event loops, due to its slim interface (as opposed to POE, which requires more lines of code to do the same thing). Hmmm… there is also an AnyEvent::XMLRPC which I could possibly use. I’ve never really considered that my XMLRPC services could be blocking, but of course they do. They’re just so darn fast in general that haven’t seen the need to optimize them. And I still won’t…. but I might add some benchmarking code to the services themselves to see if they ever block for as long as a second or half-second, in which case, AnyEvent::XMLRPC could come to the rescue for me. Because let’s face it: calls from to a service from different clients are going to be asynchronous by definition.

The next session for me is Inside Bokete: Tips of making web applications with Mojolicious and other components by Yusuke Wada who has traveled here form Yokohama, Japan. Mojolicious has views and controllers, but no object model (didn’t know this. My own web framework is similar in this way). He prefers DBIx::Skinny over DBIx::Class, since in Mojo you get to choose if and how to add your ORM (Object Relational Model). He also uses Carton, which is still labeled as experimental, but many of us are so desperate for a method of pinning down CPAN module versions in our applications that we just might experiment. He also uses an interesting deploy-from-git module that is very like the work I’ve been doing to deploy from Subversion. Good talk, funny guy. You have to respect all the people here for whom English is not a first language. Coding is challenging enough, imagine if all the keywords, documentation, etc. was not in English.

After lunch, Ricardo Signes, the current Perl 5 pumpking, is bringing us up to date on the language; his talk is entitled Perl 5: Postcards from the Edge. Something soon to come in a Perl 5 release sounds like a great idea: lexical subroutines. In other words, you can do

my sub foo {}
or
our sub foo {}
and scope the availability of your subroutines. Forgive me if I’ve gotten this wrong, because I’m no Java guy, but I believe this is roughly the feature you get with
public
vs.
private
vs.
protected
etc., when you are defining a Java class. Please feel to correct me in the comments because it’s likely these things are not 100% analogous.

Now Joe Axford is giving his Notes From A Newbie. It’s always impressive when a newbie looks to be about 20 years my senior… learning is a lifelong pursuit! What a positive and energetic dude. I aspire.

Now I’ve somewhat accidentally landed in Perl 6 Debugger Highlights, from Jonathan Worthington, which will likely be several meters over my head. But that’s ok. Just by looking up to try and see these things, I tend to get a little smarter. [... much interactive debugging on Perl 6 with jokes I wish I understood from Larry and Patrick in between ...] These guys are smoking my neurons. Yowza.

Day 4 and Day 5

I will not be blogging days 4 and 5, as I am in a class entitled Web and Mobile Development Using Perl, HTML5, CSS3, and JavaScript, taught by Gabor Szabo.

Unit Testing is a Little Bit Like Flossing

You know you should floss, but you don’t have time to do it right now.  Sure, you floss when you have an annoyance, like that piece of pulled pork caught between your molars.  And you floss for a couple of days, after your dental hygienist has given you ‘the lecture’.  But in the end, you don’t have time to do it right now.  Maybe later.

Writing unit tests is a little bit like flossing.  You know it’s a good thing to do.  It may not feel like fun while you’re doing it, but you’re almost always glad you did it when you’re done.  Like flossing, if you’re going to do it, and do it consistently, you need to try and establish a habit.  And there’s no time like now to get started.

In the directory where you stash your modules, create a subdirectory named “t“.  Yes, you’ve seen this directory name before when installing a module from CPAN.  Don’t panic!  You’re not going to have to build a complete distribution.  You’re just creating a place to stash your tests that is both convenient and follows the conventions used by the Perl prove utility.  Now, the next time you create some new function or method in one of your modules, create a unit test for that function in your “t” directory.  For example, let’s say I’m creating a new utility function call mySum() in the module MyModule.  I’ll also create a new unit test script named mySum.t.  As I write the function in the module, I’ll also start writing the unit tests in the test script.  Writing some logic to handle an edge case?  Write a test to probe that edge.  Test it.  Right now!  Now code up the next hard part, and test that.  Here’s a simple example you can use as a starting template:

#!/usr/bin/env perl
#
#       File: t/mySum.t
#      Usage: prove mySum.t
#   Abstract: Test my wonderful MyMod::mySum() service.

use warnings FATAL => qw(all);   # Make all warnings fatal.
use strict;                      # Keep things squeaky clean.
use Test::More;                  # Lots of testing goodies.
use File::Basename;              # Easy basic filespec parsing.
use lib '..';                    # Where to find our own modules.

my $ownName = basename($0); # For error reporting.

die("? $ownName: no command line args expected\n_ ", join(' ', @ARGV), "\n_")
 if (scalar(@ARGV) > 0);

use_ok('MyMod') or exit(1); # The module we're testing.

is( MyMod::mySum(),                 0,         "sum of nothing is zero"       );
is( MyMod::mySum(undef(), undef()), 0,         "sum of two nothings is zero"  );
is( MyMod::mySum(1,2,3,'fred'),     undef(),   "fred is not a number"         );
is( MyMod::mySum(2,2),              4,         "two small positive integers"  );

my $x = 2;

is( MyMod::mySum($x,$x),            4,         "two small positive integers"  );
is( MyMod::mySum($x,\$x),           undef(),   "can't sum a scalar reference" );

# etc.

done_testing();

# EOF: mySum.t

The example above uses the is() test function which, along with ok(), covers a lot of the sort of simple test cases you’ll want to do.  But there are lots of other very useful test functions beyond these, so be sure to check out the Test::More documentation for more details.

To run your test, from your module directory, you can do either:

$ t/mySum.t
ok 1 – use MyMod;
ok 2 – sum of nothing is zero
ok 3 – sum of two nothings is zero
ok 4 – fred is not a number
ok 5 – two small positive integers
ok 6 – two small positive integers
ok 7 – can’t sum a scalar reference
1..7
$

…or….

$ prove t/mySum.t
t/mySum.t .. ok
All tests successful.
Files=1, Tests=7, 0 wallclock secs ( 0.02 usr 0.00 sys + 0.02 cusr 0.00 CPU)
Result: PASS
$

As your test library grows, you can just enter the prove command with no parameters and it will automatically run all of the *.t files it finds in your test directory.

So, don’t worry about back-filling all of those unit tests you should have written.  But do start getting into the habit of creating unit tests, function by function, as you create new code, particularly for those utility functions that have very narrow, well defined jobs.  For your convenience, here is the toy module that was used with the above unit test.

package MyMod;

# Simple demo module with simple demo function.

use warnings FATAL => qw(all);
use strict;
use Scalar::Util qw(looks_like_number);

sub mySum

#  Precondition: Parameters consist of zero or more numeric values.
#
# Postcondition: The sum of all the numeric values is returned as our
#                functional value.  Undefined parameters are silently
#                ignored.  We return undefined if one or more values
#                do not appear to be numeric (i.e. a usage error).  We
#                return 0 if there are no defined parameters.
#
#     Unit Test: t/mySum.t

{
  my $sum = 0;
  foreach my $x (@_)
    {
      next if (not defined($x));
      return() if (not looks_like_number($x));
      $sum += $x;
    }

  return($sum);
}

1;  # So endith your typical Perl module.

Happy Birthday, Perl

The Perl programming language turns 25 today. Little did I know when I was 12 years old that I would learn a language being birthed at that moment, and pursue a career in speaking it.

Here is a tribute to / history of Perl’s first 25 years.

Long live!

Perl pet peeves: named parameters for methods

A colleague was creating an app that used the Net::LDAP module, a module I have been using myself for a very long time now. In the code there was the statement:

my $msg = $ldap->search(bind_dn => $base_dn, filter => $filter);

…that was returning a “Bad filter” error. Turns out there was no problem with the filter text, but there was with the search statement—the parameter label bind_dn is not a valid keyword for this module.

I did something similar a long time ago, using a parameter label for the bind() method that actually belonged to new(). The parameter and value were accepted without comment. It didn’t cause an error, it was just silently ignored since it was totally unexpected.

As I’m sure you know, when passing named parameters to a method, what you are really passing is a hash. So:

filter => $filter

…is really a hash entry with ‘filter’ as the key getting assigned the value from the scalar $filter. Now what you would like is for the method to throw an error if you misspell a key, but for the Net::LDAP module, and some other CPAN code I’ve seen, no check is done to see if an unexpected key is defined. The justification for this is the possible use of inheritance. If you pass a “foo” parameter to an object, it may not know what to do with “foo” but one of the other methods inherited by that object may. That’s why you’ll see things like…

my($self, %opts) = @_;

and then later…

$self->someOtherMethod($this, $that, %opts);

As for my own code, I’m still working on coming up with a good way to used named parameters with methods where all of the methods that get a set of named parameters have to ‘claim’ the ones that that particular method is using, so that in the end, if there are any unclaimed named parameters, an exception is thrown. Haven’t come with what I think is a clean solution, but I’m still thinking about it.

I hope this is the sort of thing Moose fixes.

Unslow Regexps

Our mail logs accumulate a few million lines per weekday. Some of them contain information on whether SpamAssassin considered a given message to be spam:

Jun 11 23:58:15 sunapee MailScanner[7997]: Message q5C3vVli019865 from 71.243.115.147 (info@softwareevals.org) to unh.edu is spam, SpamAssassin (not cached, score=15.565, required 5, BAYES_00 -1.90, DIGEST_MULTIPLE 0.29, HELO_DYNAMIC_IPADDR 1.95, HTML_MESSAGE 0.00, KHOP_DNSBL_BUMP 2.00, MIME_HTML_ONLY 0.72, PYZOR_CHECK 1.39, RAZOR2_CF_RANGE_51_100 0.50, RAZOR2_CF_RANGE_E8_51_100 1.89, RAZOR2_CHECK 0.92, RCVD_IN_HOSTKARMA_BL 1.70, URIBL_BLACK 1.73, URIBL_JP_SURBL 1.25, URIBL_RHS_DOB 1.51, URIBL_WS_SURBL 1.61)

or “ham” (not spam):

Jun 11 23:59:54 sunapee MailScanner[7634]: Message q5C3xp77020291 from 208.117.48.80 (bounces+54769-353f-cfg6=cisunix.unh.edu@email.news.spotifymail.com) to cisunix.unh.edu is not spam, SpamAssassin (not cached, score=-2, required 5, autolearn=not spam, BAYES_00 -1.90, HTML_MESSAGE 0.00, RCVD_IN_HOSTKARMA_NO -0.09, SPF_PASS -0.00, T_RP_MATCHES_RCVD -0.01)

(Lines wrapped for clarity.)

Perl scripts scan through the logs and produce plots of ham/spam traffic, typical example here. Such scripts need to (for lines like the above) extract the date/time information (always the first 15 characters) and whether the line says “is spam” or “is not spam”. My initial approach (years ago) was very simple-minded. Old code snippet:

while (<>) {
    if (/is spam/ || /is not spam/) {
        $date = ParseDate(substr($_, 0, 15));
        $dt = UnixDate($date, "%Y-%m-%d %H:00");
        if (/is spam/) {
             $spamcount{$dt}++;
        else {
             $hamcount{$dt}++;
        }
    }
}

(ParseDate and UnixDate are routines from the Date::Manip package. Without getting bogged down in details, they allow the messages to be counted in hour-sized buckets for the plot linked above.)

For not-particularly-important reasons, I decided to try an “all at once” regular expression on each line instead. New code snippet:

use Readonly;
use Regexp::Common qw/net/;

Readonly my $SPAMLINE_RE => qr {
    ^
    (\w{3} \s+ \d+ \s \d{2}:\d{2}:\d{2}) \s
    \w+ \s MailScanner \[\d+\]: \s
    Message \s \w+ \s
    from \s ($RE{net}{IPv4}) \s
    \([^)]*\) \s to \s \S+ \s
    (is \s (?:not \s)? spam)
}x;

while (<>) {
    if ( $_ =~ $SPAMLINE_RE ) {
        my $d        = $1;
        my $spamflag = $3;
        my $date     = ParseDate($d);
        my $dt       = UnixDate( $date, '%Y-%m-%d %H:00' );
        if ( $spamflag eq 'is spam' ) {
            $spamcount{$dt}++;
        }
        else {
            $hamcount{$dt}++;
        }
    }
}

There’s plenty to criticize about both snippets, but mainly I was thinking: running the second version’s huge, hairy regular expression on each log line will just have to be way slower than the simple fixed-string searching in the first version. (Even though there can be up to three searches per line in the first version, they’re searches for fixed strings, right? Right?)

Wrong, as it turns out. The hairy second version ran slightly faster than the simple-minded first version. (In addition it extracts the IP address of the server that sent us the message; not used here, but important in other scripts.)

Humbling moral: even after many years of Perl coding, my gut feelings about efficiency are not to be trusted.

Reports from YAPC::NA 2012

Since 2007, I have followed the Perl community to Houston, Pittsburgh, Asheville NC, and today: Madison, WI. This is YAPC::NA, Yet Another Perl Conference, North America, and as usual I will try my best to “live blog” it for posterity.

This year, though, there is a yet higher res option available to you: the entire event will be live streamed. Rock on. There’s no substitue for being here, but the streams should be a close second.

UPDATE (6/25): They are now posting the presentation videos, in retrospect, on YouTube.

The first two days of the conference are ‘hackathon’ days; those working on open source projects together presumably do this, while others take the opportunity to socialize, work on work-work, and genrally begin to get their geek on in the company of like-minded free software folk.

Below the break, live-ish blogging. No auto-refresh; that’s on you. :)

You may also enjoy a few pictures I took of scenes I saw in Madison.

a scene from Madison, WI
——-
Day 3 (regular sessions)

9:11pm: The conference is now over, but I just had great Hibachi at a Japanese restaurant on Madison’s State Street, complete with preparation performance and fire. Best meal I’ve had in recent memory.

5:11pm: Larry Time was a question and answer session, which turned out to be just as rewarding as a prepared speech. Larry is brilliant in a very interdisciplinary way… which may be redundant, considering brilliance might be something about integrating disparate ideas. But it strikes me that Larry Wall’s approach to designing Perl has been integral to its success, in that, he made it capable of great things while maintaining an easy, unintimidating on-ramp for newbies. This was evident in one his responses to a computer science guy, whose question I won’t embarrass myself by paraphrasing.

Following Larry was Damian Conway, who sent a prepared video in his absence. He is working on an amazing tool called rxrx (regular expression diagnostics– the name itself is genius) which actually shows you, visually, Perl’s thinking process as it attempts to match a string you provide with a regex you provide. This is nothing short of magick, and he hopes to release it by OSCON next month.

3:24pm: At long last… it’s Larry Time.

2:24pm: Time for Web Security 101. Please don’t be totally scared; I do know a little about the subject. But if you’re doing this stuff professionally, you can never attend too many ’101′ sessions on security. A tidbit you might have missed just may be shared.

I am going to bullet point these tidbits today, as many as I can keep up with:

  • whitelist inputs, don’t blacklist
  • no easy solutions. Security is everywhere.
  • Perl has an excellent security track record, but know your tools. PHP, really, does not. (conference bias? not really)
  • FILE::SPEC->no_upwards($path) to scrub input to be used for system commands. Nice. Taint mode won’t catch this on its own, it’ll just force you to have a look at the inputs in question. This one is going on the to-do.
  • Bind variables to prevent SQL injections. Check.
  • Interesting, you cannot leverage bind variables with LIMIT (a MySQL-ism), but since LIMIT value must be a number, this is be easy enough to sanitize on your own.
  • Wow. Paul Fenwick just picked out a potential security exploit in a code example, involving the possibility of a list coming in from CGI as opposed to a single value on a param. I bow now.
  • Now, XSS. Escape HTML characters on output, but not necessarily when storing it. Never output user-entered event handling, etc.
  • CSRF. Block important actions that don’t come from HTTP-REFERER that is you. Not always possible because sometimes you actually wan to support remote actions. You can also use per-user-request tokens which can help block CSRF attacks, but there is a tradeoff on how long these should be valid for.
  • DOS. Denyhosts and such at the host level, but stopping DOS at the network level is more effective.
  • DDOS. There are appliances as well as services that can help detect distributed denial of service, because this is hard. The services will often be aware of known botnets, which can be handy.
  • Buffer overflows. Use vetted, open software if it is written in C. Write your own C at your own peril, if it is internet exposed.
  • Salted hashes and rainbow tables. LinkedIN, we’re looking at you. Crypt::PBKDF2 does automatic salting and multiple iterations. That’s going on the to-do. A bit of discussion on the pain in converting to new encryption, considering you can’t revere engineer even your older, weaker hashes.
  • Avoid leaking information through error messages (especially in production). Bill Costa handles this nicely in a module of his that I use. Thanks Bill.

Great security war story was just told, about using mod_rewrite to reflect DDOS attacks back on the attackers.

Talk is over. Well worth attending. I got two to-do’s out of this. Worth the price of the conference alone.

1:38pm: I am now in The Lacuna Expanse. Well, not literally, because that is in a galaxy far, far away. How cool is this. A massive, multiplayer game written in Perl. They have a web client, a desktop client, an iPhone client, and a command line client. Said ‘wow’ yet?

Without sharing everything we learned about the stack this game is built on, I have to observe: games are among the more impressive technological achievements in programming. Really. Most completely wipe the floor with any business app I’ve ever seen. I guess (coding) time flies when you’re having (even more) fun.

11:30am: Next up, Baby XS To Get You Started. XS is the way to write a Perl module that is actually much faster C code. CSV_XS, mentioned below, is one of these. This is interesting because, within XS, you are not locked into pure C. You can leverage many of Perl’s niceties, you just have to use the ‘perlguts’ functions, which are not by the same names as Perl’s userspace keyword functions, but provide that same support.

XS is definitely advanced stuff, but if your journeys in Devel::NYTProf reveal a bottleneck that is Perl itself, converting that code to XS may be an option to speed things up. Inline::C actually loads XS support and dynamically links it, so you throw around C code in your scripts at will, should you be so incline… of course, in that case, you are not building a module, you’re writing one-offs.

11:10am: I am currently attending Refactoring Perl Code. He begins with the risk/benefit analysis of doing so, and the importance of a test suite if you are refactoring a large codebase. The perils of global replace are now being mentioned, such as when changing variable names.

9:55am: Now: Introduction to Performance Tuning Perl Web Applications. This could prove interesting, since my only true adventures in performance tuning involved implementing FastCGI and rewriting some narsty SQL statements. Yes, that’s ‘narsty’, as ‘nasty’ does no justice to some of the sub-selects I have attempted.

As might be expected, Devel::NYTProf has a big place in this talk, as this module has been “the only game in town” for some years now. I don’t use it yet, because I mostly support *relatively* low traffic, internally used applications.

The speaker got a good chuckle out of the crowd with this one: “Now that you have this awesome Perl profiling tool… ” Next slide: IT’S PROBABLY YOUR DATABASE. Oh so true. SQL can be narsty. He recommends pt-query-advisor for MySQL, part of the Percona toolkit. He is also mentioning having to escape your ORM (object-relational mapper) in many cases, due to performance issues, which I’m sure I have mentioned before is one of my reasons for still having avoided ORMs.

DBI tip: for large numbers of inserts and updates, managing your db commits can speed you up immensely. That is, if things like LOAD_INFILE or CSV_XS aren’t options. Those will always be faster.

He is now addressing caching, which will make your code more complex, because you have to track your data dependencies and invalidate portions of the cache in order to pick up changes… lordy lord, please don’t let performance issues ever lead me down this path. If they do however, the speaker is recommending the CHI module from CPAN to help manage this stuff.

Now, the requisite mentions of the persistent daemon environments: FastCGI, mod_perl, and Plack. Implementing FastCGI was definitely of the best decisions I ever made. Snappety snap snap page loads. Yum. I can still remember my initial joy over whatever X-Mas break I did that on.

“A boatload of RAM hides a multitude of sins.” SSDs, too. Of course, you need to know where your bottleneck is in order to throw the right hardware at it.

9:00am: After falling off the blogging wagon yesterday, here I am again to report on the final day of what has been (yet!) another great Perl conference.

First up this morning: Utils Are Your Friends. The ‘Util’ family of modules is one of the oldest, comprised of things that weren’t in Perl core… some of the ‘one more thing’ features. Perl core since 5.10 actually does contain features previously only support by Util modules, but Util still does many things that core does not.

Some extended discussion of inside-out classes now. Apparently Scalar::Util provides the tools to make this more possible. Using inside-out classes has something to do with a module author wanting to better protect the data stored in an object. The concept was invented (in Perl) by DBI author Tim Bunce.

Wow: This does not work (in all cases):

# check for number
if ( $var =~ m/^\d+/ ) {

… there is actually a Scalar::Util function to do this for you in a more reliable and way more readable way:

# really check for a number, in all exotic cases (avoiding 'false' positives)
if ( looks_like_number($var) ) {

There is also ‘set_prototype’ which allows you to override a sub’s prototype. The speaker calls into question if prototypes are even a good idea. Being able to override them makes the idea seem worse. But I’m going to admit right here that I still don’t completely grasp the pros and cons of this. I have yet to use prototypes myself.

a scene from Madison, WI

Day 2 (regular sessions)

6:10pm: Blogging lapsed today, as I lapsed into some work that needed to be done. See you tomorrow.

11:50am: Okay, it was entirely review. But that’s ok, since I can barely focus my eyes at this point. Time for a much needed lunch break.

11:23am: And now, Intro To Writing Perl Documentation. I already know me some POD (Perl’s ‘Plain Old Documentation’ markup), but this won’t be entirely review.

11:11am: PhoneGap is a nice cross-platform, HTML5 platform. Not quite as fast or feature-complete as native development, but pretty good. This is almost certainly the way I’d go. I’d never have time for native mobile app development in my current role. Neat little demo of using an Android API key and a small Perl script to send an alert to the phone. Android and iOS do this quite similarly, we are told.

11:00am: Next up, Perl, Mobile App. Glue. Despite being smartphone-free myself, I’d love to mess with mobile apps and increase the joy of others. The Perl is essentially for web services, regarding phone apps. Perl will only run on jailbroken phones.

“PHP is Perl’s ugly, fat sister”. Ouch! (May be worth pointing out that ‘fat’ is not used spuriously, but refers to a rather bloated function namespace in the PHP language.)

He’s lapsing a bit into mobile app marketing right now.

10:57am: git will definitely be a commitment to learn, and is a fairly big paradigm shift from centralized version control systems like Subversion (SVN). Great talk though… I am certainly closer to Grokville than I was before. Maybe I can spend some time on #git IRC and hitchhike into town.

10:18am: It was a late one at the club last night, so the man snoring loudly in this git presentation will be forgiven. :)

8:57am: Git: a brief introduction is the first session I will be attending this morning. As an SVN user, I am already behind the times. Still, I’ve met quite a few folks still on SVN here. As Randal commented last night however, “those who swear by SVN, swear at SVN”. True, true.

I’ll be trying pretty hard to absorb the information at this talk, similar to yesterday’s CPAN talk, so I’m not sure how much will be written here about it.

a scene from Madison, WI

Day 1 (regular sessions)

4:26pm: CPAN session done. That was quite a brain dump, and I mostly kept up. Am I an expert CPAN uploader yet? Not quite. But this is a great head start. I thanked Brian for the session and told him it really ought to be standard at every Perl conference. Facility with CPAN is definitely a barrier to entry for contributing.

2:31pm: Brian d Foy is now giving a hands-on CPAN authors class. I may not be blogging this much, but doing what he says instead.

1:33pm: Lunch has been had, after which I got caught up in work-work for a bit. This is pretty common for programmers here, as the world doesn’t stop for conferences.

So, I am coming in late to 29 Ways To Get Started In open Source Today. The speaker, Andy Lester, the speaker, has written quite a bit on this topic, on his blog. He is also the author of ack among other things, and the maintainer of Perlbuzz blog. He preaches how low the bar to open source contribution is, including simply submitting bug reports, doing translations, adding to existing bug reports, closing tickets, and the like.

11:31am: I am now at a session entitled The Perl from Ipanema. I am here as an Antonio Carlos Jobim fan. Only partially kidding, there. And no, I am not intentionally avoiding the more technical sessions, it’s just happening that way so far.

I’ve long noticed that Latin America is huge on open source adoption and contribution. Brasil (do English speakers really need to use a ‘z’?) is no exception. It’s the 5th largest country in the world, and the 6th biggest economy (and rising).

And now, to Perl. The language’s usage growth in Brasil has mirrored its growth elsewhere… originally a tool reached for primarily by sysadmins, and a strong presence in biochemical research. There are 13 Perl Monger groups in the country now, and growing (3 new groups this year).

‘Only’ 24% of Brasilians in IT speak any English at all, and that’s often very little. Spoken language can certainly be a barrier when trying to work in a common computer language. The speaker’s English is impeccable though.

They give free courses in Perl at universities in Brasil. Not sure how common this is, but as a fellow American pointed out, we don’t do that here.

11:02am: Next up: There’s More Than One Way To Run A Project: The Apache Way. This talk relates to some of the points in the keynote, specifically that dictatorship is not the only way. The Apache Software Foundation actually shepherds many more projects than just the Apache web server. They take of legal issues, project trademarks, marketing, project governance, and the like. In short, the Apache Foundation provides at least one model for how to effectively run a large open source effort.

Amongst all Apache Foundation-governed projects (current count, 102), the average contribution rate is about 1 commit (to version control) every 4 minutes. Pretty impressive activity there. Subversion itself is an Apache project.

Companies and organizations may donate, but they may not join. Every member of the Apache Software Foundation is an individual.

10:33am: The first session I am attending is Get More Out Of Your Meetings. This is a collection of suggestions for keeping meetings swift and productive… running them almost like lightning talks (30-60 seconds per team lead, timed, for a daily update meeting). Cancel if you need to, don’t be shy. Don’t be late. Most of the suggestions seem obvious, yet, they bear reminding. I think we all continue to waste quite a bit of time at meetings.

9:29am: Now Michael Schwern, with the keynote. He takes the podium to the tune of the Star Wars theme. He’s focusing on the gender gap in the Perl community, and in open source. Other gaps, too… but gender being perhaps the most obvious. He suggests that there are more ‘Michaels’ here than women. Funny, but likely true. He’s doing a pretty good job of alternating the seriousness of the topic with jokes. He’s now comparing Kirk and Picard, and how the world has changed between generations of Star Trek. There is rarely a second to forget that we are at a geek conference here. :)

“Perl has become an aristocracy.” CPAN module maintenance is done by dictators, who pass the baton to the next dictators. Schwern prefers a meritocracy, despite being one of those dictators. Picards (meetings, collaboration, merit) are better than Kirks (mavericks).

I hope some folks are live streaming this. I can do it no justice. Schwern is funny, on point and brilliant. Handles metaphor deftly.

9:23am: Now Karen Pauley, Perl Foundation President.

“There are very few people in the world capable of working on the Perl 5 core.” I’d believe that. That’s a lot of C. That’s a lot of backward compatibility.

9:17am: Beer on the roof of the Pyle Center tonight, sponsored by Linode. Game Night and MST3K-style Movies tomorrow, sponsored by Cpanel.

9:05am: We’re getting started. Time to thank some sponsors (more than ever it seems?). JT Smith tells us that YAPC::NA has sold out this year, as YAPC::Asia does every year, apparently. There are 446 people here. I again have YAPC staff-shirt envy, as it is a mix of the Grateful Dead / Perl Onion logo.

They have cool raffles going this year, to raise money for The Perl Foundation. I put in $5 for a chance at ‘Lunch with Larry (Wall)’. Wish me luck!

a scene from Madison, WI

Cataloguing Perl Module Dependencies

So you have a bunch of Perl scripts and modules you’ve written over time, and you’d like to quickly determine what non-core Perl modules are required to make them run. Devel::Modlist to the rescue.

$ perl -d:Modlist=nocore some-funky-script
Module::ExtractUse 0.27
Module::ExtractUse::Grammar 0.25
Parse::RecDescent 1.967009
Pod::Strip 1.02
version::vxs 0.95
$

Unfortunately Devel::Modlist is not itself a core module, but it’s an easy install since it doesn’t have any dependencies of its own. The way it works is pretty clever. Rather than trying to parse the Perl scripts and modules, looking for “use” statements, it loads the code into the Perl debugger. This is a smart approach since the common wisdom is only perl can parse Perl.

But this is not a perfect solution. For one, this only works for code that already runs on your system, so you can’t use it to quickly list what modules are needed for some bit of Perl you just downloaded. It also cannot find and report on modules that get loaded dynamically, as would happen if you do something like:

if ($magicRequired) { require Magic; castSpell(); }

And finally this module is not designed to be used from inside of a Perl script since it would cause the dependency reporting every time that script is executed. The shell command line example shown above is the normal usage model for this module.

But even with those caveats, this is still a handy little tool to have in your Perl toolbox.

My Premature Optimization Problem

Many of you know the adage; but learning it anew can still be fun.

I had to trim leading and trailing whitespace in Perl. So:

# strip any leading or trailing whitespace
$string =~ s/^\s+//;
$string =~ s/\s+$//;

No prob.

Then I made the mistake I often make, and started thinking. Two lines for that? Please. That can be done in one line, especially in Perl. So I found this:

$string =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;

Yuck! Oh well, at least it probably runs faster.

Think again.

Or better yet, Marcus, just stop thinking altogether. This one-liner idea would have made the routine both slower and less readable for the next person (notice I forgot the #comment the second time).

Panorama theme by Themocracy