Dijkstra’s Revenge

Wired posts the actual code at fault in the recently-revealed vulnerability in Apple’s iOS implementation of SSL:

And I remember enough C to recognize the problem. But I had a number of reactions of varying seriousness:

  1. “Hey, maybe the programmer thought that it would really work if he did it twice.”
  2. The late E.W. Dijkstra was right: go-to’s really are harmful. (There’s a good Wikipedia entry on “Considered Harmful”)
  3. What would have helped? Enabling compiler warnings about unreachable code? That try/catch thing? Automatic code indentation? Always using the curly braces?

Quote du Jour

“So, you have a problem. You say “I know, I’ll use floating point!” Now you have 2.0001341678 problems.”

(Source.)

Reading Schedule Generator

Or: some stuff I worked on over break.

This is a small solution to (some would say) an even smaller problem: I used to ignominiously return unread books to the library, and face new issues of magazines after having only read a few items in the previous one. My to-be-read pile was only getting taller.

I found I was able to read more “reliably” if I had a specific daily page goal for each item. Somehow having an “official” schedule helped me overcome my affliction.

I’ve been doing this for a few years, and have written code to help out. (Certainly there must be a name for the psychological quirk that compels me to “write code to help out”. Let me know if you come across it.)  What follows is my current “solution”: a CGI-backed web form that accepts the schedule parameters for a single reading item (book or magazine) and generates a small HTML calendar schedule that I can print from a web browser, cut out, and use as a bookmark for the item.

This seems conceptually easy, if not trivial, so I imagine similar code is “out there somewhere”, but cursory Googling didn’t turn up anything.

Anyway: the finished product appears like this (specific example is the book Freedom™ by Daniel Suarez):(Not that I recommend this book: it's pretty bad so far.)

Yes, cut on the dotted line. So my goal for today (January 6) is to get up to (or past) page 127. And I should be finished in a couple weeks.

Now I’m not psycho about this: it’s OK to get ahead of the goals if the material is compelling and I have the time. It’s also OK to fall behind if there’s just too much other stuff going on. (However, in my experience, just knowing that I’ve “fallen behind” is just enough motivation to carve out some extra time to catch up later.)

Enough for the mechanics, on to the code. For a long time I ran a Perl script from the command line to generate an HTML schedule as a static file, which I would then open/print in a browser. But over break I decided that it would be (slightly) more flexible to do the whole thing from the browser, using an HTML form to get the schedule parameters, calling a CGI backend to display the result.

The form is here (hosted on my workstation). What it looks like:

It’s a very rudimentary form, with few bells or whistles. Well, OK, one bell: the Javascript jQuery UI datepicker widget for getting the schedule’s start and end dates. I didn’t know nothin’ about jQuery before I did this. (And I only know slightly more now; if you examine the source code for the form, it’s not very complicated.)

So you fill out the form. Using our example (and showing the datepicker in action):

… hit the submit button and the resulting page should produce the appropriate schedule. (I’m pretty sure it would work for you if you want to try it.)

The real work is performed by the Perl CGI script, which relies heavily on the smarts contained in the CGI, Date::Manip, and HTML::Template modules. (What’s left: some date arithmetic that’s only tricky if you’ve done it wrong a few times. I think it handles edge cases and DST changes correctly, although I haven’t checked lately.)  If you’d like to look at the script, the plain-Perl file is here, and the html-prettyprinted version is here. The HTML template used by the script is also important and that’s here.

out of sorts

Not that it matters, but while reading through the sort(1) man page, I noticed a new (to me) option:

-R, –random-sort
sort by random hash of keys

Yes, newer versions of sort will actually shuffle your input data.I’m not sure if that’s a cool thing for a command named sort to do, but I like it anyway.

A quick test (on Red Hat 6) shows that it really is random: you don’t get the same shuffle each time.

Good replacement for the Perl one-liner:

perl -MList::Util -e ‘print List::Util::shuffle <>’

(Added later: many versions of Linux include the “shuf” command.)

Unslow Regexps

Our mail logs accumulate a few million lines per weekday. Some of them contain information on whether SpamAssassin considered a given message to be spam:

Jun 11 23:58:15 sunapee MailScanner[7997]: Message q5C3vVli019865 from 71.243.115.147 (info@softwareevals.org) to unh.edu is spam, SpamAssassin (not cached, score=15.565, required 5, BAYES_00 -1.90, DIGEST_MULTIPLE 0.29, HELO_DYNAMIC_IPADDR 1.95, HTML_MESSAGE 0.00, KHOP_DNSBL_BUMP 2.00, MIME_HTML_ONLY 0.72, PYZOR_CHECK 1.39, RAZOR2_CF_RANGE_51_100 0.50, RAZOR2_CF_RANGE_E8_51_100 1.89, RAZOR2_CHECK 0.92, RCVD_IN_HOSTKARMA_BL 1.70, URIBL_BLACK 1.73, URIBL_JP_SURBL 1.25, URIBL_RHS_DOB 1.51, URIBL_WS_SURBL 1.61)

or “ham” (not spam):

Jun 11 23:59:54 sunapee MailScanner[7634]: Message q5C3xp77020291 from 208.117.48.80 (bounces+54769-353f-cfg6=cisunix.unh.edu@email.news.spotifymail.com) to cisunix.unh.edu is not spam, SpamAssassin (not cached, score=-2, required 5, autolearn=not spam, BAYES_00 -1.90, HTML_MESSAGE 0.00, RCVD_IN_HOSTKARMA_NO -0.09, SPF_PASS -0.00, T_RP_MATCHES_RCVD -0.01)

(Lines wrapped for clarity.)

Perl scripts scan through the logs and produce plots of ham/spam traffic, typical example here. Such scripts need to (for lines like the above) extract the date/time information (always the first 15 characters) and whether the line says “is spam” or “is not spam”. My initial approach (years ago) was very simple-minded. Old code snippet:

while (<>) {
    if (/is spam/ || /is not spam/) {
        $date = ParseDate(substr($_, 0, 15));
        $dt = UnixDate($date, "%Y-%m-%d %H:00");
        if (/is spam/) {
             $spamcount{$dt}++;
        else {
             $hamcount{$dt}++;
        }
    }
}

(ParseDate and UnixDate are routines from the Date::Manip package. Without getting bogged down in details, they allow the messages to be counted in hour-sized buckets for the plot linked above.)

For not-particularly-important reasons, I decided to try an “all at once” regular expression on each line instead. New code snippet:

use Readonly;
use Regexp::Common qw/net/;

Readonly my $SPAMLINE_RE => qr {
    ^
    (\w{3} \s+ \d+ \s \d{2}:\d{2}:\d{2}) \s
    \w+ \s MailScanner \[\d+\]: \s
    Message \s \w+ \s
    from \s ($RE{net}{IPv4}) \s
    \([^)]*\) \s to \s \S+ \s
    (is \s (?:not \s)? spam)
}x;

while (<>) {
    if ( $_ =~ $SPAMLINE_RE ) {
        my $d        = $1;
        my $spamflag = $3;
        my $date     = ParseDate($d);
        my $dt       = UnixDate( $date, '%Y-%m-%d %H:00' );
        if ( $spamflag eq 'is spam' ) {
            $spamcount{$dt}++;
        }
        else {
            $hamcount{$dt}++;
        }
    }
}

There’s plenty to criticize about both snippets, but mainly I was thinking: running the second version’s huge, hairy regular expression on each log line will just have to be way slower than the simple fixed-string searching in the first version. (Even though there can be up to three searches per line in the first version, they’re searches for fixed strings, right? Right?)

Wrong, as it turns out. The hairy second version ran slightly faster than the simple-minded first version. (In addition it extracts the IP address of the server that sent us the message; not used here, but important in other scripts.)

Humbling moral: even after many years of Perl coding, my gut feelings about efficiency are not to be trusted.

Passphrase Security

Bruce Schneier has a recent post about a new research paper that seems to throw a little bit of cold water on the obvious superiority of passphrases over passwords. Schneier has a pointer to the paper and a less-formal blog summary. The bottom line seems to be: users can choose poor “easily guessed” passphrases, and left to their own devices, they probably will. As usual with Schneier’s blog, many of the comments to the post are insightful and worth reading.

It seems that it might also be much more difficult to check the “quality” of a passphrase than a password. You’d like to be able to say things like: “Maybe you shouldn’t use Psalm 23:1 (King James Version) as a passphrase.”

Grammatically-Correct Random Pass Phrase Generator (in Perl)

I change my passwords every 3 months, but my passphrases are getting kind of stale. The arugment for changing your passphrases is about the same as changing your password. Being lazy (but curious), I went looking for a passphrase generator similar in philosophy to the apg password generator that I’ve been using for a long time.

I didn’t find anything comparable to apg, but I came across a blog post by Curtis Copley describing his algorithm for generating grammatically-correct random passphrases, a neat idea. He provided source code in PL/SQL and Java. Both painful languages for me. How about a Perl translation? Looking at the code, I was discouraged: “This is impossible.”

But then, looking closer at the algorithm: “Hey, maybe not that bad.”

So I coded up a Perl implementation. As far as I can tall, it’s accurate. If you’d like to look, more information and the code is here.

Still haven’t changed my passphrases, though.

Assumptions

From Stack Overflow:



I am the developer of some family tree software (written in C++ and Qt). I had no problems until one of my customers mailed me a bug report. The problem is that he has two children with his own daughter, and he can’t use my software because of errors.

Those errors are the result of my various assertions and invariants about the family graph being processed (for example, after walking a cycle, the program states that X can’t be both father and grandfather of Y).

How can I resolve those errors without removing all data assertions?

A Joke At Which I Laughed

A  wife asks her programmer husband, “Could you please go shopping for me and buy one carton of milk, and  if they have eggs, get 6.”

A short time  later the husband comes back with 6 cartons of  milk.

The wife asks him, “Why the hell did you buy 6  cartons of milk?”

He replied, “They had  eggs.”

Nine traits of the veteran Unix admin

I could not help but read an article with a title like “Nine traits of the veteran Unix admin“. It’s good! And not just because the first two traits are ones I bitterly cling to myself!

Can’t resist commenting on this one:

Veteran Unix admin trait No. 3: We wield regular expressions like weapons

“…and sometimes wind up shooting ourselves in the foot.” I was reminded of a Jamie Zawinski quote:

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

And there’s this one:

Veteran Unix admin trait No. 5: We prefer elegant solutions

Well, who doesn’t? Let’s add: “…and we’re darn good at deluding ourselves that our solutions are elegant.”

Panorama theme by Themocracy