My Premature Optimization Problem

Many of you know the adage; but learning it anew can still be fun.

I had to trim leading and trailing whitespace in Perl. So:

# strip any leading or trailing whitespace
$string =~ s/^\s+//;
$string =~ s/\s+$//;

No prob.

Then I made the mistake I often make, and started thinking. Two lines for that? Please. That can be done in one line, especially in Perl. So I found this:

$string =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;

Yuck! Oh well, at least it probably runs faster.

Think again.

Or better yet, Marcus, just stop thinking altogether. This one-liner idea would have made the routine both slower and less readable for the next person (notice I forgot the #comment the second time).

5 Responses to “My Premature Optimization Problem”

  1. Paul Sand says:

    I think the ‘g’ modifiers in the first snippet may be superfluous. Or am I missing something?

  2. Yes, yes they are! I think an elf put them there. I will get rid of them.

  3. Bill Costa says:

    Well I use to always do this:

    $str =~ s/^\s+|\s+$//g;

    but a long time ago, I too also idly wondered if this might be expensive. I found a very similar article here:

    Before you follow that link, I’ll just summarize that the author demonstrates using the Perl Benchmark module to actually time several different methods of trimming strings. Also note that the author is doing what I call string ‘compression’ which is also squeezing embedded sequences of multiple spaces into one. I generally don’t do this since, depending upon the context, the extra spaces may be meaningful. It all depends upon the kind of data you’re munging.

    In any case, to get back to the subject of premature optimization, a long time ago I was taught the 3 rules for code optimization:

    – The 1st rule of optimization: Don’t do it.

    – The 2nd rule (for experts): Don’t do it yet.

    – The 3rd rule of optimization: Get performance metrics and determine where to optimize.

    Now at first glance it may appear that the above author’s wonderful little program is indeed following the 3rd rule, but he isn’t. He is comparing the speed of different types of string trimming, on both short and long strings. But it is unlikely that is all his application is doing. The way to optimize is to profile the *entire* application and start looking at optimizing where the profiler says you are spending the most time.

    In any case, once you *do* know what method is faster, there is no harm in using it, so long as human readability doesn’t suffer — as Marcus has already pointed out. So as it turns out, for *just* trimming leading and trailing blanks:

    $str =~ s/^\s+//;
    $str =~ s/\s+$//;

    comes in a close second for short strings, and 1st for long strings, as verses the more obscure:

    join ‘ ‘, split ‘ ‘, $str;

    which is a tad faster for short strings but comes in 2nd for long strings. And it turns out my one liner above is much slower for short strings and *way* slower for long strings.

    And for the record, based on these tests, here is what I would pick for compression:

    $str =~ s/\s+/ /g;
    $str =~ s/^\s//;
    $str =~ s/\s$//;

    While the join/split trick does win pretty big for short strings in this case, it is about the same for long strings, so I’d still opt for the readability of this triplet.

    BTW — Almost forgot to check that complex one liner:

    $string =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;

    Oddly enough, it comes in dead last for short strings, but comes in second for long strings. But the simpler 2 liner still beats it by a significant margin. The thing is, I don’t think I could tell you how that one liner regular expression even works!

  4. Ed Sawyer says:

    This reminds me of Paul’s post here:

    where he quotes:

    Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

    I feel that way about this sort of thing above. (puts flame-resistant suit on)… it’s nice to be able to use higher-level languages where we have nice functions like ltrim() and rtrim() and don’t have to worry about RegEx for simple stuff like trimming spaces from strings. ;-)


  5. Not being a PHP guy, I had to go confirm that ltrim() and rtrim() are indeed PHP keywords. I am half proud, half embarrassed.

    PHP and Perl are both considered ‘high level languages’. I believe most draw the high/low line on whether or not your language requires explicit memory management. But I could also see the keyword/built-in function count as another meaningful measure of ‘how high-level?’.

    PHP does have far, fare more built-in functions than Perl does. With Perl you are more likely to pull in a module from CPAN to add some additional functionality. I might have done that here, but the task seemed too small. By posting about it, I probably have made it seem more laborious than it was. :)

Leave a Reply

Panorama theme by Themocracy