Optimizing Perl code

From AJS.COM

Jump to: navigation, search

Perl can be a very quirky language, and I've spent many years learning how to optimize complex programs. I'll try to share as much of that knowledge as I can, here, and use this is a scratchpad for noting future tricks that I find.

Contents

Simple stuff

These are going to be the obvious bits, but I'll go over them here just in case anyone missed them:

  • Don't call eval with a string if you can help it. That includes s/.../.../e as well as regular eval. If all you need to do is catch runtime exceptions, used the brace-delimited version of eval isntead.
  • Use -w or use warnings while developing and debugging, but take it out for performance turning.
  • Never use goto with a label (goto with a function name is another beast entirely).

Regular expressions

Over-specification

Be very careful with regular expressions. It's very easy to craft what looks like a reasonable regex and have it turn out to be the slowest thing about your code! Here's an example:

 while(<>) {
   if (/^.*foo/) {
     ...
   }
 }

This looks OK to many, but is actually fatally flawed. Imagine the work that it does on the input:

 foobarbizbaz

First, the .* will eat up the entire string, and then fail to match "foo". Then it will backtrack one character and try again, and repeat that over and over again until it finds the leading foo. Instead you could just remove the "^.*" and the code would behave exactly the same, but would execute much faster! Never over-specify a regexp like this unless you are sure that doing so will give you a performance improvement because of a bizarre special case in your data.

Careful use of non-greedy matching

Let's say you want to match the string inside the quotes in:

a "quick" brown fox

You could do this:

/"(.*)"/

But that will require quite a lot of backtracking. If you used a non-greedy match instead:

/"(.*?)"/

Then you have much less work to do. Always specify the text you want to match exactly if you know what it will look like:

/"([^"]*)"/

Which is even faster because it eats all of the non-quote characters and immediately matches the following quote.

Scoping issues

Scoping is expensive in Perl. There are dozens of ways that you can improve performance, knowing that, but often at the cost of readability and maintainability. Tread these woods at your own risk, and use common sense in making these decisions.

Create all lexicals at once

When entering a subroutine, you can create the variables that will hold your parameters like this:

 sub foo {
   my $a = shift;
   my $b = shift;
   my $c = shift;
   ...
 }

But it turns out that that's horribly slow. Not only do you have to perform 3 scoping operations, but you modify the parameter list 3 times as well. Instead, you can do this:

 sub foo {
   my($a, $b, $c) = @_;
   ...
 }

Which never modifies the parameter list and only performs one scoping operation (on three variables). Not naming parameters can be even faster, but at the cost of much readability.

Create lexicals outside of inner loops

Here's a spot I get myself into quite often:

 for(my $i=0;$i < 10_000;$i++) {
   my $a = $i*2;
   for(my $j=0;$j<$a;$j++) {
     my $b = sqrt($j);
     my $c = $j**2;
     foo($a,$b,$c);
   }
 }

It turns out that each of those scoping operations performed by the my operator are somewhat expensive, and performing them over and over again in an inner loop is a big hit to performance. Instead, this is much more efficient:

 my($a,$b,$c, $i,j);
 for($i=0;$i < 10_000;$i++) {
   $a = $i*2;
   for($j=0;$j<$a;$j++) {
     $b = sqrt($j);
     $c = $j**2;
     foo($a,$b,$c);
   }
 }

Just be careful that you re-initialize your values on every execution of the loop.

Prefer my to our

I'm not sure why, but in my testing, my is always more efficient than our, and by a rather measurable amount. Thus, if you find yourself needing global variables, only use our when you're sure that someone will need to access the variable from outside of your package (e.g. you'll be exporting it).

Subroutines

Other than scoping issues, subroutine calls are just expensive in their own right. Consider inlining subroutines by hand, where possible. If it's not possible, move calls to subroutines outside of inner loops if you can.

Recursion

Manual stack management is almost always preferable to recursion in Perl due to the high overhead of subroutine invocation. For example:

sub fib {
  my($a,$b) = @_;
  $a = 1 unless defined $a;
  $b = 1 unless defined $b;
  print $a+$b, "\n";
  fib($b,$a+$b);
}

becomes:

sub fib {
  my($a,$b) = (1,1);
  for(;;) {
    print $a+$b, "\n";
    ($a,$b) = ($b,$a+$b);
  }
}

or if you need to preserve the whole stack, say for tree walking:

sub find_a {
  # find an "a" in a tree datastructure with fields named:
  #  data - the payload we're searching on
  #  name - an arbitrary node name
  #  children - array ref of child nodes
  my($tree) = @_;
  my @stack = ($tree);
  while ((my $node = pop @stack)) {
    if ($node->{data} eq 'a') {
      print "Found an 'a' node at $node->{name}\n";
      return $node;
    }
    push @stack, @{$node->{children}} if $node->{children};
  }
  print "No 'a' children found\n";
  return undef;
}
Personal tools