Sunday, May 25, 2008

Clever Code

One criticism of opponents of Perl is that it is a "write-only" language - meaning that once the code is written, it is extremely difficult to maintain because it is difficult to understand upon re-examination. As with many criticisms, this should be aimed at those undisciplined developers who are writing the code, and not their tool of choice.

Having said that, I think it is also fair to say that Perl makes it very easy to write difficult-to-decipher code. This is the double-edged sword which is the shorthand Perl gives us to be very expressive in a small amount of space. A negative application of this is obfuscated Perl (where the author intentionally makes his code difficult to read), while a more positive application is the craft of creating Perl "one-liners" (trying to include a great deal of functionality in a single line of code). A one-liner can be a powerful weapon in the arsenal of a system administrator.

As a side note, I don't think obfuscated code is inherently evil. I think the Obfuscated Perl Contests have some pretty nifty entries, and when I spent time writing a couple obfuscated perls of my own, I learned a great deal about the Perl parser, so it was a good learning experience as well as fun.

Back to the point, listify() is a subroutine I wrote a while back that takes a reference to an array as its first argument, and a number n as its second argument, and transforms the original array into an array of arrays each having no more than n elements.

Implementation of listify() is here:

sub listify {
   my ($aref,$cc) = @_;
   if( ref $aref eq 'ARRAY' && $cc > 0 ) {
       my $j;
       for(my $i=0; $i<=$#$aref; $i+=$cc) {
           push @$j, [@$aref[$i..$i+$cc-1]];
       }
       $#{$j->[$#{$j}]}=$#$aref%$cc;
       @$aref = @$j;
       return 1;
   }
   return;
}
The point of the routine was to take very long lists of e-mail addresses that we use to notify customers of upcoming changes, and break them apart into smaller recipient lists of reasonable size that can be handled in a single outgoing e-mail.

Given that:

@my_array = ( 'one', 'two', 'three', 'four', 'five', 'six', 'seven',
'eight', 'nine', 'ten' );
listify(\@my_array, 4) transforms @my_array into this:

@my_array = (
   [ 'one',  'two', 'three', 'four'  ],
   [ 'five', 'six', 'seven', 'eight' ],
   [ 'nine', 'ten' ],
);
So, the line that I'm citing as my "clever" line is this one:

$#{$j->[$#{$j}]}=$#$aref%$cc;
Full disclosure: I was pretty impressed with myself for this one at the time, I guess because I was jazzed to have written something so cryptic. And since it was a tiny part of a not-often-used routine, I wasn't worried about maintainability.

The purpose of this line of code is to truncate the final array so that, using the above example, we don't end up with this instead:

@my_array = (
   [ 'one',  'two', 'three', 'four'  ],
   [ 'five', 'six', 'seven', 'eight' ],
   [ 'nine', 'ten', undef,   undef   ],
);
So, what can be done to make this line more readable?

$#{$j->[$#{$j}]}=$#$aref%$cc;
One thing it's missing is whitespace to separate the different parts:

$#{ $j->[ $#{$j} ] } = $#$aref % $cc;
$#array yields the final index of @array. So, $#$array is the notation we'd use if $array is a reference to an array. $#{$array} is the same as $#$array, so we can reduce $#{$j} accordingly.

$#{ $j->[ $#$j ] } = $#$aref % $cc;
We can't do the same with the first $# because of the -> dereference following $j, which is evaluated first.

There's no reason $j has to be an array reference. It can just as easily be an array. We can also give the variables better names while we're at it.

$#{ $result_array[$#result_array] } = $#$in_aref % $elements_per_array;
Often, a line of code can benefit from being more than one line of code. Let's try this:

my $final_aref = $result_array[$#result_array];
$#$final_aref = $#$in_aref % $elements_per_array;
Now we still have our ugly $#$ but at least there's less going on. Then we can break off another piece and add some good whitespace, and a comment for good measure.

my $final_aref        = $result_array[ $#result_array ];
my $elements_in_final = $#$in_aref % $elements_per_array;

# Truncate final array
$#$final_aref = $elements_in_final;
This is certainly much easier to read than our original:

$#{$j->[$#{$j}]}=$#$aref%$cc;
Applying a few of the same principles to the rest of the routine, we get this:

sub listify {
   my ( $in_aref, $elements_per_array ) = @_;

   return if (
       ref $in_aref ne 'ARRAY' or
       $elements_per_array <= 0
   );

   my @result_array;
   for( my $i = 0; $i <= $#$in_aref; $i += $elements_per_array ) {
       push @result_array, [
           @$in_aref[ $i..$i + $elements_per_array - 1 ]
       ];
   }
   my $final_aref        = $result_array[ $#result_array ];
   my $elements_in_final = $#$in_aref % $elements_per_array;

   # Truncate final array
   $#$final_aref = $elements_in_final;
   @$in_aref = @result_array;
}
Much better. I realize there are lots of elegant solutions out there that I would happily break into multiple lines to the original developer's horror. For me, elegance is less importan than clarity. But, breaking up some lines can increase execution time, and that's certainly a factor to be weighed.

Clever code is fun to write, but it has no place in a production environment.

In my opinion, the best reference for writing maintainable code is Perl Best Practices by Damian Conway. It's Perl-centric but much of the advice can be taken into other languages. The module Perl::Critic is intended to critique code against the standards set forth in this book.

You can also "perldoc perlstyle" and check out perltidy for automated formatting.

No comments: