Friday, July 11, 2008

Why Dollar-Underscore Should Be Avoided

This bit me recently, so I wrote it in a condensed script to share with you fine people:

#!/usr/bin/perl

use strict;
use Data::Dumper;

my @my_array = ( 'one', 'two', 'three' );

for (@my_array) {
    print "Processing: {$_}\n";
    unknown_author::his_proc();
    print "\$_ is now toast {$_}\n";
}

print Data::Dumper::Dumper( \@my_array );

package unknown_author; # Hypothetically, someone else's module

sub his_proc {
    open I, $0;
    while (<I>) { # But $_ has been aliased to $my_array[index]
        # Doesn't even matter what's inside - the damage has been done
    }
    close I;
}
Output:

Processing: {one}
$_ is now toast {}
Processing: {two}
$_ is now toast {}
Processing: {three}
$_ is now toast {}
$VAR1 = [
          undef,
          undef,
          undef
        ];
Solution:

1. You can control this by changing your own code:

for my $item (@my_array) {
    print "Processing: {$item}\n";
    unknown_author::his_proc();
    print "\$item is fine: {$item}\n";
}
But in your modules/routines:

2. Don't modify $_ unless you know for sure what's in it.

3. If you feel you must modify $_ in your routine, first say:

local $_;

As in:

sub his_proc {
    local $_;

    open I, $0;
    while (<I>) { # This version of $_ is mine now
        # Now I can do whatever I want with $_
    }
    close I;
}
Remember that $_ is global!

Incidentally, this is covered in Perl Best Practices on page 85, "Dollar-Underscore", as the rule: "Beware of any modification via $_." I like my example better because it is more concise and it happened to me. Also, instead of "be careful", I think we should just avoid $_ altogether.

$_ offers a lot of conciseness, but it has large disadvantages in production code, such as introducing subtle bugs, and reducing readability (adding to the Perl-looks-like-line-noise meme). In addition, clarity suffers when using it as the default and unspecified argument to Perl builtins, such as print.

Save it for the one-liners!