Tinypig Blog

Tuesday, December 13, 2016

Heap's Algorithm and Generating Perl Code From Pseudocode

I've been researching recursion lately and in particular, permutations algorithms. This interest was spurred by a real-life case where such an algorithm would come in handy (combinations of @clients, @users, @tickets). I came across Wikipedia's entry for Heap's algorithm and the pseudocode illustrating the algorithm. I found the non-recursive version even more interesting specifically for its lack of needing to call itself, so I chose that version of the algorithm to study.

I thought it would be an interesting exercise to convert the pseudocode to Perl code. While I was converting it I was struck by how closely the Perl code could be made to look like the pseudocode, but also how easy idiomatic Perl made it to skip certain parts of the pseudocode altogether, so I wrote two separate Perl implementations of the algorithm: one to mirror the pseudocode as closely as possible, and one to cut corners using idiomatic Perl.

First, the "pseudocode" version:

 sub output {
     my (@to_print) = @_;
     print join( ',', @to_print ), "\n";
 }

 sub is_even {
     return $_[0] % 2 ? 0 : 1;
 }

 # By referring to $_[0] and $_[1] instead of assigning their values to other
 # variables, we can force pass-by-reference here so that the change will
 # impact @A directly even though we are in a separate function outside its
 # scope.
 sub swap {
     ($_[0],$_[1]) = ($_[1],$_[0]);
 }

 sub generate {
     my ($n, @A) = @_;
     my @c;

     for ( my $i = 0; $i < $n; $i += 1 ) {
         $c[$i] = 0;
     }

     output(@A);

     my $i = 0;
     while ( $i < $n ) {
         if ( $c[$i] < $i ) {
             if ( is_even($i) ) {
                 swap( $A[0], $A[$i] );
             }
             else {
                 swap( $A[ $c[$i] ], $A[$i] );
             }
             output(@A);
             $c[$i] += 1;
             $i = 0;
         }
         else {
             $c[$i] = 0;
             $i += 1;
         } # end if
     } # end while
 }

 generate( 3, 'work', 'sleep', 'play' );

Output:

work,sleep,play
sleep,work,play
play,work,sleep
work,play,sleep
sleep,play,work
play,sleep,work

Next, the idiomatic Perl version:

 sub output {
     print join( ',', @_ ), "\n";
 }

 sub generate {
     # We don't need to pass n here because we have @A. That's not at all
     # unique to Perl of course but the cool part comes later...
     my (@A) = @_;

     # I don't need to specify the length of array @c because as soon as we
     # refer to an element, it exists. I don't need to initialize @c or $i
     # because as soon as we start performing math on their values they will be
     # assumed to start at zero.
     my (@c,$i);

     output(@A);

     # The cool part: we can refer to the length of the @A array as simply @A
     # in scalar context.
     while ( $i < @A ) {
         if ( $c[$i] < $i ) {
             # Test for is_odd by seeing if modulo 2 of $i is non-zero.
             # Since we check for is_odd vs is_even, we swap the code in the
             # if-else.
             if ( $i % 2 ) {
                 # The swap function was handy but idiomatic Perl allows us to
                 # swap variables in place
                 ( $A[ $c[$i] ], $A[$i] ) = ( $A[$i], $A[ $c[$i] ] );
             }
             else {
                 ( $A[0], $A[$i] ) = ( $A[$i], $A[0] );
             }
             output(@A);
             # Nitpicky but it's nice to have ++ instead of += 1. Again, not
             # limited to Perl.
             $c[$i]++;
             $i = 0;
         }
         else {
             $c[$i] = 0;
             $i++;
         }
     }
 }

 generate( split '', 'abc' );

Output:

a,b,c
b,a,c
c,a,b
a,c,b
b,c,a
c,b,a

In what ways could the program be further reduced?

EDIT: A change by an anonymous commenter makes this even more compact:

 sub output {
     print join( ',', @_ ), "\n";
 }

 sub generate {
     my (@A) = @_;

     output(@A);

     while ( $i < @A ) {
         if ( $c[$i] < $i ) {
             my $x = $i % 2 ? $c[ $i ] : 0;
             ( $A[ $x ], $A[$i] ) = ( $A[$i], $A[ $x ] );
             output(@A);
             $c[$i]++;
             $i = 0;
         }
         else {
             $c[$i] = 0;
             $i++;
         }
     }
 }

 generate( split '', 'abcd' );

Monday, September 26, 2016

Tattletale Variables

Sometimes you might be faced with a huge program that, somewhere, is changing a variable's value to something undesired.

 use Data::Dumper;

 sub some_long_faraway_function {
     my $href = shift;
     # Pretend there's a lot of code here I don't want to sift through
     $href->{bananas} = 'some bad value';
 }

 my $shopping_list = {
     apples => 1,
     pears => 3,
     bananas => 5,
 };
 some_long_faraway_function($shopping_list);
 warn Dumper(\$shopping_list);

Output:

$VAR1 = \{
            'apples' => 1,
            'bananas' => 'some bad value',
            'pears' => 3
        };

You don't know where it's being changed, but you need to find out. Change the variable so that it tells you where it's being changed.

 package TattletaleScalar;
 use Carp qw(cluck);
 require Tie::Scalar;
 our @ISA = qw(Tie::StdScalar);
 sub STORE {
     warn "TATTLETALE variable set to {$_[1]}";
     cluck();
     ${$_[0]} = $_[1];
 }

 package main;

 use Data::Dumper;

 sub some_long_faraway_function {
     my $href = shift;
     # Pretend there's a lot of code here I don't want to sift through
     $href->{bananas} = 'some bad value';
 }

 my $shopping_list = {
     apples => 1,
     pears => 3,
     bananas => 5,
 };
 my $tmp = $shopping_list->{bananas}; # Save current value
 tie $shopping_list->{bananas}, 'TattletaleScalar';
 $shopping_list->{bananas} = $tmp; # Restore saved value
 some_long_faraway_function($shopping_list);
 warn Dumper(\$shopping_list);

Now we can see the stack every time the variable is changed:

TATTLETALE variable set to {5} at example.pl line 7.
at example.pl line 8.
    TattletaleScalar::STORE(TattletaleScalar=SCALAR(0x7fac290d3260), 5) called at example.pl line 29
TATTLETALE variable set to {some bad value} at example.pl line 7.
at example.pl line 8.
    TattletaleScalar::STORE(TattletaleScalar=SCALAR(0x7fac290d3260), "some bad value") called at example.pl line 19
    main::some_long_faraway_function(HASH(0x7fac29026508)) called at example.pl line 30
$VAR1 = \{
            'apples' => 1,
            'pears' => 3,
            'bananas' => 'some bad value'
        };

Monday, September 12, 2016

Build Your Memory Palace

Earlier this year, I gave a talk, "Building Your Memory Palace." Here are a few notes about the presentation:

The presentation itself
The original slides
Wikipedia has a good page on Method of loci
A Google search yields some good How-To videos
HowStuffWorks has a good write-up

Wednesday, August 26, 2015

Using Dispatch Tables To Improve Application Security

Update: I have changed the title to "Using Dispatch Tables To Improve Application Security" for clarity.

At a previous job, I saw some code that asked the user which function they wanted to run and then executed a subroutine with that name. This code demonstrates why such a practice is bad:

 use strict;
 use warnings;

 sub greet            { print "Hello!\n"       }
 sub inquire          { print "How are you?\n" }
 sub bye              { print "Farewell!\n"    }
 sub delete_all_files { print "*KABOOM*\n"     }

 sub insecure_call {
     no strict 'refs';
     shift->();
 }

 insecure_call('greet');
 insecure_call('inquire');
 insecure_call('bye');
 insecure_call('delete_all_files');

Output:

Hello!
How are you?
Farewell!
*KABOOM*

One solution to this is the dispatch table. With a dispatch table, you define up front which calls are legal for an outsider to make:

 use strict;
 use warnings;

 my %dispatch = (
     greet   => \&greet,
     inquire => \&inquire,
     bye     => \&bye,
 );

 sub greet            { print "Hello!\n"       }
 sub inquire          { print "How are you?\n" }
 sub bye              { print "Farewell!\n"    }
 sub delete_all_files { print "*KABOOM*\n"     }

 sub secure_call {
     my $call = shift;
     if ( ref $dispatch{$call} eq 'CODE' ) {
         $dispatch{$call}->();
     }
     else {
         warn "Invalid call $call";
     }
 }

 secure_call('greet');
 secure_call('inquire');
 secure_call('bye');
 secure_call('delete_all_files');

Output:

Hello!
How are you?
Farewell!
Invalid call delete_all_files at example_2a line 22.

The thing that bugs me about this particular solution (and I'll admit it's minor) is the repetition:

 my %dispatch = (
     greet   => \&greet,
     inquire => \&inquire,
     bye     => \&bye,
 );

To me, this reads like:

To go to greet, type 'greet'.
To go to inquire, type 'inquire'.
To go to bye, type 'bye'.

When it could just be asking "Which function do you wish to use?"

So, we could build the dispatch table dynamically from a list of acceptable calls:

 use strict;
 use warnings;

 my %dispatch;
 my @valid_calls = qw( greet inquire bye );

 sub greet            { print "Hello!\n"       }
 sub inquire          { print "How are you?\n" }
 sub bye              { print "Farewell!\n"    }
 sub delete_all_files { print "*KABOOM*\n"     }

 sub build_dispatch_table {
     no strict 'refs';
     %dispatch = map { $_ => *{$_} } @valid_calls;
 }

 sub secure_call {
     my $call = shift;
     if ( $dispatch{$call} ) {
         $dispatch{$call}->();
     }
     else {
         warn "Invalid call $call\n";
     }
 }

 build_dispatch_table();
 secure_call('greet');
 secure_call('inquire');
 secure_call('bye');
 secure_call('delete_all_files');

 print "\nBut, now this works because of the typeglob *{}\n";

 our @greet = qw( This is an array );
 print "@{$dispatch{greet}}\n";

 print "which annoys me even though it's probably inconsequential\n";

Output:

Hello!
How are you?
Farewell!
Invalid call delete_all_files

But, now this works because of the typeglob *{}
This is an array
which annoys me even though it's probably inconsequential

In addition to the typeglob annoyance, there is still a little repetition there: greet, inquire and bye still appear more than once in the code. I don't actually find this to be a huge deal, but how might we solve those issues? One way is including the code itself in the dispatch table:

 use strict;
 use warnings;

 my %dispatch = (

 # Documentation for greet can go here
 greet =>
     sub {
         my $greeting = shift || 'Howdy!';
         print "$greeting\n";
     },

 # Documentation for inquire can go here
 inquire =>
     sub {
         print "How are you?\n";
     },

 # Documentation for farewell can go here
 farewell =>
     sub {
         print "Bye!\n";
     },


 );

 sub delete_all_files { print "*KABOOM*" }

 sub api {
     my $call = shift;
     if ( $dispatch{$call} ) {
         $dispatch{$call}->(@_);
     }
     else {
         warn "Not executing unknown API call $call\n";
     }
 }

 api('greet','Hello.');
 api('inquire');
 api('farewell');
 api('delete_all_files');

Output:

Hello.
How are you?
Bye!
Not executing unknown API call delete_all_files

One argument against this is it adds visual complexity to the code: it's one more layer that a new developer on the project would need to mentally parse before coming up-to-speed on the code. But, that may be minor, and I think these formatting choices are developer-friendly.

Friday, August 07, 2015

Accepting Input from Multiple Sources

One of the corners I often paint myself into when developing a tool is only accepting one type of input, usually STDIN, the standard input stream, like a pipeline (ex: cat fruit.txt | grep apple) or a redirect (ex: grep apple < fruit.txt)

What inevitably happens is I end up wanting the tool to work like any Unix tool and accept different kinds of input (filenames or arguments on the command line, for example.)

Finally I got fed up with it and added a function called multi_input() to my library. Here is how it works:

First, the setup:

$ cat >meats
chicken
beef
^D
$ cat >fruits
apple
orange
banana
^D
$ cat >vegetables
carrot
lettuce
broccoli
cauliflower
^D
$ cat >a.out
this is just my
default input file
^D

To illustrate use of the function, I just reverse the input to do something "interesting" with it. The operative code is:

 my $input = multi_input();
 my $reversed = reverse $input;
 print "$input\n";
 print "$reversed\n";

So now I can interact with the tool in a variety of ways, starting with my "usual" way, STDIN:

$ ./reverse.pl < vegetables
current_input_type is: STDIN
carrot
lettuce
broccoli
cauliflower

rewolfiluac
iloccorb
ecuttel
torrac

Or STDIN by way of a pipe (this is the same mechanism in the code, but just to give another example):

$ cat fruits | ./reverse.pl
current_input_type is: STDIN
apple
orange
banana

ananab
egnaro
elppa

Or filenames provided on the command line:

$ ./reverse.pl meats fruits
current_input_type is: FILEARGS
chicken
beef
apple
orange
banana

ananab
egnaro
elppa
feeb
nekcihc

Or input provided on the command line:

$ ./reverse.pl this is not a list of filenames
current_input_type is: ARGS
this is not a list of filenames
semanelif fo tsil a ton si siht

And finally, the ultimate lazy, my default input file a.out:

$ ./reverse.pl
current_input_type is: DEFAULT
this is just my
default input file

elif tupni tluafed
ym tsuj si siht

Here is the full code listing with comments:

 #!/usr/bin/perl

 use strict;
 use warnings;

 use Term::ReadKey; # for ReadMode() below

 sub multi_input {
     my $input = '';
     my $VERBOSE = 1;

     my %INPUT_TYPE = ( # names for self-documenting code
        NONE     => 0,
        ARGS     => 1,
        FILEARGS => 2,
        STDIN    => 3,
        DEFAULT  => 4,
     );
     my %INPUT_LABEL = reverse %INPUT_TYPE; # allow lookup by number

     my $current_input_type = $INPUT_TYPE{NONE};

     # I could have done this all in one "shot" but I wanted to keep the
     # detection of input type separate from the processing of input
     my $char;
     if ( @ARGV ) {
         # Note that a filename typo will result in processing of the command
         # line like it is normal input, but that won't matter in this example.
         if ( -f $ARGV[0] ) {
             $current_input_type = $INPUT_TYPE{FILEARGS};
         }
         else {
             $current_input_type = $INPUT_TYPE{ARGS};
         }
     }
     else {
         # Code from Perl Cookbook. We peek into STDIN stream to see if
         # anything's there. The read still counts, though, so we need to save
         # $char. perldoc Term::ReadKey for information on ReadMode() and
         # ReadKey()
         ReadMode('cbreak');
         if (defined ($char = ReadKey(-1)) ) {
             $current_input_type = $INPUT_TYPE{STDIN};
         }
         else {
             $current_input_type = $INPUT_TYPE{DEFAULT};
         }
         ReadMode('normal');
     }
     warn "current_input_type is: $INPUT_LABEL{$current_input_type}\n"
         if $VERBOSE;

     if ( $current_input_type == $INPUT_TYPE{FILEARGS} ) {
         local $/; # Slurp the whole file in at once, not line-by-line
         for my $file (@ARGV) {
             open(my $ifh, '<', $file) or die "Can't open $file: $!";
             $input .= <$ifh>;
             close($ifh) || warn "close failed: $!";
         }
     }
     elsif ( $current_input_type == $INPUT_TYPE{ARGS} ) {
         $input = join ' ', @ARGV;
     }
     elsif ( $current_input_type == $INPUT_TYPE{STDIN} ) {
             # Slurp all STDIN at once, not line-by-line
             $input = $char . do { local $/; <STDIN> };
     }
     else {
         my $file = "a.out";
         open(my $ifh, '<', $file) or die "Can't open $file: $!";
         $input = do { local $/; <$ifh> };
         close($ifh) || warn "close failed: $!";
     }
     return $input;
 }

 my $input = multi_input();
 my $reversed = reverse $input;
 print "$input\n";
 print "$reversed\n";

Sunday, April 26, 2015

Please ignore, just testing styles

The Comments Section of a Blog is Important

Some people still don't read a blog's comments. I encourage you to do so if the topic interests you. The original post is not complete without the comments, because in them you will often find corrections to the original post or suggestions that improve upon it. Sometimes you will read comments that you feel add little, or, if it's especially popular (not mine), flame wars and maybe some spam. But it's better to have the conversation than a lone blog post with a single person's opinions and experiences.

I have been tempted in the past to update my own posts with valuable input from the comment section, but I think it's better to encourage folks to read them. What's useful to me might not be useful to you.

There's no single person that knows everything I know, but for any given topic, there's someone who knows more about it than I do. That's why the comments are important.

But don't take my word for it. Trust me on that. :)

Sunday, April 05, 2015

Saving Vertical Space

I was reviewing some code I had written for a simple RPG dice algorithm (although there's already a good module for this, Game::Dice) and I realized again that I have a prefererence for functions that can fit on one screen. One strategy is breaking up the code into smaller routines but I sometimes like to compact it vertically as much as possible first.

This function roll, given a string of "dice language," should return the results of such a dice roll. An example of this would be "3d10+1" to roll three 10-sided dice and then add 1, or "4d6b3" which says to roll four 6-sided dice and take the best three.

Here's the function before the refactor:

sub roll {
    my $input = shift;
    die unless $input =~ /d/;
    if ( $input =~ /(\d*)d(\d+)\s*(\D?)\s*(\d*)/ ) {
        my $num   = $1 || 1;
        my $die   = $2;
        my $plus  = $3;
        my $end   = $4;
        my $total = 0;
        my @dice;
        for my $count ( 1 .. $num ) {
            my $single = int( rand($die) ) + 1;
            push @dice, $single;
            print "$single\n";
        }
        if ( $plus eq 'b' ) {
            if ( $end > $num ) {
                $end = $num;
            }
            @dice = sort { $b <=> $a } @dice;
            $#dice = $end - 1;
        }
        for my $die (@dice) {
            $total += $die;
        }
        if ( $plus eq '+' ) {
            $total += $end;
        }
        elsif ( $plus eq '-' ) {
            $total -= $end;
        }
        elsif ( $plus eq '*' ) {
            $total *= $end;
        }
        elsif ( $plus eq '/' ) {
            $total /= $end;
        }
        return $total;
    }
    return;
}

The first thing I did is to delete the first of this pair of lines, which was redundant, because the line that follows also checks the format of the input:

die unless $input =~ /d/;
if ( $input =~ /(\d*)d(\d+)\s*(\D?)\s*(\d*)/ ) {

But instead of having that big if block, I changed it to this:

return unless $input =~ /(\d*)d(\d+)\s*(\D?)\s*(\d*)/;

Then I combined these:

my $die   = $2;
my $plus  = $3;
my $end   = $4;

into this:

my ($die,$plus,$end) = ($2,$3,$4);

Once I decided I didn't need to print each individual die as it was rolled, I could reduce this:

for my $count ( 1 .. $num ) {
    my $single = int( rand($die) ) + 1;
    push @dice, $single;
    print "$single\n";
}

to this:

push @dice, int(rand($die))+1 for ( 1..$num );

Then, I changed this:

if ( $end > $num ) {
    $end = $num;
}

To use the postfix if:

$end =  $num if $end > $num;

and this:

for my $die (@dice) {
    $total += $die;
}

to use postfix for:

$total += $_ for @dice;

One thing I like to do with an if/else chain like this:

if ( $plus eq '+' ) {
    $total += $end;
}
elsif ( $plus eq '-' ) {
    $total -= $end;
}
elsif ( $plus eq '*' ) {
    $total *= $end;
}
elsif ( $plus eq '/' ) {
    $total /= $end;
}

is to compress it like this:

if    ( $plus eq '+' ) { $total += $end }
elsif ( $plus eq '-' ) { $total -= $end }
elsif ( $plus eq '*' ) { $total *= $end }
elsif ( $plus eq '/' ) { $total /= $end }

Since it's still short in width and the syntax can lined up to be quite readable.

So the final version of the refactored function is:

sub roll {
    my $input = shift;
    return unless $input =~ /(\d*)d(\d+)\s*(\D?)\s*(\d*)/;
    my $num = $1 || 1;
    my ($die,$plus,$end) = ($2,$3,$4);
    my $total = 0;
    my @dice;
    push @dice, int(rand($die))+1 for ( 1..$num );
    if ( $plus eq 'b' ) {
        $end =  $num if $end > $num;
        @dice = sort { $b <=> $a } @dice;
        $#dice = $end-1;
    }
    $total += $_ for @dice;
    if    ( $plus eq '+' ) { $total += $end }
    elsif ( $plus eq '-' ) { $total -= $end }
    elsif ( $plus eq '*' ) { $total *= $end }
    elsif ( $plus eq '/' ) { $total /= $end }
    return $total;
}

Now you can make things a lot smaller (see Perl Golf examples) but readability is important to me, and I think this is arguably as readable as the original. I was actually a little surprised that perltidy barely touched the if/elsif structure, just screwing up the alignment a little on the first line:

if ( $plus eq '+' ) { $total += $end }
elsif ( $plus eq '-' ) { $total -= $end }
elsif ( $plus eq '*' ) { $total *= $end }
elsif ( $plus eq '/' ) { $total /= $end }

The code doesn't strictly adhere to Perl Best Practices, which is something I like to use as a guide for the most part, but perlcritic (which is based on Perl Best Practices) doesn't start to complain until the cruel setting, then bringing up things like postfix if, postfix for, and unless.

How would you make it smaller while still maintaining readability?

Tuesday, December 13, 2016

Heap's Algorithm and Generating Perl Code From Pseudocode

Monday, September 26, 2016

Tattletale Variables

Monday, September 12, 2016

Build Your Memory Palace

Wednesday, August 26, 2015

Using Dispatch Tables To Improve Application Security

Friday, August 07, 2015

Accepting Input from Multiple Sources

Sunday, April 26, 2015

Please ignore, just testing styles

Sunday, April 05, 2015

Saving Vertical Space

Labels

Blog Archive