Wednesday, August 26, 2015

Using Dispatch Tables To Improve Application Security

Update: I have changed the title to "Using Dispatch Tables To Improve Application Security" for clarity.

At a previous job, I saw some code that asked the user which function they wanted to run and then executed a subroutine with that name. This code demonstrates why such a practice is bad:

  1. use strict;
  2. use warnings;
  3. sub greet { print "Hello!\n" }
  4. sub inquire { print "How are you?\n" }
  5. sub bye { print "Farewell!\n" }
  6. sub delete_all_files { print "*KABOOM*\n" }
  7. sub insecure_call {
  8. no strict 'refs';
  9. shift->();
  10. }
  11. insecure_call('greet');
  12. insecure_call('inquire');
  13. insecure_call('bye');
  14. insecure_call('delete_all_files');

Output:

Hello!
How are you?
Farewell!
*KABOOM*

One solution to this is the dispatch table. With a dispatch table, you define up front which calls are legal for an outsider to make:

  1. use strict;
  2. use warnings;
  3. my %dispatch = (
  4. greet => \&greet,
  5. inquire => \&inquire,
  6. bye => \&bye,
  7. );
  8. sub greet { print "Hello!\n" }
  9. sub inquire { print "How are you?\n" }
  10. sub bye { print "Farewell!\n" }
  11. sub delete_all_files { print "*KABOOM*\n" }
  12. sub secure_call {
  13. my $call = shift;
  14. if ( ref $dispatch{$call} eq 'CODE' ) {
  15. $dispatch{$call}->();
  16. }
  17. else {
  18. warn "Invalid call $call";
  19. }
  20. }
  21. secure_call('greet');
  22. secure_call('inquire');
  23. secure_call('bye');
  24. secure_call('delete_all_files');

Output:

Hello!
How are you?
Farewell!
Invalid call delete_all_files at example_2a line 22.

The thing that bugs me about this particular solution (and I'll admit it's minor) is the repetition:

  1. my %dispatch = (
  2. greet => \&greet,
  3. inquire => \&inquire,
  4. bye => \&bye,
  5. );

To me, this reads like:

To go to greet, type 'greet'.
To go to inquire, type 'inquire'.
To go to bye, type 'bye'.

When it could just be asking "Which function do you wish to use?"

So, we could build the dispatch table dynamically from a list of acceptable calls:

  1. use strict;
  2. use warnings;
  3. my %dispatch;
  4. my @valid_calls = qw( greet inquire bye );
  5. sub greet { print "Hello!\n" }
  6. sub inquire { print "How are you?\n" }
  7. sub bye { print "Farewell!\n" }
  8. sub delete_all_files { print "*KABOOM*\n" }
  9. sub build_dispatch_table {
  10. no strict 'refs';
  11. %dispatch = map { $_ => *{$_} } @valid_calls;
  12. }
  13. sub secure_call {
  14. my $call = shift;
  15. if ( $dispatch{$call} ) {
  16. $dispatch{$call}->();
  17. }
  18. else {
  19. warn "Invalid call $call\n";
  20. }
  21. }
  22. build_dispatch_table();
  23. secure_call('greet');
  24. secure_call('inquire');
  25. secure_call('bye');
  26. secure_call('delete_all_files');
  27. print "\nBut, now this works because of the typeglob *{}\n";
  28. our @greet = qw( This is an array );
  29. print "@{$dispatch{greet}}\n";
  30. print "which annoys me even though it's probably inconsequential\n";

Output:

Hello!
How are you?
Farewell!
Invalid call delete_all_files

But, now this works because of the typeglob *{}
This is an array
which annoys me even though it's probably inconsequential

In addition to the typeglob annoyance, there is still a little repetition there: greet, inquire and bye still appear more than once in the code. I don't actually find this to be a huge deal, but how might we solve those issues? One way is including the code itself in the dispatch table:

  1. use strict;
  2. use warnings;
  3. my %dispatch = (
  4. # Documentation for greet can go here
  5. greet =>
  6. sub {
  7. my $greeting = shift || 'Howdy!';
  8. print "$greeting\n";
  9. },
  10. # Documentation for inquire can go here
  11. inquire =>
  12. sub {
  13. print "How are you?\n";
  14. },
  15. # Documentation for farewell can go here
  16. farewell =>
  17. sub {
  18. print "Bye!\n";
  19. },
  20. );
  21. sub delete_all_files { print "*KABOOM*" }
  22. sub api {
  23. my $call = shift;
  24. if ( $dispatch{$call} ) {
  25. $dispatch{$call}->(@_);
  26. }
  27. else {
  28. warn "Not executing unknown API call $call\n";
  29. }
  30. }
  31. api('greet','Hello.');
  32. api('inquire');
  33. api('farewell');
  34. api('delete_all_files');

Output:

Hello.
How are you?
Bye!
Not executing unknown API call delete_all_files

One argument against this is it adds visual complexity to the code: it's one more layer that a new developer on the project would need to mentally parse before coming up-to-speed on the code. But, that may be minor, and I think these formatting choices are developer-friendly.

Friday, August 07, 2015

Accepting Input from Multiple Sources

One of the corners I often paint myself into when developing a tool is only accepting one type of input, usually STDIN, the standard input stream, like a pipeline (ex: cat fruit.txt | grep apple) or a redirect (ex: grep apple < fruit.txt)

What inevitably happens is I end up wanting the tool to work like any Unix tool and accept different kinds of input (filenames or arguments on the command line, for example.)

Finally I got fed up with it and added a function called multi_input() to my library. Here is how it works:

First, the setup:

$ cat >meats
chicken
beef
^D
$ cat >fruits
apple
orange
banana
^D
$ cat >vegetables
carrot
lettuce
broccoli
cauliflower
^D
$ cat >a.out
this is just my
default input file
^D

To illustrate use of the function, I just reverse the input to do something "interesting" with it. The operative code is:

  1. my $input = multi_input();
  2. my $reversed = reverse $input;
  3. print "$input\n";
  4. print "$reversed\n";

So now I can interact with the tool in a variety of ways, starting with my "usual" way, STDIN:

$ ./reverse.pl < vegetables
current_input_type is: STDIN
carrot
lettuce
broccoli
cauliflower

rewolfiluac
iloccorb
ecuttel
torrac

Or STDIN by way of a pipe (this is the same mechanism in the code, but just to give another example):

$ cat fruits | ./reverse.pl
current_input_type is: STDIN
apple
orange
banana

ananab
egnaro
elppa

Or filenames provided on the command line:

$ ./reverse.pl meats fruits
current_input_type is: FILEARGS
chicken
beef
apple
orange
banana

ananab
egnaro
elppa
feeb
nekcihc

Or input provided on the command line:

$ ./reverse.pl this is not a list of filenames
current_input_type is: ARGS
this is not a list of filenames
semanelif fo tsil a ton si siht

And finally, the ultimate lazy, my default input file a.out:

$ ./reverse.pl
current_input_type is: DEFAULT
this is just my
default input file

elif tupni tluafed
ym tsuj si siht

Here is the full code listing with comments:

  1. #!/usr/bin/perl
  2. use strict;
  3. use warnings;
  4. use Term::ReadKey; # for ReadMode() below
  5. sub multi_input {
  6. my $input = '';
  7. my $VERBOSE = 1;
  8. my %INPUT_TYPE = ( # names for self-documenting code
  9. NONE => 0,
  10. ARGS => 1,
  11. FILEARGS => 2,
  12. STDIN => 3,
  13. DEFAULT => 4,
  14. );
  15. my %INPUT_LABEL = reverse %INPUT_TYPE; # allow lookup by number
  16. my $current_input_type = $INPUT_TYPE{NONE};
  17. # I could have done this all in one "shot" but I wanted to keep the
  18. # detection of input type separate from the processing of input
  19. my $char;
  20. if ( @ARGV ) {
  21. # Note that a filename typo will result in processing of the command
  22. # line like it is normal input, but that won't matter in this example.
  23. if ( -f $ARGV[0] ) {
  24. $current_input_type = $INPUT_TYPE{FILEARGS};
  25. }
  26. else {
  27. $current_input_type = $INPUT_TYPE{ARGS};
  28. }
  29. }
  30. else {
  31. # Code from Perl Cookbook. We peek into STDIN stream to see if
  32. # anything's there. The read still counts, though, so we need to save
  33. # $char. perldoc Term::ReadKey for information on ReadMode() and
  34. # ReadKey()
  35. ReadMode('cbreak');
  36. if (defined ($char = ReadKey(-1)) ) {
  37. $current_input_type = $INPUT_TYPE{STDIN};
  38. }
  39. else {
  40. $current_input_type = $INPUT_TYPE{DEFAULT};
  41. }
  42. ReadMode('normal');
  43. }
  44. warn "current_input_type is: $INPUT_LABEL{$current_input_type}\n"
  45. if $VERBOSE;
  46. if ( $current_input_type == $INPUT_TYPE{FILEARGS} ) {
  47. local $/; # Slurp the whole file in at once, not line-by-line
  48. for my $file (@ARGV) {
  49. open(my $ifh, '<', $file) or die "Can't open $file: $!";
  50. $input .= <$ifh>;
  51. close($ifh) || warn "close failed: $!";
  52. }
  53. }
  54. elsif ( $current_input_type == $INPUT_TYPE{ARGS} ) {
  55. $input = join ' ', @ARGV;
  56. }
  57. elsif ( $current_input_type == $INPUT_TYPE{STDIN} ) {
  58. # Slurp all STDIN at once, not line-by-line
  59. $input = $char . do { local $/; <STDIN> };
  60. }
  61. else {
  62. my $file = "a.out";
  63. open(my $ifh, '<', $file) or die "Can't open $file: $!";
  64. $input = do { local $/; <$ifh> };
  65. close($ifh) || warn "close failed: $!";
  66. }
  67. return $input;
  68. }
  69. my $input = multi_input();
  70. my $reversed = reverse $input;
  71. print "$input\n";
  72. print "$reversed\n";

Sunday, April 26, 2015

Please ignore, just testing styles

At a previous job, I saw some code that asked the user which function they wanted to run and then executed a subroutine with that name. This code demonstrates why such a practice is bad:

  1. use strict;
  2. use warnings;
  3. sub greet { print "Hello!\n" }
  4. sub inquire { print "How are you?\n" }
  5. sub bye { print "Farewell!\n" }
  6. sub delete_all_files { print "*KABOOM*\n" }
  7. sub insecure_call {
  8. no strict 'refs';
  9. shift->();
  10. }
  11. insecure_call('greet');
  12. insecure_call('inquire');
  13. insecure_call('bye');
  14. insecure_call('delete_all_files');

Output:

Hello!
How are you?
Farewell!
*KABOOM*

One solution to this is the dispatch table. With a dispatch table, you define up front which calls are legal for an outsider to make:

  1. use strict;
  2. use warnings;
  3. my %dispatch = (
  4. greet => \&greet,
  5. inquire => \&inquire,
  6. bye => \&bye,
  7. );
  8. sub greet { print "Hello!\n" }
  9. sub inquire { print "How are you?\n" }
  10. sub bye { print "Farewell!\n" }
  11. sub delete_all_files { print "*KABOOM*\n" }
  12. sub secure_call {
  13. my $call = shift;
  14. if ( ref $dispatch{$call} eq 'CODE' ) {
  15. $dispatch{$call}->();
  16. }
  17. else {
  18. warn "Invalid call $call";
  19. }
  20. }
  21. secure_call('greet');
  22. secure_call('inquire');
  23. secure_call('bye');
  24. secure_call('delete_all_files');

Output:

Hello!
How are you?
Farewell!
Invalid call delete_all_files at example_2a line 22.

The thing that bugs me about this particular solution (and I'll admit it's minor) is the repetition:

  1. my %dispatch = (
  2. greet => \&greet,
  3. inquire => \&inquire,
  4. bye => \&bye,
  5. );

To me, this reads like:

To go to greet, type 'greet'.
To go to inquire, type 'inquire'.
To go to bye, type 'bye'.

When it could just be asking "Which function do you wish to use?"

So, we could build the dispatch table dynamically from a list of acceptable calls:

  1. use strict;
  2. use warnings;
  3. my %dispatch;
  4. my @valid_calls = qw( greet inquire bye );
  5. sub greet { print "Hello!\n" }
  6. sub inquire { print "How are you?\n" }
  7. sub bye { print "Farewell!\n" }
  8. sub delete_all_files { print "*KABOOM*\n" }
  9. sub build_dispatch_table {
  10. no strict 'refs';
  11. %dispatch = map { $_ => *{$_} } @valid_calls;
  12. }
  13. sub secure_call {
  14. my $call = shift;
  15. if ( $dispatch{$call} ) {
  16. $dispatch{$call}->();
  17. }
  18. else {
  19. warn "Invalid call $call\n";
  20. }
  21. }
  22. build_dispatch_table();
  23. secure_call('greet');
  24. secure_call('inquire');
  25. secure_call('bye');
  26. secure_call('delete_all_files');
  27. print "\nBut, now this works because of the typeglob *{}\n";
  28. our @greet = qw( This is an array );
  29. print "@{$dispatch{greet}}\n";
  30. print "which annoys me even though it's probably inconsequential\n";

Output:

Hello!
How are you?
Farewell!
Invalid call delete_all_files

But, now this works because of the typeglob *{}
This is an array
which annoys me even though it's probably inconsequential

In addition to the typeglob annoyance, there is still a little repetition there: greet, inquire and bye still appear more than once in the code. I don't actually find this to be a huge deal, but how might we solve those issues? One way is including the code itself in the dispatch table:

  1. use strict;
  2. use warnings;
  3. my %dispatch = (
  4. # Documentation for greet can go here
  5. greet =>
  6. sub {
  7. my $greeting = shift || 'Howdy!';
  8. print "$greeting\n";
  9. },
  10. # Documentation for inquire can go here
  11. inquire =>
  12. sub {
  13. print "How are you?\n";
  14. },
  15. # Documentation for farewell can go here
  16. farewell =>
  17. sub {
  18. print "Bye!\n";
  19. },
  20. );
  21. sub delete_all_files { print "*KABOOM*" }
  22. sub api {
  23. my $call = shift;
  24. if ( $dispatch{$call} ) {
  25. $dispatch{$call}->(@_);
  26. }
  27. else {
  28. print "Not executing unknown API call $call\n";
  29. }
  30. }
  31. api('greet','Hello.');
  32. api('inquire');
  33. api('farewell');
  34. api('delete_all_files');

Output:

Hello.
How are you?
Bye!
Not executing unknown API call delete_all_files

One argument against this is it adds visual complexity to the code: it's one more layer that a new developer on the project would need to mentally parse before coming up-to-speed on the code. But, that may be minor, and I think these formatting choices are developer-friendly.

Sunday, April 05, 2015

Saving Vertical Space

I was reviewing some code I had written for a simple RPG dice algorithm (although there's already a good module for this, Game::Dice) and I realized again that I have a prefererence for functions that can fit on one screen. One strategy is breaking up the code into smaller routines but I sometimes like to compact it vertically as much as possible first.

This function roll, given a string of "dice language," should return the results of such a dice roll. An example of this would be "3d10+1" to roll three 10-sided dice and then add 1, or "4d6b3" which says to roll four 6-sided dice and take the best three.

Here's the function before the refactor:

sub roll {
    my $input = shift;
    die unless $input =~ /d/;
    if ( $input =~ /(\d*)d(\d+)\s*(\D?)\s*(\d*)/ ) {
        my $num   = $1 || 1;
        my $die   = $2;
        my $plus  = $3;
        my $end   = $4;
        my $total = 0;
        my @dice;
        for my $count ( 1 .. $num ) {
            my $single = int( rand($die) ) + 1;
            push @dice, $single;
            print "$single\n";
        }
        if ( $plus eq 'b' ) {
            if ( $end > $num ) {
                $end = $num;
            }
            @dice = sort { $b <=> $a } @dice;
            $#dice = $end - 1;
        }
        for my $die (@dice) {
            $total += $die;
        }
        if ( $plus eq '+' ) {
            $total += $end;
        }
        elsif ( $plus eq '-' ) {
            $total -= $end;
        }
        elsif ( $plus eq '*' ) {
            $total *= $end;
        }
        elsif ( $plus eq '/' ) {
            $total /= $end;
        }
        return $total;
    }
    return;
}

The first thing I did is to delete the first of this pair of lines, which was redundant, because the line that follows also checks the format of the input:

die unless $input =~ /d/;
if ( $input =~ /(\d*)d(\d+)\s*(\D?)\s*(\d*)/ ) {

But instead of having that big if block, I changed it to this:

return unless $input =~ /(\d*)d(\d+)\s*(\D?)\s*(\d*)/;

Then I combined these:

my $die   = $2;
my $plus  = $3;
my $end   = $4;

into this:

my ($die,$plus,$end) = ($2,$3,$4);

Once I decided I didn't need to print each individual die as it was rolled, I could reduce this:

for my $count ( 1 .. $num ) {
    my $single = int( rand($die) ) + 1;
    push @dice, $single;
    print "$single\n";
}

to this:

push @dice, int(rand($die))+1 for ( 1..$num );

Then, I changed this:

if ( $end > $num ) {
    $end = $num;
}

To use the postfix if:

$end =  $num if $end > $num;

and this:

for my $die (@dice) {
    $total += $die;
}

to use postfix for:

$total += $_ for @dice;

One thing I like to do with an if/else chain like this:

if ( $plus eq '+' ) {
    $total += $end;
}
elsif ( $plus eq '-' ) {
    $total -= $end;
}
elsif ( $plus eq '*' ) {
    $total *= $end;
}
elsif ( $plus eq '/' ) {
    $total /= $end;
}

is to compress it like this:

if    ( $plus eq '+' ) { $total += $end }
elsif ( $plus eq '-' ) { $total -= $end }
elsif ( $plus eq '*' ) { $total *= $end }
elsif ( $plus eq '/' ) { $total /= $end }

Since it's still short in width and the syntax can lined up to be quite readable.

So the final version of the refactored function is:

sub roll {
    my $input = shift;
    return unless $input =~ /(\d*)d(\d+)\s*(\D?)\s*(\d*)/;
    my $num = $1 || 1;
    my ($die,$plus,$end) = ($2,$3,$4);
    my $total = 0;
    my @dice;
    push @dice, int(rand($die))+1 for ( 1..$num );
    if ( $plus eq 'b' ) {
        $end =  $num if $end > $num;
        @dice = sort { $b <=> $a } @dice;
        $#dice = $end-1;
    }
    $total += $_ for @dice;
    if    ( $plus eq '+' ) { $total += $end }
    elsif ( $plus eq '-' ) { $total -= $end }
    elsif ( $plus eq '*' ) { $total *= $end }
    elsif ( $plus eq '/' ) { $total /= $end }
    return $total;
}

Now you can make things a lot smaller (see Perl Golf examples) but readability is important to me, and I think this is arguably as readable as the original. I was actually a little surprised that perltidy barely touched the if/elsif structure, just screwing up the alignment a little on the first line:

if ( $plus eq '+' ) { $total += $end }
elsif ( $plus eq '-' ) { $total -= $end }
elsif ( $plus eq '*' ) { $total *= $end }
elsif ( $plus eq '/' ) { $total /= $end }

The code doesn't strictly adhere to Perl Best Practices, which is something I like to use as a guide for the most part, but perlcritic (which is based on Perl Best Practices) doesn't start to complain until the cruel setting, then bringing up things like postfix if, postfix for, and unless.

How would you make it smaller while still maintaining readability?

Sunday, March 15, 2015

My Bad Communication Skills

A couple weeks ago I asked how you "join the conversation" but based on feedback I got, I don't think I communicated well. I think people thought I meant "which blogs do you read?" What I really meant was: when you write a blog entry, where do you post the link so that it's seen by people who are interested in that subject?

So for example, when I write about Perl, I post to blogs.perl.org. I want to blog about other topics too, like web development (JavaScript, CSS, etc); lifehacks; Unix, Linux, shell scripting; general tech / tech business; and database stuff. But when I blog I'd like it to have a chance of being seen. Please let me know your thoughts!

Saturday, February 21, 2015

How do you join the conversation?


blogs.perl.org is great in that it's a stream of blog posts around a specific technology. Since I, like many of you, blog about other technologies too, I'd like to learn from you about other conversation streams. For me personally, the list of topics include:

  • Web development (JavaScript, CSS, etc)
  • Lifehacks
  • Unix, Linux, shell scripting
  • General tech / tech business
  • Database

I'll add what little knowledge I have on the topic:

There's reddit with corresponding subreddits on Unix and Perl. It looks like it's pretty routine to post blog entries on the Perl one at least, I'm not sure about the climate of other tech subreddits.

There are also subreddits for LifeProTips and Life Hacks. I've not yet participated in either, so again I'm not sure what the expectations are.

The Unix and Linux Forums, while not a "stream" of conversation, seems to be receptive to sharing ideas, and the people there are friendly and helpful.

And of course, Twitter with and without #perl and #unix hashtags. I say with or without as it seems like more popular bloggers, at least, don't bother with the hashtags. I'll also say that with or without the hashtags I haven't seen a lot of traffic coming from Twitter.

I'd appreciate anyone's advice, experience, or knowledge on this topic.

Thanks!

Tuesday, February 10, 2015

Fix Those Legacy Subroutines or Methods


Maybe you know the feeling… you go to add an option to that method or subroutine and… cue Jaws theme

sub update_shopping_cart {
    my $cart_id        = shift;
    my $item           = shift;
    my $quantity       = shift;

Argh. You don’t want your legacy code to break but you also don’t want to add a fourth unnamed parameter to the existing problem. And the solution is simple:

sub update_shopping_cart {
    my $cart_id        = shift;
    my $item           = shift;
    my $quantity       = shift;
    my $apply_discount = 0;        # Initialize the fourth parameter

    my $param1 = $cart_id;
    if ( ref $param1 eq 'HASH' ) {
        $cart_id        = $param1->{cart_id};
        $item           = $param1->{item};
        $quantity       = $param1->{quantity};
        $apply_discount = $param1->{apply_discount};
    }

Now either of these work. The legacy call:

update_shopping_cart( 314, 'apples', 3 );

…or the new style:

update_shopping_cart({
    cart_id        => 314,
    item           => 'apples',
    quantity       => 3,
    apply_discount => 1,
});

Bonus: there is no way to use the new option with the old-style call. If someone wants to use it, they’ll need to switch to the new style.

Pros:
  • The new call is self-documenting. In the original form of the call, you see “314” in the code by itself, and it’s not immediately obvious what it is. Now it’s nicely labeled.
  • Now that you have added the new format, you can painlessly add additional named parameters as needed.

Cons:
  • It may be confusing to see two different styles of calls in your codebase. But, now you can transition the old code piecemeal.