Sunday, December 18, 2011

Shortcut Shell Script

I have a shell script I use to simplify my life on Unix systems. This is the "Git" version, saved as $HOME/bin/g:

usage() {
      sed -n '/^  *[a-zA-Z0-9][a-zA-Z0-9]*)/ {
s?^\(  *[a-zA-Z0-9][a-zA-Z0-9]*\)) *cmd="\(.*\)" *;;?\1  \2?
p
}' $0 >&2
}

lookup=$1
shift

case $lookup in
   a) cmd="git add $*" ;;
   b) cmd="git branch" ;;
   c) cmd="git commit -m \"$*\"" ;;
   d) cmd="git checkout develop $*" ;;
   m) cmd="git checkout master $*" ;;
  pd) cmd="git push origin develop $*" ;;
  pm) cmd="git push origin master $*" ;;
   r) cmd="git checkout -b $*" ;;
   s) cmd="git status -s $*" ;;
  "") usage;;
   *) echo "Not valid." >&2
      usage;;
esac

echo $cmd >&2
eval $cmd
Ideally, any frequently executed commands take only a few keystrokes (not counting any parameters that need to be passed):


~/dev/TuSC (develop)$ g s

git status -s
M  tusc.ahk
?? tusc.ahk.bak

~/dev/TuSC (develop)$ g c fixed multi-lock bug plus usage comments for GoApp

git commit -m "fixed multi-lock bug plus usage comments for GoApp"
[develop 1a39e4d] fixed multi-lock bug plus usage comments for GoApp
 1 files changed, 67 insertions(+), 23 deletions(-)

~/dev/TuSC (develop)$ g pd

git push origin develop
Counting objects: 5, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 1.26 KiB, done.
Total 3 (delta 2), reused 0 (delta 0)
To git@github.com:tinypigdotcom/TuSC.git
   ede7073..1a39e4d  develop -> develop

~/dev/TuSC (develop)$

Some people use aliases or functions for this kind of thing, but I like my little script because I feel like it's more portable and also has a built-in command list which I can access just by typing the command key by itself (plus Enter, of course):

~/dev/TuSC (develop)$ g
   a  git add $*
   b  git branch
   c  git commit -m \"$*\"
   d  git checkout develop $*
   m  git checkout master $*
  pd  git push origin develop $*
  pm  git push origin master $*
   r  git checkout -b $*
   s  git status -s $*

~/dev/TuSC (develop)$

Sunday, June 19, 2011

Keep a Debug Journal

Whether just starting a new job, or entering a long period of heads-down coding, something I have found helpful from time to time is keeping a "Debug Journal". This is another tool in your arsenal to solve problems. You may have spent an hour or twelve today struggling with an issue. In addition to making notes about what the fix is for that item so that you have it handy in case it reappears elsewhere, you can also look at the big picture and infer lessons which apply to a larger subset of problems. An entry from my own journal follows:
Problem: CGI Form submit, script is executed twice

Answer: HTML Validator plug-in for Firefox was making additional requests. I checked the Changelog and they were aware of the bug and it had been fixed. After I upgraded, everything was fine

Lesson: Involving another pair of eyes and another brain helped me greatly. For one thing, the other person involved wasn't experiencing the error, even when hitting my code. That got us to thinking about the browser. Then he sent me a link regarding the plug-in and voilĂ . Also, again I made a bad assumption - that there was a problem with the server, and not the client. I need to start asking myself what I am assuming after being frustrated by a problem. I also need to involve other people sooner.


Having another person look at the problem may seem like an obvious step, but one thing I want to stress is that this journal can help alert you to your own blind spots. I'm not sure if it was vanity or stubbornness but it took me too long to ask for help in this case. This is something I can work on during future troubleshooting. It is also a good example of drawing a larger lesson from a specific fix. Here is another:
Problem: Does the user have sufficient privileges to see this particular control and perform its function? Code is not always very friendly in looking this sort of thing up.

Answer: In dev, I hard-coded the username to be her if it was initially me:

if($ENV{REMOTE_USER} eq 'david') {
$ENV{REMOTE_USER} = 'betsy';
}
then I could see the screen from her point of view and saw that she did indeed have access.

Lesson: If the initial problem seems difficult, it might be worth a few minutes trying to figure out if there's an easier way to determine the answer.


In our software, we do have a feature that allows us to take the role of another user. However, the view is artificial and there are some constraints on the way it works, so it was not really useful to me in this case. The point here again is the larger picture: the privilege system is somewhat complex, both in code and in data. Hard coding the login was much faster and easier than the alternatives I was facing. Also, in this case, I ended up keeping this snippet handy and using it quite frequently to diagnose user issues. Another entry:
Problem: device_id's were showing up as 1 or 2 or 8 etc (expecting 5762,12428,etc).

Answer: That's what they freakin' were.

Lesson: Just because something might look a little weird, doesn't mean it isn't so. Check the source of the data first and see if it is correct before considering it a problem.


This one was a funny one. Since there are a significant number of devices being tracked by our system, I was used to seeing DEVICE_IDs in the database of say, "5728" or "14972". My brain did not want to accept a single-digit DEVICE_ID. It just so happened I was working with devices that were entered very early in the lifetime of the system. When I looked them up explicitly, they indeed had single-digit identifiers.

Assumptions are a major blind spot for me (and many other people, I suspect). Assumptions are difficult, since we can't really move forward diagnosing any problem without making some assumptions. We just have to try to keep combing through our minds for the bad or unnecessary ones.

And that's the gist of it. If nothing else, keeping a journal can be reenforcement of your experience, helping you to remember without even needing to look it up. But you can look it up if you need to! Several more entries follow without additional comment:


Problem: I keep saying "verified" is set to 1, so why isn't this code running?

Answer: Because that's not the only requirement to get to that code. "Submit" also needs to be set (case-sensitive), but instead, I have set "submit" (lower-case "s").

Fix: copied & pasted output of Data::Dumper of $qs right beside code, so I could compare what was being sent with how it was being tested

Lesson: Also, I was getting sleepy. That makes for bad debugging. Walk around a bit and come back to it.


Problem: alert() isn't working

Answer: the JavaScript command is there, the problem is that the JavaScript command preceding it causes an error. It was a focus() method called on a control that doesn't exist.

Lesson: If a JavaScript command isn't working, you should check the JavaScript console and also the HTML source. Furthermore, if any component of a program isn't working, check THAT component. No sense in adding debug code to a Perl script when the JavaScript is failing (at least as a first resort).


Problem: Script is plugging SYSDATE into records when I have a fixed date I want in there. At the end of the script, I changed the way the script works so that, at the end of processing, I plug the same date into whatever records' AS_OF_DATE field is NULL.

Answer: The problem is that when I changed the script to do this, I never went back and took out SYSDATE so that the records' AS_OF_DATE fields were NULL.

Lesson: I solved this one pretty quickly, but I'm not sure what I could have done differently. There were two parts of the script that needed to be changed and I only changed one part. I guess when I see that setting (foo to bar if foo is baz) isn't working, I should make sure I have set foo to baz first. That seems to be a common theme in slow debugging - assumptions. When you assume...

Monday, June 13, 2011

Hard Drive Paranoia

Brought my laptop running XP out of hibernation mode today and tried to start Firefox, and got some weird error about it not being able to start due to a missing file.  Tried to start something else and started seeing weirdness in other apps as well and decided to reboot.  Upon reboot, CHKDSK automatically ran and found (and seemingly fixed) a TON of errors.

I went to the Event Viewer (Start->Control Panel->Administrative Tools->Event Viewer) and could only find this error "The system failed to flush data to the transaction log. Corruption may occur." which, after a quick Google check led me to believe it was only related to pulling out my USB drive.

I felt compelled to thoroughly check the drive before I put any more serious work onto this laptop only to risk losing it.

My idea was to copy a lot of data and then run a comparison.  First, I ran JDiskReport.  It's a great tool for identifying what's taking up space on your hard drive.  Using this, I located a local directory taking up about 25G.

I copied this directory to c:\testA

Then, though it might have been overkill, I copied it to c:\testB

Then I used another tool I'm liking: FreeFileSync.  This directory sync tool has a setting allowing you to compare file contents instead of just sizes and timestamps.

It took about an hour, but I compared c:\testA to c:\testB.  FreeFileSync reported no file differences.

Then, just to be safe I compared the original source directory to c:\testA.  Again, FreeFileSync reported no file differences.

Since I had done a lot of writing and reading of the hard drive I checked the Event Viewer again and found nothing scary.

From a command prompt (Start->Run->CMD[Enter]), I ran chkdsk c: .  It did do a few more fixes but it also gave me a message to the effect of "This does not indicate disk corruption".  Still, I was a little concerned, so

I ran chkdsk /f c: .  The /F switch tells CHKDSK to fix errors on the disk if found.  It can't run while I'm in Windows so it asks if I want to schedule it for the next boot and I answer Y.  Then I reboot.

It ran successfully, but just to be extra safe I ran chkdsk /r c: .  The /R switch tells CHKDSK to scan the entire disk for bad sectors.  Again, it can't run while I'm in Windows so it asks if I want to schedule it for the next boot and I answer Y.  Then I reboot.

It took a long time to run (maybe a couple hours), but it ran successfully.

Checked the Event Viewer again and this time I am seeing a couple entries from yesterday in the System category which seem to indicate a problem occurred:

Warning: "An error was detected on device \Device\Harddisk0\D during a paging operation."

Error: "A parity error was detected on \Device\Ide\iaStor0."

Still nothing from today, though.  Guess I will keep an eye on it.

Friday, July 11, 2008

Why Dollar-Underscore Should Be Avoided

This bit me recently, so I wrote it in a condensed script to share with you fine people:

#!/usr/bin/perl

use strict;
use Data::Dumper;

my @my_array = ( 'one', 'two', 'three' );

for (@my_array) {
    print "Processing: {$_}\n";
    unknown_author::his_proc();
    print "\$_ is now toast {$_}\n";
}

print Data::Dumper::Dumper( \@my_array );

package unknown_author; # Hypothetically, someone else's module

sub his_proc {
    open I, $0;
    while (<I>) { # But $_ has been aliased to $my_array[index]
        # Doesn't even matter what's inside - the damage has been done
    }
    close I;
}
Output:

Processing: {one}
$_ is toast {}
Processing: {two}
$_ is toast {}
Processing: {three}
$_ is toast {}
$VAR1 = [
          undef,
          undef,
          undef
        ];
Solution:

1. You can control this by changing your own code:

for my $item (@my_array) {
    print "Processing: {$item}\n";
    unknown_author::his_proc();
    print "\$item is fine: {$item}\n";
}
But in your modules/routines:

2. Don't modify $_ unless you know for sure what's in it.

3. If you feel you must modify $_ in your routine, first say:

local $_;

As in:

sub his_proc {
    local $_;

    open I, $0;
    while (<I>) { # This version of $_ is mine now
        # Now I can do whatever I want with $_
    }
    close I;
}
Remember that $_ is global!

Incidentally, this is covered in Perl Best Practices on page 85, "Dollar-Underscore", as the rule: "Beware of any modification via $_." I like my example better because it is more concise and it happened to me. Also, instead of "be careful", I think we should just avoid $_ altogether.

$_ offers a lot of conciseness, but it has large disadvantages in production code, such as introducing subtle bugs, and reducing readability (adding to the Perl-looks-like-line-noise meme). In addition, clarity suffers when using it as the default and unspecified argument to Perl builtins, such as print.

Save it for the one-liners!

Wednesday, June 11, 2008

Access is denied.

In a Perl program running under Windows, I had this instruction which should launch a browser with a Help document:
system(1,"start tkquiz.html");
For the uninitiated, "start" will use Windows extension-to-program mapping to find the appropriate program to execute in order to read that file. Think of it as a way to double-click on a file programmatically.

Anyway, the above instruction yielded this less-than-helpful error message:

Access is denied.
I experimented a bit. I copied tkquiz.pl to tkquiz.txt and tried this:
system(1,"start tkquiz.txt");
It worked.

Running ls -l on the directory using Cygwin, those two files look like this:

-rw-r--r--  1 David None  6108 Jun  5 22:12 tkquiz.html
-rwxr-xr-x  1 David None 18402 Jun 11 20:52 tkquiz.txt
So Windows wants me to have execute privilege on a file to "start" it. That is a bit of a surprise, since I think of the document being launched as just data. On the other hand, the launching program could be using the data file as code (as in the case of tkquiz.pl which would be launched using Perl).

Still under Cygwin, chmod u+x tkquiz.html fixes my problem. But, I was driving myself a bit batty trying to figure out the Windows way of doing it. I right-click on the file in Explorer and click Properties, but I don't see the Security tab I am used to seeing. Turns out that it isn't there by default.

I do wish Windows would stop saving me from myself. I understand that they have a wide variety of users, but there has to be a better way.

Monday, May 26, 2008

listify() and Perl::Critic

Before I start, I just want to make clear to anyone unfamiliar with them, both perltidy and Perl::Critic are pretty comprehensively customizable. But, I am interested in the default settings. The reason is this: on arbitrary choices like indenting by 4 spaces or indenting by 3, if everyone stuck to the defaults, code would be much more uniform across code bases. I guess as I get older, I see much more of the value in conformity. :)

In my last blog entry, I made manual changes to make my code more clear. So, I ran the code through Perl::Critic to gain some additional clarity. This was a full example program and not just the subroutine from the previous blog entry. The final program is below:

#!/usr/bin/perl

use strict;
use warnings;
use version; our $VERSION = qv('0.1');
use Data::Dumper;

my @list = (
'Alabama',        'Alaska',       'Arizona',      'Arkansas',
'California',     'Colorado',     'Connecticut',  'Delaware',
'Florida',        'Georgia',      'Hawaii',       'Idaho',
'Illinois',       'Indiana',      'Iowa',         'Kansas',
'Kentucky',       'Louisiana',    'Maine',        'Maryland',
'Massachusetts',  'Michigan',     'Minnesota',    'Mississippi',
'Missouri',       'Montana',      'Nebraska',     'Nevada',
'New Hampshire',  'New Jersey',   'New Mexico',   'New York',
'North Carolina', 'North Dakota', 'Ohio',         'Oklahoma',
'Oregon',         'Pennsylvania', 'Rhode Island', 'South Carolina',
'South Dakota',   'Tennessee',    'Texas',        'Utah',
'Vermont',        'Virginia',     'Washington',   'West Virginia',
'Wisconsin',      'Wyoming',
);

listify( \@list, 11 );

print Data::Dumper::Dumper( \@list );

sub listify {
   my ( $aref, $cc ) = @_;
   if ( ref $aref eq 'ARRAY' && $cc > 0 ) {
       my $j;
       for ( my $i = 0 ; $i <= $#$aref ; $i += $cc ) {
           push @$j, [ @$aref[ $i .. $i + $cc - 1 ] ];
       }
       $#{ $j->[ $#{$j} ] } = $#$aref % $cc;
       @$aref = @$j;
       return 1;
   }
   return;
}
Perl::Critic comes with a command-line utility, perlcritic. The default minimum severity level for deviations from the standard is 5, the most severe. Since I was looking for enlightenment, I wanted to see everything it had. I ran perlcritic -severity 1 example.pl. This is an example of its output:
Code is not tidy at line 1, column 1. See page 33 of PBP.
(Severity: 1)

RCS keywords $Id$ not found at line 1, column 1. See page 441 of
PBP. (Severity: 2)

RCS keywords $Revision$, $HeadURL$, $Date$ not found at line 1,
column 1. See page 441 of PBP. (Severity: 2)

RCS keywords $Revision$, $Source$, $Date$ not found at line 1,
column 1. See page 441 of PBP. (Severity: 2)

No "VERSION" variable found at line 1, column 1. See page 404 of
PBP. (Severity: 2)

Code before warnings are enabled at line 6, column 1. See page
431 of PBP.  (Severity: 4)

Double-sigil dereference at line 35, column 25. See page 228 of
PBP. (Severity: 2)

C-style "for" loop used at line 41, column 9. See page 100 of
PBP. (Severity: 2)
I ignored perlcritic's demands for RCS stuff like $Id$ which I don't need in a throw-away program. Other than that, I addressed all of its complaints and made the following changes:
  • perlcritic complains that code is not "tidy". That's pretty damn cool (though not unexpected as it is in PBP). I ran perltidy example.pl -o example.out, compared the differences, then copied example.out over example.pl . The prominent changes were:
    • Lines longer than 80 characters were broken up into multiple lines
    • Other whitespace changes were made. It did actually move a line up: where I had ended my if ( with a close-paren semicolon on its own line, it brought that up to the previous line. Not sure I agree with that choice, but I kept it in the name of Zen.
  • It wanted me to use warnings. I have mixed feelings about warnings because I hate things like having to no warnings 'once';, but I can't deny it has helped me before. Typically, I only compile with warnings (perl -Wc example.pl) as that usually gives me all I need.
  • It wanted me to use version. Even though it's a throw-away, I complied.
  • It did not like my c-style for loop. Mixed feelings again here - this particular for loop is very simple in structure. I re-wrote it as an exercise, though.
  • Final element in array should be -1, not $#array. Doh. $#array is just something I've been in the habit of using. But when you are actually referencing the final element, -1 looks a whole lot nicer.
  • In response to many double-sigil complaints, I created variable $aref_elements to reduce clutter. This was something that I should have discovered during refactoring anyway.
  • Perl::Critic prefers $#{$aref} to $#$aref. That is interesting. My initial feeling is that it looks much busier with the curlies, but I can see Damian's point. Once you are comfortable looking at them, consistent use erases a lot of ambiguity.

Sunday, May 25, 2008

Clever Code

One criticism of opponents of Perl is that it is a "write-only" language - meaning that once the code is written, it is extremely difficult to maintain because it is difficult to understand upon re-examination. As with many criticisms, this should be aimed at those undisciplined developers who are writing the code, and not their tool of choice.

Having said that, I think it is also fair to say that Perl makes it very easy to write difficult-to-decipher code. This is the double-edged sword which is the shorthand Perl gives us to be very expressive in a small amount of space. A negative application of this is obfuscated Perl (where the author intentionally makes his code difficult to read), while a more positive application is the craft of creating Perl "one-liners" (trying to include a great deal of functionality in a single line of code). A one-liner can be a powerful weapon in the arsenal of a system administrator.

As a side note, I don't think obfuscated code is inherently evil. I think the Obfuscated Perl Contests have some pretty nifty entries, and when I spent time writing a couple obfuscated perls of my own, I learned a great deal about the Perl parser, so it was a good learning experience as well as fun.

Back to the point, listify() is a subroutine I wrote a while back that takes a reference to an array as its first argument, and a number n as its second argument, and transforms the original array into an array of arrays each having no more than n elements.

Implementation of listify() is here:

sub listify {
   my ($aref,$cc) = @_;
   if( ref $aref eq 'ARRAY' && $cc > 0 ) {
       my $j;
       for(my $i=0; $i<=$#$aref; $i+=$cc) {
           push @$j, [@$aref[$i..$i+$cc-1]];
       }
       $#{$j->[$#{$j}]}=$#$aref%$cc;
       @$aref = @$j;
       return 1;
   }
   return;
}
The point of the routine was to take very long lists of e-mail addresses that we use to notify customers of upcoming changes, and break them apart into smaller recipient lists of reasonable size that can be handled in a single outgoing e-mail.

Given that:

@my_array = ( 'one', 'two', 'three', 'four', 'five', 'six', 'seven',
'eight', 'nine', 'ten' );
listify(\@my_array, 4) transforms @my_array into this:

@my_array = (
   [ 'one',  'two', 'three', 'four'  ],
   [ 'five', 'six', 'seven', 'eight' ],
   [ 'nine', 'ten' ],
);
So, the line that I'm citing as my "clever" line is this one:

$#{$j->[$#{$j}]}=$#$aref%$cc;
Full disclosure: I was pretty impressed with myself for this one at the time, I guess because I was jazzed to have written something so cryptic. And since it was a tiny part of a not-often-used routine, I wasn't worried about maintainability.

The purpose of this line of code is to truncate the final array so that, using the above example, we don't end up with this instead:

@my_array = (
   [ 'one',  'two', 'three', 'four'  ],
   [ 'five', 'six', 'seven', 'eight' ],
   [ 'nine', 'ten', undef,   undef   ],
);
So, what can be done to make this line more readable?

$#{$j->[$#{$j}]}=$#$aref%$cc;
One thing it's missing is whitespace to separate the different parts:

$#{ $j->[ $#{$j} ] } = $#$aref % $cc;
$#array yields the final index of @array. So, $#$array is the notation we'd use if $array is a reference to an array. $#{$array} is the same as $#$array, so we can reduce $#{$j} accordingly.

$#{ $j->[ $#$j ] } = $#$aref % $cc;
We can't do the same with the first $# because of the -> dereference following $j, which is evaluated first.

There's no reason $j has to be an array reference. It can just as easily be an array. We can also give the variables better names while we're at it.

$#{ $result_array[$#result_array] } = $#$in_aref % $elements_per_array;
Often, a line of code can benefit from being more than one line of code. Let's try this:

my $final_aref = $result_array[$#result_array];
$#$final_aref = $#$in_aref % $elements_per_array;
Now we still have our ugly $#$ but at least there's less going on. Then we can break off another piece and add some good whitespace, and a comment for good measure.

my $final_aref        = $result_array[ $#result_array ];
my $elements_in_final = $#$in_aref % $elements_per_array;

# Truncate final array
$#$final_aref = $elements_in_final;
This is certainly much easier to read than our original:

$#{$j->[$#{$j}]}=$#$aref%$cc;
Applying a few of the same principles to the rest of the routine, we get this:

sub listify {
   my ( $in_aref, $elements_per_array ) = @_;

   return if (
       ref $in_aref ne 'ARRAY' or
       $elements_per_array <= 0
   );

   my @result_array;
   for( my $i = 0; $i <= $#$in_aref; $i += $elements_per_array ) {
       push @result_array, [
           @$in_aref[ $i..$i + $elements_per_array - 1 ]
       ];
   }
   my $final_aref        = $result_array[ $#result_array ];
   my $elements_in_final = $#$in_aref % $elements_per_array;

   # Truncate final array
   $#$final_aref = $elements_in_final;
   @$in_aref = @result_array;
}
Much better. I realize there are lots of elegant solutions out there that I would happily break into multiple lines to the original developer's horror. For me, elegance is less importan than clarity. But, breaking up some lines can increase execution time, and that's certainly a factor to be weighed.

Clever code is fun to write, but it has no place in a production environment.

In my opinion, the best reference for writing maintainable code is Perl Best Practices by Damian Conway. It's Perl-centric but much of the advice can be taken into other languages. The module Perl::Critic is intended to critique code against the standards set forth in this book.

You can also "perldoc perlstyle" and check out perltidy for automated formatting.