Sunday, December 21, 2014

Vim Macros: A More Complicated Problem

Here’s a more difficult problem than my previous post on Vim macros: I collected a bunch of links to post on my website using a simple 2-column table:

blogs.perl.org  |http://blogs.perl.org/mt/mt-cp.fcgi?__mode=view&id=3159
Wiki            |http://tinypig.pbworks.com/
Tumblr          |http://tinypigdotcom.tumblr.com/
WordPress       |http://tinypig.wordpress.com/
Perl Monks      |http://www.perlmonks.org/?node=tinypig
GitHub          |https://github.com/tinypigdotcom

My HTML links template I want to add them to is basically this:

<h3>Links</h3>
<ul>
<li><a href="">LINK</a><br/></li>
</ul>

Appending the first file to the second, I get this:

<h3>Links</h3>
<ul>
<li><a href="">LINK</a><br/></li>
</ul>

blogs.perl.org  |http://blogs.perl.org/mt/mt-cp.fcgi?__mode=view&id=3159
Wiki            |http://tinypig.pbworks.com/
Tumblr          |http://tinypigdotcom.tumblr.com/
WordPress       |http://tinypig.wordpress.com/
Perl Monks      |http://www.perlmonks.org/?node=tinypig
GitHub          |https://github.com/tinypigdotcom

Note the pipes (|) separating the columns. Having these (or some other unique character) make it much easier to work with the data because we can use Vim’s F find command to move to the exact location we need. Also note there should be a blank line at the end of the file so that we can go down successfully and our find on the pipe symbol will fail, ending the recursive macro.

Step Keys Meaning
1 qqq Start and immediately stop recording macro into buffer ‘q’, to empty it
2 qq Start recording a macro into buffer ‘q’
3 0 Go to the beginning of the line
4 f| Find the next pipe symbol
5 0 Go to the beginning of the line
6 3k Go up three times
7 Y “Yank” the current line (in this case, the link template line) into the buffer
8 p Immediately “put” the line right below, effectively duplicating it
9 3j Go down three times
10 dt| “Delete to” pipe, which also captures the deleted text into the buffer
11 4k Go up four times
12 f>f> Find greater-than symbol, then find the next one
13 p “Put” the buffer (the data from the first column we deleted 3 steps back)
14 ge “Go” to the “end” of the previous word
15 l Go right
16 dt< Delete to greater-than symbol, effectively removing trailing space
17 0 Go to the beginning of the line
18 4j Go down four times
19 l Go right, skipping over the pipe symbol
20 D Delete the rest of the line (again capturing the text)
21 4k Go up four times
22 f" Find next quote symbol in the current line
23 p “Put” the buffer (data from the second column)
24 0 Go to the beginning of the line
25 4j Go down four times
26 dd Delete the current line
27 @q Add the instruction to start running macro ‘q’ again, creating the recursive call
28 q Stop recording
29 @q Actually run macro ‘q’

Here’s what it looks like in practice:

Wednesday, December 17, 2014

Fixing Multiple Lines with Vim Macros


Sometimes I have a tedius code change where I have to perform the same task on multiple lines in vim and I can’t use ‘.’ because each line is a little different or the changes are too complicated. In this situation, I use a recursive macro in vim.

Here, I started with a list of items that I want to turn into a dispatch table - the code itself isn’t important here, it’s the transformation that I want to illustrate:

my %actions = (
    'start_shopping',
    'add_item',
    'remove_item',
    'use_coupon',
    'checkout',
);

Step Keys Meaning
1 qqq Start and immediately stop recording macro into buffer ‘q’, to empty it
2 qq Start recording a macro into buffer ‘q’
3 0 Go to the beginning of the line (always start there, if possible, for consistency)
4 f' Find the next single quote (always start with a find. That way, in the infinite loop we are creating, when it fails the loop will end)
5 l go right to get to first “real” character
6 yw Yank the word
7 $ go to end of line
8 P Put (before the comma)
9 F' Find the previous single quote
10 x delete it
11 i start inserting
12 <SPACE>=><SPACE>\& (literal characters to insert)
13 <ESC> Escape, leaving insert mode
14 F' Find the previous
15 x delete it
16 0 Go to the beginning of the line
17 j Go down a line
18 @q Add the instruction to start running macro ‘q’ again, creating the recursive call
19 q Stop recording
20 @q Actually run macro ‘q’

If you have made a mistake and it starts going haywire (this happens to me once in a while) just <CTRL>-c and <ESCAPE> and u for undo and it should go back to the way it was before you ran the macro.

Here’s what it looks like in practice:


Note: I have this line in my .vimrc file:

map z @qz

Since I don’t use z’s default functionality, instead I have it run @q on an infinite loop which allows me to skip step 18 from the example and then step 20 just becomes “z”.

For this next group of commands, I’m going to use the same concept to align all the =>. The block must begin with the line where => is furthest to the right. If it doesn’t belong there, you can always put it there to align it and then move it back to its place afterward. We’ll start on the second line this time since the first line is already positioned correctly.

Step Keys Meaning
1 qqq Start and immediately stop recording macro into buffer ‘q’, to empty it
2 qq Start recording a macro into buffer ‘q’
3 0 Go to the beginning of the line
4 f= Find the next =
5 50i<SPACE><ESC> Insert 50 spaces to make sure it’s long enough
6 0 Go to the beginning of the line
7 k Go up
8 f= Find the next =
9 j Go down
10 dw “Delete word” in this case will delete to the next non-space
11 0 Go to the beginning of the line
12 j Go down a line
13 @q Add the instruction to start running macro ‘q’ again, creating the recursive call
14 q Stop recording
15 @q Actually run macro ‘q’

Here’s what it looks like in practice:


Credit to Hongleong for the qqq emptying idea (there is much more info at that link regarding recursive vim macros)

For routinely aligning text in vim, I found this video to be extremely helpful.

Just in case that disappears, I have copied two of the links included there: the Tabular plugin and a macro which uses it.

Animated GIFs created with LICEcap

Friday, December 05, 2014

Using Files for Inter-process Communication



I was abusing a test suite I had written to add test data to the database and I was running it inside an infinite loop because it was going to take a while to generate the 10,000 records I wanted. When it came time to stop, I had difficulty either control-c-ing or killing the right processes to get it to quit, so the second time I ran it, I did this instead:

while true
do
    if [ -f $HOME/stop_while ]; then
        break
    fi
    ./thing_i_wanted_repeated
done

Using files for inter-process communication seems so much more straightforward to me.

Thursday, December 04, 2014

XML Nested CDATA sections




"Nested CDATA sections are not allowed." But I need that! I have XML that contains XHTML which in turn has a CDATA section of its own.  Surprisingly I found the answer at Wikipedia.

Here's my XML:

<apple>
    <banana>
    <![CDATA[
    <script type="text/javascript">
    /* <![CDATA[ */
    var language = "en";
    /* ]]]]><![CDATA[> */
    </script>
    <script type="text/javascript" src="//www.this.com/that.js">
    </script>
    <noscript>
    <div style="display:inline;">
    <img src="//www.this.com/that/?l=b5&amp;scrum=2"/>
    </div>
    </noscript>]]></banana>
</apple>

Although the Wikipedia entry covers it sufficiently, I thought it would be interesting to diagram a bit further (click to enlarge):


Tuesday, July 08, 2014

Class Abuse

This situation arose recently and it got me curious...

What if you needed to access an algorithm in a very simple class method, but the functionality of the class is significant and would be a pain in the ass to set up and create an object in, not to mention resource-intensive, all to use one small function...

You could copy the algorithm, but what if it changes? Then you'll have to change it twice. That's no good. I don't like either option.

Remember when you call an object's method (or a class method) that the indirection syntax $self->method() or Whatever::Class->method() sends the thing before the -> as the first parameter.

So what if I just call that function directly? In some cases, I certainly could, for example: Whatever::Class::method($mydata)

In other cases, where the method looks in $self, I could just send the appropriate type of reference in with the value I needed, for example: Whatever::Class::method({data => $mydata})

In this case, it calls an accessor method -- in my example here, get_data() -- to get the value of one of its own properties. This is in line with some OOP philosophies, but it makes use of that method as an individual function that much more difficult.

However, I can just call the function with a first parameter which is an object that responds to the call to get_data

I wrote this naive decoder-ring algorithm to demonstrate.

#!/usr/bin/perl

use 5.14.0;
use autodie;

package Decoder_Ring;

sub new {
    my ( $class, $data ) = @_;
    return bless { data => $data }, $class;
}

sub decode {
    my ($self) = @_;
    return join( '',
        ( split( '', $self->get_data ) )
          [ 2, 7, 17, 24, 31, 37, 45, 54, 55, 65, 72, 77 ] );
}

sub get_data {
    my ($self) = @_;
    return $self->{data};
}

package main;

# Here's the normal object creation and method call to decode
my $normal_object =
  Decoder_Ring->new('?sSY y#ee Uiehor$cBc yL@rekLt AevLQYotUasjCGu!Z!a"');
print "normal result: ", $normal_object->decode, "\n";

# and here's the "fake" one:
# First, bless into unique class name to avoid polluting your namespace
my $pretender_object = bless {}, 'MyTemporaryClass';
my $input_data = 'l2a*lP lu*j!BfeCxssov@osol aupu Zwv+us P'
               . 'VSMvxekv vegA*crsY$L$2^ Sei!uVC$t!N?4.3';

# Next, push a method into the symbol table of the newly created class
*{MyTemporaryClass::get_data} = sub { return $input_data };

# Now our ad-hoc object will respond appropriately to the get_data call!
print "pretender result: ", Decoder_Ring::decode($pretender_object), "\n";

Output:

$ ./sandbox.pl
normal result: Secret!
pretender result: also secret.


But, I wouldn't put this in production code and I wouldn't recommend anyone else do it either, or at least comment heavily if you must.  However, it is an interesting exercise and illustrates Perl's highly flexible OO implementation.

Sunday, June 08, 2014

My First Crontab




Ok, it's not the first crontab I've ever written. This is my crontab template that I just created for mysellf because even though I've been using cron since the late 80s, I still keep forgetting which side is minutes. How sad is that?!?

I copied the nice format from Wikipedia's cron page.

The first line (the only line that isn't a comment, and therefore an actual crontab entry) is my initial test to make sure cron is working for the account.

   * * * * *  echo hi >/tmp/a.hi
#  * * * * *  command to execute
#  _ _ _ _ _
#  | | | | |
#  | | | | |
#  | | | | +---day of week (0 - 6)
#  | | | +--------month (1 - 12)
#  | | +-------------day of month (1 - 31)
#  | +------------------hour (0 - 23)
#  +-------------------------min (0 - 59)

Thursday, May 08, 2014

When a Tradeoff is not a Tradeoff


Years ago, I wrote a post on clever code and advised against it, saying that instead we should strive to make our code maintainable. Since then, I gave a talk at DCBPW called Writing Maintainable Perl. During the talk, I broke a single line of code into multiple lines of code. A member of the audience pointed out that this could have performance ramifications because every time there is a semicolon, Perl does some cleanup and other maintenance tasks. I don’t know much about Perl internals. I have since looked around for more info on this, but Google was not my friend in this case, and perlguts didn’t seem to have what I was looking for in this department either.

In the real world this type of optimization is unlikely to make much of a difference. If you have code that’s dealing with I/O from users, databases, filesystems, etc, an internal language mechanism is just not going to be a factor.

Anyway… thinking back to Damian Conway’s book Perl Best Practices, I remembered a suggestion: “Don’t optimize; benchmark.” whose accompanying text proposed that instead of looking through the code for things you might improve, you should instead run Benchmark tests against your code and see how it performs.

I decided to put both versions of the code to the test and see how they perform. Here are the results:

10:54 dbradford@dbradford-PC presentation ] > ./both_listifys.pl
Benchmark: timing 5000000 iterations of good_listify, orig_listify...
good_listify: 20 wallclock secs (19.75 usr +  0.19 sys = 19.94 CPU) @
250789.99/s (n=5000000)
orig_listify: 23 wallclock secs (22.01 usr +  0.62 sys = 22.64 CPU) @
220887.08/s (n=5000000)

“good_listify” is the cleaned up version of the routine and “orig_listify” is the “bad” version. As you can see, the difference was negligible - when I ran the test five million times there was only a difference of three seconds. However, it was a surprise to me to see that the more maintainable version was faster.

Sometimes it seems like we might trade away speed to get maintainability, but if you go back and check, you might find it doesn’t have to be a tradeoff at all. Taken to a wider view: don’t theorize; demonstrate!

Here is the code that produced the output above:

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark qw(:all) ;
use Data::Dumper;

sub orig_listify {
   my ($aref,$cc) = @_;
   if( ref $aref eq 'ARRAY' && $cc > 0 ) {
       my $j;
       for(my $i=0; $i<=$#$aref; $i+=$cc) {
           push @$j, [@$aref[$i..$i+$cc-1]];
       }

       # BAD!
       $#{$j->[$#{$j}]}=$#$aref%$cc;

       @$aref = @$j;
       return 1;
   }
   return;
}

sub good_listify {
   my ( $in_aref, $elements_per_array ) = @_;
   return if (
       ref $in_aref ne 'ARRAY' or
       $elements_per_array <= 0
   );
   my @result_array;
   for( my $i = 0; $i <= $#$in_aref; $i += $elements_per_array ) {
       push @result_array, [
           @$in_aref[ $i..$i + $elements_per_array - 1 ]
       ];
   }
   my $final_aref        = $result_array[ -1 ];
   my $elements_in_final = $#$in_aref % $elements_per_array;
   # Truncate final array
   $#$final_aref = $elements_in_final;
   @$in_aref = @result_array;
}

my @my_array = ( 'one', 'two', 'three', 'four', 'five', 'six', 'seven',
'eight', 'nine', 'ten' );

my $count=5_000_000;

timethese($count, {
    'orig_listify' => sub { orig_listify(\@my_array, 4) },
    'good_listify' => sub { good_listify(\@my_array, 4) },
});