Tuesday, January 08, 2019

debugging PERL regular expressions

Okay, I'm using PERL when I can, because it makes the job. I like how it is not distracting my train of thoughts when tackling a problem and how it makes scripts that are easy to hack later on. I love how it integrated regular expressions, but so far, I wasn't fond of how I had to test input patterns one after another with stripped down versions of the regular expression to find why it was failing.

And today I discover use re 'debug'.

With a simple line, I can get an overview of what the regular expression evaluator is trying to do when parsing a string (coloring added manually).

In yellow, we can track the position within the string: how much has been accepted so far. In green, a quick overview of what was just behind that position and what is just ahead. The blue code seems to identify the next operation that the regexp will do -- its program counter, sort of, and in white the detail of what is tried/done.

edit: let's not paint colors manually any more!


 redebug=> sub {
          if (s/(END [(]0[)])$/$ign$1$norm/) {
            $compiling = 0;
          } elsif ($compiling) {
            s/(.*)/$ign$1$norm/;
          }
          if (s/^(Compiling .*)/$low$1$ign ___/) {
            $compiling = 1;
          }
          s/^(Freeing .*)/${ign}$1$norm/;
          s/^(Matching REx) ("[^"]+"[.]*) against ("[^"]+"[.]*)/$1 $high$2$norm against $low$3$norm/;
          s/(Found anchored)/$blink$1$norm/;
          s/^ ([0-9 ]+) (<[^>]+> <[^>]+>) * ([|0-9 ]+:)/ $blink$1 $high$2 $low$3$norm/;
          s/(Match succesful.)/$blink$1$norm/;
        },
        

1 comment:

PypeBros said...

$blink, $ign, $high and such are ANSI escape code to apply the colors you want. just make sure $norm returns to regular display.