Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Parsing by indentation

by llarochelle (Beadle)
on Oct 24, 2018 at 18:18 UTC ( [id://1224600]=perlquestion: print w/replies, xml ) Need Help??

llarochelle has asked for the wisdom of the Perl Monks concerning the following question:

I've been trying to create a config file parser to parse Cisco IOS configs and such. The final objective would be to show relevant data in contexts based on filters in a configuration file. For example, with such a config file it would display all interfaces where we've found the line "access vlan" as a child of the "interface" context and only show lines containing "speed", "duplex" and "description".

{ 'Context' => '^interface', 'Types' => [ 'Switch', ], 'Condition' => 'access vlan', 'Filter' => [ 'speed', 'duplex', 'description' ] };

So far, so good. I read the "running-config" and I index the lines depth (given that a non-empty line , not beginning with a space (\s) has a depth of 0) in an array. Then, in another read I use that index to read the data again, this time using relative position based on depth to create the "childs" of a context. Here's the function :

sub getDeep { my @data = (@_); my ($bighash,$hash); #First read foreach my $idx (0.. $#data) { my ($spaces, $content) = ($data[$idx] =~ m/^(\s*) +(.*)/); my $depth = length $spaces; $bighash->{node}{$idx}{depth} = $depth; } # Variables for the first read my $ldepth = 0; my $lcontext; my $lid; # Second read foreach my $id (0 .. $#data) { $data[$id] =~ s/^\s*//; next if ($data[$id] =~ /^!/); my $depth = $bighash->{node}{$id}{depth}; if ($depth eq 0) { push (@{$hash->{global}} , $data[$ +id]); $lcontext = $data[$id]; $lid = $id; } if (($depth gt 0) && ($id - $lid eq 1)) { push (@{$hash->{$lcontext}}, (" " +x $depth. $data[$id])); $lid = $id; } } return $hash; }

Using this sub, I can return a hash, then based on the presence of an arrayref for a given key, apply filters as explained. This works pretty well, so far very proud of this piece of code. Problem comes when I want to find childs of childs. In the example below, the childs of "given param2" would reprensent my next challenge.

interface XYZ given param1 -> child of "interface XYZ" given param2 -> child of "interface XYZ" given param2.1 -> child of "given param2" given param2.2 -> child of "given param2" given param3 -> child of "interface XYZ"

So after thinking about this for a while and failing with different approaches, my question comes in 2 separate parts :

1) Is there a better way to do this that I'm not seeing ?

2) How could I keep tagging childs of childs as the lines dig deeper and identify them properly in a data structure ?

Thank you for reading up to this line :)

Replies are listed 'Best First'.
Re: Parsing by indentation
by tybalt89 (Monsignor) on Oct 24, 2018 at 19:30 UTC

    Something like this ?

    #!/usr/bin/perl # https://perlmonks.org/?node_id=1224600 use strict; use warnings; use Data::Dump 'dd'; my $data = <<'END'; interface XYZ given param1 -> child of "interface XYZ" given param2 -> child of "interface XYZ" given param2.1 -> child of "given param2" given param2.1.1 -> child of "given param2.1" given param2.1.2 -> child of "given param2.1" given param2.2 -> child of "given param2" given param3 -> child of "interface XYZ" given param4 -> child of "interface XYZ" interface SECOND given param5 -> child of "interface SECOND" END my $struct = buildstruct($data); dd $struct; sub buildstruct { my $block = shift; my @answers; while( $block =~ /^( *)(.*)\n((?:\1 +.*\n)*)/gm ) { my ($head, $rest) = ($2, $3); $head =~ s/ ->.*//; push @answers, $rest ? { $head => buildstruct($rest) } : $head; } \@answers; }

    Outputs:

    [ { "interface XYZ" => [ "given param1", { "given param2" => [ { "given param2.1" => ["given param2.1.1", "given param2.1.2 +"] }, "given param2.2", ], }, "given param3", "given param4", ], }, { "interface SECOND" => ["given param5"] }, ]

    Or do I completely misunderstand what you're asking for ?

      This is exactly what I was looking for. In just one pass ... This is impressive. I tried it out in my code and this works flawlessly against a whole router config. I think I'd like to add a logic to capture parents without a child on the first level and push them in a separate array or in an arrayref inside the structure, but all I can say is WOW. Could you please explain the regex in the while loop ? Otherwhise, I get that you're using the ternary operator to dig deeper ... this solves my challenge in a single line. Respect !

        Same code with expanded regex with comments

        #!/usr/bin/perl # https://perlmonks.org/?node_id=1224600 use strict; use warnings; use Data::Dump 'dd'; my $data = <<'END'; interface XYZ given param1 -> child of "interface XYZ" given param2 -> child of "interface XYZ" given param2.1 -> child of "given param2" given param2.1.1 -> child of "given param2.1" given param2.1.2 -> child of "given param2.1" given param2.2 -> child of "given param2" given param3 -> child of "interface XYZ" given param4 -> child of "interface XYZ" interface SECOND given param5 -> child of "interface SECOND" END my $struct = buildstruct($data); dd $struct; sub buildstruct { my $block = shift; my @answers; while( $block =~ /^ # make sure to start at beginning of a line (wit +h m) (\ *) # match leading spaces of header line (.*) # match rest of line, save as head \n # and match the newline ( (?: # match all following lines with \1 # same whitespace as head \ + # plus at least one more space ( i.e. indented ) .*\n # contents to be looked at later )* # as many as possible ) # save as rest /gmx ) # global, multiline, and extra whitespace { my ($head, $rest) = ($2, $3); $head =~ s/ ->.*//; push @answers, $rest ? { $head => buildstruct($rest) } : $head; } \@answers; }

        I hope this helps :)

        The regex matches each line, gets the indentation space string, then also matches all following lines that are indented that much plus at least one more space.

        > In just one pass ...

        not really, it's recursive.

        This solution is one-pass.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Re: Parsing by indentation
by atcroft (Abbot) on Oct 24, 2018 at 20:18 UTC

    Some time back I had to do something similar. While I am not able to pull example code at the moment, what I did was basically the following:

      1. Read each line from the config into a file.
      2. Keep a count of the number of leading spaces on each line in a hash.
    1. Compute the greatest common factor (gcf) of the non-zero leading space counts.

    I then processed the array of entries using the ( number of leading spaces / GCF ) value to determine indentation level.

    Hope that helps.

Re: Parsing by indentation
by Corion (Patriarch) on Oct 24, 2018 at 18:25 UTC

    Crossposted from stackoverflow. Crossposting is ok, but it is considered polite to inform about it so that efforts are not duplicated.

      Thanks for the info Corion. You're absolutely right. The discussion in StackOverflow got very technical so I decided to post here to seek wisdom :)

        So, you're saying that you came to the Monastery to gain wisdom from people that aren't as technical ;)

        I'm very much just kidding. Your question was well laid out and understandable, ++.

Re: Parsing by indentation
by LanX (Saint) on Oct 25, 2018 at 04:03 UTC
    Maybe of help Re: Parsing a Tree to a Table.

    It's a similar problem, in your case @path should hold references of the current parent containers.

    The parsed indentation level gives you the index for @path.

    Either add the current element to the correct parent container or expand path.

    HTH

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      I think the following code is self explaining
      use strict; use warnings; use Data::Dump qw/pp dd/; my @parse; my @path; my $last_level=0; $path[$last_level] = \@parse; while (my $line = <DATA> ) { # pp my($white,$key,undef,$content) = $line =~ /^ (\s*) # indent (.*?) # key ( \s*->\s* # ignore arrow (.*) )? # optional group $/x; my $level = length($white) / 2; # pp [$white, $key,$level,$last_level]; die "indent-level $level too big (last level was $last_level)!" if $level > $last_level+1; my @children; push @{$path[$level]}, # { # $key => { # children => \@children, # # level => $level, # # content => $content, # } # }; { $key => \@children # terse output }; $path[$level+1] = \@children; $last_level = $level; } warn "Output: " , pp \@parse; __DATA__ interface XYZ given param1 -> child of "interface XYZ" given param2 -> child of "interface XYZ" given param2.1 -> child of "given param2" given param2.2 -> child of "given param2" given param2.2.1 -> child of "given param2.2" given param3 -> child of "interface XYZ"

      I extended your input to cover the case of a bigger indent gap.

      Output: [ { "interface XYZ" => [ { "given param1" => [] }, { "given param2" => [ { "given param2.1" => [] }, { "given param2.2" => [{ "given param2.2.1" => [] }] }, ], }, { "given param3" => [] }, ], }, ] at d:/tmp/parse_indent.pl line 55, <DATA> line 7.

      you can uncomment various code sections to play with debug output and different data-structure patterns YMMV.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      update

      Some people prefer to avoid empty "children" arrays for leafs-nodes.

      In this case avoid an empty default array and let $path[$level] point to an upper container where you check for existence of an entry for children.

      extending the code should be straight forward.

Re: Parsing by indentation
by karlgoethebier (Abbot) on Oct 27, 2018 at 14:15 UTC

    What about Cisco::Reconfig? Sure, i‘m aware that i guessed once more. Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

      Hi Karl, I've played a little with Cisco::Reconfig and I'm still not sure why I didn't enjoyed it as much as it was promising ... In fact this is exactly when I decided I should start working on a general parser to get context by indentation (node - child relation). And I needed something that would work with all the vendors too, so I thought it might be easier to write rules based on the vendor, only when necessary. Accessing the data structure was unfriendly and I would have had to build a config file around it to address my needs ... Well, thanks for the suggestion anyways, it's always good to evaluate many options. Luc

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1224600]
Approved by Corion
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2024-04-18 06:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found