Re: Contextual find and replace large config file

It works fine (as long as the format does not change too much), however the more complex things that I want to do these kind of snippets tend to become very complex and difficult to maintain. ... I am looking for a very simple approach (search and replace, not reading the entire data file to memory)

It depends a lot on how much you can trust how strict the configuration file format is. For example, if you can be absolutely certain that, like in your example, the opening and closing braces are always on a line by themselves, then it'd be possible to implement a fairly simple line-by-line parser that keeps the names of the current sections on a stack, so that you can differentiate between different nested sections that happen to have the same name - I'm thinking something like the following:

use warnings;
use strict;

# $sep needs to be some char guaranteed not to be in the section name
# (not the most elegant solution, but this isn't meant to be)
my $sep = "\n";
my $target_sec = "ObjectType1${sep}NestedObject";
my $target_param = "Param1";
my $replace_val = "Hello";

my ($secname,@sec_stack,$cur_sec);
while (<DATA>) {
    next if /^\s*#/ || !/\S/;
    if ( /^ (\s*) (.+?) \s* = \s* (.+?) \s* $/x ) {
        die "section name without a following block"
            if defined $secname;
        my ($indent,$param,$val) = ($1,$2,$3);
        if ($cur_sec eq $target_sec && $param eq $target_param)
            { $_ = $indent.$param.' = '.$replace_val."\n" }
    }
    elsif ( /^ \s* (\w+) \s* $/x ) {
        die "two section names following one another"
            if defined $secname;
        $secname = $1;
    }
    elsif ( /^ \s* \{ \s* $/x ) {
        die "'{' without name" unless defined $secname;
        push @sec_stack, $secname;
        $cur_sec = join $sep, @sec_stack;
        $secname = undef;
    }
    elsif ( /^ \s* \} \s* $/x ) {
        die "section name without a following block"
            if defined $secname;
        die "'}' without '{'" unless @sec_stack;
        pop @sec_stack;
        $cur_sec = join $sep, @sec_stack;
    }
    else { die "Failed to parse '$_'" }
    print $_;
}

__DATA__
# comment
ObjectType1
{
    Param1 = Foo
    NestedObject
    {
        Param1 = Bar
    }
    # just another comment
}

ObjectType2
{
    Param1 = Quz
    NestedObject
    {
        Param1 = Baz
    }
}
[download]

But once things start getting more complex, I'd recommend a "real" parser instead. You can check the Config:: namespace to see if there happen to be any modules that match your config format. 500k lines isn't all too much to read into memory at once, IMO, unless you're running on some really memory-restricted machine. In the worst case, you can write a parser yourself, e.g. using the m/\G.../gc technique (there's one example in the Perl docs in perlop under "\G assertion"), or using a full grammar (Parse::RecDescent, Regexp::Grammars, Marpa::R2, ...).

Here's a solution using m/\G.../gc, followed by a Regexp::Grammars example (the latter only parses, it doesn't do the replacement). In both, I've made some assumptions about the file format, such as that a Name = Value pair must appear on a single line by itself, that the section names may or may not contain whitespace, and so on (I've chosen slightly different rules in both). What I like about these kind of solutions is that they're "just" regular expressions, and as long as one can deal with those, it should hopefully be understandable.

use warnings;
use strict;
use Data::Compare qw/Compare/;

my @target_block = ('Object Type1','NestedObject');
my $target_param = 'Param 1';
my $new_val = 'Hello!';
my $data = do { local $/; <DATA> };

my @stack;
pos($data)=0;
while ( pos($data)<length($data) ) {
    use re '/msx';
    my $repl;
    if ( $data=~m{\G ^ \h* \# [^\n]* (?:\z|\n) }gc )
        {} # comment, nothing to do
    elsif ( $data=~m{\G \s* ( \w(?:[\w\h]*\w)? ) \s* \{ \h*\n* }gc )
        { push @stack, $1 }
    elsif ( $data=~m{\G (?<pre> ^\h* ) (?<name> [^\n=]+?)
            (?<mid> \h*=\h* ) (?<value> [^\n]+? )
            (?<post> \h*(?:\z|\n) ) }gc ) {
        if ( Compare(\@stack,\@target_block)
                && $+{name} eq $target_param ) {
            $repl = $+{pre}.$+{name}.$+{mid}.$new_val.$+{post};
        }
    }
    elsif ( $data=~m{\G \s* \} \h*\n* }gc ) {
        die "'}' with no opening '{'?" unless @stack;
        pop @stack;
    }
    else { die "Failed to parse at: \""
        .substr($data, pos $data, 50)."...\"" }
    print $repl//substr($data, $-[0], $+[0]-$-[0]);
}

__DATA__
# comment
Object Type1 {
    Param1 = Foo
    NestedObject {
        Param 1 = Bar   
    }
    # just another comment
}

# comment
ObjectType2 {
    Param1 = Quz = z
    NestedObject {
        Param1 = Baz
} }
[download]

use warnings;
use strict;
use Regexp::Grammars;

my @blockstack;
my $grammar = do { use Regexp::Grammars; qr{
    \A (?: <.comment> | <[confblock]> )* \z
    <rule: confblock>
        ^ <blockname=([^\s\{\}=]+)> \{
        (?: <[param]> | <[confblock]> | <.comment> )*
        \} <.ws>
    <rule: param>
        ^ <name=([^\n=]+?)> = <value=([^\n]+?)> (?:\n|\z)
    <token: comment>
        ^ \h* \# [^\n]* (?:\n|\z)
}xms };

my $data = do { local $/; <DATA> };
$data =~ $grammar or die "failed to parse";
my %conf = %/; #/

use Data::Dump;
dd \%conf;

__DATA__
# comment
ObjectType1 {
    Param1 = Foo
    NestedObject {
        Param1 = Bar   
    }
    # just another comment
}

# comment
ObjectType2 {
    Param1 = Quz
    NestedObject {
        Param1 = Baz
} }
[download]

Comment on Re: Contextual find and replace large config file Select or Download Code

Replies are listed 'Best First'.
Re^2: Contextual find and replace large config file by Veltro (Hermit) on Jan 03, 2019 at 10:57 UTC
This is great stuff haukex I think that using Regexp::Grammars is probably the best solution, however I am getting this YACC feeling over me and think this kind of thing is programming on an entire different level. So currently I am looking at your second approach which I think will offer me the flexibility that I am looking for. Actually I think this will help me to take this even one step further and build a more advanced configuration which will allow me to specify a filter and formulas to act on parameters. And for this I am thinking in the same lines as LanX (using a cache, separate functionality in functions etc. etc.). I understand about 95% of the code, but I am still struggling with some of the regex items which are: Why `(?:\z\|\n)` and not just `\z` when `\z` is 'up to and including `\n`' Why `\h\n` and not `\s*` Thanks for your elaborate post	[reply] [d/l] [select]
Re^3: Contextual find and replace large config file by haukex (Archbishop) on Jan 03, 2019 at 12:40 UTC
Why `(?:\z\|\n)` and not just `\z` when `\z` is 'up to and including `\n`' Not quite, `\z` only ever matches at the very end of the string, whereas `\Z` also matches before the newline at the end of the string, and the meaning of `$` is changed by the `/m` modifier to match before every newline or at the end of the string. When I want to express "match up to the end of this line", I sometimes prefer `(?:\z\|\n)` over `$`+`/m` because the former explicitly consumes the `\n`. Why `\h\n` and not `\s`* Because `/\s/` would also match e.g. `\t\n\t`, which causes a following `/^.../` to no longer match, since `/\s/` consumed the `\t` at the beginning of the line. Update: Regarding the first point: `$ perl -MData::Dump -e 'dd split /($)/m, "x\ny\nz"' ("x", "", "\ny", "", "\nz") $ perl -MData::Dump -e 'dd split /(\z\|\n)/m, "x\ny\nz"' ("x", "\n", "y", "\n", "z")` [download]	[reply] [d/l] [select]


Your skill will accomplish what the force of many cannot
	PerlMonks