http://qs321.pair.com?node_id=11116846


in reply to Need your help in a pattern based text manipulation algorithm

Here's a more developed version of the code here. The subroutine  process_params() is no longer a stub. This code is only very minimally tested. With a params.dat input file that is

NAME|VALUE = a TASK|VALUE = copy CAPS|VALUE = 0 PKG_TYPE|VALUE = premium NAME|VALUE = c TASK|VALUE = paste STACK|VALUE = 2 SHIP|VALUE = lowtier NAME|VALUE = z TASK|VALUE = cut CAPS|VALUE = 0 PKG_TYPE|VALUE = premium
(note blank line at end as a test) and a a_copy.txt file that is initially
the rain CAPS|VALUE = in spain falls PKG_TYPE|VALUE = mainly
the converted a_copy.txt file
the rain CAPS|VALUE = 0 in spain falls PKG_TYPE|VALUE = premium mainly
is produced. A file c_paste.txt also exists and seems to be handled correctly, but no z_cut.txt file exists to test throwing an exception in this case (which happens).

Latest full script code is

use 5.010; # needs regex \K operator use strict; use warnings; use autodie; use Data::Dump qw(dd); use constant USAGE => <<"EOUSAGE"; usage: perl $0 param_file_name where: param_file_name parameter file name EOUSAGE use constant EXT => '.txt'; my @tags = qw(NAME TASK CAPS PKG_TYPE STACK SHIP); my $rx_value_intro = qr{ \s* [|] \s* VALUE \s+ = }xms; my $rx_value = qr{ [[:alnum:]]+ }xms; my ($rx_tag) = map qr{ \b (?: $_) \b }xms, join ' | ', map quotemeta, reverse sort @tags ; # print "rx_tag $rx_tag \n"; # for debug die USAGE unless @ARGV == 1; my $param_file_name = shift; die "'$param_file_name' does not exist \n", USAGE unless -e $param_file_name; open my $fh_param, '<', $param_file_name; local $/ = ''; # "paragrep" mode while (my $record = <$fh_param>) { my $got_params = my %params = $record =~ m{ \G \s* ($rx_tag) $rx_value_intro \s* ($rx_value) \s+ }xmsg; die "bad params record '$record'" unless $got_params; # dd \%params; # for debug process_params(%params); } close $fh_param; exit; # subroutines ###################################################### sub process_params { my (%params, ) = @_; exists $params{$_} or die "no '$_' parameter" for qw(NAME TASK); my $input_file_name = "$params{'NAME'}_$params{'TASK'}${ \EXT }"; die "input file '$input_file_name' does not exist" unless -e $input_file_name; delete @params{ qw(NAME TASK) }; my ($rx_search) = map qr{ \b (?: $_) \b }xms, join ' | ', map quotemeta, reverse sort keys %params ; # print "rx_search $rx_search \n\n"; # for debug open my $fh_in_text, '<', $input_file_name; my $content = do { local $/; <$fh_in_text>; }; close $fh_in_text; $content =~ s{ ^ ($rx_search) $rx_value_intro \K } { $params{$1}}xmsg; open my $fh_out_text, '>', $input_file_name; print $fh_out_text $content; close $fh_out_text; }
Note that Perl version 5.10+ is required because a regex extended pattern is used. I'm sorry that this post is a bit rushed; please feel free to ask any questions.

One thing that would be helpful if you have any further questions is the provision of an example input/parameter file and at least one corresponding short before/after pair of files to be processed.


Give a man a fish:  <%-{-{-{-<