Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Need your help in a pattern based text manipulation algorithm

by AnomalousMonk (Bishop)
on May 17, 2020 at 02:04 UTC ( #11116846=note: print w/replies, xml ) Need Help??


in reply to Need your help in a pattern based text manipulation algorithm

Here's a more developed version of the code here. The subroutine  process_params() is no longer a stub. This code is only very minimally tested. With a params.dat input file that is

NAME|VALUE = a TASK|VALUE = copy CAPS|VALUE = 0 PKG_TYPE|VALUE = premium NAME|VALUE = c TASK|VALUE = paste STACK|VALUE = 2 SHIP|VALUE = lowtier NAME|VALUE = z TASK|VALUE = cut CAPS|VALUE = 0 PKG_TYPE|VALUE = premium
(note blank line at end as a test) and a a_copy.txt file that is initially
the rain CAPS|VALUE = in spain falls PKG_TYPE|VALUE = mainly
the converted a_copy.txt file
the rain CAPS|VALUE = 0 in spain falls PKG_TYPE|VALUE = premium mainly
is produced. A file c_paste.txt also exists and seems to be handled correctly, but no z_cut.txt file exists to test throwing an exception in this case (which happens).

Latest full script code is

use 5.010; # needs regex \K operator use strict; use warnings; use autodie; use Data::Dump qw(dd); use constant USAGE => <<"EOUSAGE"; usage: perl $0 param_file_name where: param_file_name parameter file name EOUSAGE use constant EXT => '.txt'; my @tags = qw(NAME TASK CAPS PKG_TYPE STACK SHIP); my $rx_value_intro = qr{ \s* [|] \s* VALUE \s+ = }xms; my $rx_value = qr{ [[:alnum:]]+ }xms; my ($rx_tag) = map qr{ \b (?: $_) \b }xms, join ' | ', map quotemeta, reverse sort @tags ; # print "rx_tag $rx_tag \n"; # for debug die USAGE unless @ARGV == 1; my $param_file_name = shift; die "'$param_file_name' does not exist \n", USAGE unless -e $param_file_name; open my $fh_param, '<', $param_file_name; local $/ = ''; # "paragrep" mode while (my $record = <$fh_param>) { my $got_params = my %params = $record =~ m{ \G \s* ($rx_tag) $rx_value_intro \s* ($rx_value) \s+ }xmsg; die "bad params record '$record'" unless $got_params; # dd \%params; # for debug process_params(%params); } close $fh_param; exit; # subroutines ###################################################### sub process_params { my (%params, ) = @_; exists $params{$_} or die "no '$_' parameter" for qw(NAME TASK); my $input_file_name = "$params{'NAME'}_$params{'TASK'}${ \EXT }"; die "input file '$input_file_name' does not exist" unless -e $input_file_name; delete @params{ qw(NAME TASK) }; my ($rx_search) = map qr{ \b (?: $_) \b }xms, join ' | ', map quotemeta, reverse sort keys %params ; # print "rx_search $rx_search \n\n"; # for debug open my $fh_in_text, '<', $input_file_name; my $content = do { local $/; <$fh_in_text>; }; close $fh_in_text; $content =~ s{ ^ ($rx_search) $rx_value_intro \K } { $params{$1}}xmsg; open my $fh_out_text, '>', $input_file_name; print $fh_out_text $content; close $fh_out_text; }
Note that Perl version 5.10+ is required because a regex extended pattern is used. I'm sorry that this post is a bit rushed; please feel free to ask any questions.

One thing that would be helpful if you have any further questions is the provision of an example input/parameter file and at least one corresponding short before/after pair of files to be processed.


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: Need your help in a pattern based text manipulation algorithm
by kaushik9918 (Sexton) on May 17, 2020 at 05:26 UTC

    Hello ,

    Thanks a lot for your time. To add to my post, my a_copy.txt is something like this:

    sjdsajdsd kjfksls'fsaf sandlksadnksalndop djsndsan;sa sdsakdslkdsa sakdsakdsa ..................... ................. NAME|VALUE = a ksadksad dewidewjrw djsdnsalkd dskdsa;dkjsa .................... ............................. TASK|VALUE = copy dsakdmsald;sadsa dsalkdsldk'dls ................................ dskmdsldsdsda CAPS|VALUE = 34 dksdmlsajdsajdsa .............................. dsdksld;sads PKG_TYPE|VALUE = minimal dsmds.dsa.d lsdksadjsldjsdjpos ........................... ............................ .................................. jndlkjsandlksandsndsa jdnsakjdnkjs;dsa kdjslkdjsa;lkds STACK|VALUE = 4 dfdskjff;sdf ............................. ........................ SHIP|VALUE = junk kdskdsakjdl'sa jdsajd;lksahdsakda .......................... dsjads .........................

    And input file is user based, so its not confined to only the content you see in my example. It can have multiple values under "NAME|VALUE =" and "TASK|VALUE =" . I have to read these two keywords from the input file, and open the correct (NAME|VALUE)_(TASK|VALUE).txt. In my example, the first one happens to be "a_copy.txt". Second one happens to be "z_cut.test" and third one is "c_paste.txt". There can be many more like this based on the user input. So, I have to grep for the lines that follow "NAME|VALUE = a" and "TASK|VALUE = copy" in the input file, and replace those lines in the .txt file For example, in this case, a_copy.txt, I have to replace the lines "CAPS|VALUE = 34" with "CAPS|VALUE = 0" and "PKG_TYPE|VALUE = minimal" with "PKG_TYPE|VALUE = premium". In the c_paste.txt I have to replace "STACK|VALUE = 65" with "STACK|VALUE = 2" and "SHIP|VALUE = medium" with "SHIP|VALUE = lowtier" and so on. And you are right, since z_cut.txt file doesnt exist, the code has to throw an error in the end giving a list of all the non existent files and exit. The exit should happen only after the code has successfully run on the existing files. For example, post processing , a_copy.txt should look like:

    sjdsajdsd kjfksls'fsaf sandlksadnksalndop djsndsan;sa sdsakdslkdsa sakdsakdsa ..................... ................. NAME|VALUE = a ksadksad dewidewjrw djsdnsalkd dskdsa;dkjsa .................... ............................. TASK|VALUE = copy dsakdmsald;sadsa dsalkdsldk'dls ................................ dskmdsldsdsda CAPS|VALUE = 0 dksdmlsajdsajdsa .............................. dsdksld;sads PKG_TYPE|VALUE = premium dsmds.dsa.d lsdksadjsldjsdjpos ........................... ............................ .................................. jndlkjsandlksandsndsa jdnsakjdnkjs;dsa kdjslkdjsa;lkds STACK|VALUE = 4 dfdskjff;sdf ............................. ........................ SHIP|VALUE = junk kdskdsakjdl'sa jdsajd;lksahdsakda .......................... dsjads .........................

      It's a few days since I posted and I don't know if you're still looking for support, but I thought I'd follow up and close the loop. I'd like to mention that the example input/output file pair posted here seems to contain much irrelevant data and thus to be overlong. The goal in composing such example files is to include all that is necessary (including, in this case, a bit of general text that is not subject to alteration) and little else. Please see Short, Self-Contained, Correct Example.

      In any event, here's a version of the full script previously posted, updated to reflect my current understanding of your requirements based upon this. As before, it is only minimally tested and I have made no attempt to provide a GUI wrapper. And if you're using a version of Perl prior to 5.10, a fix to the regex that uses the  \K operator can easily be made.

      PerlMonks exists to provide support and assistance to Perl users at all levels of expertise and is not, in general, a free, on-line code writing service. You say that you are a Perl novice, so I have provided a fair amount of code that, in other circumstances, I would have expected you to have contributed to substantially. I have no hesitation about providing help to you, but in future if you have any qustions, please provide the code with which you are working (or at least a reference to it), and please try to provide short, pertinent example files for development and testing.


      Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11116846]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (11)
As of 2020-09-28 19:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If at first I donít succeed, I Ö










    Results (144 votes). Check out past polls.

    Notices?