http://qs321.pair.com?node_id=670565

steph_bow has asked for the wisdom of the Perl Monks concerning the following question:

Hello, Dear Monks

My script detects a line in "EXEMPLE.txt" that contains the element "coucou".

I would like to get the rest ot the text from this line. Could you help me ? Thanks

#!/usr/bin/perl use strict; use warnings; use diagnostics; use Cwd; use File::Copy; my $element = "coucou"; open (INFILE, "<EXEMPLE.txt"); while (my $line = <INFILE>){ if ($line =~ /$element/){ my $outfile = "RESEARCHED_text_"."$element".".txt"; open (OUTFILE, ">$outfile"); print OUTFILE "$line"; close OUTFILE; } } close INFILE;

Replies are listed 'Best First'.
Re: get the rest of the text
by johngg (Canon) on Feb 27, 2008 at 12:18 UTC
    Here are some thought about your code.

    • Check for the success or failure of open and close operations.

    • Use the three argument form of open and lexical filehandles.

    • Don't keep opening and closeing your output file every time you find a line you are interested in, do them once outside the while loop. Your code keeps opening the file for writing thereby clobbering what you have written in previous loops.

    • Add captures to your regex to preserve what comes before and after $element and avoid the performance implications of $'.

    • Reading into $_ can save some typing as certain operations default to using $_ if no argument is given.

    Something along these lines (not tested).

    use strict; use warnings; my $element = q{coucou}; my $inFile = q{EXEMPLE.txt}; my $outFile = qq{RESEARCHED_text_${element}.txt}; open my $inFH, q{<}, $inFile or die qq{open: $inFile: $!\n}; open my $outFH, q{>}, $ourFile or die qq{open: $outFile: $!\n}; while ( <$inFH> ) { next unless m{(.*?)$element(.*); print $outFH; my $beforeElement = $1; my $afterElement = $2; # Do something here with your captured text ... } close $inFH or die qq{close: $inFile: $!\n}; close $outFH or die qq{close: $outFile: $!\n};

    I hope this is of use.

    Cheers,

    JohnGG

Re: get the rest of the text
by moritz (Cardinal) on Feb 27, 2008 at 09:58 UTC
    Everything in the line after the regex match is stored in the special variable $'.

    But note that using this variable (as well as $` and $&) slows down the regex engine in the whole program. So read the warnings in perlre and keep them in mind.

Re: get the rest of the text
by Punitha (Priest) on Feb 27, 2008 at 09:59 UTC

    Hi steph_bow,

    You can get the previous text from the matching string by using '$`' and the following text by using '$'' like,

    #!/usr/bin/perl use strict; use warnings; my $element = "coucou"; open (INFILE, "<EXEMPLE.txt"); while (my $line = <INFILE>){ if ($line =~ /$element/){ my $pre=$`;###to get the previous text from the matching t +ext my $post=$';###to get the following text from the matching + text print "PREVIOUS:$pre\nFOLLOWING:$post\n"; } } close INFILE;

    Punitha

Re: get the rest of the text
by McDarren (Abbot) on Feb 27, 2008 at 12:30 UTC
    As has been pointed out, you can use $'.
    Although, this is best avoided if possible. And in this case it is certainly possible to avoid it by making use of capturing parentheses in your pattern match.

    Here is a small snippet of code that demonstrates how this might be done:

    #!/usr/bin/perl use strict; use warnings; my $wanted = 'coucou'; while (my $line = <DATA>) { chomp($line); my $rest = ''; if ($line =~ m/$wanted(.*)$/) { $rest = $1; print "$wanted:$rest\n"; } else { print "$wanted not found in $line\n"; } } __DATA__ abc coucou def not in this line in this line, but nothing following coucou coucou once, coucou twice, coucou three times - what should be done wi +th this line?

    One thing you didn't specify in your question is what should happen if the string is found more than once in any line. The example above assumes that you would want everything after the first match.

    If you wanted instead, everything after the last match, then you could achieve this by inserting a "greedy" dot-star at the beginning of the pattern match, like so:

    m/.*$wanted(.*)$/

    Cheers,
    Darren