Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: multiple OR match fails

by lune (Pilgrim)
on Jan 31, 2012 at 13:33 UTC ( [id://950971]=note: print w/replies, xml ) Need Help??


in reply to multiple OR match fails

The obvious part of your question refers to return all matches from a regex match.

That can easily done like this (I simplified your regex, as the missing parenthesis makes it unclear, what you really want):

while(<STDIN>) { # see previous answer #undef ($/); $string=$_; my @matches = ($string =~ m/(FINDINGS|COMPLICATIONS|:.*)/g); print STDOUT "@matches \n"; } echo "FINDINGS COMPLICATIONS :something" | t.pl

However from your question it seems, what you really want is not just to get a list of matches, but some sort of parsing. eg. extract the text from the section "FINDINGS" etc.

To answer this, it would be necessary to know, where a section ends. If this is not, what you wanted, please clarify.

Replies are listed 'Best First'.
Re^2: multiple OR match fails
by zzgulu (Novice) on Jan 31, 2012 at 15:36 UTC

    Thank you very much for your inputs and sorry for the typo; one parenthesis was missing from the code. My text files are operative notes and each note consists of sections that start with a title at the beginning of a line, all in upper case and end in colon. Sections are usually separated by an empty line, although this may not be always the case. The input directory contains 1000 files and my intention is to write the files back to an output directory but with only designated matched sections (title + content). Per recommendation, it seems adding a while loop to my matching RegEx fixed the issue but please do advise me if you find other issues in the code. I seldom do codes but since I am working with text files the RegEx is very powerful helping me for occasional data extraction.I am sure there are much easier ways to code what I coded below. This is a sample input file:

    PREOPERATIVE DIAGNOSIS: Left invasive cancer, positive margins.

    TITLE OF OPERATION:

    1. Left needle-localized segmental mastectomy.

    2. intraoperative axillary lymphatic mapping.

    3. lymphadenectomy.

    ANESTHESIA: General.

    INDICATIONS FOR SURGERY: Invasive carcinoma with positive margins and residual calcifications.

    COMPLICATIONS : None.

    #!/usr/bin/perl use strict; use warnings; my $indir; my $file; my $new; my $string; my $outdir; $indir = 'C:/input'; $outdir ='C:/output'; if(-d $indir) { opendir(DIR, $indir) or die "can't open $!"; } while ($file=readdir(DIR)) { my $fullpath=$indir.'/'.$file; open IN, "$indir/$file"; $new= "$outdir/$file"; open OUT, ">$new"; while(<IN>) { undef ($/); $string=$_; while ($string =~m/(FINDINGS|COMPLICATIONS)(:)(.*?)(^[A-Z])/sgm) { print "processing $file\n"; print OUT "$1$2\t$3"; } } close IN; close OUT; } closedir(DIR); exit;
      Since you asked for comments, I'll make a few:
      - main improvement is to make better indenting
      - if(-d $indir) was unnecessary
      - when you do a readdir, this returns only the names (not full paths) and this will include any directories (including the . and .. ones!). It is common to use a grep to filter out the stuff that you don't want.
      - always check whether any kind of file operation succeeded or not
      - declare variables when you actually use them the first time.
      I didn't actually run this so excuse me if I made a mistake.
      #!/usr/bin/perl use strict; use warnings; my $indir = 'C:/input'; my $outdir ='C:/output'; opendir(DIR, $indir) or die "can't open directory $indir $!"; foreach my $file (grep{-f "$indir/$_"}readdir DIR) { open IN, '<', "$indir/$file" or die "can't open $indir/$file $!"; my $new= "$outdir/$file"; open OUT, '>', $new or die "can't open $new for output $!"; while (my $string = <IN>) { undef ($/); while ($string =~m/(FINDINGS|COMPLICATIONS)(:)(.*?)(^[A-Z])/sgm +) { print "processing $file\n"; print OUT "$1$2\t$3"; } } close IN; close OUT; } closedir(DIR);
      update: these "close" statements aren't strictly necessary, all file handles will get closed when your program exists. When you open IN for the next file, this automatically closes the current IN file (if there is one). exit() wasn't necessary, so I took it out.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://950971]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2024-04-25 05:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found