Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: pattern matching

by Xiong (Hermit)
on Feb 10, 2012 at 11:49 UTC ( [id://953001]=note: print w/replies, xml ) Need Help??


in reply to pattern matching

I'm not sure you've given enough information. We need a broader selection of example inputs and outputs.

For instance, here's a perfectly valid script that does exactly what you demand -- no more, no less:

#!/usr/bin/perl # xyzatgc.pl # = Copyright 2011 Xiong Changnian <xiong@cpan.org> = # = Free Software = Artistic License 2.0 = NO WARRANTY = use 5.014002; use strict; use warnings; #~ use Devel::Comments '#####', ({ -file => 'debug.log' }); #--------------------------------------------------------------------- +-------# # Pass input filename on command line: $ xyzatgc.pl infile.txt my $in_filename = shift; # Construct output filename: infile.txt => infile.out $in_filename =~ /([^.]+)\.txt/; my $out_filename = $1 . q|.out|; # Slurp in entire input file. my $indata ; { open my $in_fh, '<', $in_filename or die "Couldn't open $in_filename for reading"; local $/ = undef; # slurp $indata = <$in_fh>; close $in_fh or die "Couldn't close $in_filename"; }; # Substitute as required. $indata =~ s/XYZATGC/XYZ/g; # Write results to output file. open my $out_fh, '>', $out_filename or die "Couldn't open $out_filename for writing"; say {$out_fh} $indata; close $out_fh or die "Couldn't close $out_filename"; # Terminate. say 'Done.'; __END__

Input:

XYZATGC XYZATGC XYZATGC XYZATGC XYZATGC XYZ xyz foo ATGC atgc JAPHATGC XYZATGC XYZATGCXYZATGCXYZATGC

Output:

XYZ XYZ XYZ XYZ XYZ XYZ xyz foo ATGC atgc JAPHATGC XYZ XYZXYZXYZ

Now I'm going to wager that's not quite what you want. Please don't try to explain in English words what you'd rather see. Instead, show us a fuller example of input and output.

We'll see what we can do.

I'm not the guy you kill, I'm the guy you buy. —Michael Clayton

Replies are listed 'Best First'.
Re^2: pattern matching
by Anonymous Monk on Feb 10, 2012 at 13:01 UTC

    hi this is not wat i wanted.
    $indata =~ s/XYZATGC/XYZ/g; i dont knw what string will be there after ATGC. in a string like this XYZATGCCVFGBGVFCD... as soon as ATGC is found at a particular position it should trim the entire part (including ATGC and the rest that follows it and show only the string which is XYZ and write it to new file. am i clear now


    here are some exammples
    input

    XYZATGCACGTGFVGFCCV.......
    YZXCVFDCXZATGCXCCXZZSDD
    output
    XYZ new file1.txt
    YZXCVFDCXZ new file2.txt

      - # Substitute as required. - $indata =~ s/XYZATGC/XYZ/g; + # This is not terribly efficient. + my @outdata = split q|ATGC|, $indata; + $indata = $outdata[0];
      I'm not the guy you kill, I'm the guy you buy. —Michael Clayton

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://953001]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-04-19 11:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found