Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Loop to merge every two columns

by Renyulb28 (Novice)
on May 11, 2011 at 20:14 UTC ( [id://904239]=perlquestion: print w/replies, xml ) Need Help??

Renyulb28 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, the dilemma that we are having right now is with figuring out how to merge every two columns within the data file. The file is as follows; first column is the ID, and every column after that is a marker, except two markers make a whole, so that is what we want. These are space delimited. The colleague who knows python only suggested a long difficult method of writing out each combination (ie 1+2, 3+4, etc.) and with R it is even more problematic.

3851 A A G G T T ... 3854 A A G G T T ...

The output for the above example would look like this:

3851 AA GG TT ... 3854 AA GG TT ...

Is there a simple Perl script available for this task?

Replies are listed 'Best First'.
Re: Loop to merge every two columns
by wind (Priest) on May 11, 2011 at 20:30 UTC
    use strict; while (<DATA>) { s/( \w) /$1/g; print; } __DATA__ 3851 A A G G T T 3854 A A G G T T

    Or as a one liner

    perl -pi.bak -e 's/( \w) /$1/g' file.dat

    And if you want to use "newer" regex features, the following is shorter:

    s/ \w\K //g;
      Note that those "newer" regex features are present in all versions of Perl that aren't "end-of-life". The last major version of Perl that didn't have those was 5.8, which dates from July 2002 - less than a year after the release of IE 6.

        Yes, I provide some programming work for a couple non-profits that run perl 5.8 and perl 5.10 respectively. As a consequence, I don't always get to utilize the latest and greatest.

        And even though the 5.10 EOL was announced, I find that a lot of posters aren't always on a current version of perl. Therefore sometimes it's easier to just assume they don't have access to certain features.

        I definitely appreciate the \K marker though, and use it plenty now to clean up regex's when I can.

Re: Loop to merge every two columns
by LanX (Saint) on May 11, 2011 at 20:33 UTC
    DB<104> $a="3851 A A G G T T" DB<105> $a=~s/([ACGT]) ([ACGT])/$1$2/g DB<106> p $a 3851 AA GG TT

    Cheers Rolf

    PS:

    > The colleague who knows python only suggested a long difficult method of writing out each combination (ie 1+2, 3+4, etc.)...

    This pretty much reflects the impression I have when reading in python boards... :)

Re: Loop to merge every two columns
by johngg (Canon) on May 11, 2011 at 22:57 UTC

    A solution using regular expressions and substitution, as already shown, is probably best but just to show another way -

    knoppix@Microknoppix:~$ perl -E ' > $_ = q{3851 A A G G T T}; > say; > @e = split m{( )}; > $_ = join q{}, @e[ grep { ( $_ + 1 ) % 4 } 0 .. $#e ]; > say;' 3851 A A G G T T 3851 AA GG TT knoppix@Microknoppix:~$

    I hope this is of interest.

    Cheers,

    JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://904239]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2024-04-24 12:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found