Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Sorting a text file

by GrandFather (Saint)
on Mar 15, 2007 at 09:30 UTC ( [id://604949]=note: print w/replies, xml ) Need Help??


in reply to Sorting a text file

There are many ways to do this trick, but they all depend on pulling out the sort key (in this case the second column). A simple way for small quantites of data is to use a hash:

use strict; use warnings; my %data; while (<DATA>) { my $key = /^\d+\s+(\d+)/ ? $1 : next; $data{$key} = $_; } print $data{$_} for sort keys %data; __DATA__ 77876 8543 CA84985 54E 77873 8003 CA84985 54E 77875 7725 CA84985 54E 77872 8511 CA84985 54E 77873 8123 CA84985 54E 77822 9908 CA84985 54E 77819 8503 CA84985 54E 77826 8040 CA84985 54E 77822 7874 CA84985 54E 77884 8543 CA84985 54E 77809 7211 CA84985 54E

Prints:

77809 7211 CA84985 54E 77875 7725 CA84985 54E 77822 7874 CA84985 54E 77873 8003 CA84985 54E 77826 8040 CA84985 54E 77873 8123 CA84985 54E 77819 8503 CA84985 54E 77872 8511 CA84985 54E 77884 8543 CA84985 54E 77822 9908 CA84985 54E

DWIM is Perl's answer to Gödel

Replies are listed 'Best First'.
Re^2: Sorting a text file
by prasadbabu (Prior) on Mar 15, 2007 at 09:45 UTC

    GrandFather,

    I think here we cannot use hash directly as you did, because the second column has duplicates like '8543' which'll fail.

    In your input 11 lines are present but in your output only 10 lines present.

    If I am wrong, please correct me.

    Prasad

      You are quite right. However if you have duplicate keys how do you expect the sort to arrange those lines? Do you fall back to a secondary key, or does it not matter, or do you retain the original file order? You could for example rely on sort's stability in recent versions of Perl to retain the lines with identical keys in file order:

      use strict; use warnings; print sort {substr ($a, 9, 4) cmp substr ($b, 9, 4)} <DATA> __DATA__ ...

      Prints (using the original data):

      77809 7211 CA84985 54E 77875 7725 CA84985 54E 77822 7874 CA84985 54E 77873 8003 CA84985 54E 77826 8040 CA84985 54E 77873 8123 CA84985 54E 77819 8503 CA84985 54E 77872 8511 CA84985 54E 77876 8543 CA84985 54E 77884 8543 CA84985 54E 77822 9908 CA84985 54E

      DWIM is Perl's answer to Gödel
        This seems to work with digits as shows above, however, I couldn't make it work for the following while sorting with the 2nd word (combination)-
        _DATA_ CCCCCC-NF-33 LONDON_JAM_2AAA END-ONE_2TWO DDDDDD-NF_52 VENICE_CCC_1ZZZ WHAT_WHEN-1WHY KKKKK-INF_44 JAMAICA_AAA_3TTT HOW-WHERE_3WHAT AAAAA-INF-32 LONDON_JAM_2AAA END-ONE_2TWO BBBBB-INF-12 JAMAICA_AAA_3TTT WHAT_WHEN-1WHY VVVVV-INF_24 VENICE_CCC_1ZZZ END-ONE_2TWO
        Any suggestion?
      You could turn GrandFather's approach around and swap keys and values in the hash:
      my %data; ( $data{ $_}) = /^\d+\s+(\d+)/ while <DATA>; print for sort { $data{ $a} cmp $data{ $b} } keys %data;
      This is essentially a variant of the Schwartz Transform, using a hash in place of an array to keep the sort field(s).

      Anno

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://604949]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-19 02:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found