Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Sorting characters within a string

by kjherron (Pilgrim)
on Aug 24, 2001 at 04:13 UTC ( [id://107569]=note: print w/replies, xml ) Need Help??


in reply to Sorting characters within a string

I can think of a couple other ways to do it, but they're both worse than yours unless you're having performance problems:

1) Generate every possible string and its sorted version, storing them in a hash with the unsorted string as the key & the sorted string as the value. There's only, what, 45 possible strings? That's doable.

2) Split the string into characters, count the number of each character, then output the characters in order based on the counts. This is O(n) so it'd be a win if your strings were really long, but it's just overkill for these short strings.

If performance is a problem, a fairly painless thing to do is cache the sorted strings as you calculate them:

if (!exists $sort_cache{$bases}) { $sort_cache{$bases} = join( '', sort split('', $bases)); } return $sort_cache{$bases};
This is of course just a lazy variant on #1 above.

Replies are listed 'Best First'.
Re: Re: Sorting characters within a string
by jlongino (Parson) on Aug 24, 2001 at 05:05 UTC
    I think you're right that it would be more work initially for the "all possibilites" hash. I'm no mathematician/statistican but I think there are more like 5! = 120 possibilities (and I certainly wouldn't want to build that hash by hand). Tilly, you're a mathematician. What are the correct number of possibilities?

    Building the hash programmatically would be an interesting brain teaser.

    Update: This was assuming string lengths of up to 5.

    If the code and the comments disagree, then both are probably wrong. -- Norm Schryer

      Are duplicates allowed? If so then the correct number for 1 is 5, for 2 is 5*5=25, for 3 is 5*5*5=125, and for 4 is 5*5*5*5=625. For all strings of length 2-4 that comes out to a grand total of 775.

      Were I autogenerating, my approach might be as follows (untested):

      { my @c = qw(A T C G N); my @strings = @c; foreach (1..5) { foreach (@strings) { $sorted_str{$string} = join '', sort, split //; } @strings = map { my $string = $_; map $string.$_, @c; } @strings; } }
      Note that the nested map will be much slower than you think if you are pre 5.6.1. Personally I would be inclined to use the Orcish (for "Or Cache") maneuver for this:
      $bases = $sorted{$bases} ||= join '', sort, split //, $bases;
      Building the hash programmatically would be ani nteresting brain teaser.

      Here is the worst way to do it:
      my @strings = (grep /[acgmt]{2}/, ('aa' .. 'tt'), grep /[acgmt]{3}/, ('aaa' .. 'ttt'), grep /[acgmt]{4}/, ('aaaa' .. 'tttt')); my %sort_cache; for my $key (@strings) { $sort_cache{$key} = join '',sort split('',$key); }

      Hey, don't take this seriously ;-) it does the job but it's so inefficient it's scary.
      Guillaume
        Guillaume,

        I think one of the assumptions (although not clearly stated) is that no string has repeated characters in it.

        Very inventive code though!

        Update: Maybe a regexp to eliminate any string with duplicate letters. As though things weren't bad enough :)

        If the code and the comments disagree, then both are probably wrong. -- Norm Schryer

      Boy, I really suck at this. One more try assuming strings of length 2-4:

      length of 2: 5 . 4 = 20
      length of 3: 5 . 4 . 3 = 60
      length of 4: 5 . 4 . 3 . 2 = 120
      total of 20 + 60 + 120 = 200 possibilities.

      If the code and the comments disagree, then both are probably wrong. -- Norm Schryer

Re: Re: Sorting characters within a string
by dga (Hermit) on Aug 24, 2001 at 23:42 UTC

    Nice Idea to precompute the values.

    I got 3901 which represents the entire set of 2-4 letter long unsorted inputs in this alphabet. This of course folds to a very small number of sorted outcomes.

    Here is the code.

    #!/usr/bin/perl use strict; use warnings; my(%pp); my(@acgnt)=( ' ', 'A', 'C', 'G', 'N', 'T' ); my($i); for($i=11;$i<100000;$i++) { my($s, $o, @s); while($i =~ /6/) { $o=index(reverse($i),'6'); $i+=5*10**$o; } $s=sprintf "%04d", $i; @s=split('',$s); @s = map { $acgnt[$_] } @s; $s=join('', @s); $s =~ y/ //d; $pp{$s}=join('', sort(@s)); } #print out the lookup table (not really part of the initializer) my($k, $v); while(($k,$v)=each %pp) { print "$k = $v\n"; }

    This creates a complete list of inputs you could obtain and builds a hash with the outputs you want to display. It does this fairly quickly and would only have to be done at startup time and then your print statement would bacically be print "$pp{$_}\n";

    This could be made into an initializer function or the values could be computed and saved out and then read in for execution of the real program.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://107569]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (5)
As of 2024-03-29 14:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found