Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: list of unique strings, also eliminating matching substrings

by BrowserUk (Patriarch)
on May 21, 2011 at 03:16 UTC ( [id://906028]=note: print w/replies, xml ) Need Help??


in reply to list of unique strings, also eliminating matching substrings

I have hundreds of sets of strings, each containing about 100,000 strings, and each string is about 300 characters long.

A few questions:

  1. You want to eliminate the dups in each of the files? Or across all of the files?
  2. What (roughly) are the maximum and minimum lengths of the strings?
  3. Do they consist soley of ACGT or are the other characters (X N etc.)?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: list of unique strings, also eliminating matching substrings

Replies are listed 'Best First'.
Re^2: list of unique strings, also eliminating matching substrings
by lindsay_grey (Novice) on May 21, 2011 at 03:44 UTC

    1. I want to eliminate the duplicates within each file.

    2. The strings range from 200 to 400 characters.

    3. The complete alphabet is A, G, C, T, N.

    Note for point 1. I want to eliminate not just the exact duplicates but also those that are contained within a longer string.

      Presumably you've code the obvious two loops method and it is taking too long. Could you supply a timing for one of your datasets?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        7 hours! for about 200,000 starting strings.

        7 hours! starting with about 200,00 strings. Thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://906028]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2024-04-19 05:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found