Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re^2: Comparing each line of a file to itself

by bliako (Monsignor)
on Jan 13, 2019 at 20:27 UTC ( #1228488=note: print w/replies, xml ) Need Help??

in reply to Re: Comparing each line of a file to itself
in thread Comparing each line of a file to itself

 60 characters per line, that's 480 bits

why 60x8=480bits when 1 character = [ATGC] = 2 bits?

Replies are listed 'Best First'.
Re^3: Comparing each line of a file to itself
by kschwab (Vicar) on Jan 13, 2019 at 20:59 UTC

    Well, yes, a content aware solution could mush down to 2 bits per character. I was proposing, though, something more memory efficient than $SEEN{$_}++.

    I don't know much about DNA, but googling around a bit, "one DNA sequence per line" could mean 237, 373, etc, characters per line. 373*2= 746, so an MD5 hash could still be significantly smaller.

    Also, I don't know if OP's file format has comments or other things besides A/T/G/C.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1228488]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2022-05-21 09:53 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (76 votes). Check out past polls.