Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^3: Reduce RAM required

by bliako (Monsignor)
on Jan 10, 2019 at 16:41 UTC ( [id://1228326]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Reduce RAM required
in thread Reduce RAM required

I have done an investigation of my own (on file ftp://ftp.ncbi.nih.gov/genomes/Homo_sapiens/CHR_20/hs_ref_GRCh38.p12_chr20.fa.gz).

Unless my method for calculating the frequencies is wrong or my script has a bug (quite likely actually), empirically at least, bases like to be next to some bases more than others, and the more they gather the more fussy they become. Almost a tenfold! Also note that this is for just 1 chromosome of HS. Maybe overall the differences even out which I don't believe.

CG => 0.0121 TT => 0.0902 TAC => 0.0098 TGA => 0.0195

Results when examining pairs of letters:

AA => 0.0876441581694274, AC => 0.050382172379123, AG => 0.0726163599915327, AT => 0.0680375702421899, CA => 0.0744871531142771, CC => 0.0586105336500266, CG => 0.0121528607767549, CT => 0.072435600501404, GA => 0.0616385773891699, GC => 0.0479870698325412, GG => 0.0597908940638241, GT => 0.0509971288041851, TA => 0.0549250240353684, TC => 0.0606918145805057, TG => 0.0758458048289572, TT => 0.0902412592104137,

Results when examining triplets of letters:

AAA => 0.0335770294488858, AAC => 0.0141652325290566, AAG => 0.0191070631163639, AAT => 0.0207919285470298, ACA => 0.0195277849575718, ACC => 0.0122296472539126, ACG => 0.00301071191243409, ACT => 0.015618504735689, AGA => 0.0218989913762551, AGC => 0.0155317569775754, AGG => 0.0196529454780131, AGT => 0.0155383292608308, ATA => 0.016470891588768, ATC => 0.0127097589265764, ATG => 0.0175833302904619, ATT => 0.0212709235696259, CAA => 0.0176484150372629, CAC => 0.0162805665608066, CAG => 0.022908155898834, CAT => 0.017649579543762, CCA => 0.0198202675145805, CCC => 0.0158003751274237, CCG => 0.00344220145072305, CCT => 0.0195463851572704, CGA => 0.00252629316093788, CGC => 0.00300099705684542, CGG => 0.00358382458359042, CGT => 0.00304285548223897, CTA => 0.0116349353800189, CTC => 0.01868181086631, CTG => 0.0222707082316364, CTT => 0.0198463333175886, GAA => 0.0196784370038435, GAC => 0.0100617509082632, GAG => 0.0187920401801322, GAT => 0.0131055794712754, GCA => 0.0158572126158678, GCC => 0.0139523511697284, GCG => 0.00302848260065376, GCT => 0.0151525107103092, GGA => 0.0176044828331722, GGC => 0.0139832983287465, GGG => 0.0158069633628229, GGT => 0.0123961876354327, GTA => 0.0100987758340782, GTC => 0.0100467877973565, GTG => 0.0167909713545918, GTT => 0.0140548755980838, TAA => 0.0167405625801087, TAC => 0.00987903505290584, TAG => 0.0118104408663679, TAT => 0.0164935276808539, TCA => 0.0192795855518179, TCC => 0.0166299982712662, TCG => 0.00266904889601673, TCT => 0.022118317401688, TGA => 0.0195997929347923, TGC => 0.015470963357463, TGG => 0.0207474061136176, TGT => 0.0200197809773843, TTA => 0.0167229833176149, TTC => 0.0192543333081449, TTG => 0.019199106986227, TTT => 0.035068794178565,

I will post the script I used shortly on the Meditations section as it may be of more general use.

bw, bliako

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1228326]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (1)
As of 2024-04-18 23:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found