Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: doubt in storing a data of 2 lines in an array.

by johngg (Canon)
on Oct 30, 2006 at 14:30 UTC ( [id://581302]=note: print w/replies, xml ) Need Help??


in reply to doubt in storing a data of 2 lines in an array.

davorg's reply suggested one way to approach the problem would be to read the whole file into memory then parse with regular expressions. The following script shows one possible way of doing this using two stages, the first to break into records and the second to break each record into fields. Here it is

use strict; use warnings; my $rxRecord = qr {(?xs) (ENTRY.*?\n) (?=ENTRY|\z) }; my $rxFieldHdrs = qr{(?:ENTRY|TITLE|ORGANISM|ACCESSIONS)}; my $rxField = qr {(?xs) ($rxFieldHdrs.*?\n) (?=$rxFieldHdrs|\z) }; my $fileText; { local $/; $fileText = <DATA>; } my @records = $fileText =~ m{$rxRecord}g; foreach my $record (@records) { print qq{$record}, q{+} x 50, qq{\n}; my @fields = $record =~ m{$rxField}g; foreach my $field (@fields) { print qq{$field}, q{-} x 50, qq{\n}; } print q{*} x 50, qq{\n}; } __END__ ENTRY CCHU #type complete TITLE cytochrome c [validated] - human Homo sapiens ORGANISM #formal_name Homo sapiens #common_name man ACCESSIONS A31764; A05676; I55192; A00001 MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGMIYARAJLFGRKTSEKGQAPGYSYTAANKN +KGIIWGEDTLMEYLENPKKYIP ENTRY CCCZ #type complete TITLE cytochrome c - chimpanzee (tentative sequence) ORGANISM #formal_name Pan troglodytes #common_name chimpanzee ACCESSIONS A00002 GDVEKGKKIFIMKCSQCHTSEKVEKGSSSKHKSSSTGPNLHGLMIYARAJFGRKTGSEKQAPGYSYTAAN +KNKGIIWGED ENTRY CCMQR #type complete TITLE cytochrome c - rhesus macaque (tentative sequence) Macaca mulatta ORGANISM #formal_name Macaca mulatta #common_name rhesus macaq +ue ACCESSIONS A00003 GDVEKGKKIFIMKCSQSEKCHTVEKGGSSSSKHKTGPNLHGSSEKEMIYARAJKSEKLFGAAAAAAAARK +TGQAPGYSYTAANKSSSSNKGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE ENTRY CCMKP #type complete TITLE cytochrome c - spider monkey ORGANISM #formal_name Ateles sp. #common_name spider monkey ACCESSIONS A00004 GDVFKGKRIFIMKCSQCHTVESSSSKGGKHKTGPNLHGLMIYARAJSEKFGSSSSSSSSSSR

and here is the output showing for each record the whole record then each individual field. As you can see, your two-line title is preserved.

ENTRY CCHU #type complete TITLE cytochrome c [validated] - human Homo sapiens ORGANISM #formal_name Homo sapiens #common_name man ACCESSIONS A31764; A05676; I55192; A00001 MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGMIYARAJLFGRKTSEKGQAPGYSYTAANKN +KGIIWGEDTLMEYLENPKKYIP ++++++++++++++++++++++++++++++++++++++++++++++++++ ENTRY CCHU #type complete -------------------------------------------------- TITLE cytochrome c [validated] - human Homo sapiens -------------------------------------------------- ORGANISM #formal_name Homo sapiens #common_name man -------------------------------------------------- ACCESSIONS A31764; A05676; I55192; A00001 MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGMIYARAJLFGRKTSEKGQAPGYSYTAANKN +KGIIWGEDTLMEYLENPKKYIP -------------------------------------------------- ************************************************** ENTRY CCCZ #type complete TITLE cytochrome c - chimpanzee (tentative sequence) ORGANISM #formal_name Pan troglodytes #common_name chimpanzee ACCESSIONS A00002 GDVEKGKKIFIMKCSQCHTSEKVEKGSSSKHKSSSTGPNLHGLMIYARAJFGRKTGSEKQAPGYSYTAAN +KNKGIIWGED ++++++++++++++++++++++++++++++++++++++++++++++++++ ENTRY CCCZ #type complete -------------------------------------------------- TITLE cytochrome c - chimpanzee (tentative sequence) -------------------------------------------------- ORGANISM #formal_name Pan troglodytes #common_name chimpanzee -------------------------------------------------- ACCESSIONS A00002 GDVEKGKKIFIMKCSQCHTSEKVEKGSSSKHKSSSTGPNLHGLMIYARAJFGRKTGSEKQAPGYSYTAAN +KNKGIIWGED -------------------------------------------------- ************************************************** ENTRY CCMQR #type complete TITLE cytochrome c - rhesus macaque (tentative sequence) Macaca mulatta ORGANISM #formal_name Macaca mulatta #common_name rhesus macaq +ue ACCESSIONS A00003 GDVEKGKKIFIMKCSQSEKCHTVEKGGSSSSKHKTGPNLHGSSEKEMIYARAJKSEKLFGAAAAAAAARK +TGQAPGYSYTAANKSSSSNKGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE ++++++++++++++++++++++++++++++++++++++++++++++++++ ENTRY CCMQR #type complete -------------------------------------------------- TITLE cytochrome c - rhesus macaque (tentative sequence) Macaca mulatta -------------------------------------------------- ORGANISM #formal_name Macaca mulatta #common_name rhesus macaq +ue -------------------------------------------------- ACCESSIONS A00003 GDVEKGKKIFIMKCSQSEKCHTVEKGGSSSSKHKTGPNLHGSSEKEMIYARAJKSEKLFGAAAAAAAARK +TGQAPGYSYTAANKSSSSNKGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE -------------------------------------------------- ************************************************** ENTRY CCMKP #type complete TITLE cytochrome c - spider monkey ORGANISM #formal_name Ateles sp. #common_name spider monkey ACCESSIONS A00004 GDVFKGKRIFIMKCSQCHTVESSSSKGGKHKTGPNLHGLMIYARAJSEKFGSSSSSSSSSSR ++++++++++++++++++++++++++++++++++++++++++++++++++ ENTRY CCMKP #type complete -------------------------------------------------- TITLE cytochrome c - spider monkey -------------------------------------------------- ORGANISM #formal_name Ateles sp. #common_name spider monkey -------------------------------------------------- ACCESSIONS A00004 GDVFKGKRIFIMKCSQCHTVESSSSKGGKHKTGPNLHGLMIYARAJSEKFGSSSSSSSSSSR -------------------------------------------------- **************************************************

I hope this is of use

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^2: doubt in storing a data of 2 lines in an array.
by Anonymous Monk on Oct 30, 2006 at 14:53 UTC
    hi john, thank ya for the reply. the program works out very well, and the coding was really smart, i just need to clarify one last doubt of mine, ie., i am not able to print the TITLE's content in the same line, its printing in 2 lines watever i do. plz reply. thank u once again.
      You could either do a global substitution something like $field =~ s{\n}{ }g to replace any newline with a space or you could achieve the same thing with split and join, something like $field = join q{ }, split m{\n}, $field;. In each case you are going to have to handle a big gap in your line because of the indentation of the second line of the title. However, this post should give you enough clues about s{this}{the other} to solve that for yourself. Big hint, \s+ means one or more white-space characters.

      Best of luck,

      JohnGG

        hey john, i got it..thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://581302]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-23 23:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found