Re: doubt in storing a data of 2 lines in an array.

davorg's reply suggested one way to approach the problem would be to read the whole file into memory then parse with regular expressions. The following script shows one possible way of doing this using two stages, the first to break into records and the second to break each record into fields. Here it is

use strict;
use warnings;

my $rxRecord = qr
   {(?xs)
       (ENTRY.*?\n)
       (?=ENTRY|\z)
   };

my $rxFieldHdrs = qr{(?:ENTRY|TITLE|ORGANISM|ACCESSIONS)};
my $rxField = qr
   {(?xs)
       ($rxFieldHdrs.*?\n)
       (?=$rxFieldHdrs|\z)
   };

my $fileText;
{
    local $/;
    $fileText = <DATA>;
}

my @records = $fileText =~ m{$rxRecord}g;

foreach my $record (@records)
{
    print qq{$record}, q{+} x 50, qq{\n};
    my @fields = $record =~ m{$rxField}g;
    foreach my $field (@fields)
    {
        print qq{$field}, q{-} x 50, qq{\n};
    }
    print q{*} x 50, qq{\n};
}

__END__
ENTRY            CCHU       #type complete
TITLE            cytochrome c [validated] - human
                 Homo sapiens
ORGANISM         #formal_name Homo sapiens #common_name man
ACCESSIONS       A31764; A05676; I55192; A00001
MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGMIYARAJLFGRKTSEKGQAPGYSYTAANKN
+KGIIWGEDTLMEYLENPKKYIP
ENTRY            CCCZ       #type complete
TITLE            cytochrome c - chimpanzee (tentative sequence)
ORGANISM         #formal_name Pan troglodytes #common_name chimpanzee
ACCESSIONS       A00002
GDVEKGKKIFIMKCSQCHTSEKVEKGSSSKHKSSSTGPNLHGLMIYARAJFGRKTGSEKQAPGYSYTAAN
+KNKGIIWGED
ENTRY            CCMQR      #type complete
TITLE            cytochrome c - rhesus macaque (tentative sequence)
                 Macaca mulatta 
ORGANISM         #formal_name Macaca mulatta #common_name rhesus macaq
+ue
ACCESSIONS       A00003
GDVEKGKKIFIMKCSQSEKCHTVEKGGSSSSKHKTGPNLHGSSEKEMIYARAJKSEKLFGAAAAAAAARK
+TGQAPGYSYTAANKSSSSNKGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE
ENTRY            CCMKP      #type complete
TITLE            cytochrome c - spider monkey
ORGANISM         #formal_name Ateles sp. #common_name spider monkey
ACCESSIONS       A00004
GDVFKGKRIFIMKCSQCHTVESSSSKGGKHKTGPNLHGLMIYARAJSEKFGSSSSSSSSSSR
[download]

and here is the output showing for each record the whole record then each individual field. As you can see, your two-line title is preserved.

ENTRY            CCHU       #type complete
TITLE            cytochrome c [validated] - human
                 Homo sapiens
ORGANISM         #formal_name Homo sapiens #common_name man
ACCESSIONS       A31764; A05676; I55192; A00001
MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGMIYARAJLFGRKTSEKGQAPGYSYTAANKN
+KGIIWGEDTLMEYLENPKKYIP
++++++++++++++++++++++++++++++++++++++++++++++++++
ENTRY            CCHU       #type complete
--------------------------------------------------
TITLE            cytochrome c [validated] - human
                 Homo sapiens
--------------------------------------------------
ORGANISM         #formal_name Homo sapiens #common_name man
--------------------------------------------------
ACCESSIONS       A31764; A05676; I55192; A00001
MGDVEKGKKIFIMKCSQCHTVEMGDVEKGGKHKTGPNLHGMIYARAJLFGRKTSEKGQAPGYSYTAANKN
+KGIIWGEDTLMEYLENPKKYIP
--------------------------------------------------
**************************************************
ENTRY            CCCZ       #type complete
TITLE            cytochrome c - chimpanzee (tentative sequence)
ORGANISM         #formal_name Pan troglodytes #common_name chimpanzee
ACCESSIONS       A00002
GDVEKGKKIFIMKCSQCHTSEKVEKGSSSKHKSSSTGPNLHGLMIYARAJFGRKTGSEKQAPGYSYTAAN
+KNKGIIWGED
++++++++++++++++++++++++++++++++++++++++++++++++++
ENTRY            CCCZ       #type complete
--------------------------------------------------
TITLE            cytochrome c - chimpanzee (tentative sequence)
--------------------------------------------------
ORGANISM         #formal_name Pan troglodytes #common_name chimpanzee
--------------------------------------------------
ACCESSIONS       A00002
GDVEKGKKIFIMKCSQCHTSEKVEKGSSSKHKSSSTGPNLHGLMIYARAJFGRKTGSEKQAPGYSYTAAN
+KNKGIIWGED
--------------------------------------------------
**************************************************
ENTRY            CCMQR      #type complete
TITLE            cytochrome c - rhesus macaque (tentative sequence)
                 Macaca mulatta 
ORGANISM         #formal_name Macaca mulatta #common_name rhesus macaq
+ue
ACCESSIONS       A00003
GDVEKGKKIFIMKCSQSEKCHTVEKGGSSSSKHKTGPNLHGSSEKEMIYARAJKSEKLFGAAAAAAAARK
+TGQAPGYSYTAANKSSSSNKGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE
++++++++++++++++++++++++++++++++++++++++++++++++++
ENTRY            CCMQR      #type complete
--------------------------------------------------
TITLE            cytochrome c - rhesus macaque (tentative sequence)
                 Macaca mulatta 
--------------------------------------------------
ORGANISM         #formal_name Macaca mulatta #common_name rhesus macaq
+ue
--------------------------------------------------
ACCESSIONS       A00003
GDVEKGKKIFIMKCSQSEKCHTVEKGGSSSSKHKTGPNLHGSSEKEMIYARAJKSEKLFGAAAAAAAARK
+TGQAPGYSYTAANKSSSSNKGITWGEDTLMEYLENPKKYIPGTKMIFVGIKKKEE
--------------------------------------------------
**************************************************
ENTRY            CCMKP      #type complete
TITLE            cytochrome c - spider monkey
ORGANISM         #formal_name Ateles sp. #common_name spider monkey
ACCESSIONS       A00004
GDVFKGKRIFIMKCSQCHTVESSSSKGGKHKTGPNLHGLMIYARAJSEKFGSSSSSSSSSSR
++++++++++++++++++++++++++++++++++++++++++++++++++
ENTRY            CCMKP      #type complete
--------------------------------------------------
TITLE            cytochrome c - spider monkey
--------------------------------------------------
ORGANISM         #formal_name Ateles sp. #common_name spider monkey
--------------------------------------------------
ACCESSIONS       A00004
GDVFKGKRIFIMKCSQCHTVESSSSKGGKHKTGPNLHGLMIYARAJSEKFGSSSSSSSSSSR
--------------------------------------------------
**************************************************
[download]

I hope this is of use

Cheers,

JohnGG

Comment on Re: doubt in storing a data of 2 lines in an array. Select or Download Code

Replies are listed 'Best First'.
Re^2: doubt in storing a data of 2 lines in an array. by Anonymous Monk on Oct 30, 2006 at 14:53 UTC
hi john, thank ya for the reply. the program works out very well, and the coding was really smart, i just need to clarify one last doubt of mine, ie., i am not able to print the TITLE's content in the same line, its printing in 2 lines watever i do. plz reply. thank u once again.	[reply]
Re^3: doubt in storing a data of 2 lines in an array. by johngg (Canon) on Oct 30, 2006 at 15:09 UTC
You could either do a global substitution something like `$field =~ s{\n}{ }g` to replace any newline with a space or you could achieve the same thing with `split` and `join`, something like `$field = join q{ }, split m{\n}, $field;`. In each case you are going to have to handle a big gap in your line because of the indentation of the second line of the title. However, this post should give you enough clues about `s{this}{the other}` to solve that for yourself. Big hint, `\s+` means one or more white-space characters. Best of luck, JohnGG	[reply] [d/l] [select]
Re^4: doubt in storing a data of 2 lines in an array. by Anonymous Monk on Oct 31, 2006 at 10:22 UTC
hey john, i got it..thanks.	[reply]


Keep It Simple, Stupid
	PerlMonks