Hi
I am not aware of the OBO data set , but i was just checking the data set and thought i can share some technical inputs on it.
Your requirement was clear enough to understand in that you wish to build a data structure like a hash of arrays, or any such data structure from which you can readily extract your data based on the id key.
However as i mentioned your data set has some interesting features. I hope the OBO Parser solves your problems, coz if you had to work with such data sets and write code from scratch it is hardly extensible.
But in any case, i was able to find a way to structure your dataset using the space delimiter and the '!' character delimiter. Actually if you consider it , it might not be a great approach, but again that is your data set :D
If you were to practically wish to achieve this you would need a thorough understanding of the data structures in Perl. But not to worry, you can read the docs and figure it out . perldata, perlreftut to begin with.
The key feature i would say i found that was needed for me to make an extraction , was the use of anonymous arrays and references.
Now even though you might say the code works, i can only surmise it is hardly extensible in case your requirement changes, and if you were asked to analyse a dataset of a million or so records, i think it is best you have someone use the standard module (like OBO::Parser )
Note - I created a file of your data in the OP and passed it as an argument to this script below
#!/usr/bin/perl
use strict;
my (%hash,$hash_id);
my $isa_array_ref;
open(my $fh,"<",$ARGV[0])
|| die "$0: can't open $ARGV[0] for reading: $!";
LINE: while(<$fh>){
chomp($_);
next LINE if ($_ eq "Term");
#split on first blank space
my @TermRow = split(/ /,$_,2);
if($TermRow[0] eq 'id:'){
$hash_id = $TermRow[1];
$isa_array_ref = undef;
}
elsif($TermRow[0] eq 'is_a:'){
my @TermISAText = split(/!/,$TermRow[1]);
#checking if anonymous array reference already exists
if($isa_array_ref){
my @temp_array = @{$isa_array_ref};
push(@temp_array,$TermISAText[1]);
$isa_array_ref = \@temp_array;
$hash{$hash_id} = $isa_array_ref;
}
else{
#creating an anonymous array reference
$isa_array_ref= [$TermISAText[1]];
$hash{$hash_id} = $isa_array_ref;
}
}
}
close($fh);
print "Result of Extraction:\n ";
my @id_keys = keys %hash;
foreach(@id_keys){
print "key : $_";
print "list of values \n";
foreach(@{$hash{$_}}){
print $_,"\n";
}
print "\n";
}
Output
XXXXXX:progs$ perl term_reader.pl ./term.txt
Result of Extraction:
key : HP:0000008list of values
Abnormal internal genitalia
Abnormality of the female genitalia
key : HP:0000007list of values
Mode of inheritance
The Great Programmer is one who inspires others to code, not just one who writes great code
|