[Bioperl-l] Entrez Gene ASN.1 solution
Mark Lambrecht
mark_lambrecht at yahoo.com
Wed Apr 13 05:09:49 EDT 2005
We have developed our own interface to the NCBI
Entrez Gene ASN.1 flat files. We needed this
internally to replace the bioperl LocusLink parser.
Because we have used so many great bioperl code over
the last years, we had hoped that people can benefit
from our work. This system has already proven its
value , at least for us.
The module consists of the following objects:
=> Bio::_GeneData.pm : abstract engine for
parsing "type blocks"
within the NCBI ASN.1 files
=> Bio::Gene.pm :Entrez Gene object (replaces the
Bioperl sequence
object that is normally returned by an IO object) and
only keeps
relevant data, can easily be extended to map
additional needed data
using the GeneData engine
=> Bio::GeneIO.pm : iterator derived from RootIO
(similar to the
SeqIO objects); implements next_gene method.
subdirectory Index with
=> Bio::Index::EntrezGene.pm : object with
capability to index and
consult an ASN.1 File, inherits from
Bio::Index::Abstract
test scripts will be committed too :
=> few small test records (with extension asn1)
=> t_gene_indexer.pl : test file to index asn.1
file and return
an example record
#example:
my $file = "gene_hs.asn1";
my $inx = Bio::Index::EntrezGene->new(
'-filename' =>
$file.".inx", '-write_flag' => 'WRITE');
$inx->make_index("/usr/local/datasets/ncbi/gene/$file");
=> testGene.pl : tests a Gene objects for return
of appropriate
data fields
#example for only extracting track info from
the asn1 file,
this is a dynamic way of choosing which data to parse
my $track_info = new Bio::Gene::GeneTrack;
$track_info->geneid(1);
$gene->type('test_type');
$gene->track_info($track_info);
print "dump:\n".Dumper($gene)."\n";
Stefan Kirov and Mingyi Liu have produced similar
solutions (wich we didn't test); we believe that ours
is different because it is a all-in-one lightweight
Entrez Gene ASN1 parser that will only capture
essential data (thereby making it rather fast). We
deliberately didn't choose to map the data on a Seq
object. At the same time, a bioperl-compliant indexer
has been written.
We hope that this code can somehow be useful.
We will commit the code to bioperl cvs if people
agree, as soon as we obtain a login.
Kris Ulens (bioinformatics software developer)
Mark Lambrecht (scientist bioinformatics)
Galapagos Genomics
http://www.galapagosgenomics.com
__________________________________
Yahoo! Mail Mobile
Take Yahoo! Mail with you! Check email on your mobile phone.
http://mobile.yahoo.com/learn/mail
More information about the Bioperl-l
mailing list