[Bioperl-l] Entrez Gene ASN.1 solution

Mark Lambrecht mark_lambrecht at yahoo.com
Wed Apr 13 05:09:49 EDT 2005


We have developed our own interface to the NCBI
Entrez Gene ASN.1 flat files. We needed this
internally to replace the bioperl LocusLink parser.
Because we have used so many great bioperl code over
the last years, we had hoped that people can benefit
from our work. This system has already proven its
value , at least for us.

The module consists of the following objects:

     => Bio::_GeneData.pm : abstract engine for
parsing "type blocks"
 within the NCBI ASN.1 files
     => Bio::Gene.pm :Entrez Gene object (replaces the
Bioperl sequence
 object that is normally returned by an IO object) and
only keeps
 relevant data, can easily be extended to map
additional needed data
 using the GeneData engine
     => Bio::GeneIO.pm : iterator derived from RootIO
(similar to the
 SeqIO objects); implements next_gene method.
 	
     subdirectory Index with
        => Bio::Index::EntrezGene.pm : object with
capability to index and
 consult an ASN.1 File, inherits from
Bio::Index::Abstract

     test scripts will be committed too :
     => few small test records (with extension asn1)
     => t_gene_indexer.pl : test file to index asn.1
file and return
 an example record

        #example:
        my $file = "gene_hs.asn1";

        my $inx = Bio::Index::EntrezGene->new(
'-filename'   =>
 $file.".inx", '-write_flag' => 'WRITE');
       
$inx->make_index("/usr/local/datasets/ncbi/gene/$file");
     => testGene.pl : tests a Gene objects for return
of appropriate 
 data fields

        #example for only extracting track info from
the asn1 file,
 this is a dynamic way of choosing which data to parse
        my $track_info = new Bio::Gene::GeneTrack;

        $track_info->geneid(1);
        $gene->type('test_type');
        $gene->track_info($track_info);
        print "dump:\n".Dumper($gene)."\n";

Stefan Kirov and Mingyi Liu have produced similar
solutions (wich we didn't test); we believe that ours
is different because it is a all-in-one lightweight
Entrez Gene ASN1 parser that will only capture
essential data (thereby making it rather fast). We
deliberately didn't choose to map the data on a Seq
object. At the same time, a bioperl-compliant indexer
has been written. 
We hope that this code can somehow be useful.

We will commit the code to bioperl cvs if people
agree, as soon as we obtain a login.

 Kris Ulens (bioinformatics software developer)
 Mark Lambrecht (scientist bioinformatics)

Galapagos Genomics
http://www.galapagosgenomics.com


		
__________________________________ 
Yahoo! Mail Mobile 
Take Yahoo! Mail with you! Check email on your mobile phone. 
http://mobile.yahoo.com/learn/mail 


More information about the Bioperl-l mailing list