[Bioperl-l] Entrez Gene solution : code available

Mark Lambrecht mark_lambrecht at yahoo.com
Mon Apr 25 05:45:27 EDT 2005


The below mail has been sent to the bioperl list April
13th.  In order that people can use and test the code,
we have decided to make it available.

You can download the code using the following link : 

http://users.pandora.be/akasha/BioPerl.tar.gz

Kris Ulens 
  bioinformatics software developer, Galapagos
Genomics
  tel.: + 32 (0) 486 683 532
  e-mail: fantom at earthling.net
  
 Mark Lambrecht, PhD
  scientist bioinformatics, Galapagos Genomics
  and 
K.U.Leuven, Faculty of Applied Bioscience and
Engineering 
  tel.: + 32 (0) 495 944 125
  e-mail: mark_lambrecht at yahoo.com


=====================================
Mail to bioperl-l list April 13th :
We have developed our own interface to the NCBI
Entrez Gene ASN.1 flat files. We needed this
internally to replace the bioperl LocusLink parser.
Because we have used so many great bioperl code over
the last years, we had hoped that people can benefit
from our work. This system has already proven its
value , at least for us.

The module consists of the following objects:

     => Bio::_GeneData.pm : abstract engine for
parsing "type blocks"
 within the NCBI ASN.1 files
     => Bio::Gene.pm :Entrez Gene object (replaces the
Bioperl sequence
 object that is normally returned by an IO object) and
only keeps
 relevant data, can easily be extended to map
additional needed data
 using the GeneData engine
     => Bio::GeneIO.pm : iterator derived from RootIO
(similar to the
 SeqIO objects); implements next_gene method.
 	
     subdirectory Index with
        => Bio::Index::EntrezGene.pm : object with
capability to index and
 consult an ASN.1 File, inherits from
Bio::Index::Abstract

     test scripts will be committed too :
     => few small test records (with extension asn1)
     => t_gene_indexer.pl : test file to index asn.1
file and return
 an example record

        #example:
        my $file = "gene_hs.asn1";

        my $inx = Bio::Index::EntrezGene->new(
'-filename'   =>
 $file.".inx", '-write_flag' => 'WRITE');
       
$inx->make_index("/usr/local/datasets/ncbi/gene/$file");
     => testGene.pl : tests a Gene objects for return
of appropriate 
 data fields

        #example for only extracting track info from
the asn1 file,
 this is a dynamic way of choosing which data to parse
        my $track_info = new Bio::Gene::GeneTrack;

        $track_info->geneid(1);
        $gene->type('test_type');
        $gene->track_info($track_info);
        print "dump:\n".Dumper($gene)."\n";

Stefan Kirov and Mingyi Liu have produced similar
solutions (wich we didn't test); we believe that ours
is different because it is a all-in-one lightweight
Entrez Gene ASN1 parser that will only capture
essential data (thereby making it rather fast). We
deliberately didn't choose to map the data on a Seq
object. At the same time, a bioperl-compliant indexer
has been written. 
We hope that this code can somehow be useful.

We will commit the code to bioperl cvs if people
agree, as soon as we obtain a login.

 Kris Ulens (bioinformatics software developer)
 Mark Lambrecht (scientist bioinformatics)

Galapagos Genomics
http://www.galapagosgenomics.com


We have developed our own interface to the NCBI
Entrez Gene ASN.1 flat files. We needed this
internally to replace the bioperl LocusLink parser.
Because we have used so many great bioperl code over
the last years, we had hoped that people can benefit
from our work. This system has already proven its
value , at least for us.

The module consists of the following objects:

     => Bio::_GeneData.pm : abstract engine for
parsing "type blocks"
 within the NCBI ASN.1 files
     => Bio::Gene.pm :Entrez Gene object (replaces the
Bioperl sequence
 object that is normally returned by an IO object) and
only keeps
 relevant data, can easily be extended to map
additional needed data
 using the GeneData engine
     => Bio::GeneIO.pm : iterator derived from RootIO
(similar to the
 SeqIO objects); implements next_gene method.
 	
     subdirectory Index with
        => Bio::Index::EntrezGene.pm : object with
capability to index and
 consult an ASN.1 File, inherits from
Bio::Index::Abstract

     test scripts will be committed too :
     => few small test records (with extension asn1)
     => t_gene_indexer.pl : test file to index asn.1
file and return
 an example record

        #example:
        my $file = "gene_hs.asn1";

        my $inx = Bio::Index::EntrezGene->new(
'-filename'   =>
 $file.".inx", '-write_flag' => 'WRITE');
       
$inx->make_index("/usr/local/datasets/ncbi/gene/$file");
     => testGene.pl : tests a Gene objects for return
of appropriate 
 data fields

        #example for only extracting track info from
the asn1 file,
 this is a dynamic way of choosing which data to parse
        my $track_info = new Bio::Gene::GeneTrack;

        $track_info->geneid(1);
        $gene->type('test_type');
        $gene->track_info($track_info);
        print "dump:\n".Dumper($gene)."\n";

Stefan Kirov and Mingyi Liu have produced similar
solutions (wich we didn't test); we believe that ours
is different because it is a all-in-one lightweight
Entrez Gene ASN1 parser that will only capture
essential data (thereby making it rather fast). We
deliberately didn't choose to map the data on a Seq
object. At the same time, a bioperl-compliant indexer
has been written. 
We hope that this code can somehow be useful.

We will commit the code to bioperl cvs if people
agree, as soon as we obtain a login.

 Kris Ulens
 Mark Lambrecht


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the Bioperl-l mailing list