[Bioperl-l] Entrez Gene ASN.1 solution

Mark Lambrecht mark_lambrecht at yahoo.com
Thu Apr 14 08:11:49 EDT 2005


Hi Stefan,

Thanks for your response.
I will contribute the code to cvs as soon as I obtain 
a cvs account with write permissions.  Mingyi
suggested that the indexer (Bio::Index::EntrezGene.pm)
could be used to also return your seq-derived object,
which I feel is a good idea.  In this way, your ASN.1
object will have an indexer attached to it.  The
indexer could then alternatively return a Bio::Seq
derived object or a Bio::Gene object.

The Bio::Gene object has a number of different methods
implemented to return pieces of data, such as : 
$gene->get_urls;
 but is generic and could easily be changed to return
different or more data from the Entrez Gene ASN1
record.  Each part of the ASN.1 is parsed into
separate small objects, such as Gene::GeneCommentary,
Gene::GeneTrack, ...
So retrieving the gene id is done by 
$gene->get_gene_id() or
$gene->get_genetrack->get_gene_id();
These getter methods are autoloaded by _GeneData.pm so
if a new piece of data needs to accessed, no new
method needs to be implemented.

Regards,
Mark


=============================================
 Kris Ulens 
  bioinformatics software developer
  tel.: 0032 (0) 486 683 532
  e-mail: fantom at earthling.net
  
 Mark Lambrecht, PhD
  K.U.Leuven, Faculty of Applied Bioscience 
              and Engineering 
  tel.: 0032 (0) 495 944 125
  e-mail: mark at lambrecht.com
          mark.lambrecht at biw.kuleuven.be

--- Stefan Kirov <skirov at utk.edu> wrote:
> Could you please post a description of the Entrez
> Gene object? I am also 
> not very happy with creating Bio::Seq object as I
> don't think this 
> object should be "one size fits all" solution, so I
> am very curious to 
> see what is your design.
> I find the indexing very useful for a particular
> group of people 
> (actually we discussed this before and agreed it is
> a good idea).
> I  think having two parsers for the same format is
> OK for bioperl so I 
> don't see any reason for you parser not to be in
> Bioperl.
> Stefan
> 
> Mark Lambrecht wrote:
> 
> >We have developed our own interface to the NCBI
> >Entrez Gene ASN.1 flat files. We needed this
> >internally to replace the bioperl LocusLink parser.
> >Because we have used so many great bioperl code
> over
> >the last years, we had hoped that people can
> benefit
> >from our work. This system has already proven its
> >value , at least for us.
> >
> >The module consists of the following objects:
> >
> >     => Bio::_GeneData.pm : abstract engine for
> >parsing "type blocks"
> > within the NCBI ASN.1 files
> >     => Bio::Gene.pm :Entrez Gene object (replaces
> the
> >Bioperl sequence
> > object that is normally returned by an IO object)
> and
> >only keeps
> > relevant data, can easily be extended to map
> >additional needed data
> > using the GeneData engine
> >     => Bio::GeneIO.pm : iterator derived from
> RootIO
> >(similar to the
> > SeqIO objects); implements next_gene method.
> > 	
> >     subdirectory Index with
> >        => Bio::Index::EntrezGene.pm : object with
> >capability to index and
> > consult an ASN.1 File, inherits from
> >Bio::Index::Abstract
> >
> >     test scripts will be committed too :
> >     => few small test records (with extension
> asn1)
> >     => t_gene_indexer.pl : test file to index
> asn.1
> >file and return
> > an example record
> >
> >        #example:
> >        my $file = "gene_hs.asn1";
> >
> >        my $inx = Bio::Index::EntrezGene->new(
> >'-filename'   =>
> > $file.".inx", '-write_flag' => 'WRITE');
> >       
>
>$inx->make_index("/usr/local/datasets/ncbi/gene/$file");
> >     => testGene.pl : tests a Gene objects for
> return
> >of appropriate 
> > data fields
> >
> >        #example for only extracting track info
> from
> >the asn1 file,
> > this is a dynamic way of choosing which data to
> parse
> >        my $track_info = new Bio::Gene::GeneTrack;
> >
> >        $track_info->geneid(1);
> >        $gene->type('test_type');
> >        $gene->track_info($track_info);
> >        print "dump:\n".Dumper($gene)."\n";
> >
> >Stefan Kirov and Mingyi Liu have produced similar
> >solutions (wich we didn't test); we believe that
> ours
> >is different because it is a all-in-one lightweight
> >Entrez Gene ASN1 parser that will only capture
> >essential data (thereby making it rather fast). We
> >deliberately didn't choose to map the data on a Seq
> >object. At the same time, a bioperl-compliant
> indexer
> >has been written. 
> >We hope that this code can somehow be useful.
> >
> >We will commit the code to bioperl cvs if people
> >agree, as soon as we obtain a login.
> >
> > Kris Ulens (bioinformatics software developer)
> > Mark Lambrecht (scientist bioinformatics)
> >
> >Galapagos Genomics
> >http://www.galapagosgenomics.com
> >
> >
> >		
> >__________________________________ 
> >Yahoo! Mail Mobile 
> >Take Yahoo! Mail with you! Check email on your
> mobile phone. 
> >http://mobile.yahoo.com/learn/mail 
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
>
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >  
> >
> 
> -- 
> Stefan Kirov, Ph.D.
> University of Tennessee/Oak Ridge National
> Laboratory
> 5700 bldg, PO BOX 2008 MS6164
> Oak Ridge TN 37831-6164
> USA
> tel +865 576 5120
> fax +865-576-5332
> e-mail: skirov at utk.edu
> sao at ornl.gov
> 
> "And the wars go on with brainwashed pride
> For the love of God and our human rights
> And all these things are swept aside"
> 
> 



		
__________________________________ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/


More information about the Bioperl-l mailing list