[Bioperl-l] Entrez Gene and bioperl-db

Peter Robinson Peter.Robinson at t-online.de
Mon Jan 3 17:40:01 EST 2005


Hi Bioperlers, hi Hilmar,

after some thinking I have embarked on a lex/yacc parser for the Entrez
Gene ASN.1 format as the way of least resistance, although I am not sure
how that would fit in to BioPerl. If anyone is interested in this (or
has a better idea of how to go about it..), please drop me a line.

In the meantime I have been looking at writing code to parse some of the
"easy" Entrez gene documents, starting off with gene_info. This file
includes the NCBI taxon id for each entry. I would like to convert this
to a Bio::Species object to pass to the following
	my $seq = $self->sequence_factory->create(
			     -verbose => $self->verbose(),
			     -accession_number => $geneID,  
			     -desc => $description,
			     -display_id => $symbol,
			     -species =>  ??? 
			     -annotation => $ann);

and saw the Bio::Taxonomy::FactoryI code, which appears to want to do
this sort of thing. However, the code for that is pretty preliminary. Is
anyone working on this at the moment? Or is there a better way of doing
this (it seems a shame not to provide the actual species name if one has
the taxid...)

best

Peter



On Tue, 2004-12-28 at 07:17, Hilmar Lapp wrote:
> Great to hear that someone is giving this a shot. Yes at this point is  
> appears that NCBI is only offering the ASN.1, not a conversion to XML.  
> Their asn2xml tool will not work with this ASN.1 format either, just  
> checked it to be sure. They do seem to be mulling the option of XML  
> though on the Gene FAQ. Maybe if enough people get in their ears they  
> will spend some effort towards that. After all, the entrez gene web  
> interface can display XML on demand - even though it looks fairly  
> hideous.
> 
> There is no ASN.1 support in bioperl at all. Also, ASN.1 support in  
> perl is actually thin - there is Convert::ASN1 at version 0.18 two  
> years ago that I could find ... doesn't make me feel warm and fuzzy.
> 
> In the absence of any XML available from NCBI, gene_info might be the  
> best start. An option could be to check for the presence of the other  
> tab-delimited files and use those that are present. These are  
> tab-delimited and hence the format itself is trivial so you can focus  
> entirely on setting up a Bio::Seq plus annotation that's  
> comparable/compatible to what the current SeqIO::locuslink does.
> 
> My $0.02 (worth less and less almost every day).
> 
> 	-hilmar
> 
> On Thursday, December 23, 2004, at 10:51  AM, Peter Robinson wrote:
> 
> > Hi,
> >
> > I have been thinking about given a BioPerl EntrezGene parser a try  
> > since
> > I have been a heavy user of locus link to date. One issue is that the
> > files that correspond to LL_tmpl (which was a flat file) are now in asn
> > format
> > http://www.ncbi.nlm.nih.gov/entrez/query/static/help/
> > genehelp.html#query
> > Although I saw some mention of ASN support in Bioperl by googling, I
> > can't seem to find any module that does this in the present
> > distribution. What is the status on that? In any case, I will be  
> > working
> > on this in the next month or two and if anything nice comes of it I  
> > will
> > send it to you / BioPerpl.
> >
> > best wishes & happy holidays
> >
> > Peter
> >
> > On Tue, 2004-12-14 at 09:00, Hilmar Lapp wrote:
> >> Since load_seqdatabase.pl will use bioperl's SeqIO parsers for parsing
> >> any input file, what you're asking is whether or not there is a SeqIO
> >> parser for NCBI Gene.
> >>
> >> The answer to that question is no, not yet. Anybody who feels  
> >> motivated
> >> is welcome to give it a try ... Since I'll need it, I'll write the
> >> parser if nobody else does within the next 3 months, but I'm not going
> >> to promise when exactly this will happen.
> >>
> >> 	-hilmar
> >>
> >> On Monday, December 13, 2004, at 08:03  AM, Law, Annie wrote:
> >>
> >>> Hi,
> >>>
> >>> I was wondering with regards to bioperl-db the scripts and schema and
> >>> load_seqdatabase.pl has there been preparation for integration of
> >>> Entrez
> >>> gene information when locuslink is phased out?  Or if it has already
> >>> been
> >>> changed could somebody point
> >>> me to the documentation or changed code?
> >>>
> >>> Thanks,
> >>> Annie.
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> > -- 
> > Peter N. Robinson
> > peter.robinson at t-online.de
> > peter.robinson at charite.de
> > http://www.charite.de/ch/medgen/robinson/
> >
> >
-- 
Peter N. Robinson
peter.robinson at t-online.de
peter.robinson at charite.de
http://www.charite.de/ch/medgen/robinson/



More information about the Bioperl-l mailing list