[Bioperl-l] parsing entrezgene file (lost data)

Wed Jul 6 01:00:57 UTC 2011

On Jul 5, 2011, at 5:38 PM, Carnë Draug wrote:

> Well, I update the bug report with what you found, thank you.
> 
> 2011/7/5 Smithies, Russell <Russell.Smithies at agresearch.co.nz>:
>> Bio::ASN1::EntrezGene is not the easiest to work with but you can access everything if you try hard enough.
>> I used it last year from transforming ASN.1 gene records from NCBI into fully annotated Wiki pages and it was very successful though I got sick of typing so many curly brackets ;-)
> 
> You mean I should access the data "manually" rather than using
> methods? It will have to do by now although that's kind of the
> opposite of objects are meant to (I think, I'm no programmer).
> 
> My plan is to make an application that can be reused by other people
> hence trying to do it in a nice maintainable way without too many
> hacks and why I can't just parse the gene2refseq file.
> 
> Since what I want is to get the transcripts and proteins given a gene
> UID, I can see two options.
>  1 - parse the ASN1 file and access the data 'manually' until this is
> fixed (and then fix the code to use the methods)
>  2 - use elink from EUtilities. But since it fails around half the
> times, I'd have to check if it's a pseudo gene first. If it's not it
> should link to at least one place in the nucleotide database so I'd
> have the connection on an eval block until an id is returned.
> 
> I think I'll go for the first option but opinions are welcome.
> 
> Carnë

I assume the problem is that not every piece of data has been mapped to relevant BioPerl classes to be stored in the Bio::Seq, thus the lack of support for these.  Bio::ASN1::EntrezGene is a fairly generic parser, though (as Russell points out the data is there but hasn't been mapped to objects).  Maybe someone with a bit more experience with this parser can chip in, though?

chris