[Bioperl-l] parsing entrezgene file (lost data)
Chris Fields
cjfields at illinois.edu
Wed Jul 6 01:00:57 UTC 2011
On Jul 5, 2011, at 5:38 PM, Carnë Draug wrote:
> Well, I update the bug report with what you found, thank you.
>
> 2011/7/5 Smithies, Russell <Russell.Smithies at agresearch.co.nz>:
>> Bio::ASN1::EntrezGene is not the easiest to work with but you can access everything if you try hard enough.
>> I used it last year from transforming ASN.1 gene records from NCBI into fully annotated Wiki pages and it was very successful though I got sick of typing so many curly brackets ;-)
>
> You mean I should access the data "manually" rather than using
> methods? It will have to do by now although that's kind of the
> opposite of objects are meant to (I think, I'm no programmer).
>
> My plan is to make an application that can be reused by other people
> hence trying to do it in a nice maintainable way without too many
> hacks and why I can't just parse the gene2refseq file.
>
> Since what I want is to get the transcripts and proteins given a gene
> UID, I can see two options.
> 1 - parse the ASN1 file and access the data 'manually' until this is
> fixed (and then fix the code to use the methods)
> 2 - use elink from EUtilities. But since it fails around half the
> times, I'd have to check if it's a pseudo gene first. If it's not it
> should link to at least one place in the nucleotide database so I'd
> have the connection on an eval block until an id is returned.
>
> I think I'll go for the first option but opinions are welcome.
>
> Carnë
I assume the problem is that not every piece of data has been mapped to relevant BioPerl classes to be stored in the Bio::Seq, thus the lack of support for these. Bio::ASN1::EntrezGene is a fairly generic parser, though (as Russell points out the data is there but hasn't been mapped to objects). Maybe someone with a bit more experience with this parser can chip in, though?
chris
More information about the Bioperl-l
mailing list