[Bioperl-l] Entrez Gene ASN

Peter.Robinson at t-online.de Peter.Robinson at t-online.de
Thu Mar 10 16:28:23 EST 2005


On Thu, Mar 10, 2005 at 09:14:40AM -0500, Stefan Kirov wrote:
> Hi guys!
> I have done some (mostly) serious thinking about ASN Entrez Gene parsing 
> and I propose we do my favorite thing- postpone everything we cannot 
> deal with right now. If you want it to sound better: take a gradual 
> approach where we store the data we can deal with in the existing 
> Bioperl objects and skipping the rest for now.
> In details:
> ASN gene record can be correctly represented as a tree. I have written a 
> simple parser for my own purposes which is storing the following:
> node_id---|
>                  --parent
>                  --level
>                  --tag
>                  --values
> What I do then is get specific levels and tags and build different 
> objects. So level 2 with parent EntrezGene (which is the root level and 
> has no information) is gene description and has tags such as gene, name, 
> etc; at level 3, 5 and 6 you can get the complete specie definition by 
> looking for orgname and org as tags and records with parent mod (which 
> is a value for orgname, descend down the branch).
> I am using this approach to store most of the data in a relational 
> database without going through Bioperl. What I ultimately want to do is 
> use standard Bioperl modules. However, I don't think we have an object 
> that can efficiently represent the structure (correct me if I am wrong). 
> I think it may be a good idea to have a container object, possibly 
> Bio::Gene that may contain multiple Bio::Seq objects (with or without 
> real sequence). I believe we can borrow some structure and code from 
> EnsEMBL gene representation (way to contain multiple transcripts, etc., 
> not the database interactions certainly).
> Please let me know what you think.
> Stefan


Hi Stefan,

from the work I have done on this issue it would seem that your suggestion is quite promising. Let me know if you need some help on this. How is the performance that you are seeing to date?

best,
Peter


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l


More information about the Bioperl-l mailing list