[Bioperl-l] Entrez Gene ASN

Stefan Kirov skirov at utk.edu
Thu Mar 10 09:14:40 EST 2005


Hi guys!
I have done some (mostly) serious thinking about ASN Entrez Gene parsing 
and I propose we do my favorite thing- postpone everything we cannot 
deal with right now. If you want it to sound better: take a gradual 
approach where we store the data we can deal with in the existing 
Bioperl objects and skipping the rest for now.
In details:
ASN gene record can be correctly represented as a tree. I have written a 
simple parser for my own purposes which is storing the following:
node_id---|
                  --parent
                  --level
                  --tag
                  --values
What I do then is get specific levels and tags and build different 
objects. So level 2 with parent EntrezGene (which is the root level and 
has no information) is gene description and has tags such as gene, name, 
etc; at level 3, 5 and 6 you can get the complete specie definition by 
looking for orgname and org as tags and records with parent mod (which 
is a value for orgname, descend down the branch).
I am using this approach to store most of the data in a relational 
database without going through Bioperl. What I ultimately want to do is 
use standard Bioperl modules. However, I don't think we have an object 
that can efficiently represent the structure (correct me if I am wrong). 
I think it may be a good idea to have a container object, possibly 
Bio::Gene that may contain multiple Bio::Seq objects (with or without 
real sequence). I believe we can borrow some structure and code from 
EnsEMBL gene representation (way to contain multiple transcripts, etc., 
not the database interactions certainly).
Please let me know what you think.
Stefan


More information about the Bioperl-l mailing list