[BioSQL-l] Re: getting exon information from genbank files

Tue Apr 12 05:09:38 EDT 2005

Sorry for the confusion the values were masked  they were not actual values .
Later I was able to figure out how to do the stuff what  I needed.
I am developing few example SQL queries which I will post on the list soon. 

Thanks for helping.
Ankit Soni

On Mon, 11 Apr 2005 11:55:09 -0700, Hilmar Lapp <hlapp at gnf.org> wrote:
> Ankit, the values you're showing in your sample record, did you make  
> them up entirely or is this an actual query result?
> 
> Note that all columns in the location table are numeric, so it only  
> creates confusion if you choose letters as characters to mask the real  
> values. If they are the real values that you must have changed the  
> schema and not used load_seqdatabase.pl to load records.
> 
> Note also that generally what's in biosql will closely resemble the  
> object model that was built by the SeqIO bioperl parser run on your  
> input record(s) - provided you used load_seqdatabase.pl to load the  
> record(s). So, what ends up in biosql as the result of loading a  
> genbank record greatly depends on the genbank record itself. As a rule,  
> what the genbank record had in its feature table you'll also find in  
> biosql as a seqfeature record, and what wasn't in the feature table you  
> also won't find in biosql. Introns are usually not annotated in Genbank  
> explicitly, they are only implicit as the region between exons, so  
> unless the genbank record you loaded were exceptions you . How to find  
> exons again depends on the feature table of the original records: some  
> have a single cDNA feature with a composite ('split') location, which  
> will end up in biosql as one seqfeature that has many locations  
> attached. Genomic contigs sometimes have the exons annotated as  
> individual features, and then this is what you'll find in biosql too:  
> one seqfeature per exon, each with a single location.
> 
> The bottom line is, if you load through load_seqdatabase.pl the content  
> in biosql will closely match the object tree in bioperl - which often  
> times will be close to the data structure of the original input record.  
> Features that weren't there to begin with you won't find magically  
> added.
> 
> So, to come back to your question, there is no good answer because it  
> greatly depends  on what your input was. Most likely though you'll have  
> to impute introns by fetching the locations of the cDNA (or mRNA)  
> feature or the locations of the exon features, order them properly, and  
> then infer introns between consecutive exons.
> 
> If this is what you need to do all the time I'd write a script that  
> does this in an automated fashion against all newly loaded records and  
> inserts the introns as features back into the database.
> 
> 	-hilmar
> 
> On Sunday, April 10, 2005, at 11:04  AM, ankit soni wrote:
> 
> > Hi all,
> > I have just started using BioSQL for one of my projects and I have
> > loaded few genbank files in the MySQL database using BioPerl and the
> > standard schema.
> > I wanted to ask how can I get the information about the exons, introns
> > from the database.
> > If I use the following querry I get the start and end position but I
> > am not able to find out what limits(start_pos and end-pos) stand for
> > i.e. gene or exon or intron.
> > mysql> select * from location where seqfeature_id='XXXX';
> > +-------------+---------------+-----------+---------+----------- 
> > +---------+--------+------+
> > | location_id | seqfeature_id | dbxref_id | term_id | start_pos |
> > end_pos | strand | rank |
> > +-------------+---------------+-----------+---------+----------- 
> > +---------+--------+------+
> > |       YYYY |         XXXX  |      NULL  |    NULL |      ABC  |
> > EFG |      1    |    1     |
> > +-------------+---------------+-----------+---------+----------- 
> > +---------+--------+------+
> >
> > It would be very helpful if somebody can guide me.
> > I am sorry if I am unable to use the correct biological terms as I
> > know very little of biology.
> >
> > Ankit Soni
> > Junior Undergraduate
> > Dept. of Computer Science
> > IIT kanpur
> > India
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> >
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
> 
> 
>