[BioSQL-l] Re: getting exon information from genbank files
ankit soni
ankitson at gmail.com
Tue Apr 12 05:09:38 EDT 2005
Sorry for the confusion the values were masked they were not actual values .
Later I was able to figure out how to do the stuff what I needed.
I am developing few example SQL queries which I will post on the list soon.
Thanks for helping.
Ankit Soni
On Mon, 11 Apr 2005 11:55:09 -0700, Hilmar Lapp <hlapp at gnf.org> wrote:
> Ankit, the values you're showing in your sample record, did you make
> them up entirely or is this an actual query result?
>
> Note that all columns in the location table are numeric, so it only
> creates confusion if you choose letters as characters to mask the real
> values. If they are the real values that you must have changed the
> schema and not used load_seqdatabase.pl to load records.
>
> Note also that generally what's in biosql will closely resemble the
> object model that was built by the SeqIO bioperl parser run on your
> input record(s) - provided you used load_seqdatabase.pl to load the
> record(s). So, what ends up in biosql as the result of loading a
> genbank record greatly depends on the genbank record itself. As a rule,
> what the genbank record had in its feature table you'll also find in
> biosql as a seqfeature record, and what wasn't in the feature table you
> also won't find in biosql. Introns are usually not annotated in Genbank
> explicitly, they are only implicit as the region between exons, so
> unless the genbank record you loaded were exceptions you . How to find
> exons again depends on the feature table of the original records: some
> have a single cDNA feature with a composite ('split') location, which
> will end up in biosql as one seqfeature that has many locations
> attached. Genomic contigs sometimes have the exons annotated as
> individual features, and then this is what you'll find in biosql too:
> one seqfeature per exon, each with a single location.
>
> The bottom line is, if you load through load_seqdatabase.pl the content
> in biosql will closely match the object tree in bioperl - which often
> times will be close to the data structure of the original input record.
> Features that weren't there to begin with you won't find magically
> added.
>
> So, to come back to your question, there is no good answer because it
> greatly depends on what your input was. Most likely though you'll have
> to impute introns by fetching the locations of the cDNA (or mRNA)
> feature or the locations of the exon features, order them properly, and
> then infer introns between consecutive exons.
>
> If this is what you need to do all the time I'd write a script that
> does this in an automated fashion against all newly loaded records and
> inserts the introns as features back into the database.
>
> -hilmar
>
> On Sunday, April 10, 2005, at 11:04 AM, ankit soni wrote:
>
> > Hi all,
> > I have just started using BioSQL for one of my projects and I have
> > loaded few genbank files in the MySQL database using BioPerl and the
> > standard schema.
> > I wanted to ask how can I get the information about the exons, introns
> > from the database.
> > If I use the following querry I get the start and end position but I
> > am not able to find out what limits(start_pos and end-pos) stand for
> > i.e. gene or exon or intron.
> > mysql> select * from location where seqfeature_id='XXXX';
> > +-------------+---------------+-----------+---------+-----------
> > +---------+--------+------+
> > | location_id | seqfeature_id | dbxref_id | term_id | start_pos |
> > end_pos | strand | rank |
> > +-------------+---------------+-----------+---------+-----------
> > +---------+--------+------+
> > | YYYY | XXXX | NULL | NULL | ABC |
> > EFG | 1 | 1 |
> > +-------------+---------------+-----------+---------+-----------
> > +---------+--------+------+
> >
> > It would be very helpful if somebody can guide me.
> > I am sorry if I am unable to use the correct biological terms as I
> > know very little of biology.
> >
> > Ankit Soni
> > Junior Undergraduate
> > Dept. of Computer Science
> > IIT kanpur
> > India
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp email: lapp at gnf.org
> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
More information about the BioSQL-l
mailing list