[Bioperl-l] Re: Bio::DB::GFF trouble + Blast question
Scott Cain
cain at cshl.org
Fri Jul 4 04:29:51 EDT 2003
The main things I look for is that the loader reports success and
reports the right number of rows loaded. Also, after the load is done,
if I am skeptical, I will do a `select count(*) from fdata` and that
should return the same number of as there are lines in the GFF file.
Scott
On Thu, 2003-07-03 at 18:33, Venky Nandagopal wrote:
> Thanks Scott, that indeed was the problem. I dropped and rebuilt the
> database and it seems to work fine (i think) now, although I cant figure
> out how the database got screwed up in the first place. Is there a way to
> check that the database has been loaded completely/correctly?
>
> Venky
>
> On 03 Jul 2003 16:21:15 -0400, Scott Cain <cain at cshl.org> wrote:
>
> > Venky,
> >
> > Fair enough--if I change my example script to have the @genes line like
> > yours, it works for CG6667 and several other nearby numbers.
> >
> > I am more inclined to distrust the database itself. Perhaps a row in
> > the database is corrupted. Try these queries:
> >
> > mysql> select * from fgroup where gname='CG6667';
> > +------+--------+--------+
> > | gid | gclass | gname |
> > +------+--------+--------+
> > | 8185 | Gene | CG6667 |
> > +------+--------+--------+
> >
> > mysql> select * from fdata where gid=8185; (using the gid from above)
> > +-------+------+----------+----------+---------------+---------+--------
> > +---------+--------+------+---------------+--------------+
> > | fid | fref | fstart | fstop | fbin | ftypeid | fscore |
> > fstrand | fphase | gid | ftarget_start | ftarget_stop |
> > +-------+------+----------+----------+---------------+---------+--------
> > +---------+--------+------+---------------+--------------+
> > | 23255 | 2L | 17414916 | 17428443 | 100000.000174 | 5 | NULL |
> > - | NULL | 8185 | NULL | NULL |
> > +-------+------+----------+----------+---------------+---------+--------
> > +---------+--------+------+---------------+--------------+
> >
> > select * from fdna where fref='2L' and foffset>=17414000 and
> > foffset<=17430000;
> >
> > which should give 9 rows of dna chunks that are 2000 bases long.
> >
> > If any of your results are different from mine (excluding ids), then I
> > think you database has a problem.
> >
> > Scott
> >
> >
> >
> > On Thu, 2003-07-03 at 15:56, Venky Nandagopal wrote:
> >> Scott,
> >>
> >> Thanks for the reply. I should have been more careful with my email --
> >> my script actually has the following lines
> >>
> >> @genes = $db->get_feature_by_name("Gene" => $gene_id);
> >> print $genes[0]->seq;
> >>
> >> The script works for every other CG number I've tried --- it only fails
> >> for CG6667, which makes me think that there must be something wierd
> >> going on with Bio::DB::GFF, not the script.
> >>
> >> Venky
> >>
> >>
> >>
> >> On 03 Jul 2003 13:46:08 -0400, Scott Cain <cain at cshl.org> wrote:
> >>
> >> > Venky,
> >> >
> >> > It is not all that clear to me why in this case you need to, but you
> >> > need to specify the class of the object, in this case 'Gene'.
> >> >
> >> > Here is an example script that works for me:
> >> >
> >> > #!/usr/bin/perl
> >> > use strict;
> >> > use Bio::DB::GFF;
> >> > my $db = Bio::DB::GFF->new(-adaptor => 'dbi::mysql',
> >> > -dsn => 'fly');
> >> > my @genes = $db->get_feature_by_name(-class=>'Gene',-name=>'CG6665');
> >> > print $genes[0]->seq,"\n";
> >> >
> >> > Scott
> >> >
> >> > On Thu, 2003-07-03 at 11:58, bioperl-l-request at portal.open-bio.org
> >> > wrote:
> >> >> Message: 12
> >> >> Date: Thu, 03 Jul 2003 01:43:57 -0700
> >> >> From: Venky Nandagopal <venky at OCF.Berkeley.EDU>
> >> >> Subject: [Bioperl-l] Bio::DB::GFF trouble + Blast question
> >> >> To: bioperl-l at portal.open-bio.org
> >> >> Message-ID: <oprrp7vjpxqwe008 at mail.ocf.berkeley.edu>
> >> >> Content-Type: text/plain; charset=utf-8; format=flowed
> >> >>
> >> >> Hi,
> >> >>
> >> >> I have a couple of problems: (1) I use a database created using >>
> >> process_gadfly.pl to access the D.mel genome, via Bio::DB::GFF. I have a
> >> >> utility script that returns the sequence of a gene given the CG
> >> number, >> using @genes = get_feature_by_name(CG####); print
> >> $genes[0]->seq;
> >> >> This script seems to work fine for most CG numbers, except for
> >> CG6667, >> which is the ID for the dorsal gene. For some reason, no
> >> sequence is >> returned by the seq() method. The gene object is not
> >> undefined though, >> since $genes[0]->asString returns
> >> "gene:gadfly(CG6667)"; similarly the >> start, end, strand methods work
> >> fine. I have tried getting transcripts >> instead of the gene etc etc,
> >> but CG6667 refuses to yield any sequence. >> Can anyone provide an
> >> explanation for this?
> >> >>
> >> >>
> >> >> (2) This is not directly connect to Bioperl, but BLAST reports
> >> sometimes >> provide Expect values in the form "Expect(3)=0.0". What
> >> does the 3 refer >> to? Sometimes it says "Expect(7+)=1e-23".
> >> >>
> >> >>
> >> >> Thanks
> >> >> Venky
> >> >
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain at cshl.org
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
More information about the Bioperl-l
mailing list