[Bioperl-l] RE: [Gmod-gbrowse] features not displaying

Marc Logghe Marc.Logghe at devgen.com
Wed Feb 2 05:52:44 EST 2005


Allright, status update about my little 'gbrowse on top of biosql'-project.

I have prepared a fresh linux box with bioperl-live tagged bioperl-release-1-5-0, bioperl-db and bioperl-schema HEAD, release_1_62-bugfixes Generic-Genome-Browser.
I loaded some genbank proteins in biosql (Thanks Hilmar for helping me out !) and adapted 06.biosql.conf.
Three observations that at this (early) point can be made:
1) init_code (providing some missing methods in order to make Bio::SeqFeature::Generic and Bio::Graphics::Feature more compliant) is not needed anymore, which is good (Thanks Lincoln)
2) all features are displayed in duplicate (I'll flesh that one out). This confirms the things Genevieve is seeing.
3) Bio::SeqFeature::Generic::get_tag_values does not exist anymore: Bio::SeqFeature::Geneneric does not inherit from Bio::AnnotatableI which implements that method, which is bad (can it pass the tests, then ?). I don't know what the situation is in bioperl-live HEAD.

Cheers,
Marc


> -----Original Message-----
> From: gmod-gbrowse-admin at lists.sourceforge.net
> [mailto:gmod-gbrowse-admin at lists.sourceforge.net]On Behalf Of 
> Genevieve
> DeClerck
> Sent: Tuesday, February 01, 2005 10:57 PM
> To: gmod-gbrowse at lists.sourceforge.net
> Cc: Lincoln Stein
> Subject: Re: [Gmod-gbrowse] features not displaying
> 
> 
> 
> I've been playing around with the NC_004578.gbk (P.syringae DC3000 
> chromosome) file I got from ncbi that I've been trying to 
> display in a 
>   biosql/gbrowse setup -- I got some of the data to 
> partically display..
> 
> I found that if I remove the whole genome sequence (ORIGIN record) at 
> the end of NC_004578.gbk so that just the feature records remain, and 
> load this into a fresh biosql db, CDS's display in GBrowse. But there 
> are a couple of problems with what's displayed:
> 
> 1) no features with a stop coordinate > 500,000 are shown, so 
> for DC3000 
> only about a 1/6 of the genome features are shown;
> 
> 2) all the CDS features are duplicated.. there are two 
> colored bars for 
> each CDS. When you mouse over dulpicate bars they both 
> display something 
> like: "CDS: :813..1916". I thought this might have something 
> to do with 
> the fact that in the gbk file, most if not all of the CDS 
> records have 
> corresponding 'gene' records with the same exact coordinates, but the 
> Genes have their own duplication problem... the gene records also 
> display in duplicate ... mousing over the gene glyphs in the 
> same region 
> as the CDS above shows this: "Gene: :813..1916".
> 
> Another test I performed with the NC_004578.gbk file is that 
> I severely 
> truncated the whole genome sequence at the end of the file.. removing 
> all sequence after the '7141' line. And I removed all features in the 
> file that had a coord > 7000. This left 4 CDSs and 4 genes. The file 
> loads fine and displays all 4 features.. in duplicate. So 
> when selecting 
> the Landmark, "NC_004578:1..7200", all features are visible.. 
> 16 glyphs 
> total due to duplication.
> 
> The web log is not showing any errors of note..
> 
> -Genevieve
> 
> 
> 
> Lincoln Stein wrote:
> 
> > Sounds like Simon and I will have to work through this.  
> Probably the 
> > display_name needs to get set more broadly.
> > 
> > Lincoln
> > 
> > On Friday 28 January 2005 07:30 pm, Hilmar Lapp wrote:
> > 
> >>I haven't had a chance yet to look at this in the code so I am and
> >>probably sound a bit confused. In order to display something in
> >>gbrowse wouldn't you need coordinates?
> >>
> >>If what the code does is to fetch sequences, then they don't have
> >>coordinates - only their feature(s) do/does (which are
> >>auto-retrieved along with the seqs). This means that you'd have to
> >>have loaded the database with sequence entries whose feature table
> >>gives the coordinates of the *sequence*. Usually the feature table
> >>will give the coordinates of the features on the sequence, no?
> >>(unless it's a remote location)
> >>
> >>Assuming in your question you mean retrieving features by name, the
> >>naive answer is similar to how you do this for seq by accession,
> >>except you ask for the SeqFeatureI adaptor, provide a SeqFeatureI
> >>factory, specify SeqFeatureI in the collections, and constrain by
> >>the display_name property of the SeqFeatureI entity.
> >>
> >>However, this probably won't work for most dataload situations
> >>because bioperl doesn't set the display_name property in any of its
> >>rich format parsers. The corresponding column is optional in
> >>biosql, so there's no error or warning from that.
> >>
> >>So, if your name or identifier to search by is the value of a
> >>feature tag, technically it will be in the value column of the
> >>seqfeature_qualifier_value table unless you massaged the data
> >>structure prior to upload. You then need to set up a BioQuery with
> >>SeqFeatureI and Bio::Annotation::SimpleValue as entities to be
> >>joined, and constrain by value of the latter. This may mean nothing
> >>to most, but I hope Simon has some idea on what I'm talking about.
> >>
> >>	-hilmar
> >>
> >>On Jan 28, 2005, at 12:14 PM, Lincoln Stein wrote:
> >>
> >>>There's a central call in the gbrowse/biosql adaptor called
> >>>get_feature_by_name() that calls
> >>>biosql->fetch_Seq_by_accession(). What it should do is to look
> >>>for seqfeatures if the
> >>>fetch_Seq_by_accession() doesn't return a result.
> >>>
> >>>This should be a simple fix .  Simon, HIlmar, what is the
> >>>appropriate biosql call to retrieve seqfeatures?
> >>>
> >>>Lincoln
> >>>
> >>>On Friday 28 January 2005 02:26 pm, Hilmar Lapp wrote:
> >>>
> >>>>So how do others do this then on biosql? Simon ran a benchmark
> >>>>that included biosql. Ah - that was on an artifical dataset,
> >>>>right? I thought Simon's been running gbrowse on top of biosql
> >>>>for some demo site on real data?
> >>>>
> >>>>So this may need more work then in that the bridging code should
> >>>>query either bioentry, or seqfeature, or possibly both, maybe
> >>>>configurable?
> >>>>
> >>>>	-hilmar
> >>>>
> >>>>On Thursday, January 27, 2005, at 12:59  PM, Marc Logghe wrote:
> >>>>
> >>>>>Hi Genevieve,
> >>>>>my posts to gmod-gbrowse at lists.sourceforge.net seem to get lost
> >>>>>one way or another.
> >>>>>
> >>>>>Anyhow, I am afraid I will not be able to help you out ...
> >>>>>I can only make a few remarks.
> >>>>>As far as I understand your pseudomonas biosql database
> >>>>>contains 1 bioentry with the accession NC_004578 and sequence
> >>>>>length 6397126 bp. A query in the gbrowse/biosql combination
> >>>>>will only work for that accession and will return the complete
> >>>>>segment. So I guess you get a kind of timeout because the the
> >>>>>fetching of the bioentry from biosql takes too long.
> >>>>>Also, as far as I know, it is not possible to query for
> >>>>>features residing on that segment, e.g. the gene PSPTO0041.
> >>>>>That is because gbrowse/biosql will only look for the bioentry
> >>>>>in biosql with the accession number PSPTO0041 and not the gene
> >>>>>feature with that locus_tag !
> >>>>>This is probably not the behaviour you want.
> >>>>>An option I see, is that you convert the genbank record into
> >>>>>gff and load that in gbrowse/chado or gbrowse/gff.
> >>>>>I think genbank2gff3.pl is especially suited to do the
> >>>>>conversion. Hope I am not terribly wrong here ...
> >>>>>
> >>>>>Regards,
> >>>>>Marc
> >>>>>
> >>>>>
> >>>>>>-----Original Message-----
> >>>>>>From: Genevieve DeClerck [mailto:gad14 at cornell.edu]
> >>>>>>Sent: Thursday, January 27, 2005 6:14 PM
> >>>>>>To: Marc Logghe
> >>>>>>Cc: gmod-gbrowse at lists.sourceforge.net
> >>>>>>Subject: Re: [Gmod-gbrowse] features not displaying
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>Right, thanks Marc - I lost that line when I created my conf
> >>>>>>file from
> >>>>>>scratch (from 06.biosql.conf) for the umpteenth time. With
> >>>>>>that line (and init_code.pl) I can successfully get the cds in
> >>>>>>the 'parkin' example biosql database to display! So I moved to
> >>>>>>the next step and created a new biosql db and loaded it with
> >>>>>>data that I'm more interested
> >>>>>>in.. the gbk file from ncbi of Pseudomonas syringae features
> >>>>>>(ftp://bio-mirror.net/biomirror/ncbigenomes/Bacteria/Pseudomon
> >>>>>>as_syringae/NC_004578.gbk).
> >>>>>>
> >>>>>>
> >>>>>>I seem to be encountering a similar problem as with the
> >>>>>>example parkin
> >>>>>>db previous to adding back that line - everything appears to
> >>>>>>be operating correctly in the browser, no errors in the web
> >>>>>>log, but no features appear for CDS's or anything else.
> >>>>>>Translation frames, GC content and sequence display fine.
> >>>>>>
> >>>>>>I'm starting to think that there's something amiss with how
> >>>>>>the NC_004578.gbk data is stored in the database... or
> >>>>>>something that biosql/gbrowse doesn't like about it. However,
> >>>>>>I do not see anything weird with the data when I browse
> >>>>>>through it in mysql. THe conf file i'm
> >>>>>>using for the NC_004578 db is pretty much the same as what I
> >>>>>>used for the parkin db.. just the appropriate changes in the
> >>>>>>db_args def.
> >>>>>>
> >>>>>>Any thoughts?
> >>>>>>
> >>>>>>Thanks,
> >>>>>>Genevieve
> >>>>>>
> >>>>>>Marc Logghe wrote:
> >>>>>>
> >>>>>>>Hi Genevieve,
> >>>>>>>I am pretty sure it will be fixed when you adjust your conf
> >>>>>>
> >>>>>>like I suggested earlier.
> >>>>>>
> >>>>>>
> >>>>>>>init_code    = do "$main::CONF_DIR/init_code.pl";
> >>>>>>>
> >>>>>>>
> >>>>>>>and put init_code.pl in your gbrowse.conf directory.
> >>>>>>>
> >>>>>>>
> >>>>>>>*Bio::SeqFeature::Generic::attributes =
> >>>>>>
> >>>>>>\&Bio::SeqFeature::Generic::get_tag_values;
> >>>>>>
> >>>>>>
> >>>>>>>*Bio::SeqFeature::Generic::method =
> >>>>>>
> >>>>>>\&Bio::SeqFeature::Generic::primary_tag;
> >>>>>>
> >>>>>>
> >>>>>>>*Bio::SeqFeature::Generic::type = sub {
> >>>>>>>   my $feat = shift;
> >>>>>>>   my ($method) = $feat->primary_tag;
> >>>>>>>   my ($source) = $feat->source_tag;
> >>>>>>>   return $method;
> >>>>>>>};
> >>>>>>>*Bio::SeqFeature::Generic::name = sub {
> >>>>>>>   my $feat = shift;
> >>>>>>>   my $name = eval {($feat->get_tag_values('name'))[0]};
> >>>>>>>   $name ||= eval {($feat->get_tag_values('label'))[0]};
> >>>>>>>   $name ||= eval {($feat->get_tag_values('db_xref'))[0]};
> >>>>>>>#   $name ||= 'unknown';
> >>>>>>>   print STDERR "name = $name\n";
> >>>>>>>   return $name;
> >>>>>>>}
> >>>>>>>
> >>>>>>>
> >>>>>>>I tested it by commenting out the init_code line in my conf
> >>>>>>
> >>>>>>and I have the same things happening as you mention: no
> >>>>>>features show up, no errors in the log.
> >>>>>>
> >>>>>>
> >>>>>>>ML
> >>>>>>>
> >>>>>>>
> >>>>>>>>Hi,
> >>>>>>>>
> >>>>>>>>I can't for the life of me figure out why features/glyphs
> >>>>>>>>are not displaying in gbrowse. Everything else in the
> >>>>>>>>browser page is displaying
> >>>>>>>>fine - the header section, the footer, etc. A couple of
> >>>>>>>>things display
> >>>>>>>>ok in the feature window... GC Content, Translation fwd and
> >>>>>>>>rev (including DNA sequence when zoomed all the way down).
> >>>>>>>>But no real features, such as CDS, are displayed. The CDS
> >>>>>>>>track has a label, "CDS,"
> >>>>>>>>but the rest is just plain empty, where I know there should
> >>>>>>>>be a feature.
> >>>>>>>>
> >>>>>>>>I have gbrowse 1.62 running with a biosql-mysql database
> >>>>>>
> >>>>>>loaded with
> >>>>>>
> >>>>>>
> >>>>>>>>bioperl-db example data file 'parkin.gb', which has one CDS.
> >>>>>>>>[I just got
> >>>>>>>>this setup working yesterday, thanks to help from this list
> >>>>>>>>and BioSQL-l
> >>>>>>>>- see posts from jan 25 2005: "gbrowse on top of biosql"].
> >>>>>>>>There are no
> >>>>>>>>errors in the web error log or the mysql log. I triple
> >>>>>>>>checked that the
> >>>>>>>>parkin data is actually in the database and it is.
> >>>>>>>>
> >>>>>>>>..I'm thinking there must be something wrong with my gbrowse
> >>>>>>>>conf file
> >>>>>>>>for this database.. in particular in the track section. But
> >>>>>>>>if that were
> >>>>>>>>the case, wouldn't I be seeing errors in my httpd error
> >>>>>>
> >>>>>>log? The conf
> >>>>>>
> >>>>>>
> >>>>>>>>file i have is almost an exact copy of a conf file that i've
> >>>>>>>>seen referred to in the docs: 06.biosql.conf. The track
> >>>>>>>>stanzas look reasonable, going from info in docs and
> >>>>>>>>tutorial. How do I debug the conf file??
> >>>>>>>>Or should I be looking somewhere else?
> >>>>>>>>
> >>>>>>>>My biosql.conf file is pasted below.
> >>>>>>>>
> >>>>>>>>Any clues much appreciated.
> >>>>>>>>
> >>>>>>>>Thanks,
> >>>>>>>>Genevieve
> >>>>>>>>
> >>>>>>>>#************** biosql.conf *****************
> >>>>>>>>
> >>>>>>>>[GENERAL]
> >>>>>>>>description = biosql
> >>>>>>>>db_adaptor  = Bio::DB::Das::BioSQL
> >>>>>>>>db_args     = driver    mysql
> >>>>>>>>	      dbname    biosql
> >>>>>>>>	      namespace genbank
> >>>>>>>>	      host      localhost
> >>>>>>>>	      user      nobody
> >>>>>>>>              pass      ''
> >>>>>>>>
> >>>>>>>>plugins = SequenceDumper FastaDumper RestrictionAnnotator
> >>>>>>>>
> >>>>>>>># Web site configuration info
> >>>>>>>>stylesheet  = /gbrowse/gbrowse.css
> >>>>>>>>buttons     = /gbrowse/images/buttons
> >>>>>>>>tmpimages   = /gbrowse/tmp
> >>>>>>>>
> >>>>>>>># where to link to when user clicks in detaild view
> >>>>>>>>#link          =
> >>>>>>>>http://localhost/perl/gbrowse?ref=$ref;start=$start;stop=$en
> >>>>>>>>d link          = AUTO
> >>>>>>>>
> >>>>>>>># what image widths to offer
> >>>>>>>>image widths  = 450 640 800 1024
> >>>>>>>>
> >>>>>>>># default width of detailed view (pixels)
> >>>>>>>>default width = 800
> >>>>>>>>default features = CDS
> >>>>>>>>
> >>>>>>>>
> >>>>>>>># max and default segment sizes for detailed view
> >>>>>>>>max segment     = 500000
> >>>>>>>>default segment = 50000
> >>>>>>>>
> >>>>>>>># zoom levels
> >>>>>>>>zoom levels    = 100 200 1000 2000 5000 10000 20000 40000
> >>>>>>>>100000 200000
> >>>>>>>>500000 1000000
> >>>>>>>>low res = 200000
> >>>>>>>>
> >>>>>>>># colors of the overview, detailed map and key
> >>>>>>>>overview bgcolor = wheat
> >>>>>>>>detailed bgcolor = white
> >>>>>>>>key bgcolor      = beige
> >>>>>>>>
> >>>>>>>>footer = <hr>
> >>>>>>>>	<table width="100%">
> >>>>>>>>	<TR>
> >>>>>>>>	<TD align="LEFT" class="databody">
> >>>>>>>>	For the source code for this browser, see the <a
> >>>>>>>>href="http://www.gmod.org">
> >>>>>>>>	Generic Model Organism Database Project.</a>  For other
> >>>>>>>>questions, send
> >>>>>>>>	mail to <a
> >>>>>>>>href="mailto:lstein at cshl.org">lstein at cshl.org</a>. </TD>
> >>>>>>>>	</TR>
> >>>>>>>>	</table>
> >>>>>>>>	<hr>
> >>>>>>>>	<pre>$Id: 06.biosql.conf,v 1.1 2003/06/26 12:32:23
> >>>>>>>>lstein Exp $</pre>
> >>>>>>>>
> >>>>>>>># examples to show in the introduction
> >>>>>>>>examples = AB019558
> >>>>>>>>
> >>>>>>>># "automatic" classes to try when an unqualified identifier
> >>>>>>>>is given automatic classes = Accession
> >>>>>>>>
> >>>>>>>>[TRACK DEFAULTS]
> >>>>>>>>glyph       = generic
> >>>>>>>>height      = 8
> >>>>>>>>bgcolor     = cyan
> >>>>>>>>fgcolor     = cyan
> >>>>>>>>fontcolor   = black
> >>>>>>>>font2color  = blue
> >>>>>>>>label density = 25
> >>>>>>>>bump density  = 100
> >>>>>>>>description = 1
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>############################################################
> >>>>>>>>## ##################
> >>>>>>>># the remainder of the sections configure particular
> >>>>>>>>features to sho
> >>>>>>>>############################################################
> >>>>>>>>## ##################
> >>>>>>>>
> >>>>>>>>[CDS]
> >>>>>>>>feature      = CDS
> >>>>>>>>glyph        = transcript2
> >>>>>>>>#glyph        = generic
> >>>>>>>>bgcolor      = turquoise
> >>>>>>>>fgcolor      = black
> >>>>>>>>height       = 10
> >>>>>>>>connector    = solid
> >>>>>>>>key           = CDS
> >>>>>>>>
> >>>>>>>>[REPEAT]
> >>>>>>>>feature       = repeat_region
> >>>>>>>>glyph         = generic
> >>>>>>>>bgcolor       = red
> >>>>>>>>height        = 10
> >>>>>>>>description   = 1
> >>>>>>>>key           = Repeat regions
> >>>>>>>>
> >>>>>>>>[TranslationF]
> >>>>>>>>glyph        = translation
> >>>>>>>>global feature = 1
> >>>>>>>>frame0       = cadetblue
> >>>>>>>>frame1       = blue
> >>>>>>>>frame2       = darkblue
> >>>>>>>>height       = 20
> >>>>>>>>fgcolor      = purple
> >>>>>>>>strand       = +1
> >>>>>>>>translation  = 3frame
> >>>>>>>>key          = 3-frame translation (forward)
> >>>>>>>>
> >>>>>>>>[DNA/GC Content]
> >>>>>>>>glyph        = dna
> >>>>>>>>global feature = 1
> >>>>>>>>height       = 40
> >>>>>>>>do_gc        = 1
> >>>>>>>>fgcolor      = red
> >>>>>>>>axis_color   = blue
> >>>>>>>>
> >>>>>>>>[TranslationR]
> >>>>>>>>glyph        = translation
> >>>>>>>>global feature = 1
> >>>>>>>>frame0       = darkred
> >>>>>>>>frame1       = red
> >>>>>>>>frame2       = crimson
> >>>>>>>>height       = 20
> >>>>>>>>fgcolor      = blue
> >>>>>>>>strand       = -1
> >>>>>>>>translation  = 3frame
> >>>>>>>>key          = 3-frame translation (reverse)
> >>>>>>>>
> >>>>>>>>#****************************************************
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>-------------------------------------------------------
> >>>>>>>>This SF.Net email is sponsored by: IntelliVIEW --
> >>>>>>>>Interactive Reporting
> >>>>>>>>Tool for open source databases. Create drag-&-drop reports.
> >>>>>>
> >>>>>>Save time
> >>>>>>
> >>>>>>
> >>>>>>>>by over 75%! Publish reports on the web. Export to DOC,
> >>>>>>
> >>>>>>XLS, RTF, etc.
> >>>>>>
> >>>>>>
> >>>>>>>>Download a FREE copy at
> >>>>>>>>http://www.intelliview.com/go/osdn_nl
> >>>>>>>>_______________________________________________
> >>>>>>>>Gmod-gbrowse mailing list
> >>>>>>>>Gmod-gbrowse at lists.sourceforge.net
> >>>>>>>>https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> >>>>>
> >>>>>-------------------------------------------------------
> >>>>>This SF.Net email is sponsored by: IntelliVIEW -- Interactive
> >>>>>Reporting Tool for open source databases. Create drag-&-drop
> >>>>>reports. Save time by over 75%! Publish reports on the web.
> >>>>>Export to DOC, XLS, RTF, etc. Download a FREE copy at
> >>>>>http://www.intelliview.com/go/osdn_nl
> >>>>>_______________________________________________
> >>>>>Gmod-gbrowse mailing list
> >>>>>Gmod-gbrowse at lists.sourceforge.net
> >>>>>https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> >>>
> >>>--
> >>>Lincoln D. Stein
> >>>Cold Spring Harbor Laboratory
> >>>1 Bungtown Road
> >>>Cold Spring Harbor, NY 11724
> >>>
> >>>NOTE: Please copy Sandra Michelsen <michelse at cshl.edu> on
> >>>all emails regarding scheduling and other time-critical topics.
> > 
> > 
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: IntelliVIEW -- Interactive 
> Reporting
> Tool for open source databases. Create drag-&-drop reports. Save time
> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
> Download a FREE copy at http://www.intelliview.com/go/osdn_nl
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> 



More information about the Bioperl-l mailing list