[Bioperl-l] [Gmod-gbrowse] scores in Bio::DB::BigBed

Chris Fields cjfields at illinois.edu
Mon Jul 4 16:10:43 UTC 2011


I generally follow these rules where I want a common set of possibly volatile features (e.g. specific transcriptome analysis) separate from my main 'stable' feature database (e.g. gene models):

1) BigBed - lightweight bundle of simple features where the ranges may overlap, but I'm not concerned about score.  I have found BED/BigBed scores of limited use in most cases to me unless I scale the data (since they must be 0-1000 integer values).  Document it very well if you do any scaling! YMMV

2) SAM/BAM - bundle of (possibly overlapping) features where summary stats are needed.  I've seen these used for BLAST/BLAT runs, etc.

3) BigWig - quantitative data of fixed or varying ranges covering entire genome, ranges can't overlap

4) BedGraph - quantitative sparse data, ranges can't overlap (these are converted over to BigWig for GBrowse, though)

5) Of course, one can also set up separate DB::SF::Store databases as well depending on your needs (I have used both the SQLite and MySQL adaptors for this).

I think this is almost begging for a 'best practices' chart/table somewhere, maybe a GBrowse 'cookbook' of common data representation cases.

chris

On Jul 4, 2011, at 8:22 AM, Lincoln Stein wrote:

> I had a look at the output of bigBedSummary, which is from Jim Kent's source
> tree (no Perl involved), and it appears that the statistics it provides are
> limited to coverage; so I don't think you can do anything with the scores if
> you're using BigBed indexing. Have a look at BedGraph=>BigWig and see if it
> meets your needs.
> 
> Lincoln
> 
> On Mon, Jul 4, 2011 at 9:04 AM, Lincoln Stein <lincoln.stein at gmail.com>wrote:
> 
>> Hi Dan,
>> 
>> The documentation for BigBed is scanty; all I know about it is what is
>> provided by the bigbed library is in Jim Kent's bigbed.h include file. I had
>> thought that the scores in BED files would come through into the summary
>> statistics like those in BigWig, but now I'm looking at the example data
>> provided in Jim's source code, and see that the BigBed example source file
>> has scores of "0".
>> 
>> I'll investigate whether there is an issue in the Perl layer, but it could
>> easily be a limitation in the library itself. Have you considered using a
>> BedGraph file and indexing it with bedGraphToBigWig? I know that the
>> Bio::DB::BigWig interface works perfectly to retrieve and summarize the
>> scores.
>> 
>> Lincoln
>> 
>> 
>> On Sun, Jul 3, 2011 at 5:48 AM, Daniel Lang <
>> Daniel.Lang at biologie.uni-freiburg.de> wrote:
>> 
>>> Hi,
>>> 
>>> quick question about the BigBed adaptor: Is it correct that the bin and
>>> summary functions only return statistics about the number of features in
>>> the defined intervals?
>>> I was expecting them to deliver statistics about the score if the
>>> respective bb file has a defined score field.
>>> If this is true, does this also mean that I cannot plot the distribution
>>> of scores in BigBed files in gbrowse?
>>> 
>>> This is the first time I'm using BigBed, maybe I'm doing something
>>> wrong...
>>> 
>>> I had some trouble formatting the bed files correctly in order to see
>>> the score in the features returned by the Bio::DB::BigBed::features()
>>> routine. It seems the bigbed entries will only have a correctly assigned
>>> score field if you also provide a non-empty name field. Initially I
>>> thought that the order of columns is irrelevant if you use an .as file
>>> in the bedToBigBed call, but that doesn't seem to be the case.
>>> 
>>> Best,
>>> Daniel
>>> --
>>> 
>>> Dr. Daniel Lang
>>> University of Freiburg, Plant Biotechnology
>>> Schaenzlestr. 1, D-79104 Freiburg
>>> fax:        +49 761 203 6945
>>> phone:      +49 761 203 6989
>>> homepage:   http://www.plant-biotech.net/
>>>           http://www.cosmoss.org/
>>> e-mail <http://www.cosmoss.org/e-mail>:
>>> daniel.lang at biologie.uni-freiburg.de
>>> 
>>> #################################################
>>> My software never has bugs.
>>> It just develops random features.
>>> #################################################
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> All of the data generated in your IT infrastructure is seriously valuable.
>>> Why? It contains a definitive record of application performance, security
>>> threats, fraudulent activity, and more. Splunk takes this data and makes
>>> sense of it. IT sense. And common sense.
>>> http://p.sf.net/sfu/splunk-d2d-c2
>>> _______________________________________________
>>> Gmod-gbrowse mailing list
>>> Gmod-gbrowse at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>> 
>> 
>> 
>> 
>> --
>> Lincoln D. Stein
>> Director, Informatics and Biocomputing Platform
>> Ontario Institute for Cancer Research
>> 101 College St., Suite 800
>> Toronto, ON, Canada M5G0A3
>> 416 673-8514
>> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
>> 
> 
> 
> 
> -- 
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list