[Bioperl-l] Next Gen Formats

Chris Fields cjfields at illinois.edu
Fri Mar 12 14:06:51 UTC 2010


For the colorspace fasta we could derive a parser just for that based on the current fasta parser.  They could retain their original color space designation (maybe via a meta designation), and possibly convert to sequence calls based on their mapping (if the following link is current):

http://marketing.appliedbiosystems.com/images/Product_Microsites/Solid_Knowledge_MS/pdf/SOLiD_Dibase_Sequencing_and_Color_Space_Analysis.pdf

Did the sequencing facility provide the actual sequence, though, and not just the color calls and qual?  Seems strange to not provide it...

chris

On Mar 12, 2010, at 7:43 AM, Ryan Golhar wrote:

> Direct from sequencing machine
> 
> ------Original Message------
> From: Peter
> Sender: p.j.a.cock at googlemail.com
> To: golharam at umdnj.edu
> Cc: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Next Gen Formats
> Sent: Mar 12, 2010 8:26 AM
> 
> On Fri, Mar 12, 2010 at 1:09 PM, Ryan Golhar <golharam at umdnj.edu> wrote:
>> 
>> Here is an example of a color-space sequence:
>> 
>> In one file (something.csfasta):
>> 
>>> 1_30_226_F3
>> T210320010.200.03.0110320320220212200122200.2220200
>>> 1_30_252_F3
>> T322220212.133.00.2202322132022202221002011.0011020
>> 
>> The '.' means the color could not be called
>> 
>> In another file (something.qual):
>> 
>>> 1_30_226_F3
>> 4 4 27 17 31 7 24 26 13 -1 10 25 14 -1 26 4 -1 19 9 5 6 14 12 6 9 4 4 7 7 20
>> 4 4 19 12 12 4 4 12 10 10 5 4 -1 13 16 8 4 15 4 4
>>> 1_30_252_F3
>> 18 4 19 15 9 4 4 5 4 -1 6 4 5 -1 5 6 -1 9 6 4 4 4 6 4 4 4 4 5 8 4 8 7 4 7 5
>> 4 4 10 9 12 8 4 -1 6 5 5 4 10 4 12
>> 
>> The -1 represents those colors that could not be called.
> 
> Now that is funny (using -1). True PHRED scores are defined with a
> logarithm and can't be negative. A score of zero is normally used in
> this situation since that maps to a probability of error of 1 (i.e. the
> read is 100% wrong, or 0% true).
> 
> Where did these files come from? Direct from a sequencing
> machine or via some third party script?
> 
> Peter
> 
> 
> Sent from my Verizon Wireless BlackBerry
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list