[Bioperl-l] Bio::SeqIO::scf traces scrambled?

Charles Tilford charles.tilford at bms.com
Thu Jun 18 19:59:01 UTC 2009


Chris Fields wrote:
> Charles,
>
> The best way to make sure this is addressed is to file a ticket (bug  
> report) on it so we can properly track it.
Ok, I'll put that in.
>
> AFAIK this module doesn't use staden::read but is pure perl. 
Yes, that's my understanding too. I'm using the SeqIO module because of 
ongoing hiccups with the staden installation.
> Note: there is also Bio::SCF (non-bp):
>
> http://search.cpan.org/~lds/Bio-SCF-1.01/
>   
I have that installed, but have not tried it out yet.

Thanks!
-CAT
> chris
>
> On Jun 18, 2009, at 8:38 AM, Charles Tilford wrote:
>
>   
>> Nutshell: Bio::SeqIO::scf seems to be mixing up A/C/G/T trace  
>> channels. Can anyone confirm?
>>
>> Hi all,
>>
>> I'm using the SCF Bio::SeqIO module to parse trace data out of  
>> chromatograms. The SCF files are being produced by phred using the "- 
>> cd" parameter. The traces come out great, and the corresponding base  
>> calls from the .phd files align with the peaks wonderfully when I  
>> visualize them on a rendered trace. However, only the A bases align  
>> to the appropriate trace channel, the rest are mixed up. I find that  
>> if I do the following re-mapping, the phred base calls match the
>>
>> SeqIO : Remapped
>> A : A
>> C : G
>> G : T
>> T : C
>>
>> The relevant part of Bio::SeqIO::scf is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE9
>>
>> ... which indicates that it expects the pack()ed trace data to be in  
>> order ATGC. The base call parsing code is here:
>>
>> http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/SeqIO/scf.html#CODE8
>>
>> ... which is unpacking in order ACGT. As far as I can tell, the  
>> relevant official SCF documentation is here:
>>
>> http://staden.sourceforge.net/manual/formats_unix_4.html
>>
>> ... which indicates that both trace and base order should be ACGT  
>> (matching the SeqIO unpack() for bases, but not traces). My  
>> empirical channel unscrambling mapping implies order ACTG, which is  
>> different from either of the two orders above. The sequence from the  
>> SCF file (should be that from original AB1 file, I think) is not  
>> perfectly identical to that called by phred, but is very similar (to  
>> be expected); that is, I don't need to remap C, G and T to get it to  
>> align with the phred data.
>>
>> So it looks like the SeqIO module is not mapping the sections of the  
>> packed trace data to the appropriate bases. The unpack order is  
>> different than the staden documentation ... but so is the order I  
>> impose to correct the problem. I am still unclear as to the  
>> differences between V2 and V3 of the format. The major difference  
>> appears to be coding the trace absolutely (V2) or relatively to  
>> prior values (V3); I'd expect if I was using one format and SeqIO  
>> was trying to parse the other that I would get garbage out. Running  
>> in verbose reports "scf.pm is working with a version 2 scf."
>>
>> Thoughts on this would be appreciated - can anyone confirm a problem  
>> with trace extraction from SCF?
>>
>> I'm hoping that once I convince our admin to (properly) install  
>> staden::read that I can work directly with the ab1 files, but I need  
>> to stopgap on SCF for the time being....
>>
>> -CAT
>>     
>
>
>
>   



More information about the Bioperl-l mailing list