[Biojava-l] concatenating chromatograms

Andy Yates ady at sanger.ac.uk
Thu Feb 2 11:23:54 EST 2006


Throwing my opinion into the ring on this I've got to agree with Russ 
here. I would think that SCF is a more sensible format for this kind of 
procedure but there is the added bonus that the SCF parser does not 
encode delta-delta values which the SCF specification is completely 
dependant on.

SCF does have the advantage that nothing "really" assumes anything about 
them so you can fiddle about with the chromatogram and so long as the 
things you create in the output Chromatogram are normalised with respect 
to the cuts then everything should be hunky dory.

If you're doing this for space concerns can I suggest passing the SCF 
files through a compression filter. You get the best results with a 
BZIP2 compression algorithm (the format was developed for bzip 
compression) but GZIP works really well and is the choice of compression 
format here at the Sanger Centre.

Hope that helps,

Andy Yates
~~~~~~~~~~~~~~~
Senior Computer Biologist,
Cancer Genome Project.

Wellcome Trust Sanger Institute,
Hinxton, Cambridge

Russ Kepler wrote:
> On Wednesday 01 February 2006 11:41 pm, Heather Kent wrote:
>> I would like to write a small application that would concatenate abi or scf
>> chromatograms and write out a new chromatogram file..
>>  has anyone done something similar to this or seen any code that would be
>> helpful for me, i am new at programming
>> and have been looking through the Biojava API
> 
> I'm familiar with the ABI trace code and what you want to do would not be 
> difficult, but the result may not work the way that you want it to.  A 
> basecaller will likely be fooled in the transition between the traces and 
> miscall or call no peaks for some time unless you match the local frequencies 
> of each trace around the transition, and tagging the start of one run to the 
> end of the other is a pretty good way to not do that.
> 
> If you're not going to run things through a basecaller all you really need to 
> do it is to catenate the trace and basecalls arrays and sequences.  These are 
> all exposed in gets().  If the data is coming from a newish AB instrument you 
> may want to add code to handle the Q values from the KB caller and catenate 
> those arrays as well.
> 
> Writing the new file would be a new capability, but the existing reader should 
> show you the way to do it.
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l


More information about the Biojava-l mailing list