[Biojava-l] concatenating chromatograms
Andy Yates
ady at sanger.ac.uk
Thu Feb 2 11:23:54 EST 2006
Throwing my opinion into the ring on this I've got to agree with Russ
here. I would think that SCF is a more sensible format for this kind of
procedure but there is the added bonus that the SCF parser does not
encode delta-delta values which the SCF specification is completely
dependant on.
SCF does have the advantage that nothing "really" assumes anything about
them so you can fiddle about with the chromatogram and so long as the
things you create in the output Chromatogram are normalised with respect
to the cuts then everything should be hunky dory.
If you're doing this for space concerns can I suggest passing the SCF
files through a compression filter. You get the best results with a
BZIP2 compression algorithm (the format was developed for bzip
compression) but GZIP works really well and is the choice of compression
format here at the Sanger Centre.
Hope that helps,
Andy Yates
~~~~~~~~~~~~~~~
Senior Computer Biologist,
Cancer Genome Project.
Wellcome Trust Sanger Institute,
Hinxton, Cambridge
Russ Kepler wrote:
> On Wednesday 01 February 2006 11:41 pm, Heather Kent wrote:
>> I would like to write a small application that would concatenate abi or scf
>> chromatograms and write out a new chromatogram file..
>> has anyone done something similar to this or seen any code that would be
>> helpful for me, i am new at programming
>> and have been looking through the Biojava API
>
> I'm familiar with the ABI trace code and what you want to do would not be
> difficult, but the result may not work the way that you want it to. A
> basecaller will likely be fooled in the transition between the traces and
> miscall or call no peaks for some time unless you match the local frequencies
> of each trace around the transition, and tagging the start of one run to the
> end of the other is a pretty good way to not do that.
>
> If you're not going to run things through a basecaller all you really need to
> do it is to catenate the trace and basecalls arrays and sequences. These are
> all exposed in gets(). If the data is coming from a newish AB instrument you
> may want to add code to handle the Q values from the KB caller and catenate
> those arrays as well.
>
> Writing the new file would be a new capability, but the existing reader should
> show you the way to do it.
> _______________________________________________
> Biojava-l mailing list - Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
More information about the Biojava-l
mailing list