[Bioperl-l] scf version 2 traces

Anthony Underwood Anthony.Underwood at hpa.org.uk
Thu Oct 9 17:08:31 UTC 2008


Hi all,

 

A long time ago (March 2004) I had a discussion with Chad about reading
scf files in Bioperl. I noticed there may be some problems with version
2 files. I now mostly code in ruby and so am contributing to bioruby.

 

I have been writing code to extract trace information from scf files
based on some code from another biorubyist for reading ABI files and
then looking at the code in Bioperl. I now have this working and a whole
better understanding of reading binary files. I believe I have
discovered the bugs in Bioperl for reading version2 scf traces.

 

In scf.pm

 

In the _parse_v2_traces method I believe the lines entering the
information into the traces array should be as below since the order is
specified here
http://staden.sourceforge.net/manual/formats_unix_4.html#SEC4

 

                          push @{$traces->{'a'}},$read[$offset2];

                          push @{$traces->{'t'}},$read[$offset2+1];

                          push @{$traces->{'g'}},$read[$offset2+3];

                          push @{$traces->{'c'}},$read[$offset2+2];

 

also the $buffer for this method passed in from the next_seq method is
incorrect because the offset isn't correct. In the next_seq method the
last of the following lines should be changed

 

                $creator->{header} = $self->_get_header($buffer);

                if ($creator->{header}->{'version'} lt "3.00") {

                                $self->debug("scf.pm is working with a
version 2 scf.\n");

                                # first gather the trace information

                                $length =
$creator->{header}->{'samples'} *

                                  $creator->{header}->{sample_size}*4;

                                $buffer = $self->read_from_buffer($fh,
$buffer, $length, $creator->{header}->{samples_offset});

 

To 

 

$buffer = $self->read_from_buffer($fh, $buffer, $length,
$creator->{header}->{sample_offset});

 

Note sample_offet not samples_offset.

 

I have tested these corrections using other sequence viewers (Chromas,
FinchTV) and with these changes the output is now correct.

 

Can these be updated in the live code and next release version.

 

Thanks

 

Anthony

Dr Anthony Underwood
Bioinformatics Unit | Statistics, Modelling and Bioinformatics
Department
Centre for Infections
Health Protection Agency
61 Colindale Avenue
London
NW9 5HT
t: 0208 3276466  f: 0208 3276738  e:anthony.underwood at hpa.org.uk

 



-----------------------------------------
**************************************************************************
The information contained in the EMail and any attachments is
confidential and intended solely and for the attention and use of
the named addressee(s). It may not be disclosed to any other person
without the express authority of the HPA, or the intended
recipient, or both. If you are not the intended recipient, you must
not disclose, copy, distribute or retain this message or any part
of it. This footnote also confirms that this EMail has been swept
for computer viruses, but please re-sweep any attachments before
opening or saving. HTTP://www.HPA.org.uk
**************************************************************************



More information about the Bioperl-l mailing list