[Bioperl-l]ABI.pm and .ab1 files
Malay
mbasu at mail.nih.gov
Wed Apr 28 21:40:10 EDT 2004
Kevin Roland Viel wrote:
> Malay,
>
> Thanks. I diverted some attention to perl, but just not enough :(
>
> This still frustrates me, but I have found a kludge. I use phred so I
> just added the -cd <directory> -cp 2. The SCF file produced has a 128
> byte header followed by the data of interest. I read this file directly
> using SAS (using the S370FPIB2. informat). The file is in big endian
> format. If anyone following this thread might suggests where and how
> the ab1 format stores these data, I would be very thankful as it would
> save space (why have a .scf and .ab1 file?).
Here you go:
ABI file can start with a 128 byte header (if generated in MAC ). So
basically search for the the either the 0-3 or bytes 128-131 for the
string "ABI", For all calculation after that offset the byte number
accordingly if the mac header is present.
Read bytes 18-21 shows a number read as N
Read byte 26-29 shows the the offset of a table index. Read that number (A).
Go to the position A and loop N times reading each time 28 bytes at a
time each time check for the presence of string "DATA" in the first 4
bytes of the 28 bytes read and increment a counter, whenever the string
is present. DATA segments 9 - 12 contains the adresses of traces. We
don't know which segments represents which base. To know that you also
have to look for the 28 bytes segments starting with "FWO_". The address
of each segment is given as 32 bit long integer presnt form bytes 20-23
of each 28 byte segment. Note all the five offset addresses. In each 28
bytes of DATA segment the bytes 8-11 reprenstn a 32 bit integer
containing the number of point in the trace value.
Go to "FWO_" offset and read 4 bytes each containing a base. For
example if you read A then G then C then T that means the DATA segments
9 - 12 has the same order.
Now go to offset of each DATA segment (segements 9 - 12) read a series
of 16 bit long interger as many times shown by the length of the DATA
segment.
There you are you have your trace values. :)
>
> For what is worth, I have attached a gif of my subregion. I have found
> it very useful and very informative for review.
>
> Regards,
>
> Kevin
>
> Kevin Viel
> Department of Epidemiology
> Rollins School of Public Health
> Emory University
> Atlanta, GA 30322
>
> On Wed, 28 Apr 2004, Malay wrote:
>
>
>>Here is a way to do this out of the BioPerl:
>>
>>Download and istall ABI.pm
>>
>>http://cpan.uwinnipeg.ca/cpan/authors/id/M/MA/MALAY/ABI-0.01.tar.gz
>>
>>use ABI;
>>
>>my $abi = ABI->new(-file=>"mysequence.abi");
>>my $seq = $abi->get_sequence(); # To get the sequence
>>my @trace_a = $abi->get_trace("A"); # Get the raw traces for "A"
>>my @trace_g = $abi->get_trace("G"); # Get the raw traces for "G"
>>my @base_calls = $abi->get_base_calls(); # Get the base calls
>>
>>Malay
>>malay at mail.nih.gov
>>
>
>
I hope this helped.
Malay
mbasu at mail.nih.gov
More information about the Bioperl-l
mailing list