[Bioperl-l] Re: Question on the scf file format
Heikki Lehvaslaiho
heikki at ebi.ac.uk
Mon Oct 27 11:24:52 EST 2003
Guillaume,
In other words: Tony's fixes are only in the CVS head and in developer
releases in our website. They never made it out into the 1.2 release
series. It was an oversight which I apologise profusely.
Please install the 1.3.02 from
http://bioperl.org/DIST/current_core_unstable.tar.gz
and try again.
Yours,
-Heikki
On Mon, 2003-10-27 at 15:38, Tony Cox wrote:
> On Mon, 27 Oct 2003, Guillaume Giraudon wrote:
>
> Hi Guillaume,
>
> This looks like this may be the "cast" you have to apply if you are using
> 8bit/16bit data values. Check the code I added to the Bioperl SCF.pm module to
> make this work properly for all SCF files.
>
> Tony
>
>
>
> +>Hi Jason, Hi Tony, Hi Heikki
> +>
> +>I'm sorry to bother you with this simple question but I saw one of your source files (scf.pm) on the web and though you might be able to help me out on this matter : I am trying to write a web based scf file viewer (in php). I came across a lot of documents that seem to all be based on the RFC I found at
> +>http://www.mrc-lmb.cam.ac.uk/pubseq/scf-rfc.html
> +>
> +>I have attached a zip file of the files I'm working with so that you might take a look at them. I'm comparing the results I get from my program with what Chromas (v1.45) gives me. So far, I believe I'm parsing the header correctly. What I get makes sense :
> +>
> +>scf_header Object
> +>(
> +> [magic_number] => 779314022
> +> [samples] => 10934
> +> [samples_offset] => 128
> +> [bases] => 899
> +> [bases_left_clip] => 0
> +> [bases_right_clip] => 0
> +> [bases_offset] => 87600
> +> [comments_size] => 364
> +> [comments_offset] => 98388
> +> [version] => 3.00
> +> [sample_size] => 2
> +> [code_set] => 0
> +> [private_size] => 0
> +> [private_offset] => 0
> +> [spare] =>
> +>)
> +>
> +>Now when I start parsing the Samples section, I get confused. From what I can gather, Its composed of delta differences between each sample (and not the values themselves as I originally thought).
> +>
> +>Strangely, I believe I'm calculating my offsets fine because the very first vales of all A,C,G and T match what I have in the raw_data.txt file (exported with chromas). But I cant seem to read the rest of the samples correctly.
> +>
> +>Here is a little HEX extract from the scf file :
> +>
> +>00000000h: 2E 73 63 66 00 00 2A B6 00 00 00 80 00 00 03 83 ; .scf..*¶...€...ƒ
> +>00000010h: 00 00 00 00 00 00 00 00 00 01 56 30 00 00 01 6C ; ..........V0...l
> +>00000020h: 00 01 80 54 33 2E 30 30 00 00 00 02 00 00 00 00 ; ..€T3.00........
> +>00000030h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
> +>00000040h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
> +>00000050h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
> +>00000060h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
> +>00000070h: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
> +>00000080h: 00 A0 FF 60 00 00 00 10 00 03 FF FE FF FD FF FB ; . ÿ`......ÿþÿýÿû
> +>00000090h: FF FE FF FA FF FB FF FD FF FB FF FF FF FF 00 00 ; ÿþÿúÿûÿýÿûÿÿÿÿ..
> +>000000a0h: FF FF 00 01 00 00 FF FD 00 01 FF FD 00 03 00 00 ; ÿÿ....ÿý..ÿý....
> +>000000b0h: 00 04 00 01 00 03 00 01 00 03 00 00 00 02 FF FF ; ..............ÿÿ
> +>000000c0h: 00 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 ; ................
> +>
> +>The first 128 bytes are the header and I seem to read the part fine. My samples_offset is 128 (80h) and my sample_size is 2. At that location (80h), the first value is '00 A0' witch is 160 in decimal. That’s exactly my first value so that look about right. But then I have FF 60. Is this supposed to be a delta ?
> +>I though it could be but then, if its an unsigned value, it’s a bit huge.
> +>I tried to consider it as a signed value but then again, I cant seem to get the same thing as Chromas.
> +>
> +>Here is a short extract of what I get :
> +>
> +>A C G T
> +>160 0 72 6
> +>-160 0 -84 -8
> +>0 0 -5 -1
> +>16 0 2 2
> +>3 0 3 1
> +>-2 0 2 0
> +>-3 0 5 0
> +>-5 0 4 0
> +>-2 0 1 0
> +>-6 0 0 0
> +>-5 0 0 0
> +>-3 0 0 0
> +>-5 0 0 0
> +>-1 0 0 0
> +>
> +>Any idea of what I might be doing wrong ?
> +>
> +>Thank you in advance,
> +>
> +>G.Giraudon
> +>
> +>
> +>
>
> ******************************************************
> Tony Cox Email:avc at sanger.ac.uk
> Sanger Institute WWW:www.sanger.ac.uk
> Wellcome Trust Genome Campus Head,Software Services
> Hinxton Tel: +44 1223 834244
> Cambs. CB10 1SA Fax: +44 1223 494919
> ******************************************************
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list