[Biopython-dev] [Bug 2382] Generic FASTA parser

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Oct 16 21:58:38 UTC 2007


http://bugzilla.open-bio.org/show_bug.cgi?id=2382





------- Comment #5 from jflatow at northwestern.edu  2007-10-16 17:58 EST -------
Nope, they actually have a file format that looks like this:

Position        Consensus       Quality Score   Depth   Signal  StdDeviation
>contig00001    1
1       G       64      2       1.00    0.00
2       A       64      2       1.00    0.00
3       G       64      2       1.00    0.00
4       A       64      2       1.00    0.00
5       G       64      2       2.00    0.00
6       G       64      2       2.00    0.00
7       A       64      2       3.00    0.00
8       A       64      2       3.00    0.00
9       A       64      2       3.00    0.00
10      C       64      2       2.00    0.00
11      C       64      2       2.00    0.00
12      T       64      2       1.00    0.00
13      C       64      2       3.00    0.00
14      C       64      2       3.00    0.00
15      C       64      2       3.00    0.00
16      G       64      2       1.00    0.00
17      T       64      2       1.00    0.00
18      G       64      2       1.00    0.00
19      A       64      2       1.00    0.00
20      T       64      2       1.00    0.00
21      C       64      2       2.00    0.00
22      C       64      2       2.00    0.00

Note the file-wide header at the top of the page (a generic FASTA-like parser
might skip to the first '>'), or we could get rid of that beforehand but it
would be nice if it were smart.

Also, here is another sample FASTA-like file format they use for pair
alignments:

>ERSGEES01EM5WC, 2..30 of 95 and ERSGEES01C1ZV2, 1..29 of 268   (29/29 ident)
         2 CGGTGACCCGGGAGATCTGAATTCCTGGT 30
         1 CGGTGACCCGGGAGATCTGAATTCCTGGT 29
>ERSGEES01EM5WC, 2..29 of 95 and ERSGEES01DMS5T, 1..28 of 259   (28/28 ident)
         2 CGGTGACCCGGGAGATCTGAATTCCTGG 29
         1 CGGTGACCCGGGAGATCTGAATTCCTGG 28
>ERSGEES01EM5WC, 29..2 of 95 and ERSGEES01D8GDV, 205..232 of 232   (28/28 ident)
        29 CCAGGAATTCAGATCTCCCGGGTCACCG 2
       205 CCAGGAATTCAGATCTCCCGGGTCACCG 232


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list