[Biojava-l] Biojava Parsers : Apply quality values for contig ?

Ashika Umanga Umagiliya aumanga at biggjapan.com
Wed Feb 25 07:44:24 UTC 2009


Greetings all,


I am using 'phred/phrap' to assemble DNA sequences ,and 'phrap' 
generates contig file and a contig-quality files for an assembly.
Now I want to parse these two files and generate final contig , by 
removing Bases with '0' quality values.

For example :
CGACTATG + 0 42 54 59 48 0 0 0 > _GACT____

Why I want to do this is; because only this "masking" will give the 
similar contig that of which generated by ChromasPro.

I can use Fasta-parser to parse contig file.But I wonder whether theres 
anyway to handle parsing of Quality file in BioJava.

Below I have give the structures of two file types:

thanks in advance,
Umanga



contig file:
------------
 >seqs_fasta.Contig1
TTGGAGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCAGGCCTAACA
CATGCAAGTCGAACGGTAACAGGAAGCAGCTTGCTGCTTTGCTGACGAGT
GGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGGGATAAC
TACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGG
GACCTTCGGGCCTCTTGCCATCGGATGTGCCCAGATGGGATTAGCTAGTA
GGTGGGGTAACGGCTCACCTAGGCGACGATCCCTAGCTGGTCTGAGAGGA
TGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCA
GCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGC
GTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAG
GGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCAC
CGGCTAATTCCGTGCCAGCAGCCGCGGTAATATTNTTATTCTTTATGTAT
ACATATTCTTTTTACTTTATTCTATTAAATTTATTCTTTCATAATTAAAC
CTTCCCTTACACCCATTCCACCTCCCATCCCTCTTCCCCTCCCACTCTCC
ATCTCATATGGCGTTCGCGCCTCTCTCTTCATCTCCTCCTATATTTATTC
TAACTTCTTTCATCTCAATCATTTCTTCTGTCTCATCCTTCCATTCTTTC
CATGATCTCCCCCATTGTCATGTCTTCAAAAAACCACACAAAACACTAGA
ATCTTTTCTTATTACACACAAGTATATACAATTTTTAACAATCCATTAAA
ACACACACAACACCTAGCAATCAACAACGCTACCATCCCCAATATTCTCT
GTTCTCCTCTCTTTCTCCGCGTGCATCTGCGCACTACTCTCTAATTTCAT
CTCTATTATCTTTTTTTCTTAACTCATCCGCATACATCCAAGACTCTAGA
CCCATTTCTCGCCTCTTTCATTTACTGCCGATACAGAGCTTATAAATTCT
ATATCATTTATCCACACTCATTATTAAATAGGCTGACACCTCTAACCGTC
CACTACACCACCTTTCCCATGCCATCTCCCTAACACTGCACTCATCCGTA
ACTTCCTACTCTACCCTCTCTTTCTTTCCTTACTTTCTTTTCTTTCTCTT
ACATTTTTATTTAAAATTCCTCTTTTAGCCTCTATTTTCTGTTATCTACT
TTTCTCCTAAATTCCCCCTATTCTTCACGTCCCATACCTATCCCTACCAC
CACCACTACCACCCCTCTCTTCATTCTACTCGCTCTAAACCCTCCACCCT
CCCCTCCTTGCTCTTATGTATCTCCTCATCTTTTAAT



quality file
------------
 >seqs_fasta.Contig1
0 23 23 33 33 33 33 33 31 41 47 47 47 47 47 47 47 50 47 47 57 59 59 59 
42 42 35 42 42 54 59 48 48 48 48 48 48 54 57 57 57 57 57 54 54 57 54 54 
54 74
74 74 74 59 57 57 57 57 72 72 84 76 73 72 72 72 79 81 74 74 62 50 50 50 
59 39 43 32 35 32 43 58 44 48 70 70 58 73 55 69 67 87 87 90 90 90 90 90 
90 90
90 90 90 90 90 90 90 90 89 90 90 90 90 90 90 90 85 87 87 90 90 90 90 90 
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 
90 90
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 
90 90 90 90 90 90 90 90 90 90 90 77 77 77 81 81 90 90 90 90 90 90 90 90 
90 90
90 90 90 90 90 87 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 
90 90
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 
90 90
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 
90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 
90 90
90 90 90 90 74 74 85 90 90 90 90 90 90 90 90 90 90 90 90 83 83 90 90 90 
90 90 90 90 75 83 83 89 89 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 
90 90
90 72 72 72 57 57 43 37 37 43 72 72 72 72 72 72 90 90 90 90 90 90 90 86 
90 90 90 90 90 90 90 79 85 83 90 90 90 89 87 87 90 90 90 90 67 67 79 78 
90 86
88 82 73 68 65 61 59 63 62 68 71 72 59 56 41 35 30 30 28 32 41 47 40 56 
49 42 49 51 50 37 37 39 39 37 52 54 51 46 20 20 27 24 32 24 20 20 21 24 
16 19
19 33 29 22 23 12 11 11 12 20 23 40 32 31 28 22 13 13 18 26 28 28 34 28 
25 24 28 23 26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0



-- 
アシカ ウマンガ ウマギリヤ
㈱国際バイオインフォマティクス研究所(BiGG)
〒140-0001
東京都品川区北品川3-6-9 アンドウビル8F
TEL:03-6679-8763
FAX:03-6679-8764




More information about the Biojava-l mailing list