[Biojava-l] Question about AB1 parser (ABIFChromatogram)
Andy Yates
ayates at ebi.ac.uk
Tue Jun 16 09:56:10 UTC 2009
Well the most likely explanation is that since Phred is doing its own
basecalling it will be using the unprocessed data which is in the AB1
file. This is always longer than the processed data you see in the AB1
file normally and has not been massaged WRT trace trimming, noise
reduction and base lining. So my guess WRT such a huge number of bases
is that phred has gone into the "soup" regions of a trace from the
beginning and the end.
Andy
Ashika Umanga Umagiliya wrote:
> HI Andy,
>
> Thank you for the tip. I checked the AB1 with 'trev' and it gave the
> number of bases as 520.
> But I wonder in Phrap ACE file how it show 1404 bases.
>
> Best Regards,
> Umanga
>
> Andy Yates wrote:
>> Hi,
>>
>> Are you sure that the AB1 file actually contains more than 520bp? Have a
>> check there first (using something like trev should display the AB1
>> processed data) and then we can see if its a problem in the data or the
>> program.
>>
>> Andy
>>
>> Ashika Umanga Umagiliya wrote:
>>
>>> Greetings all,
>>>
>>> When I parse a AB1 file using 'ABIFChromatogram' the number of basecalls
>>> read is lower than the basecalls in a Phrap ACE file.
>>>
>>> For example I parse the AB1 file using biojava ABIChromatogram , it gave
>>> the following basecalls:
>>>
>>> ttaagcaggttaagcgtcctccctgttggtaccgtcaagagtgcacaaa
>>> ttacttacacatatgttcttccctaataacagagttttacgatccgaag
>>> accttcatcactcacgcggcgttgctccgtcaggctttcgcccattgcg
>>> gaagattccctactgctgcctcccgtaggagtctggaccgtgtctcagt
>>> tccagtgtggccgatcaccctctcaggtcggctatgcatcgttgccttg
>>> gtaagccgttaccttaccaactagctaatgcagcgcggatccatctata
>>> agtgacagcaagaccgtctttcacttttgaaccatgcggttcaaaatat
>>> tatccggtattagctccggtttcccgaagttatcccagtcttataggta
>>> ggttatccacgtgttactcacccgtccgccgctaacatcagagaagcaa
>>> gcttctcgtccgttcgctcgatttgcatgtattaggcacgccgccagcg
>>> ttcatcctgagccaggatcaaactctccaa
>>>
>>> Total number is 520 ,
>>>
>>>
>>> But in Phrap ACE file the corrosponding entry is :
>>> (should get compliment sequence to compare)
>>>
>>> RD 2008-10-24_A02_S_R.ab1 1404 0 0
>>> gaatggttgtagagagattggtatgttgtagtggtgtgttgtgtgtgaat
>>> aggtagtaatggtaggagatgttcatgttttcgttgtgtaagaagattat
>>> cgagagagaagatatgttggtattaaatgggaggggataagaacaaaaga
>>> gaacaaatatgtgtgatatatagatttggggaaagggaggtggtagatat
>>> aattgggttgggtgggtaggactggtgattggattggtgtgatggggagg
>>> agttggtagtaatgtgttgtggttttttgattttgcgtttagttagacat
>>> atatgtaacgagaggattgattgagatagtaaatatgagacgggattagc
>>> caaggatcagaagaggaggggggaaaggtggggagaggaagggaggattg
>>> acgataggagaggtgtctagtgtgggtgagaggtgggtgaatatttggtg
>>> gagatgtgtgggtatttagatgttgtgagattggattgtggacgtagggt
>>> ggatgtggtttgggagttggagaatgggtgtgtagttgtggatatagtta
>>> tcaggttgtgagaagtggagaagaggggagagaagagagggggggaaaga
>>> ggaaagaagagagaaatagaaggagattatgggagggtggaagggacgta
>>> atgagtgttgatttatgatgtatagtttagatgggtggtatggatatttt
>>> tgggcgagggaaggagtgaggggatagatggacagagggttntggagcta
>>> ttgttttgtttcttgatgtgggtgttggggtgttgattttgtagttctac
>>> tttagtttgggtgtagaacagggggattcaaggtcagagtgagtgtgggg
>>> gtaaggaagttgatagttgtcagttccttntgaagtTGGAGAGTTTGATC
>>> CTGGCTCAGGATGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAGCG
>>> AACGGACGAGAAGCTTGCTTCTCTGATGTTAGCGGCGGACGGGTGAGTAA
>>> CACGTGGATAACCTACCTATAAGACTGGGATAACTTCGGGAAACCGGAGC
>>> TAATACCGGATAATATTTTGAACCGCATGGTTCAAAAGTGAAAGACGGTC
>>> TTGCTGTCACTTATAGATGGATCCGCGCTGCATTAGCTAGTTGGTAAGGT
>>> AACGGCTTACCAAGGCAACGATGCATAGCCGACCTGAGAGGGTGATCGGC
>>> CACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTAGG
>>> GAATCTTCCGCAATGGGCGAAAGCCTGACGGAGCAACGCCGCGTGAGTGA
>>> TGAAGGTCTTCGGATCGTAAAACTCTGTTATTAGGGAAGAACATATGTGT
>>> AAGTAACTGTGCACATCTTGACGGTACCtaacatggaggcctgttcctcg
>>> ttaa
>>>
>>> My question is why biojava parser only give 520 bases? Is there a way to
>>> parse all 1404 bases as seen in Phrap ACE file?
>>>
>>> Thanks in advance,
>>> umanga
>>>
>>> _______________________________________________
>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
More information about the Biojava-l
mailing list