[Bioperl-l] WGS, WGS_SCAFLD support added for GenBank files

Chris Fields cjfields at uiuc.edu
Fri Mar 10 13:48:40 UTC 2006


The second was built using bioperl, so postprocess_data isn't working  
as expected.  I committed a change to NCBIHelper in CVS yesterday to  
fix this by retrieving the sequence directly from NCBI using format  
'fasta.'

Chris

On Mar 9, 2006, at 8:38 PM, Brian Osborne wrote:

> Chris,
>
> Below...
>

....

>>
>>> CH398085 Oryza sativa (indica cultivar-group) chromosome 1  
>>> scaffold000005
>> genomic scaffold, whole genome shotgun sequence (from NCBI)
>> ....
>> TTAGGTGGTTTTATAACTTTAGACTTTGGGAATTTTCATATCACCTGGACACTATGGAAT
>> TGTTGGATGATGGTGGAATTGGACATACACCTCTCTTCCTCTTTCAAAACCCCTAAAACC
>> TGTTTTCGGTGGGGTTTGGGTGCATGCCAGTTGTGGGAAGTAGCACCCCGGGCACTATAA
>> GGATTAAGCTCAGGCCTCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>> NNNNNNNNNCTGAGTACTGTGGTTGTACTCATTCTTGCTCAATCTTTTCCCCCTTCAGTA
>> AGAGAAGATTTGGAGAAGAAGTCTTAGGTGGAGTCCTGGCTTATACCCCAGTTGAGCGCC
>> TGTGAAGATGGAGCCGTAGGCCCGCTAGTCCGCTGCTGTTTATTTTTGATTGTCAGGCCT
>> TAAGTGCCTTTGTAATAATGTAAATATTATCGATATAATAAAGATGTGTCTTTTATATCA
>> TGTTTGTGTGGTGTACCCCGGCTTTTCCTGGGACGGGGATTAATACACTAGCGTTCGGGA
>> AAAGGCAATTTTCCCGGTCGCGACAGAACTTGTAATTCTCTAGCACTAGAATGACATATC
>> CTTTGGATTGTGCACCAATGCCACGCGAAAACCCATGGTGCCAAAACTAGGGGTGGAAAA
>> ACCTCCGAGACCTCCTCCGAAGAGGCAGGTGACAGGTAAGGCGGAGGAACCCGAGATGCA
>> TAAGGAAAATCCAGTGCCGGAAGTGCCACCGGAGATTGCAGTGCCGGAGGTGCCCATGGA
>> GATTGTAGTGCCGTTGTCCCAATGGAGATTACAGTGGCAGAACCAGAGGTGCAAATTGTG
>> GCATCAGTCGGGACATATATAGAAGAAGTAGTACGATTGGAATGGGACGGTACAGAGCCA
>> GAAATATTTGAAGACCCTTCTCCTGCGAAAGACCCCGAGGTGCAAGAAACCCCGGTCCCT
>> GAGAAGGCCACTGACAATTCTAAGGTGCCTAAAGTGCTTATGAGCCACGACTCCAAGTCT
>> AAAGATGAGAACAATGAGAAGTTCATGGGCTAACCATCTTCAGAGGGGGTAAGGAACGTG
>> CCAAACTCAGAGATGATGACCCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>> NNNNNNNNNNNNNTACTTGTTGCAATAATCTTGCTCCGGAGTAAGTGGTTATAGGATGCA
>> AGTACAATAACTAGTTGTAGACAAAGTCAATGACGATACGGAGAAGAATAAGCGCAATGT
>>
>>
>>
>>> CH398085 Oryza sativa (indica cultivar-group) chromosome 1  
>>> scaffold000005
>> genomic scaffold, whole genome shotgun sequence (bioperl's version)
>> ....
>> TTAGGTGGTTTTATAACTTTAGACTTTGGGAATTTTCATATCACCTGGACACTATGGAAT
>> TGTTGGATGATGGTGGAATTGGACATACACCTCTCTTCCTCTTTCAAAACCCCTAAAACC
>> TGTTTTCGGTGGGGTTTGGGTGCATGCCAGTTGTGGGAAGTAGCACCCCGGGCACTATAA
>> GGATTAAGCTCAGGCCTC  <----no gap, missing base
>>
>> CTGAGTACTGTGGTTGTACTCATTCTTGCTCAATCTTTTCCC
>> CCTTCAGTAAGAGAAGATTTGGAGAAGAAGTCTTAGGTGGAGTCCTGGCTTATACCCCAG
>> TTGAGCGCCTGTGAAGATGGAGCCGTAGGCCCGCTAGTCCGCTGCTGTTTATTTTTGATT
>> GTCAGGCCTTAAGTGCCTTTGTAATAATGTAAATATTATCGATATAATAAAGATGTGTCT
>> TTTATATCATGTTTGTGTGGTGTACCCCGGCTTTTCCTGGGACGGGGATTAATACACTAG
>> CGTTCGGGAAAAGGCAATTTTCCCGGTCGCGACAGAACTTGTAATTCTCTAGCACTAGAA
>> TGACATATCCTTTGGATTGTGCACCAATGCCACGCGAAAACCCATGGTGCCAAAACTAGG
>> GGTGGAAAAACCTCCGAGACCTCCTCCGAAGAGGCAGGTGACAGGTAAGGCGGAGGAACC
>> CGAGATGCATAAGGAAAATCCAGTGCCGGAAGTGCCACCGGAGATTGCAGTGCCGGAGGT
>> GCCCATGGAGATTGTAGTGCCGTTGTCCCAATGGAGATTACAGTGGCAGAACCAGAGGTG
>> CAAATTGTGGCATCAGTCGGGACATATATAGAAGAAGTAGTACGATTGGAATGGGACGGT
>> ACAGAGCCAGAAATATTTGAAGACCCTTCTCCTGCGAAAGACCCCGAGGTGCAAGAAACC
>> CCGGTCCCTGAGAAGGCCACTGACAATTCTAAGGTGCCTAAAGTGCTTATGAGCCACGAC
>> TCCAAGTCTAAAGATGAGAACAATGAGAAGTTCATGGGCTAACCATCTTCAGAGGGGGTA
>> AGGAACGTGCCAAACTCAGAGATGATGACCC  <---- no gap, missing base
>>
>> GATGGTGGGTTAGCCTGCCTAGCTAGTTC  <---- should be revcomp
>> GAAGCGGCACTCCTTTTAATTATTTGATATTAGATCATTTTTTAATATTTGTGTTTTTAC
>> AAGTACCGCGAGGTACAACCTCATGGACAGGAACAACGCTTTTTTGCAACATATATTTTA
>> TACGAAATCTATGCTTTCTGTAAAGTTAAAGCACACTAAATCTAAAGCTTAATATACAAC
>> CATGCCACATCATCACCCACTAGCAATAATTATATATTTAATCTCATACAAGCATACAAA
>
> Here's the sequence from NCBI:
>
>      1621 ttaggtggtt ttataacttt agactttggg aattttcata tcacctggac  
> actatggaat
>      1681 tgttggatga tggtggaatt ggacatacac ctctcttcct ctttcaaaac  
> ccctaaaacc
>      1741 tgttttcggt ggggtttggg tgcatgccag ttgtgggaag tagcaccccg  
> ggcactataa
>      1801 ggattaagct caggcctct
>           [gap 50 bp]    Expand Ns
>      1870          c tgagtactgt ggttgtactc attcttgctc aatcttttcc  
> cccttcagta
>      1921 agagaagatt tggagaagaa gtcttaggtg gagtcctggc ttatacccca  
> gttgagcgcc
>      1981 tgtgaagatg gagccgtagg cccgctagtc cgctgctgtt tatttttgat  
> tgtcaggcct
>      2041 taagtgcctt tgtaataatg taaatattat cgatataata aagatgtgtc  
> ttttatatca
>      2101 tgtttgtgtg gtgtaccccg gcttttcctg ggacggggat taatacacta  
> gcgttcggga
>      2161 aaaggcaatt ttcccggtcg cgacagaact tgtaattctc tagcactaga  
> atgacatatc
>      2221 ctttggattg tgcaccaatg ccacgcgaaa acccatggtg ccaaaactag  
> gggtggaaaa
>      2281 acctccgaga cctcctccga agaggcaggt gacaggtaag gcggaggaac  
> ccgagatgca
>      2341 taaggaaaat ccagtgccgg aagtgccacc ggagattgca gtgccggagg  
> tgcccatgga
>      2401 gattgtagtg ccgttgtccc aatggagatt acagtggcag aaccagaggt  
> gcaaattgtg
>      2461 gcatcagtcg ggacatatat agaagaagta gtacgattgg aatgggacgg  
> tacagagcca
>      2521 gaaatatttg aagacccttc tcctgcgaaa gaccccgagg tgcaagaaac  
> cccggtccct
>      2581 gagaaggcca ctgacaattc taaggtgcct aaagtgctta tgagccacga  
> ctccaagtct
>      2641 aaagatgaga acaatgagaa gttcatgggc taaccatctt cagagggggt  
> aaggaacgtg
>      2701 ccaaactcag agatgatgac ccc
>           [gap 50 bp]    Expand Ns
>      2774               tacttgt tgcaataatc ttgctccgga gtaagtggtt  
> ataggatgca
>      2821 agtacaataa ctagttgtag acaaagtcaa tgacgatacg gagaagaata  
> agcgcaatgt
>      2881 cagaccagct tgttataatc cagtaacagt aagtaaactc cgtaccgttc  
> gtttttttca
>      2941 ttcattttaa ttattgtccg ttgcaggctt gcagcagtca catgagtgcg  
> tataaatgca
>      3001 ccgatttcaa gcccggtgct attaatcaat agattcttct tcactgtggt  
> tcgacaaaca
>      3061 atgaaactag tataactata gtataactag gtgattcctc acgctttccc  
> gtgctttgtt
>      3121 gtaaaattta ctaagaaatt ctcaatatgt tttttttaca atcaaactag  
> gattacgaag
>
> It agrees with the 1st sequence, not the second sequence.
>
> Brian O.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list