[Bioperl-l] writing genbank files
Jason Stajich
jason@cgt.mc.duke.edu
Wed, 18 Sep 2002 12:46:59 -0400 (EDT)
I did this fix about 1 week ago, don't see how your parsing would have
worked before unless genbank parsing changed too from the version you
were using before..
All I did was:
RCS file: /home/repository/bioperl/bioperl-live/Bio/Species.pm,v
retrieving revision 1.16
retrieving revision 1.17
diff -r1.16 -r1.17
282c282
< return 1 if $string =~ /^[A-Z][a-z]+$/;
---
> return 1 if $string =~ /^[A-Z][\sa-z]+$/;
Someone needs to refresh the ideas behind the Species object and the
taxonomic fields in genbank/embl records. Either we are parsing things
differently or the values that one can put in the field are changing. We
have a lot more taxonomic fields that are not matching what was expected
when this module was built (James G did the brunt of the work back in the
day).
Anyways, I am perfectly happy to turn off the name validatation
altogether, basically it required all fields other than the species to
start with a capital letter.
-jason
On Wed, 18 Sep 2002, Hilmar Lapp wrote:
> I believe Jason fixed something yesterday in Species.pm in order to
> allow spaces in certain places. Jason?
>
> -hilmar
>
> On Wednesday, September 18, 2002, at 01:48 AM, gert thijs wrote:
>
> > Hilmar,
> >
> > I just installed the modules from the main trunk. I tried to test
> > it but now I was unable to parse input sequences in genbank format.
> > Now I have a problem uploading a genbank flat file. There seems to
> > be a problem while parsing the species name. I guess not having an
> > upper case starting letter stops the genbank parser. In attachment
> > you can find a file on which the parser throws the expection.
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Invalid name 'eurosids II' (Wrong case?)
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> > /users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/Root/Root.pm:318
> > STACK: Bio::Species::validate_name
> > /users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/Species.pm:283
> > STACK: Bio::Species::classification
> > /users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/Species.pm:121
> > STACK: Bio::SeqIO::genbank::_read_GenBank_Species
> > /users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/SeqIO/genbank.pm:884
> > STACK: Bio::SeqIO::genbank::next_seq
> > /users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/SeqIO/genbank.pm:229
> > STACK: AnnotatedSequence::new
> > /users/sista/thijs/perl/lib//AnnotatedSequence.pm:66
> > STACK: GeneIndex.pl:168
> > -----------------------------------------------------------
> >
> > Gert
> >
> >
> > Hilmar Lapp wrote:
> >> It should be written as join(complement(...),complement(...),...).
> >> This is main trunk only though. Do you have an example where this
> >> is not true?
> >> -hilmar
> >> On Tuesday, September 17, 2002, at 02:06 AM, gert thijs wrote:
> >>> Hello,
> >>>
> >>> I have a question about the current status of the genbank file
> >>> parser/writer. I noticed that a CDS with a location of the type
> >>> complement(join()) is written as a join() without the complement.
> >>> I saw that this problem has been a major thread on the list a few
> >>> weeks ago, but I could not find if the problem has been solved by
> >>> now or if it was solved how it should be solved.
> >>>
> >>> Gert
> >>>
> >>>
> >>>
> >>> -- + Gert Thijs
> >>> + K.U.Leuven
> >>> + ESAT-SCD
> >>> + Kasteelpark Arenberg 10
> >>> + B-3001 Leuven-Heverlee
> >>> + Belgium
> >>> +
> >>> + Tel : +32 16 32 85 88
> >>> + Fax : +32 16 32 19 70
> >>> + email: gert.thijs@esat.kuleuven.ac.be
> >>> +
> >>> + http://www.esat.kuleuven.ac.be/~thijs
> >>> + http://www.esat.kuleuven.ac.be/~dna/BioI/
> >>> +
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@bioperl.org
> >>> http://bioperl.org/mailman/listinfo/bioperl-l
> >>>
> >> -- -------------------------------------------------------------
> >> Hilmar Lapp email: lapp at gnf.org
> >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
> >> -------------------------------------------------------------
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l@bioperl.org
> >> http://bioperl.org/mailman/listinfo/bioperl-l
> >
> >
> > -- + Gert Thijs
> > + K.U.Leuven
> > + ESAT-SCD
> > + Kasteelpark Arenberg 10
> > + B-3001 Leuven-Heverlee
> > + Belgium
> > +
> > + Tel : +32 16 32 85 88
> > + Fax : +32 16 32 19 70
> > + email: gert.thijs@esat.kuleuven.ac.be
> > +
> > + http://www.esat.kuleuven.ac.be/~thijs
> > + http://www.esat.kuleuven.ac.be/~dna/BioI/
> > +
> > LOCUS AB000094 15532 bp DNA linear PLN
> > 13-JAN-1998
> > DEFINITION Arabidopsis thaliana gene for inorganic phosphate
> > transporter,
> > protein phosphatase 1 catalytic subunit, complete cds.
> > ACCESSION AB000094
> > VERSION AB000094.1 GI:2780346
> > KEYWORDS protein phosphatase 1 catalytic subunit inorganic phosphate
> > SOURCE Arabidopsis thaliana (strain:Columbia) DNA.
> > ORGANISM Arabidopsis thaliana
> > Eukaryota; Viridiplantae; Streptophyta; Embryophyta;
> > Tracheophyta;
> > Spermatophyta; Magnoliophyta; eudicotyledons; core
> > eudicots;
> > Rosidae; eurosids II; Brassicales; Brassicaceae;
> > Arabidopsis.
> > REFERENCE 1 (bases 0 to 0)
> > AUTHORS Mitsukawa,N., Okumura,S. and Shibata,D.
> > TITLE High-affinity phosphate transporter genes of
> > Arabidopsis thaliana
> > JOURNAL Soil Sci. Plant Nutr. 43, 971-974 (1997)
> > REFERENCE 2 (bases 0 to 0)
> > AUTHORS Mitsukawa,N., Okumura,S. and Shibata,D.
> > TITLE Isolation of a gene that encode a protein phosphatase 1
> > catalytic
> > subun it in Arabidopsis thaliana
> > JOURNAL Unpublished (1996)
> > REFERENCE 3 (bases 1 to 15532)
> > AUTHORS Mitsukawa,N.
> > TITLE Direct Submission
> > JOURNAL Submitted (24-DEC-1996) to the DDBJ/EMBL/GenBank databases.
> > Norihiro Mitsukawa, Mitsui Plant Biotech.Res.Inst.,
> > Research
> > Division; TCI-D21, Sengen 2-1-6, Tsukuba, Ibaraki 305,
> > JAPAN
> > (E-mail:tsu01129@koryu.statci.go.jp, Tel:0298-58-6252,
> > Fax:0298-58-6234)
> > COMMENT Sequence updated (08-Jan-1998).
> > FEATURES Location/Qualifiers
> > source 1..15532
> > /chromosome=5
> > /strain="Columbia"
> > /organism="Arabidopsis thaliana"
> > /db_xref="taxon:3702"
> > gene join(2560..2960,3086..4250)
> > /gene="PHT3"
> > CDS join(2560..2960,3086..4250)
> > /product="inorganic phosphate transporter"
> > /gene="PHT3"
> > /protein_id="BAA24281.1"
> > /codon_start=1
> >
> > /translation="MADQQLGVLKALDVAKTQLYHFTAIVIAGMGFFTDAYDLFCVSL
> >
> > VTKLLGRLYYFNPTSAKPGSLPPHVAAAVNGVALCGTLAGQLFFGWLGDKLGRKKVYG
> >
> > ITLIMMILCSVASGLSLGNSAKGVMTTLCFFRFWLGFGIGGDYPLSATIMSEYANKKT
> >
> > RGAFIAAVFAMQGVGILAGGFVALAVSSIFDKKFPSPTYEQDRFLSTPPQADYIWRII
> >
> > VMFGALPAALTYYWRMKMPETARYTALVAKNIKQATADMSKVLQTDLELEERVEDDVK
> >
> > DPKKNYGLFSKEFLRRHGLHLLGTTSTWFLLDIAFYSQNLFQKDIFSAIGWIPKAATM
> >
> > NAIHEVFKIARAQTLIALCSTVPGYWFTVAFIDIIGRFAIQLMGFFMMTVFMFAIAFP
> >
> > YNHWILPDNRIGFVVMYSLTFFFANFGPNATTFIVPAEIFPARLRSTCHGISAATGKA
> >
> > GAIVGAFGFLYAAQPQDKTKTDAGYPPGIGVKNSLIMLGVINFVGMLFTFLVPEPKGK
> > SLEELSGEAEVDK"
> > /db_xref="GI:2780347"
> > gene join(9110..9510,9661..10834)
> > /gene="PHT2"
> > CDS join(9110..9510,9661..10834)
> > /product="inorganic phosphate transporter"
> > /gene="PHT2"
> > /protein_id="BAA24282.1"
> > /codon_start=1
> >
> > /translation="MAEQQLGVLKALDVAKTQLYHFTAIVIAGMGFFTDAYDLFCVSL
> >
> > VTKLLGRIYYFNPESAKPGSLPPHVAAAVNGVALCGTLSGQLFFGWLGDKLGRKKVYG
> >
> > LTLIMMILCSVASGLSFGNEAKGVMTTLCFFRFWLGFGIGGDYPLSATIMSEYANKKT
> >
> > RGAFIAAVFAMQGVGILAGGFVALAVSSIFDKKFPAPTYAVNRALSTPPQVDYIWRII
> >
> > VMFGALPAALTYYWRMKMPETARYTALVAKNIKQATADMSKVLQTDIELEERVEDDVK
> >
> > DPRQNYGLFSKEFLRRHGLHLLGTTSTWFLLDIAFYSQNLFQKDIFSAIGWIPKAATM
> >
> > NATHEVFRIARAQTLIALCSTVPGYWFTVAFIDTIGRFKIQLNGFFMMTVFMFAIAFP
> >
> > YNHWIKPENRIGFVVMYSLTFFFANFGPNATTFIVPAEIFPARLRSTCHGISAAAGKA
> >
> > GAIIGAFGFLYAAQNQDKAKVDAGYPPGIGVKNSLIVLGVLNFIGMLFTFLVPEPKGK
> > SLEELSGEAEVSHDEK"
> > /db_xref="GI:2780348"
> > CDS join(13719..13967,14046..14605,14688..14862)
> > /product="protein phosphatase 1 catalytic subunit"
> > /protein_id="BAA24283.1"
> > /codon_start=1
> >
> > /translation="MDPGTLNSVINRLLEAREKPGKIVQLSETEIKQLCFVSRDIFLR
> >
> > QPNLLELEAPVKICGDIHGQYPDLLRLFEHGGYPPNSNYLFLGDYVDRGKQSLETICL
> >
> > LLAYKIKFPENFFLLRGNHESASINRIYGFYDECKRRFSVKIWRIFTDCFNCLPVAAL
> >
> > IDERIFCMHGGLSPELLSLRQIRDIRRPTDIPDRGLLCDLLWSDPDKDVRGWGPNDRG
> >
> > VSYTFGSDIVSGFLKRLDLDLICRAHQVVEDGFEFFANKQLVTIFSAPNYCGEFDNAG
> >
> > AMMSVSEDLTCSFQILKSNDKKSKFSFGSRGGAKTSFPYPKVKVCINHITF"
> > /db_xref="GI:2780349"
> > BASE COUNT 5197 a 2674 c 2593 g 5068 t
> > ORIGIN
> > 1 cttttaaaaa atttacaaag atttttaaaa gcattttttt ggctgggaaa
> > aaaattttac
> > 61 gagaaaacat ttttggcggg aaattttttt tttggcggga aaaaaagttt
> > ggcggaaatt
> > 121 ttttgtttgg cgggaaaaat aaattttggt acatgttatt attaaatgag
> > ggtaatttgg
> > 181 tcattctgtt caatagaagg gatattttta aaaataaaca atataaaatg
> > gtattgttac
> > 241 aaaagggtag taaaaaaagg gtagttttgc aaatctccct cgagaatttg
> > caatttctct
> > 301 ctcatagcat aacatgtctt ttttttagta aatttcaaga tgacaaaaag
> > aaaaaaaaaa
> > 361 cataaagaaa caaattgaaa gcaacatcta tttttctgac ttgtaaaaat
> > gtggtgttac
> > 421 tagcaaatat caaatttttg tatctataaa ctagatttta acccgcggta
> > tactaggaga
> > 481 acgatttatt ttttaaagtt aatatatata caagtttaca aattgaatat
> > atttataaaa
> > 541 aaaataaatt ttttagttta caattattat taggtaacat cccgccaaat
> > ctgttccatc
> > 601 aaaaagctta ttaattttat taatgttaac cttagttatg atatgattct
> > aaatcattgt
> > 661 ctagatattt tagccaatgt taggacgttg aacccacata gttttcgtca
> > attgaataat
> > 721 atatgttcgt tttataaatt tcgaatcaca atttatgcag aaaatgtttt
> > gaaatttttc
> > 781 ttaactttac atattataca ataaaaaatc aaacaaagat gataggaaat
> > tcatagtttt
> > 841 aaatcttaat caaatcttct aaatatcaat attgttaaat atagaagtaa
> > tgagtataag
> > 901 agttgggttt tatttgaacc aacccaacta agatttaatg aacatcataa
> > actaatgggt
> > 961 ctggtatcaa atgtaatcag atgcatttta acccacaatc aaaaggtgac
> > tgcaaaaata
> > 1021 ctatataaga ttcatagaag tagatcgtct atataggcct gaaaaatatc
> > agttttaaat
> > 1081 taaattattt agagatctta ttttgttggg cttaaatact ggagaataag
> > actactttag
> > 1141 taatcaaaat atttgggcca tgtaactaat gcaaaccaaa atcattattg
> > acaaaagtca
> > 1201 ttgcatcaaa tgattttctt accaaaatgt ctttaaaata tatatatata
> > tatatatata
> > 1261 tatatgtacc ttcgtacggt acggttgctt ttatataaaa aggtgttgta
> > cggttgcatt
> > 1321 gacttttgat cgtaaatact atatagtccg gagactcaaa tgcaaaccac
> > tttctctatt
> > 1381 tttttggacg gaagttatgt taacctaaat tatctaaaat attacacaaa
> > taatttttaa
> > 1441 acagtttaaa atggcatata attatgttaa tagaaatttg acttttagta
> > cggagataca
> > 1501 ttagctgaga agtttgacct caccttattt agtgggaacc ttgttacttt
> > tgaattagtt
> > 1561 tgaacaatat atctctaatt catataaagt attttgaaat ataaagcttt
> > ataaatataa
> > 1621 catatttttt ggtgggtaat gcttcttttt ttctgtggct tataactacg
> > tggagatctt
> > 1681 atcttctgtc cgcttatatt agtatttcac catagtgatt ttaataaaaa
> > tcagattaaa
> > 1741 caatatataa cagctatttt agtttacagg tttaaacttt tattttagtt
> > ctttgatttt
> > 1801 ttttttcttt cagaaaatgg tggacatctt ctaaagtttg ttttattaat
> > ctaggtagaa
> > 1861 actttcaaac aaaagacgaa aacagaaaag ttgactttag tacggaaata
> > catgatatga
> > 1921 gataactgat aagataactt gttagatggt ttccactctc ataaacaata
> > taaaaaaatt
> > 1981 attaacgata gtgaactcga tttttagtaa ctctatgaaa atcttcttgt
> > gagatcataa
> > 2041 cattttagaa gactaggctt ttcgtgtaat atgatctgta gaccattcca
> > atgttatcat
> > 2101 atgcagaaca gcaacacaat gaaataaatt aagagatgga aaataaatta
> > agatcaaata
> > 2161 aaatgtggat atacccttaa gattccggag acaatacttc tccagttctc
> > ctctcgtcta
> > 2221 taaatttctg ccgtctcagc caacataatc acaaccacca cctctctcaa
> > taattctctc
> > 2281 tcttgtcctt cacaacctgt tactacttag gtgaatatca tctacatgtt
> > tattttgtct
> > 2341 taatccaaat tcacatttaa ttttgttttc actcttttta tcatgtatac
> > ttttttaatt
> > 2401 agctagccga gattatatat gttaaaacta cctctatttc taacacaccc
> > aagcaatata
> > 2461 tcatctttcc aatatttacg aaagaagaaa tttctataca aatattaacc
> > tttactttgt
> > 2521 gcaaaatagg agagaagaaa aacaaaaaga ctgactgaga tggccgatca
> > acagctagga
> > 2581 gtgctaaagg cacttgatgt tgcgaagacg caactttacc atttcacggc
> > tattgtcatt
> > 2641 gccggtatgg gcttctttac ggacgcgtac gatctctttt gtgtgtcctt
> > ggtgaccaag
> > 2701 cttcttggcc gcctctacta cttcaatcca acgtcagcaa agcctggctc
> > acttccccct
> > 2761 catgttgcgg ctgcagtcaa cggtgtggcc ctttgtggaa cccttgccgg
> > tcaactcttc
> > 2821 ttcggatggc ttggtgacaa actcggacgg aaaaaggtgt acggtatcac
> > tttgatcatg
> > 2881 atgattctct gctcagttgc ttccggtctt tccttgggca attcggccaa
> > gggtgtcatg
> > 2941 acgactcttt gcttcttcag gtacaattta tttagccaca aacctaatat
> > cacatacgtc
> > 3001 acagatacaa gctcgagaga ttagtcacta tttcgaccta gattatggtt
> > acttaagata
> > 3061 ctgatatcta gacgattata tataggtttt ggctcgggtt tggcattgga
> > ggtgactacc
> > 3121 ctctatctgc caccatcatg tctgaatacg ctaacaagaa gactcgtggg
> > gctttcatcg
> > 3181 cggcagtgtt cgccatgcaa ggtgtaggta tcttggcggg aggttttgtg
> > gcacttgcag
> > 3241 tttcttccat atttgacaaa aagttcccat cgccgacgta tgagcaagac
> > aggtttctat
> > 3301 caacgcctcc tcaagctgat tacatttggc gaatcatcgt catgtttggt
> > gctttacccg
> > 3361 cagctttgac ttactattgg cgtatgaaga tgcctgaaac agcccgttac
> > accgctttag
> > 3421 ttgccaagaa catcaaacaa gccacagcag acatgtccaa ggtcttacaa
> > acagatctcg
> > 3481 agcttgagga aagggtggag gatgacgtca aggaccccaa aaaaaactat
> > ggcttgttct
> > 3541 ccaaggaatt ccttagacgc catgggcttc atctccttgg gactacctcc
> > acttggtttt
> > 3601 tgcttgacat cgccttctac agccaaaact tgttccaaaa ggatattttc
> > tcggccattg
> > 3661 gatggatccc aaaggcagcc actatgaacg ccatccatga ggttttcaag
> > attgctaggg
> > 3721 ctcagactct cattgccctc tgcagtacag tcccaggtta ctggttcaca
> > gtagccttta
> > 3781 ttgatatcat tggaaggttt gcgatccaac taatgggatt tttcatgatg
> > accgttttta
> > 3841 tgtttgctat tgccttccca tacaaccact ggattttacc agataatcgt
> > atcggattcg
> > 3901 tggttatgta ctcactcaca tttttcttcg ccaactttgg acccaatgca
> > actactttca
> > 3961 ttgtcccagc tgaaatcttt ccagcaaggc taaggtctac gtgccatgga
> > atatcagccg
> > 4021 caactggtaa ggctggagcc atcgttggag ccttcgggtt cctatatgct
> > gctcaaccac
> > 4081 aggataagac caagacagac gcaggatacc caccgggcat cggagtcaag
> > aactcattga
> > 4141 tcatgcttgg tgtcattaac tttgttggta tgctcttcac cttcctcgtc
> > cctgagccca
> > 4201 agggcaagtc ccttgaagaa ctctccggcg aggctgaggt tgataaatga
> > ttatgccgtc
> > 4261 atatatgttt gtcattggtt ttgcgatgtg tgaattatat ttgtaatggt
> > gtactacttt
> > 4321 tacgttttac gttctttgcg atgagtgaat tatatttgta acggtgtact
> > actttcgctt
> > 4381 tttgtttaaa tgtgtgtgca agtgcaactt gttaagatgt aaactcatgt
> > tatggtcata
> > 4441 tcctagtaat gctataagtt tggaagcaat aaagacatga acaatcaaac
> > aaaaaatatg
> > 4501 cttagtggaa agtttgaaat gaaagatata agggctagtg gatgtaattc
> > tagggatcaa
> > 4561 tattgataga actgataata caaaaaatca gcttgtactg gtatattagg
> > ttgaaaagat
> > 4621 aaatgagata ttttagaact actgaactat aaagattggt atattgttgt
> > tcatgtaatc
> > 4681 gaaagatttt ttttgtaaga ttgtagatga atttatcatc aaatgaacta
> > caaagactag
> > 4741 ggactcccaa gagtccaaga ccctactctt cacttgtcta aacatttaag
> > tgaagcaatg
> > 4801 tcaaattagg agtatgatgc aacagtctat ttgatatatt tgaccagttg
> > tactagactc
> > 4861 aatttgatgc tgccagtgtg ccgtagaagc cttccaaata gacattatag
> > ctttagtcaa
> > 4921 agcaaatgag gaagccgagt ggtttcagaa cttgaaaata tttcgatggg
> > agaaaccagt
> > 4981 gttggcacta cgtgtaaact aaactctgat agcaattatt tattgttcga
> > gcattgcgct
> > 5041 cttagtttaa tgttaaatct tgtctaatta gaagacaaca caaaaaccaa
> > tagcaactaa
> > 5101 tctcaaatag tgtaatctca cttgattata tcaaatcgga tgataaccta
> > tgacctattt
> > 5161 tactaaaagg tctgtcacaa aagtaagtta tatatattca aaacaaaaaa
> > aatataagat
> > 5221 aaaaaataaa acgaagaaaa actgggaatg tcaaatatga taaatgattt
> > aaacaaatat
> > 5281 gtcatgacaa gaaaaagaaa aaagaaaaaa aactgtcacc gttacaaaaa
> > actgtcacca
> > 5341 ttataagtgt tattatactt ataaaaatca gcaagaacaa caaattttgg
> > tcatcataag
> > 5401 aattcaaaat tacaaatata tcgtaacata aaacattata gtggtcattc
> > catcaatttt
> > 5461 gtttatcgtt taacaaatct gcgatcaaac gatatatttt agtaatataa
> > tttcatctat
> > 5521 cacaacataa ccaattatgt cataactaaa aatactatat tttaaaaaca
> > aatatgttgt
> > 5581 tacatgaaaa tattttggta acattataaa aatctgtcat aagatataac
> > aaatctgcta
> > 5641 tcattagagg tcattatatt tattagtttt ttgtacataa atttcttata
> > acatgataga
> > 5701 tgtttttaga aaattataaa tctaccgagt cattcaaaat aaaacagcca
> > gacactacag
> > 5761 taagtctgtt ataaactatg aaagtatgta gttgaataaa aaaatctgtc
> > aaaaaaataa
> > 5821 atatgtcaaa aaaataaatc agctacaatt gttttagtct attgtaacca
> > tattctaaaa
> > 5881 aatctgtcaa ccaaaaaagt ctgccaaaag aaatatttac ggacaaatag
> > atctgtagta
> > 5941 tataaattag atattaaact tttgtaaaat tattgctaga attaatttga
> > aaggacattt
> > 6001 ttctatttat ttttttacac atgagacaag tgtattttag acatgaaaat
> > aatttaaccc
> > 6061 aaattgtttt taaaataatc tagtaattaa actgataatt tatgttaatc
> > tagtaacttt
> > 6121 cccattatta taatgataaa ttgttgagaa aatttacttt attttagtgt
> > tactttgttc
> > 6181 tagagtaaaa tcgaagaata cttagggagg gacatagggt cagtgccggc
> > ttaacatgga
> > 6241 gaggagggag tgcaacaacg agtggcccat aaagcaaaag ggacacatac
> > aattctttcc
> > 6301 ttttgtaaac ataaaaagaa aaagaaaaaa atatgtgcaa atataacaga
> > aaattgcaaa
> > 6361 tttttactaa aaagctcata tatatttgca ttccttttaa atataaggta
> > aaaaaaaatt
> > 6421 ctaaatttat tgggcccaaa ttatatgttg atatatttta atccaacaac
> > tctaatacaa
> > 6481 atattgattt attgggccaa atctcttttg taaaatgaga ttgttgtttt
> > attttatttt
> > 6541 aaatgatgtt tactaagtgt tttacttttg tacgaaaaca atttccaata
> > tattaagggt
> > 6601 ccaatttttt ttttgcccta ggacctagaa taagcttgag ccggcactgc
> > agagggtcct
> > 6661 aggatttttg tattttttac tttgtaagtg aatgccatag caagccaagg
> > actgaagaag
> > 6721 aacaacggaa aatatagaga atatgaattt gtgtgatgcg aagacgaaaa
> > gaaattatag
> > 6781 aaaatcaact gaaattgatg acgtcgaatt tgattgagga gaaaaaatag
> > ctacaaaata
> > 6841 agaggaagaa aaaggaaaat aaagaaaaga acgaagagaa tgagaaggag
> > aagccgaaaa
> > 6901 gggtctagag aagaggggtg ttttttcatg aagacaagat actacgaaag
> > aaggaagatg
> > 6961 aagggctagt tataagtaga tgtagatctc tgaatatata tatatatata
> > tatatataaa
> > 7021 gtgtctccct aaaccggaaa aaacgatggg ttttgatatg ctgaacgggg
> > gacaaaaccc
> > 7081 gagaaccgaa atcaatttaa aagttcaccg atcactatac aactctgcac
> > ttattgaaac
> > 7141 cgaaaaccaa aattctccgt ttagacggtt ttggttcaat acggttctta
> > ttgaacgtca
> > 7201 atcactctat ctactactat acttgaaaaa caaaattaaa ttttttgcct
> > tgacttgatc
> > 7261 cagagagatg ttacgcggcg gcgatgcatt aatcgataaa atcaaacaac
> > atgccacgca
> > 7321 ttgacttttg caagaagcca gcaactttat ataaaacaat cgatgataat
> > tgtgatcgtt
> > 7381 acaactatgt aataatgcat atatataagc tatgtattaa ctgtaaactt
> > gatttaaata
> > 7441 atactgatgg ggagtatata ccttgatgca tagaataaaa atgtaaaata
> > cgtatgaatt
> > 7501 actagcttag tggtagcctt gttacttgga ttagtttagc aagcaaatat
> > ctagctctaa
> > 7561 ttcacaagct aaccctaaaa gttggttgga catatacatc gtcaaatata
> > tcgaagctat
> > 7621 atatatatat ttaacatatt tgtttttgtt acattaaaac aaacaaaaca
> > aaagaaaaaa
> > 7681 gttttttttg tgctcctcct ttttaatatc ttttatgaaa atggctgact
> > gttatggaaa
> > 7741 atgatgtttt tattaatcta agtagtgcct ttcaccaaag aaaaaaaaaa
> > accaaatata
> > 7801 gtagacttta gtacggagat acgtgaacat aacttgttag attggttttc
> > aacaaaagag
> > 7861 tatagaatgg aaaatcgtta catactgact ctcatgaaca aacatatttt
> > gcttaatgaa
> > 7921 aagtcaactc gatctttatt atttatgagg ataaaaatca aataggctct
> > gtcgacaaaa
> > 7981 cgaaacaatc agactaccct ttagaatata tttaaaatta ccaaaaatat
> > taatctttta
> > 8041 ggaaattaca catgatgata ttttagcata tggcggatat ttacctccaa
> > ctccaaatga
> > 8101 ggatatttta gcatatggcg gtctttattc gtacgtacta ttttgcaaat
> > cattttatat
> > 8161 gattgtatat accatagatt ataatcacta ctcttagtcc cataatttgt
> > aaataatagt
> > 8221 aattcgtaga catagatgat caaagttctt tgtaatctta ctacatacac
> > aaaaagtaat
> > 8281 aaaacacaac atttttgttg tattaaatgt gcttgtccgt tgacaaaaaa
> > gataaaagaa
> > 8341 tagctttgtt tgtcctttct cgacaagttt gtattataat cgtctcacgt
> > tcatttcaag
> > 8401 attacatgca gaatttgtca aatgaagtct aaatcatcaa aagtaatttt
> > tccttatttg
> > 8461 ttctcacttg aatataaatt aacgaaatat gattcaccat aaattacacg
> > aactatagga
> > 8521 tacgggtact tatttattcc cttgcggtaa tacgaatagg tccaagattg
> > gaaatctcag
> > 8581 agttgtcatg aatactaagt taaaatatgt gcagaccata caaaacatag
> > acgagcatag
> > 8641 aaagagaagt tgatcaagtg atggaaaaag attaaaataa aatataaata
> > tggatatacc
> > 8701 cttaagattc cggagactat gcttctccga tctcgtctat aaatttcagc
> > cgtctcagcc
> > 8761 aacacaatca caaccaccac ctccctccct ctctctctct taatctttct
> > gcccccacca
> > 8821 ttagcgcaca acggtgagat tcgttagatg tttattatct atgcatccaa
> > attcacatat
> > 8881 agatagatag tacatatata gttgtttata tcaatagaaa cttttgtttc
> > taaattgtaa
> > 8941 tcgcttaaat tatttgattt atgtataggt ttaattataa atgttattct
> > gttggttgta
> > 9001 ttggtactaa actaaaataa tttagtagca tctttccaat attattataa
> > aattatagtt
> > 9061 aagatattta tttttacttc atgaagcagg aaagacaaga gaggcttaga
> > tggctgaaca
> > 9121 acaactagga gtgctaaagg cactcgatgt tgcgaagacg caactttatc
> > atttcacggc
> > 9181 gattgtcatc gccggtatgg gtttctttac cgatgcgtac gatctttttt
> > gcgtgtcctt
> > 9241 ggtaacaaag ctccttggcc gcatctacta cttcaatccg gagtcagcga
> > agcctggctc
> > 9301 acttccccct catgttgcgg ccgctgtcaa tggtgtggcc ctttgtggaa
> > ccctttctgg
> > 9361 tcaactcttc ttcggttggc tcggtgacaa actcggacgg aaaaaagtgt
> > acggtcttac
> > 9421 tttgatcatg atgatcttat gctctgtcgc ttctggcctc tcttttggca
> > acgaagccaa
> > 9481 gggtgtcatg accacccttt gcttcttcag gtacagtttt catccaatta
> > caatattata
> > 9541 tacacacatg attaaccaat aatctaataa cgaatgcaag ttttaaaagt
> > tagtcacgct
> > 9601 tcgaactgat ttaggtattt ttctttgaga aagttttttt atatataaac
> > cattacgtag
> > 9661 gttttggttg ggatttggta ttggaggtga ctacccactt tctgccacca
> > tcatgtctga
> > 9721 atacgcaaac aagaagaccc gtggggcttt catcgcagct gtcttcgcca
> > tgcaaggtgt
> > 9781 cggtatcttg gctggaggtt tcgtggcact cgcagtatct tctatattcg
> > acaaaaagtt
> > 9841 cccagctcca acatatgcag taaacagggc cctctcaacg cctcctcaag
> > ttgattacat
> > 9901 ttggcgaatc atcgtcatgt ttggtgcttt acccgcagct ttgacttact
> > actggcgtat
> > 9961 gaagatgcct gaaactgccc gttacaccgc tttggttgcc aagaacatca
> > aacaagccac
> > 10021 agccgacatg tccaaggtct tacaaacaga tatcgagctt gaggaaaggg
> > tggaggatga
> > 10081 cgtcaaagac cccagacaaa actatggctt gttctccaag gaattcctta
> > gacgccatgg
> > 10141 acttcatctc cttggaacta cctccacttg gtttttgctt gacattgcct
> > tctacagcca
> > 10201 aaacttgttc cagaaggata ttttctcggc catcggatgg atcccaaagg
> > cagccaccat
> > 10261 gaacgccacc catgaggttt tcaggattgc tagggctcag actcttatcg
> > ccctttgcag
> > 10321 tacagtccca ggctactggt tcacagttgc gtttattgat accattggaa
> > ggtttaagat
> > 10381 ccaactaaat ggatttttca tgatgaccgt gtttatgttt gccattgcct
> > tcccttacaa
> > 10441 ccactggatc aaaccagaaa accgtatcgg atttgtggtt atgtactctc
> > ttactttctt
> > 10501 cttcgccaat tttggtccaa atgcaaccac ttttattgtc cctgctgaga
> > tattcccggc
> > 10561 caggctaagg tcaacatgtc atggaatatc agccgcggct ggtaaggctg
> > gagccatcat
> > 10621 tggagccttc gggttcttat atgcggctca aaatcaagac aaggctaagg
> > tggatgcagg
> > 10681 atacccacca ggtatcggag ttaagaactc attgatcgtg cttggtgttc
> > ttaacttcat
> > 10741 cggtatgctc ttcaccttcc ttgtcccaga gcccaagggc aagtccctcg
> > aagaactctc
> > 10801 tggtgaagct gaagttagcc atgacgagaa ataatttact ttgtgatcaa
> > atgtggagtt
> > 10861 gtttgtggtt tgtttgattt ttttgtttct ttccaaattt tttttttcct
> > ttccttgatg
> > 10921 cagtaagaaa tgtttgattg tgtggaatac ttcagtctta ctgccgtgag
> > tgcaaatctt
> > 10981 gtaatccact tttttgtgtt ctaattaaca ttgtcaaata aagtttctta
> > aactttgatt
> > 11041 ctaactttcc aaacactaac gaactctctg ttgtgcgttt atgtttatca
> > atttgaagca
> > 11101 tgaaaatttg attgttcata tgtttgctat atactgagaa acaccaaaga
> > atagttttta
> > 11161 tggatttcga ttatctgatc aacacattag ttggttatgg tgttcaatgt
> > tcagtgtctt
> > 11221 gtcatcacta acgttatctc tttgccttat atcgacacat tcaagtgggc
> > gtagttgtag
> > 11281 accattagtc ttgtgctctg tttgtaaata acgctattgg ttagttagct
> > ttctttttgg
> > 11341 ttggtgtata tattcttttt gggttgaagc tctgtgtagc gtttagaaat
> > gttaaaagtt
> > 11401 cgatcccaaa atgtagcaag atttgaaatt tctttttgga ttgatgtaat
> > gtatttaact
> > 11461 ggcttcttat gaatcacggt gcttattgtt tgattaatgt gaactgaatt
> > gttctgtgct
> > 11521 tgagattcaa aaagctcaag aggatctaat actaaatagt ctcactaatg
> > catatgtttg
> > 11581 atttgtttca aatattgatt gttctctctg acgcatttgt ttgattattg
> > tgcactacct
> > 11641 gtttgggatg ggcattctat tggtctcact ataccaaatt accaactttg
> > tgaaactcac
> > 11701 cataacttgt gtgatccata aaatcttaga ttaattactg ttaagtttca
> > tgaaatttta
> > 11761 actcaaaaca atttggcaat aaatagattg acattgtatg tccaaaaacc
> > aaactgagta
> > 11821 tttaacaaaa ccaactattt ccgaccaatg aatgcgtcta aaatgtttgt
> > ttccaacaaa
> > 11881 tttctgatca ttgatttcgt cggaactaac acaaataata attcgcataa
> > tcataataat
> > 11941 tctcctaatc agttcataaa cttgaatgac aagttaattt actcatttta
> > ccatgaccat
> > 12001 taattcataa attgagtcac aaattaatat caagatttga ccatgtccct
> > tatgcaagag
> > 12061 ggggttggaa agcgatgggt ctatagtgtt acaattaatt ctctagtatt
> > catataacca
> > 12121 aaacacaatt ataatatttt cacattaaag ataaattaaa aacatccata
> > taatgtttca
> > 12181 catttaataa tatacatatt ttaaatattt tgttaacatt taaaattatg
> > atttttctat
> > 12241 catttttttt attttttaaa ctactttgaa tcatcatata atacataaca
> > ctaattaaaa
> > 12301 tctaattttt tctggatata ttgtgattcg gatgtttgaa aacagccata
> > tattactcac
> > 12361 ataatgaaat aaaattattt gtgttcaaag tctcacatca gctaaaactt
> > tgtgtccaaa
> > 12421 atctcataaa taagaccaaa ttgaacccaa agaaatatga gtgagtagga
> > atcctaactt
> > 12481 aacaaacatc caaatttaaa tatttgttca gtttaattcg gttctttagt
> > caaaccgttt
> > 12541 aattatacca gatatcataa attctaaact ttttaaaaaa ttaaaatatt
> > ataattgatc
> > 12601 tatttaaagt gtataaaaac ttcaaaatat tatcaacgtt taacttataa
> > aaagtttaaa
> > 12661 atatcacggt atgggttata tggtttgacg agtttgaatc ttcattaata
> > taaatttaaa
> > 12721 tatcattaac tcggtggtac aacatgggta agtcatcatt ttaatatttt
> > gttaacacta
> > 12781 gcaatattat aatgatatgt caagttaaat tgattcaatt atgattatgt
> > attaaaacaa
> > 12841 gatatataag tattagtttt ctaggtatct gcaatattag aaaccagata
> > tccaaaagtc
> > 12901 acggatctgg attgttaaag cacagttctg aatacttttc aaattctagg
> > atatccggat
> > 12961 ccattctacc aattacaatc ctgcagaagc cttgttggcc acacaatcga
> > cacaagagtt
> > 13021 agctagccaa aaagtgcctt caacttaaga agatgcaaac cggaaagaaa
> > aaaacatatc
> > 13081 tttggtccga aggaaatcat gttttattat caaatttaac caccaacaac
> > tacaaaccaa
> > 13141 atttacaaca acaacaacca aaaataaaaa cacatacaca caagtgatag
> > cagcagacaa
> > 13201 cacaaaacat aacaaaaaca gaagcttccc aaaattccga aattcaaaat
> > cctctttaat
> > 13261 ttctcatcat atagtagtcg agttacgttg gcatggtttc taatagttat
> > tatgcggagt
> > 13321 tttaaatctt catcagttgt attctttcga attttgcgaa ctcagaatcg
> > actgcaaata
> > 13381 aacatgttta attagatcaa atggttagag attccgagct tgccttgtaa
> > catgacatga
> > 13441 caacaatgat ttggctataa attacttgta agttactagt gagaaacata
> > gcaaaccaat
> > 13501 tttcaagtat accacagtgt tagttttgac taacattagg tctaattcgg
> > tgaagtcaat
> > 13561 actaatccaa agtttcatct atcaatgcga atagatttaa aattattgag
> > gagcacaaac
> > 13621 agattcaaaa tgttaaatca ttcaaaaaca tttagtaata gagcaaatta
> > ttatgcattt
> > 13681 accatatttc attttttgtg ctacagtata tccacaaatt aaaaagtgat
> > gtgattaata
> > 13741 cataccttca ctttaggata agggaagcta gttttagcac cacctctgct
> > tccgaaactg
> > 13801 aactttgatt tcttgtcatt agattttaag atctgaaaag agcaggtcaa
> > atcctcagac
> > 13861 acactcatca tcgcacctgc attgtcaaat tccccacagt aattcggcgc
> > agagaatatc
> > 13921 gttacgagct gcttattcgc aaagaactcg aatccatctt caacaacctg
> > ttacaaaatc
> > 13981 gttgcttgtg tcaaaaggaa acacgttaaa aatgtttctt tgggcacaag
> > gaacaataaa
> > 14041 caaacctggt gagccctaca aatgaggtca agatcgagtc ttttaagaaa
> > tccagaaact
> > 14101 atatctgatc caaaagtgta agaaactccg cgatcgttag gcccccaacc
> > tctaacatct
> > 14161 ttatcaggat cagaccacaa gagatcacag agtaaaccac gatcaggaat
> > atccgttgga
> > 14221 cgacgaatat ccctaatctg cctcaagctt agcagctccg gggagagccc
> > accatgcata
> > 14281 caaaaaatcc gctcatcgat gagtgcagcg acggggagac agttgaagca
> > atcagtgaag
> > 14341 attcgccaaa tcttgacact gaatctacgt ttacactcgt catagaagcc
> > gtaaatacga
> > 14401 ttgattgatg cactttcatg gtttcctctg agaaggaaga agttttcagg
> > gaacttaatc
> > 14461 ttgtaagcaa gtaaaagaca aatcgtttcg aggctttgct tgccgcgatc
> > gacataatct
> > 14521 ccaagaaaca agtagtttga attaggaggg tatccgccat gttcgaatag
> > tctcaagaga
> > 14581 tccggatatt gtccatgaat gtcccctgtt tattatttgt ttaaccaaaa
> > gattacatag
> > 14641 gagcacaatc tcagaaaaga gtcagagaga aacagagaga aacgtaccac
> > atattttaac
> > 14701 aggagcttca agttccaaga gatttggttg tctcaagaag atatctctag
> > agacgaaaca
> > 14761 gagctgtttg atctctgttt cagacaactg aacaatcttt cctggttttt
> > ctctagcttc
> > 14821 aagcaaccta ttgatcaccg agtttaaagt accaggatcc atttataaaa
> > cctcaagaaa
> > 14881 cagagcaaca atgagacaaa acaacaacaa cgcctagaag ctttataata
> > tacaaaagct
> > 14941 attgacaata tggaataaat ttaaaaagta tggagctttt atgttctttt
> > atgcttgctc
> > 15001 tttgagaaaa agaatggtga ttgtcaaatt caagaagggc tttataaagc
> > cgcgtttaag
> > 15061 aatatattct tcataccttt ttagcaaaag tcacatgtgt tcatacctac
> > ctacattgtt
> > 15121 ctaacttcaa ctaaatcata taataatcag attctagcta atttagtaac
> > acattgtttg
> > 15181 tgtgtgtgtt aagcttcaca ggattgaatc ctgtccggta cgtcatgatt
> > caaatttaag
> > 15241 atctttccgc cggtactttg agtgcgaaaa acgctctttc ttcttcttct
> > agtatcgatg
> > 15301 tttttggatt cgtccgcgtc atattaagca acttggctat ttacaaaaaa
> > catattcttc
> > 15361 ttcctttgta gccaaaagcc atgggtaaaa acctacccta attgttctaa
> > cttcaactac
> > 15421 atctaataac caaaaatgtt agctatattt tctcttttat ggtcaaaaca
> > tattagctag
> > 15481 tttagacact ttctttgtgt tgagcttttt aggcttgaat ctaatcgaat tc
> > 15541
> > //
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp email: lapp at gnf.org
> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
> -------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu