[Bioperl-l] writing genbank files

Jason Stajich jason@cgt.mc.duke.edu
Wed, 18 Sep 2002 12:46:59 -0400 (EDT)


I did this fix about 1 week ago, don't see how your parsing would have
worked before unless genbank parsing changed too from the version you
were using before..

All I did was:

RCS file: /home/repository/bioperl/bioperl-live/Bio/Species.pm,v
retrieving revision 1.16
retrieving revision 1.17
diff -r1.16 -r1.17
282c282
<     return 1 if $string =~ /^[A-Z][a-z]+$/;
---
>     return 1 if $string =~ /^[A-Z][\sa-z]+$/;


Someone needs to refresh the ideas behind the Species object and the
taxonomic fields in genbank/embl records.  Either we are parsing things
differently or the values that one can put in the field are changing.  We
have a lot more taxonomic fields that are not matching what was expected
when this module was built (James G did the brunt of the work back in the
day).

Anyways, I am perfectly happy to turn off the name validatation
altogether, basically it required all fields other than the species to
start with a capital letter.

-jason


On Wed, 18 Sep 2002, Hilmar Lapp wrote:

> I believe Jason fixed something yesterday in Species.pm in order to
> allow spaces in certain places. Jason?
>
> 	-hilmar
>
> On Wednesday, September 18, 2002, at 01:48 AM, gert thijs wrote:
>
> > Hilmar,
> >
> > I just installed the modules from the main trunk. I tried to test
> > it but now I was unable to parse input sequences in genbank format.
> > Now I have a problem uploading a genbank flat file. There seems to
> > be a problem while parsing the species name. I guess not having an
> > upper case starting letter stops the genbank parser. In attachment
> > you can find a file on which the parser throws the expection.
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Invalid name 'eurosids II' (Wrong case?)
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> > /users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/Root/Root.pm:318
> > STACK: Bio::Species::validate_name
> > /users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/Species.pm:283
> > STACK: Bio::Species::classification
> > /users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/Species.pm:121
> > STACK: Bio::SeqIO::genbank::_read_GenBank_Species
> > /users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/SeqIO/genbank.pm:884
> > STACK: Bio::SeqIO::genbank::next_seq
> > /users/sista/thijs/perl/lib/site_perl/5.6.0/Bio/SeqIO/genbank.pm:229
> > STACK: AnnotatedSequence::new
> > /users/sista/thijs/perl/lib//AnnotatedSequence.pm:66
> > STACK: GeneIndex.pl:168
> > -----------------------------------------------------------
> >
> > Gert
> >
> >
> > Hilmar Lapp wrote:
> >> It should be written as join(complement(...),complement(...),...).
> >> This is main trunk only though. Do you have an example where this
> >> is not true?
> >>     -hilmar
> >> On Tuesday, September 17, 2002, at 02:06 AM, gert thijs wrote:
> >>> Hello,
> >>>
> >>> I have a question about the current status of the genbank file
> >>> parser/writer.  I noticed that a CDS with a location of the type
> >>> complement(join()) is written as a join() without the complement.
> >>> I saw that this problem has been a major thread on the list a few
> >>> weeks ago, but I could not find if the problem has been solved by
> >>> now or if it was solved how it should be solved.
> >>>
> >>> Gert
> >>>
> >>>
> >>>
> >>> -- + Gert Thijs
> >>> +  K.U.Leuven
> >>> +  ESAT-SCD
> >>> +  Kasteelpark Arenberg 10
> >>> +  B-3001 Leuven-Heverlee
> >>> +  Belgium
> >>> +
> >>> + Tel  : +32 16 32 85 88
> >>> + Fax  : +32 16 32 19 70
> >>> + email: gert.thijs@esat.kuleuven.ac.be
> >>> +
> >>> +  http://www.esat.kuleuven.ac.be/~thijs
> >>> +  http://www.esat.kuleuven.ac.be/~dna/BioI/
> >>> +
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l@bioperl.org
> >>> http://bioperl.org/mailman/listinfo/bioperl-l
> >>>
> >> -- -------------------------------------------------------------
> >> Hilmar Lapp                            email: lapp at gnf.org
> >> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> >> -------------------------------------------------------------
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l@bioperl.org
> >> http://bioperl.org/mailman/listinfo/bioperl-l
> >
> >
> > -- + Gert Thijs
> > +  K.U.Leuven
> > +  ESAT-SCD
> > +  Kasteelpark Arenberg 10
> > +  B-3001 Leuven-Heverlee
> > +  Belgium
> > +
> > + Tel  : +32 16 32 85 88
> > + Fax  : +32 16 32 19 70
> > + email: gert.thijs@esat.kuleuven.ac.be
> > +
> > +  http://www.esat.kuleuven.ac.be/~thijs
> > +  http://www.esat.kuleuven.ac.be/~dna/BioI/
> > +
> > LOCUS       AB000094               15532 bp    DNA     linear   PLN
> > 13-JAN-1998
> > DEFINITION  Arabidopsis thaliana gene for inorganic phosphate
> > transporter,
> >             protein phosphatase 1 catalytic subunit, complete cds.
> > ACCESSION   AB000094
> > VERSION     AB000094.1  GI:2780346
> > KEYWORDS    protein phosphatase 1 catalytic subunit inorganic phosphate
> > SOURCE      Arabidopsis thaliana (strain:Columbia) DNA.
> >   ORGANISM  Arabidopsis thaliana
> >             Eukaryota; Viridiplantae; Streptophyta; Embryophyta;
> > Tracheophyta;
> >             Spermatophyta; Magnoliophyta; eudicotyledons; core
> > eudicots;
> >             Rosidae; eurosids II; Brassicales; Brassicaceae;
> > Arabidopsis.
> > REFERENCE   1  (bases 0 to 0)
> >   AUTHORS   Mitsukawa,N., Okumura,S. and Shibata,D.
> >   TITLE     High-affinity phosphate transporter genes of
> > Arabidopsis thaliana
> >   JOURNAL   Soil Sci. Plant Nutr. 43, 971-974 (1997)
> > REFERENCE   2  (bases 0 to 0)
> >   AUTHORS   Mitsukawa,N., Okumura,S. and Shibata,D.
> >   TITLE     Isolation of a gene that encode a protein phosphatase 1
> > catalytic
> >             subun it in Arabidopsis thaliana
> >   JOURNAL   Unpublished (1996)
> > REFERENCE   3  (bases 1 to 15532)
> >   AUTHORS   Mitsukawa,N.
> >   TITLE     Direct Submission
> >   JOURNAL   Submitted (24-DEC-1996) to the DDBJ/EMBL/GenBank databases.
> >             Norihiro Mitsukawa, Mitsui Plant Biotech.Res.Inst.,
> > Research
> >             Division; TCI-D21, Sengen 2-1-6, Tsukuba, Ibaraki 305,
> > JAPAN
> >             (E-mail:tsu01129@koryu.statci.go.jp, Tel:0298-58-6252,
> >             Fax:0298-58-6234)
> > COMMENT     Sequence updated (08-Jan-1998).
> > FEATURES             Location/Qualifiers
> >      source          1..15532
> >                      /chromosome=5
> >                      /strain="Columbia"
> >                      /organism="Arabidopsis thaliana"
> >                      /db_xref="taxon:3702"
> >      gene            join(2560..2960,3086..4250)
> >                      /gene="PHT3"
> >      CDS             join(2560..2960,3086..4250)
> >                      /product="inorganic phosphate transporter"
> >                      /gene="PHT3"
> >                      /protein_id="BAA24281.1"
> >                      /codon_start=1
> >
> > /translation="MADQQLGVLKALDVAKTQLYHFTAIVIAGMGFFTDAYDLFCVSL
> >
> > VTKLLGRLYYFNPTSAKPGSLPPHVAAAVNGVALCGTLAGQLFFGWLGDKLGRKKVYG
> >
> > ITLIMMILCSVASGLSLGNSAKGVMTTLCFFRFWLGFGIGGDYPLSATIMSEYANKKT
> >
> > RGAFIAAVFAMQGVGILAGGFVALAVSSIFDKKFPSPTYEQDRFLSTPPQADYIWRII
> >
> > VMFGALPAALTYYWRMKMPETARYTALVAKNIKQATADMSKVLQTDLELEERVEDDVK
> >
> > DPKKNYGLFSKEFLRRHGLHLLGTTSTWFLLDIAFYSQNLFQKDIFSAIGWIPKAATM
> >
> > NAIHEVFKIARAQTLIALCSTVPGYWFTVAFIDIIGRFAIQLMGFFMMTVFMFAIAFP
> >
> > YNHWILPDNRIGFVVMYSLTFFFANFGPNATTFIVPAEIFPARLRSTCHGISAATGKA
> >
> > GAIVGAFGFLYAAQPQDKTKTDAGYPPGIGVKNSLIMLGVINFVGMLFTFLVPEPKGK
> >                      SLEELSGEAEVDK"
> >                      /db_xref="GI:2780347"
> >      gene            join(9110..9510,9661..10834)
> >                      /gene="PHT2"
> >      CDS             join(9110..9510,9661..10834)
> >                      /product="inorganic phosphate transporter"
> >                      /gene="PHT2"
> >                      /protein_id="BAA24282.1"
> >                      /codon_start=1
> >
> > /translation="MAEQQLGVLKALDVAKTQLYHFTAIVIAGMGFFTDAYDLFCVSL
> >
> > VTKLLGRIYYFNPESAKPGSLPPHVAAAVNGVALCGTLSGQLFFGWLGDKLGRKKVYG
> >
> > LTLIMMILCSVASGLSFGNEAKGVMTTLCFFRFWLGFGIGGDYPLSATIMSEYANKKT
> >
> > RGAFIAAVFAMQGVGILAGGFVALAVSSIFDKKFPAPTYAVNRALSTPPQVDYIWRII
> >
> > VMFGALPAALTYYWRMKMPETARYTALVAKNIKQATADMSKVLQTDIELEERVEDDVK
> >
> > DPRQNYGLFSKEFLRRHGLHLLGTTSTWFLLDIAFYSQNLFQKDIFSAIGWIPKAATM
> >
> > NATHEVFRIARAQTLIALCSTVPGYWFTVAFIDTIGRFKIQLNGFFMMTVFMFAIAFP
> >
> > YNHWIKPENRIGFVVMYSLTFFFANFGPNATTFIVPAEIFPARLRSTCHGISAAAGKA
> >
> > GAIIGAFGFLYAAQNQDKAKVDAGYPPGIGVKNSLIVLGVLNFIGMLFTFLVPEPKGK
> >                      SLEELSGEAEVSHDEK"
> >                      /db_xref="GI:2780348"
> >      CDS             join(13719..13967,14046..14605,14688..14862)
> >                      /product="protein phosphatase 1 catalytic subunit"
> >                      /protein_id="BAA24283.1"
> >                      /codon_start=1
> >
> > /translation="MDPGTLNSVINRLLEAREKPGKIVQLSETEIKQLCFVSRDIFLR
> >
> > QPNLLELEAPVKICGDIHGQYPDLLRLFEHGGYPPNSNYLFLGDYVDRGKQSLETICL
> >
> > LLAYKIKFPENFFLLRGNHESASINRIYGFYDECKRRFSVKIWRIFTDCFNCLPVAAL
> >
> > IDERIFCMHGGLSPELLSLRQIRDIRRPTDIPDRGLLCDLLWSDPDKDVRGWGPNDRG
> >
> > VSYTFGSDIVSGFLKRLDLDLICRAHQVVEDGFEFFANKQLVTIFSAPNYCGEFDNAG
> >
> > AMMSVSEDLTCSFQILKSNDKKSKFSFGSRGGAKTSFPYPKVKVCINHITF"
> >                      /db_xref="GI:2780349"
> > BASE COUNT     5197 a   2674 c   2593 g   5068 t
> > ORIGIN
> >         1 cttttaaaaa atttacaaag atttttaaaa gcattttttt ggctgggaaa
> > aaaattttac
> >        61 gagaaaacat ttttggcggg aaattttttt tttggcggga aaaaaagttt
> > ggcggaaatt
> >       121 ttttgtttgg cgggaaaaat aaattttggt acatgttatt attaaatgag
> > ggtaatttgg
> >       181 tcattctgtt caatagaagg gatattttta aaaataaaca atataaaatg
> > gtattgttac
> >       241 aaaagggtag taaaaaaagg gtagttttgc aaatctccct cgagaatttg
> > caatttctct
> >       301 ctcatagcat aacatgtctt ttttttagta aatttcaaga tgacaaaaag
> > aaaaaaaaaa
> >       361 cataaagaaa caaattgaaa gcaacatcta tttttctgac ttgtaaaaat
> > gtggtgttac
> >       421 tagcaaatat caaatttttg tatctataaa ctagatttta acccgcggta
> > tactaggaga
> >       481 acgatttatt ttttaaagtt aatatatata caagtttaca aattgaatat
> > atttataaaa
> >       541 aaaataaatt ttttagttta caattattat taggtaacat cccgccaaat
> > ctgttccatc
> >       601 aaaaagctta ttaattttat taatgttaac cttagttatg atatgattct
> > aaatcattgt
> >       661 ctagatattt tagccaatgt taggacgttg aacccacata gttttcgtca
> > attgaataat
> >       721 atatgttcgt tttataaatt tcgaatcaca atttatgcag aaaatgtttt
> > gaaatttttc
> >       781 ttaactttac atattataca ataaaaaatc aaacaaagat gataggaaat
> > tcatagtttt
> >       841 aaatcttaat caaatcttct aaatatcaat attgttaaat atagaagtaa
> > tgagtataag
> >       901 agttgggttt tatttgaacc aacccaacta agatttaatg aacatcataa
> > actaatgggt
> >       961 ctggtatcaa atgtaatcag atgcatttta acccacaatc aaaaggtgac
> > tgcaaaaata
> >      1021 ctatataaga ttcatagaag tagatcgtct atataggcct gaaaaatatc
> > agttttaaat
> >      1081 taaattattt agagatctta ttttgttggg cttaaatact ggagaataag
> > actactttag
> >      1141 taatcaaaat atttgggcca tgtaactaat gcaaaccaaa atcattattg
> > acaaaagtca
> >      1201 ttgcatcaaa tgattttctt accaaaatgt ctttaaaata tatatatata
> > tatatatata
> >      1261 tatatgtacc ttcgtacggt acggttgctt ttatataaaa aggtgttgta
> > cggttgcatt
> >      1321 gacttttgat cgtaaatact atatagtccg gagactcaaa tgcaaaccac
> > tttctctatt
> >      1381 tttttggacg gaagttatgt taacctaaat tatctaaaat attacacaaa
> > taatttttaa
> >      1441 acagtttaaa atggcatata attatgttaa tagaaatttg acttttagta
> > cggagataca
> >      1501 ttagctgaga agtttgacct caccttattt agtgggaacc ttgttacttt
> > tgaattagtt
> >      1561 tgaacaatat atctctaatt catataaagt attttgaaat ataaagcttt
> > ataaatataa
> >      1621 catatttttt ggtgggtaat gcttcttttt ttctgtggct tataactacg
> > tggagatctt
> >      1681 atcttctgtc cgcttatatt agtatttcac catagtgatt ttaataaaaa
> > tcagattaaa
> >      1741 caatatataa cagctatttt agtttacagg tttaaacttt tattttagtt
> > ctttgatttt
> >      1801 ttttttcttt cagaaaatgg tggacatctt ctaaagtttg ttttattaat
> > ctaggtagaa
> >      1861 actttcaaac aaaagacgaa aacagaaaag ttgactttag tacggaaata
> > catgatatga
> >      1921 gataactgat aagataactt gttagatggt ttccactctc ataaacaata
> > taaaaaaatt
> >      1981 attaacgata gtgaactcga tttttagtaa ctctatgaaa atcttcttgt
> > gagatcataa
> >      2041 cattttagaa gactaggctt ttcgtgtaat atgatctgta gaccattcca
> > atgttatcat
> >      2101 atgcagaaca gcaacacaat gaaataaatt aagagatgga aaataaatta
> > agatcaaata
> >      2161 aaatgtggat atacccttaa gattccggag acaatacttc tccagttctc
> > ctctcgtcta
> >      2221 taaatttctg ccgtctcagc caacataatc acaaccacca cctctctcaa
> > taattctctc
> >      2281 tcttgtcctt cacaacctgt tactacttag gtgaatatca tctacatgtt
> > tattttgtct
> >      2341 taatccaaat tcacatttaa ttttgttttc actcttttta tcatgtatac
> > ttttttaatt
> >      2401 agctagccga gattatatat gttaaaacta cctctatttc taacacaccc
> > aagcaatata
> >      2461 tcatctttcc aatatttacg aaagaagaaa tttctataca aatattaacc
> > tttactttgt
> >      2521 gcaaaatagg agagaagaaa aacaaaaaga ctgactgaga tggccgatca
> > acagctagga
> >      2581 gtgctaaagg cacttgatgt tgcgaagacg caactttacc atttcacggc
> > tattgtcatt
> >      2641 gccggtatgg gcttctttac ggacgcgtac gatctctttt gtgtgtcctt
> > ggtgaccaag
> >      2701 cttcttggcc gcctctacta cttcaatcca acgtcagcaa agcctggctc
> > acttccccct
> >      2761 catgttgcgg ctgcagtcaa cggtgtggcc ctttgtggaa cccttgccgg
> > tcaactcttc
> >      2821 ttcggatggc ttggtgacaa actcggacgg aaaaaggtgt acggtatcac
> > tttgatcatg
> >      2881 atgattctct gctcagttgc ttccggtctt tccttgggca attcggccaa
> > gggtgtcatg
> >      2941 acgactcttt gcttcttcag gtacaattta tttagccaca aacctaatat
> > cacatacgtc
> >      3001 acagatacaa gctcgagaga ttagtcacta tttcgaccta gattatggtt
> > acttaagata
> >      3061 ctgatatcta gacgattata tataggtttt ggctcgggtt tggcattgga
> > ggtgactacc
> >      3121 ctctatctgc caccatcatg tctgaatacg ctaacaagaa gactcgtggg
> > gctttcatcg
> >      3181 cggcagtgtt cgccatgcaa ggtgtaggta tcttggcggg aggttttgtg
> > gcacttgcag
> >      3241 tttcttccat atttgacaaa aagttcccat cgccgacgta tgagcaagac
> > aggtttctat
> >      3301 caacgcctcc tcaagctgat tacatttggc gaatcatcgt catgtttggt
> > gctttacccg
> >      3361 cagctttgac ttactattgg cgtatgaaga tgcctgaaac agcccgttac
> > accgctttag
> >      3421 ttgccaagaa catcaaacaa gccacagcag acatgtccaa ggtcttacaa
> > acagatctcg
> >      3481 agcttgagga aagggtggag gatgacgtca aggaccccaa aaaaaactat
> > ggcttgttct
> >      3541 ccaaggaatt ccttagacgc catgggcttc atctccttgg gactacctcc
> > acttggtttt
> >      3601 tgcttgacat cgccttctac agccaaaact tgttccaaaa ggatattttc
> > tcggccattg
> >      3661 gatggatccc aaaggcagcc actatgaacg ccatccatga ggttttcaag
> > attgctaggg
> >      3721 ctcagactct cattgccctc tgcagtacag tcccaggtta ctggttcaca
> > gtagccttta
> >      3781 ttgatatcat tggaaggttt gcgatccaac taatgggatt tttcatgatg
> > accgttttta
> >      3841 tgtttgctat tgccttccca tacaaccact ggattttacc agataatcgt
> > atcggattcg
> >      3901 tggttatgta ctcactcaca tttttcttcg ccaactttgg acccaatgca
> > actactttca
> >      3961 ttgtcccagc tgaaatcttt ccagcaaggc taaggtctac gtgccatgga
> > atatcagccg
> >      4021 caactggtaa ggctggagcc atcgttggag ccttcgggtt cctatatgct
> > gctcaaccac
> >      4081 aggataagac caagacagac gcaggatacc caccgggcat cggagtcaag
> > aactcattga
> >      4141 tcatgcttgg tgtcattaac tttgttggta tgctcttcac cttcctcgtc
> > cctgagccca
> >      4201 agggcaagtc ccttgaagaa ctctccggcg aggctgaggt tgataaatga
> > ttatgccgtc
> >      4261 atatatgttt gtcattggtt ttgcgatgtg tgaattatat ttgtaatggt
> > gtactacttt
> >      4321 tacgttttac gttctttgcg atgagtgaat tatatttgta acggtgtact
> > actttcgctt
> >      4381 tttgtttaaa tgtgtgtgca agtgcaactt gttaagatgt aaactcatgt
> > tatggtcata
> >      4441 tcctagtaat gctataagtt tggaagcaat aaagacatga acaatcaaac
> > aaaaaatatg
> >      4501 cttagtggaa agtttgaaat gaaagatata agggctagtg gatgtaattc
> > tagggatcaa
> >      4561 tattgataga actgataata caaaaaatca gcttgtactg gtatattagg
> > ttgaaaagat
> >      4621 aaatgagata ttttagaact actgaactat aaagattggt atattgttgt
> > tcatgtaatc
> >      4681 gaaagatttt ttttgtaaga ttgtagatga atttatcatc aaatgaacta
> > caaagactag
> >      4741 ggactcccaa gagtccaaga ccctactctt cacttgtcta aacatttaag
> > tgaagcaatg
> >      4801 tcaaattagg agtatgatgc aacagtctat ttgatatatt tgaccagttg
> > tactagactc
> >      4861 aatttgatgc tgccagtgtg ccgtagaagc cttccaaata gacattatag
> > ctttagtcaa
> >      4921 agcaaatgag gaagccgagt ggtttcagaa cttgaaaata tttcgatggg
> > agaaaccagt
> >      4981 gttggcacta cgtgtaaact aaactctgat agcaattatt tattgttcga
> > gcattgcgct
> >      5041 cttagtttaa tgttaaatct tgtctaatta gaagacaaca caaaaaccaa
> > tagcaactaa
> >      5101 tctcaaatag tgtaatctca cttgattata tcaaatcgga tgataaccta
> > tgacctattt
> >      5161 tactaaaagg tctgtcacaa aagtaagtta tatatattca aaacaaaaaa
> > aatataagat
> >      5221 aaaaaataaa acgaagaaaa actgggaatg tcaaatatga taaatgattt
> > aaacaaatat
> >      5281 gtcatgacaa gaaaaagaaa aaagaaaaaa aactgtcacc gttacaaaaa
> > actgtcacca
> >      5341 ttataagtgt tattatactt ataaaaatca gcaagaacaa caaattttgg
> > tcatcataag
> >      5401 aattcaaaat tacaaatata tcgtaacata aaacattata gtggtcattc
> > catcaatttt
> >      5461 gtttatcgtt taacaaatct gcgatcaaac gatatatttt agtaatataa
> > tttcatctat
> >      5521 cacaacataa ccaattatgt cataactaaa aatactatat tttaaaaaca
> > aatatgttgt
> >      5581 tacatgaaaa tattttggta acattataaa aatctgtcat aagatataac
> > aaatctgcta
> >      5641 tcattagagg tcattatatt tattagtttt ttgtacataa atttcttata
> > acatgataga
> >      5701 tgtttttaga aaattataaa tctaccgagt cattcaaaat aaaacagcca
> > gacactacag
> >      5761 taagtctgtt ataaactatg aaagtatgta gttgaataaa aaaatctgtc
> > aaaaaaataa
> >      5821 atatgtcaaa aaaataaatc agctacaatt gttttagtct attgtaacca
> > tattctaaaa
> >      5881 aatctgtcaa ccaaaaaagt ctgccaaaag aaatatttac ggacaaatag
> > atctgtagta
> >      5941 tataaattag atattaaact tttgtaaaat tattgctaga attaatttga
> > aaggacattt
> >      6001 ttctatttat ttttttacac atgagacaag tgtattttag acatgaaaat
> > aatttaaccc
> >      6061 aaattgtttt taaaataatc tagtaattaa actgataatt tatgttaatc
> > tagtaacttt
> >      6121 cccattatta taatgataaa ttgttgagaa aatttacttt attttagtgt
> > tactttgttc
> >      6181 tagagtaaaa tcgaagaata cttagggagg gacatagggt cagtgccggc
> > ttaacatgga
> >      6241 gaggagggag tgcaacaacg agtggcccat aaagcaaaag ggacacatac
> > aattctttcc
> >      6301 ttttgtaaac ataaaaagaa aaagaaaaaa atatgtgcaa atataacaga
> > aaattgcaaa
> >      6361 tttttactaa aaagctcata tatatttgca ttccttttaa atataaggta
> > aaaaaaaatt
> >      6421 ctaaatttat tgggcccaaa ttatatgttg atatatttta atccaacaac
> > tctaatacaa
> >      6481 atattgattt attgggccaa atctcttttg taaaatgaga ttgttgtttt
> > attttatttt
> >      6541 aaatgatgtt tactaagtgt tttacttttg tacgaaaaca atttccaata
> > tattaagggt
> >      6601 ccaatttttt ttttgcccta ggacctagaa taagcttgag ccggcactgc
> > agagggtcct
> >      6661 aggatttttg tattttttac tttgtaagtg aatgccatag caagccaagg
> > actgaagaag
> >      6721 aacaacggaa aatatagaga atatgaattt gtgtgatgcg aagacgaaaa
> > gaaattatag
> >      6781 aaaatcaact gaaattgatg acgtcgaatt tgattgagga gaaaaaatag
> > ctacaaaata
> >      6841 agaggaagaa aaaggaaaat aaagaaaaga acgaagagaa tgagaaggag
> > aagccgaaaa
> >      6901 gggtctagag aagaggggtg ttttttcatg aagacaagat actacgaaag
> > aaggaagatg
> >      6961 aagggctagt tataagtaga tgtagatctc tgaatatata tatatatata
> > tatatataaa
> >      7021 gtgtctccct aaaccggaaa aaacgatggg ttttgatatg ctgaacgggg
> > gacaaaaccc
> >      7081 gagaaccgaa atcaatttaa aagttcaccg atcactatac aactctgcac
> > ttattgaaac
> >      7141 cgaaaaccaa aattctccgt ttagacggtt ttggttcaat acggttctta
> > ttgaacgtca
> >      7201 atcactctat ctactactat acttgaaaaa caaaattaaa ttttttgcct
> > tgacttgatc
> >      7261 cagagagatg ttacgcggcg gcgatgcatt aatcgataaa atcaaacaac
> > atgccacgca
> >      7321 ttgacttttg caagaagcca gcaactttat ataaaacaat cgatgataat
> > tgtgatcgtt
> >      7381 acaactatgt aataatgcat atatataagc tatgtattaa ctgtaaactt
> > gatttaaata
> >      7441 atactgatgg ggagtatata ccttgatgca tagaataaaa atgtaaaata
> > cgtatgaatt
> >      7501 actagcttag tggtagcctt gttacttgga ttagtttagc aagcaaatat
> > ctagctctaa
> >      7561 ttcacaagct aaccctaaaa gttggttgga catatacatc gtcaaatata
> > tcgaagctat
> >      7621 atatatatat ttaacatatt tgtttttgtt acattaaaac aaacaaaaca
> > aaagaaaaaa
> >      7681 gttttttttg tgctcctcct ttttaatatc ttttatgaaa atggctgact
> > gttatggaaa
> >      7741 atgatgtttt tattaatcta agtagtgcct ttcaccaaag aaaaaaaaaa
> > accaaatata
> >      7801 gtagacttta gtacggagat acgtgaacat aacttgttag attggttttc
> > aacaaaagag
> >      7861 tatagaatgg aaaatcgtta catactgact ctcatgaaca aacatatttt
> > gcttaatgaa
> >      7921 aagtcaactc gatctttatt atttatgagg ataaaaatca aataggctct
> > gtcgacaaaa
> >      7981 cgaaacaatc agactaccct ttagaatata tttaaaatta ccaaaaatat
> > taatctttta
> >      8041 ggaaattaca catgatgata ttttagcata tggcggatat ttacctccaa
> > ctccaaatga
> >      8101 ggatatttta gcatatggcg gtctttattc gtacgtacta ttttgcaaat
> > cattttatat
> >      8161 gattgtatat accatagatt ataatcacta ctcttagtcc cataatttgt
> > aaataatagt
> >      8221 aattcgtaga catagatgat caaagttctt tgtaatctta ctacatacac
> > aaaaagtaat
> >      8281 aaaacacaac atttttgttg tattaaatgt gcttgtccgt tgacaaaaaa
> > gataaaagaa
> >      8341 tagctttgtt tgtcctttct cgacaagttt gtattataat cgtctcacgt
> > tcatttcaag
> >      8401 attacatgca gaatttgtca aatgaagtct aaatcatcaa aagtaatttt
> > tccttatttg
> >      8461 ttctcacttg aatataaatt aacgaaatat gattcaccat aaattacacg
> > aactatagga
> >      8521 tacgggtact tatttattcc cttgcggtaa tacgaatagg tccaagattg
> > gaaatctcag
> >      8581 agttgtcatg aatactaagt taaaatatgt gcagaccata caaaacatag
> > acgagcatag
> >      8641 aaagagaagt tgatcaagtg atggaaaaag attaaaataa aatataaata
> > tggatatacc
> >      8701 cttaagattc cggagactat gcttctccga tctcgtctat aaatttcagc
> > cgtctcagcc
> >      8761 aacacaatca caaccaccac ctccctccct ctctctctct taatctttct
> > gcccccacca
> >      8821 ttagcgcaca acggtgagat tcgttagatg tttattatct atgcatccaa
> > attcacatat
> >      8881 agatagatag tacatatata gttgtttata tcaatagaaa cttttgtttc
> > taaattgtaa
> >      8941 tcgcttaaat tatttgattt atgtataggt ttaattataa atgttattct
> > gttggttgta
> >      9001 ttggtactaa actaaaataa tttagtagca tctttccaat attattataa
> > aattatagtt
> >      9061 aagatattta tttttacttc atgaagcagg aaagacaaga gaggcttaga
> > tggctgaaca
> >      9121 acaactagga gtgctaaagg cactcgatgt tgcgaagacg caactttatc
> > atttcacggc
> >      9181 gattgtcatc gccggtatgg gtttctttac cgatgcgtac gatctttttt
> > gcgtgtcctt
> >      9241 ggtaacaaag ctccttggcc gcatctacta cttcaatccg gagtcagcga
> > agcctggctc
> >      9301 acttccccct catgttgcgg ccgctgtcaa tggtgtggcc ctttgtggaa
> > ccctttctgg
> >      9361 tcaactcttc ttcggttggc tcggtgacaa actcggacgg aaaaaagtgt
> > acggtcttac
> >      9421 tttgatcatg atgatcttat gctctgtcgc ttctggcctc tcttttggca
> > acgaagccaa
> >      9481 gggtgtcatg accacccttt gcttcttcag gtacagtttt catccaatta
> > caatattata
> >      9541 tacacacatg attaaccaat aatctaataa cgaatgcaag ttttaaaagt
> > tagtcacgct
> >      9601 tcgaactgat ttaggtattt ttctttgaga aagttttttt atatataaac
> > cattacgtag
> >      9661 gttttggttg ggatttggta ttggaggtga ctacccactt tctgccacca
> > tcatgtctga
> >      9721 atacgcaaac aagaagaccc gtggggcttt catcgcagct gtcttcgcca
> > tgcaaggtgt
> >      9781 cggtatcttg gctggaggtt tcgtggcact cgcagtatct tctatattcg
> > acaaaaagtt
> >      9841 cccagctcca acatatgcag taaacagggc cctctcaacg cctcctcaag
> > ttgattacat
> >      9901 ttggcgaatc atcgtcatgt ttggtgcttt acccgcagct ttgacttact
> > actggcgtat
> >      9961 gaagatgcct gaaactgccc gttacaccgc tttggttgcc aagaacatca
> > aacaagccac
> >     10021 agccgacatg tccaaggtct tacaaacaga tatcgagctt gaggaaaggg
> > tggaggatga
> >     10081 cgtcaaagac cccagacaaa actatggctt gttctccaag gaattcctta
> > gacgccatgg
> >     10141 acttcatctc cttggaacta cctccacttg gtttttgctt gacattgcct
> > tctacagcca
> >     10201 aaacttgttc cagaaggata ttttctcggc catcggatgg atcccaaagg
> > cagccaccat
> >     10261 gaacgccacc catgaggttt tcaggattgc tagggctcag actcttatcg
> > ccctttgcag
> >     10321 tacagtccca ggctactggt tcacagttgc gtttattgat accattggaa
> > ggtttaagat
> >     10381 ccaactaaat ggatttttca tgatgaccgt gtttatgttt gccattgcct
> > tcccttacaa
> >     10441 ccactggatc aaaccagaaa accgtatcgg atttgtggtt atgtactctc
> > ttactttctt
> >     10501 cttcgccaat tttggtccaa atgcaaccac ttttattgtc cctgctgaga
> > tattcccggc
> >     10561 caggctaagg tcaacatgtc atggaatatc agccgcggct ggtaaggctg
> > gagccatcat
> >     10621 tggagccttc gggttcttat atgcggctca aaatcaagac aaggctaagg
> > tggatgcagg
> >     10681 atacccacca ggtatcggag ttaagaactc attgatcgtg cttggtgttc
> > ttaacttcat
> >     10741 cggtatgctc ttcaccttcc ttgtcccaga gcccaagggc aagtccctcg
> > aagaactctc
> >     10801 tggtgaagct gaagttagcc atgacgagaa ataatttact ttgtgatcaa
> > atgtggagtt
> >     10861 gtttgtggtt tgtttgattt ttttgtttct ttccaaattt tttttttcct
> > ttccttgatg
> >     10921 cagtaagaaa tgtttgattg tgtggaatac ttcagtctta ctgccgtgag
> > tgcaaatctt
> >     10981 gtaatccact tttttgtgtt ctaattaaca ttgtcaaata aagtttctta
> > aactttgatt
> >     11041 ctaactttcc aaacactaac gaactctctg ttgtgcgttt atgtttatca
> > atttgaagca
> >     11101 tgaaaatttg attgttcata tgtttgctat atactgagaa acaccaaaga
> > atagttttta
> >     11161 tggatttcga ttatctgatc aacacattag ttggttatgg tgttcaatgt
> > tcagtgtctt
> >     11221 gtcatcacta acgttatctc tttgccttat atcgacacat tcaagtgggc
> > gtagttgtag
> >     11281 accattagtc ttgtgctctg tttgtaaata acgctattgg ttagttagct
> > ttctttttgg
> >     11341 ttggtgtata tattcttttt gggttgaagc tctgtgtagc gtttagaaat
> > gttaaaagtt
> >     11401 cgatcccaaa atgtagcaag atttgaaatt tctttttgga ttgatgtaat
> > gtatttaact
> >     11461 ggcttcttat gaatcacggt gcttattgtt tgattaatgt gaactgaatt
> > gttctgtgct
> >     11521 tgagattcaa aaagctcaag aggatctaat actaaatagt ctcactaatg
> > catatgtttg
> >     11581 atttgtttca aatattgatt gttctctctg acgcatttgt ttgattattg
> > tgcactacct
> >     11641 gtttgggatg ggcattctat tggtctcact ataccaaatt accaactttg
> > tgaaactcac
> >     11701 cataacttgt gtgatccata aaatcttaga ttaattactg ttaagtttca
> > tgaaatttta
> >     11761 actcaaaaca atttggcaat aaatagattg acattgtatg tccaaaaacc
> > aaactgagta
> >     11821 tttaacaaaa ccaactattt ccgaccaatg aatgcgtcta aaatgtttgt
> > ttccaacaaa
> >     11881 tttctgatca ttgatttcgt cggaactaac acaaataata attcgcataa
> > tcataataat
> >     11941 tctcctaatc agttcataaa cttgaatgac aagttaattt actcatttta
> > ccatgaccat
> >     12001 taattcataa attgagtcac aaattaatat caagatttga ccatgtccct
> > tatgcaagag
> >     12061 ggggttggaa agcgatgggt ctatagtgtt acaattaatt ctctagtatt
> > catataacca
> >     12121 aaacacaatt ataatatttt cacattaaag ataaattaaa aacatccata
> > taatgtttca
> >     12181 catttaataa tatacatatt ttaaatattt tgttaacatt taaaattatg
> > atttttctat
> >     12241 catttttttt attttttaaa ctactttgaa tcatcatata atacataaca
> > ctaattaaaa
> >     12301 tctaattttt tctggatata ttgtgattcg gatgtttgaa aacagccata
> > tattactcac
> >     12361 ataatgaaat aaaattattt gtgttcaaag tctcacatca gctaaaactt
> > tgtgtccaaa
> >     12421 atctcataaa taagaccaaa ttgaacccaa agaaatatga gtgagtagga
> > atcctaactt
> >     12481 aacaaacatc caaatttaaa tatttgttca gtttaattcg gttctttagt
> > caaaccgttt
> >     12541 aattatacca gatatcataa attctaaact ttttaaaaaa ttaaaatatt
> > ataattgatc
> >     12601 tatttaaagt gtataaaaac ttcaaaatat tatcaacgtt taacttataa
> > aaagtttaaa
> >     12661 atatcacggt atgggttata tggtttgacg agtttgaatc ttcattaata
> > taaatttaaa
> >     12721 tatcattaac tcggtggtac aacatgggta agtcatcatt ttaatatttt
> > gttaacacta
> >     12781 gcaatattat aatgatatgt caagttaaat tgattcaatt atgattatgt
> > attaaaacaa
> >     12841 gatatataag tattagtttt ctaggtatct gcaatattag aaaccagata
> > tccaaaagtc
> >     12901 acggatctgg attgttaaag cacagttctg aatacttttc aaattctagg
> > atatccggat
> >     12961 ccattctacc aattacaatc ctgcagaagc cttgttggcc acacaatcga
> > cacaagagtt
> >     13021 agctagccaa aaagtgcctt caacttaaga agatgcaaac cggaaagaaa
> > aaaacatatc
> >     13081 tttggtccga aggaaatcat gttttattat caaatttaac caccaacaac
> > tacaaaccaa
> >     13141 atttacaaca acaacaacca aaaataaaaa cacatacaca caagtgatag
> > cagcagacaa
> >     13201 cacaaaacat aacaaaaaca gaagcttccc aaaattccga aattcaaaat
> > cctctttaat
> >     13261 ttctcatcat atagtagtcg agttacgttg gcatggtttc taatagttat
> > tatgcggagt
> >     13321 tttaaatctt catcagttgt attctttcga attttgcgaa ctcagaatcg
> > actgcaaata
> >     13381 aacatgttta attagatcaa atggttagag attccgagct tgccttgtaa
> > catgacatga
> >     13441 caacaatgat ttggctataa attacttgta agttactagt gagaaacata
> > gcaaaccaat
> >     13501 tttcaagtat accacagtgt tagttttgac taacattagg tctaattcgg
> > tgaagtcaat
> >     13561 actaatccaa agtttcatct atcaatgcga atagatttaa aattattgag
> > gagcacaaac
> >     13621 agattcaaaa tgttaaatca ttcaaaaaca tttagtaata gagcaaatta
> > ttatgcattt
> >     13681 accatatttc attttttgtg ctacagtata tccacaaatt aaaaagtgat
> > gtgattaata
> >     13741 cataccttca ctttaggata agggaagcta gttttagcac cacctctgct
> > tccgaaactg
> >     13801 aactttgatt tcttgtcatt agattttaag atctgaaaag agcaggtcaa
> > atcctcagac
> >     13861 acactcatca tcgcacctgc attgtcaaat tccccacagt aattcggcgc
> > agagaatatc
> >     13921 gttacgagct gcttattcgc aaagaactcg aatccatctt caacaacctg
> > ttacaaaatc
> >     13981 gttgcttgtg tcaaaaggaa acacgttaaa aatgtttctt tgggcacaag
> > gaacaataaa
> >     14041 caaacctggt gagccctaca aatgaggtca agatcgagtc ttttaagaaa
> > tccagaaact
> >     14101 atatctgatc caaaagtgta agaaactccg cgatcgttag gcccccaacc
> > tctaacatct
> >     14161 ttatcaggat cagaccacaa gagatcacag agtaaaccac gatcaggaat
> > atccgttgga
> >     14221 cgacgaatat ccctaatctg cctcaagctt agcagctccg gggagagccc
> > accatgcata
> >     14281 caaaaaatcc gctcatcgat gagtgcagcg acggggagac agttgaagca
> > atcagtgaag
> >     14341 attcgccaaa tcttgacact gaatctacgt ttacactcgt catagaagcc
> > gtaaatacga
> >     14401 ttgattgatg cactttcatg gtttcctctg agaaggaaga agttttcagg
> > gaacttaatc
> >     14461 ttgtaagcaa gtaaaagaca aatcgtttcg aggctttgct tgccgcgatc
> > gacataatct
> >     14521 ccaagaaaca agtagtttga attaggaggg tatccgccat gttcgaatag
> > tctcaagaga
> >     14581 tccggatatt gtccatgaat gtcccctgtt tattatttgt ttaaccaaaa
> > gattacatag
> >     14641 gagcacaatc tcagaaaaga gtcagagaga aacagagaga aacgtaccac
> > atattttaac
> >     14701 aggagcttca agttccaaga gatttggttg tctcaagaag atatctctag
> > agacgaaaca
> >     14761 gagctgtttg atctctgttt cagacaactg aacaatcttt cctggttttt
> > ctctagcttc
> >     14821 aagcaaccta ttgatcaccg agtttaaagt accaggatcc atttataaaa
> > cctcaagaaa
> >     14881 cagagcaaca atgagacaaa acaacaacaa cgcctagaag ctttataata
> > tacaaaagct
> >     14941 attgacaata tggaataaat ttaaaaagta tggagctttt atgttctttt
> > atgcttgctc
> >     15001 tttgagaaaa agaatggtga ttgtcaaatt caagaagggc tttataaagc
> > cgcgtttaag
> >     15061 aatatattct tcataccttt ttagcaaaag tcacatgtgt tcatacctac
> > ctacattgtt
> >     15121 ctaacttcaa ctaaatcata taataatcag attctagcta atttagtaac
> > acattgtttg
> >     15181 tgtgtgtgtt aagcttcaca ggattgaatc ctgtccggta cgtcatgatt
> > caaatttaag
> >     15241 atctttccgc cggtactttg agtgcgaaaa acgctctttc ttcttcttct
> > agtatcgatg
> >     15301 tttttggatt cgtccgcgtc atattaagca acttggctat ttacaaaaaa
> > catattcttc
> >     15361 ttcctttgta gccaaaagcc atgggtaaaa acctacccta attgttctaa
> > cttcaactac
> >     15421 atctaataac caaaaatgtt agctatattt tctcttttat ggtcaaaaca
> > tattagctag
> >     15481 tttagacact ttctttgtgt tgagcttttt aggcttgaat ctaatcgaat tc
> >     15541
> > //
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu