[Bioperl-l] module for unflattening GenBank/EMBL/DDBJ records
Peili Zhang
peili at morgan.harvard.edu
Thu Jul 17 12:35:25 EDT 2003
Hi Chris,
I tried to use your Unflatener, but don't seem to understand the results I got
back. can you take a look and let me know if I'm using the unflattener
correctly?
I have my test script (testUnflattener.pl) and one of the ARGS GB files
(AnnIX.v003) attached. below is the output from running testUnflattener.pl
('unknown' is for features w/o the /symbol tag):
source: unknown
exon: unknown
exon: unknown
exon: unknown
exon: unknown
exon: unknown
exon: unknown
mRNA: AnnIX-RA
mRNA: AnnIX-RB
intron: unknown
CDS: AnnIX-P2
CDS: AnnIX-P1
intron: unknown
intron: unknown
intron: unknown
intron: unknown
then I added the /gene tag for all the mRNA/CDS/exon/source features in
AnnIX.v003 and changed the 'source' feature to be 'gene' feature. the output now
changed to:
gene: unknown
mRNA: AnnIX-RA
CDS: AnnIX-P1
mRNA: AnnIX-RB
CDS: AnnIX-P2
exon: unknown
exon: unknown
exon: unknown
exon: unknown
exon: unknown
exon: unknown
intron: unknown
intron: unknown
intron: unknown
intron: unknown
intron: unknown
I'm not worried about the introns, they're not going into chado. but I'm
concerned that exons are not put into the hierarchy but CDS's are instead. this
comes back to our discussion on the chado feature graph/object model etc. I
understand I can infer exons from the join locations of mRNA's, but I have to
include the exons in the tree if they're explicitly listed in the GB file, since
the tags for the exons are important annotation information to be loaded into
chado. is it hard for you to make such changes to your code? furthermore,
according to our chado implementation, I'll need to change CDS's to be 'protein'
features.
let me know what you think. thanks.
Peili
>Date: Tue, 15 Jul 2003 11:49:40 -0700 (PDT)
>From: Chris Mungall <cjm at fruitfly.org>
>X-X-Sender: <cjm at heartbroken.lbl.gov>
>To: Peili Zhang <peili at morgan.harvard.edu>
>Cc: <birney at ebi.ac.uk>, <bioperl-l at bioperl.org>, <emmert at morgan.harvard.edu>
>Subject: Re: [Bioperl-l] module for unflattening GenBank/EMBL/DDBJ records
>MIME-Version: 1.0
>X-Virus-Scanned: by amavisd-new
>X-Spam-Status: No, hits=-103.0 required=3.0
tests=EMAIL_ATTRIBUTION,IN_REP_TO,QUOTED_EMAIL_TEXT,
SPAM_PHRASE_00_01,USER_AGENT_PINE,USER_IN_WHITELIST version=2.43
>X-Spam-Level:
>
>Yes, it is committed
>
>Bio::SeqFeature::Tools::Unflattener
>
>cheers
>Chris
>
-------------- next part --------------
LOCUS DMSOS 14000 bp DNA INV 21-Aug-2001
DEFINITION D.melanogaster FlyBase-curated sequence: AnnIX.v003
ACCESSION AnnIX.v003
SOURCE fruit fly.
ORGANISM Drosophila melanogaster
Eukaryotae; mitochondrial eukaryotes; Metazoa; Arthropoda;
Tracheata; Insecta; Pterygota; Diptera; Brachycera; Muscomorpha;
Ephydroidea; Drosophilidae; Drosophila.
REFERENCE 1
AUTHORS FBrf0104946 == FlyBase, 1996-, other
COMMENT Reference sequence of AnnIX == FBgn0000083
COMMENT This record is derived from the following:
AC009344 AC009344.8 17-FEB-2001
AY007377 AY007377.1 14-SEP-2000
AF261718 AF261718.1 28-AUG-2000
M34068 M34068.1 26-APR-1993
AA390914 AA390914.1 23-APR-2001
AW942105 AW942105.1 23-APR-2001
COMMENT The following contributed to reference sequence development:
bases 1..14000 == AC009344 14575..28574
COMMENT Reference sequence based on BDGP genomic sequence.
FEATURES Location/Qualifiers
gene 1..14000
/gene="AnnIX"
/organism="Drosophila melanogaster"
exon 4191..4247
/comment="exon boundaries inferred from FlyBase alignment
of cDNA to reference sequence"
/evidence="experimental"
/gene="AnnIX"
/label="AnnIX|exon|1"
/number="1"
/primary="AF261718:1..57"
/primary="AY007377:1..57"
mRNA join(4191..4247,6408..6461,7031..7739,7800..7981,
9240..9499)
/gene="AnnIX"
/comment="mRNA structure inferred from FlyBase alignment of
cDNA to reference sequence"
/evidence="experimental"
/label="AnnIX-RA|mRNA"
/primary="AY007377:1..1263"
/primary="M34068:<1..1095"
/symbol="AnnIX-RA"
mRNA join(4191..4247,6408..6461,7031..7739,7800..7981,
9686..9879)
/gene="AnnIX"
/comment="mRNA structure inferred from FlyBase alignment of
EST and cDNA to reference sequence"
/evidence="experimental"
/label="AnnIX-RB|mRNA"
/primary="AF261718:1..1197"
/primary="AW942105:complement(>512..1)"
/symbol="AnnIX-RB"
intron 4248..6407
/label="AnnIX|intron|1-2"
exon 6408..6461
/comment="exon boundaries inferred from FlyBase alignment
of cDNA to reference sequence"
/evidence="experimental"
/gene="AnnIX"
/label="AnnIX|exon|2"
/number="2"
/primary="AF261718:58..111"
/primary="AY007377:58..111"
CDS join(6432..6461,7031..7739,7800..7981,9686..9739)
/gene="AnnIX"
/aa_size="324"
/derived_from="AnnIX-RB"
/evidence="predicted"
/label="AnnIX-P2|CDS"
/symbol="AnnIX-P2"
/translation="MSSAEYYPFKCTPTVYPADPFDPVEDAAILRKAMKGFGTDEKAII
EILARRGIVQRLEIAEAFKTSYGKDLISDLKSELGGKFEDVILALMTPLPQFYAQELHD
AISGLGTDEEAIIEILCTLSNYGIKTIAQFYEQSFGKSLESDLKGDTSGHFKRLCVSLV
QGNRDENQGVDEAAAIADAQALHDAGEGQWGTDESTFNSILITRSYQQLRQIFLEYENL
SGNDIEKAIKREFSGSVEKGFLAIVKCCKSKIDYFSERLHDSMAGMGTKDKTLIRIIVS
RSEIDLGDIKEAFQNKYGKSLESWIKDDLSGDYSYVLQCLASY"
CDS join(6432..6461,7031..7739,7800..7981,9240..9293)
/gene="AnnIX"
/aa_size="324"
/derived_from="AnnIX-RA"
/evidence="predicted"
/label="AnnIX-P1|CDS"
/symbol="AnnIX-P1"
/translation="MSSAEYYPFKCTPTVYPADPFDPVEDAAILRKAMKGFGTDEKAII
EILARRGIVQRLEIAEAFKTSYGKDLISDLKSELGGKFEDVILALMTPLPQFYAQELHD
AISGLGTDEEAIIEILCTLSNYGIKTIAQFYEQSFGKSLESDLKGDTSGHFKRLCVSLV
QGNRDENQGVDEAAAIADAQALHDAGEGQWGTDESTFNSILITRSYQQLRQIFLEYENL
SGNDIEKAIKREFSGSVEKGFLAIVKCCKSKIDYFSERLHDSMAGMGTKDKTLIRIIVS
RSEIDLGDIKEAFQNKYGKSLESWIKEDAETDIGYVLVTLTAW"
intron 6462..7030
/label="AnnIX|intron|2-3"
exon 7031..7739
/comment="exon boundaries inferred from FlyBase alignment
of cDNA to reference sequence"
/evidence="experimental"
/gene="AnnIX"
/label="AnnIX|exon|3"
/number="3"
/primary="AW942105:complement(512..376)"
/primary="AF261718:112..820"
/primary="AY007377:112..820"
/primary="M34068:1..655"
intron 7740..7799
/label="AnnIX|intron|3-4"
exon 7800..7981
/comment="exon boundaries inferred from FlyBase alignment
of cDNA to reference sequence"
/evidence="experimental"
/gene="AnnIX"
/label="AnnIX|exon|4"
/number="4"
/primary="M34068:656..837"
/primary="AF261718:821..1002"
/primary="AW942105:complement(375..193)"
/primary="AY007377:821..1002"
intron 7982..9239
/label="AnnIX|intron|4-5"
intron 7982..9685
/label="AnnIX|intron|4-6"
exon 9240..9499
/comment="exon boundaries inferred from FlyBase alignment
of cDNA to reference sequence"
/evidence="experimental"
/gene="AnnIX"
/label="AnnIX|exon|5"
/number="5"
/primary="AY007377:1003..1263"
/primary="M34068:838..1095"
exon 9686..9879
/comment="exon boundaries inferred from FlyBase alignment
of cDNA to reference sequence"
/evidence="experimental"
/gene="AnnIX"
/label="AnnIX|exon|6"
/number="6"
/primary="AF261718:1003..1197"
/primary="AW942105:complement(>193..1)"
BASE COUNT 3803 a 3178 c 2983 g 4036 t
ORIGIN
1 accgttagaa atgttatgcg ggatacatag ttaagttgca taccctttga gttacaatca
61 ctagttaata atatctacgt tattaccaac acgcacactt tatcgtaata cctccttgaa
121 gtttaattta tacatcaact ttatcagtca aaactttgat ttcgtctgac acttttttcg
181 attacgatcc gtcgccaata attgcgataa atcttatcaa gtctttttgg gattggcgct
241 caaatttaca atatggccgt acatcctact tatgtatgtt ttttaactaa ttaatcacca
301 caatgcaaag tactctttct ttgttgagcc catatgcact cacatttgca ccatgaatca
361 tgtcagtagc tcgtttcatg taacaatttc tactttgcca gattacgatg cgttcggaac
421 aggcagataa gaattcggcc catccaagaa aggccttgac agttctaccc caaaatagag
481 atatcctcgt gatattagaa ggaacccaac aatatgctcg ttcttatctt cttatagaaa
541 tttgtgaatt cccgtatcca atgaaatcat tttacttagt aaaatgattt gttaggcctt
601 aaaaaaaaac aaaaacaccc gaactatcag taccacaatt taagagagaa ctcgttatta
661 tttaacttta ttaattatgt atttctttat caaaagagca gactttttgt ttgtgactgt
721 cttcaacatt agatccgtct ttaacattag atcagatcac ctgacacggg aaactctcgt
781 agactttata caattcaaaa aaaccaaaat cgttacttga cacaaatatc ataactaatg
841 cataaaatat gaaatgagag atatctaaaa tagcttggca tattttcttg gtaaaataaa
901 tgtgttaaat acaaagaatg taaaatgcaa taaatgatac atatatcaaa aatggaatac
961 cacggttact ttaagtgcta gcataacaaa ttacaataat aattcaatat ctagccattc
1021 gttgcacata atttggaggg ttaagagggt aaacaatgga tgggaaatgg gctgggttag
1081 ttgcgaattg gtttatatgg tttataatcc aacatcagta ttatcgtttt cgttagtatg
1141 taaatacaat ttgattttgt tctatcgtga atttcaattg gaagctttta tgagttctgt
1201 cccaccacct gcacgtagtt ggctggaaag agtccgtagc ggttcttgca caatcccctc
1261 caccacccat cgtcgatctt ctcgatgtgg gtgatcacgt catctggatc aaaggatatc
1321 tcgtcgtcat ccgccgcctg gtagtcgtac agggcaatgg cgtggattcc ggtgtcctcc
1381 agatagtcgg ccaaattgtc tgagttcgcg tagattgcct cctcgggaac ggttgcggtg
1441 ccactcggag caacagcctc tgaaacggtt ccattcgttg gtggcaaggg cgaggcagcc
1501 ttaatctcgg cctggttctg gtacagtggc tcttcttcaa ctactggctc aggatcggct
1561 tgaggttgcg gctcaggttg gacttggact tgggcgtgga cttcgggttg aggcgtcggc
1621 acatacactg gtgattgcgg ctcacttctg ggaggcgtgt ccacggtctc aacctcaatt
1681 tggggcacca catctggtgg tggagctgca gccttggcca ctggagctgt ttccggctct
1741 cgagcgggga ccactgtcgg agtaggagca accgcagctg atgttgtact tgccgttggc
1801 tcctcctttg cttctaattc tattttaaca ggctgcgcct tgggaatgat aatgggttcc
1861 tttcgtgctg gaggggtttc agaaacaggc gactgcatct ggttaaatgc actgatagca
1921 ttgccaatgc caccggtccg acctgtctgg atggccgccc tgctgccctt tggaggcggt
1981 gcttccgtcg aggtacgagg agtgttctcg gccacagtct tcttggcggc ttcctcccga
2041 tcccgcttgt ccttggcctc acggagacgt ttctgttcct cggcccgctt acgtgactcc
2101 tcctcggagt ttttagccag attctcgaac ttggcacgca aattggaagg cttggcacct
2161 tcgatgactg gcttgacctt gtggtccacc tggctggcgt gcttctgcgg agcctctttg
2221 tggtcccatc ccaccgcgga cttgtccttt cggtcctcct gaacgccaaa cttgccgcca
2281 aagcccttgg agtaatctta tgttgggaga agatataaat aaaataatca gacatgaaca
2341 tactaagaaa aaatcaatga gaaattcatt tccgttaggt aacttaccct tctgagactc
2401 gtgcttctcc accttctcga tgtgatccca gcccacggcg gatttatcca cccgatcgga
2461 ctgcactcca aatttgcctc cgaagccggt ggcgtaatcc ttctgggagg cgtgcttctc
2521 gaccttttcg acgtggtccc agcccacggc agacttgtcc ttgcgatcct cttgcactcc
2581 gaactttccc ccaaaaccat cgctgtagtc cttctgcgag gcgtgctttc ccaccttgcc
2641 ctggtagtca tgacccacag ccgacttgtc catgcggtcc ttctcaacgc caaacttgcc
2701 cccgtatccg tatcctgcgt tctgatcctt cagcagctgc ttcttcttgt ccagatcggc
2761 ctgctccgtc tcctcgcgca gcttgtccat gctaaaaaga ataaagaaag ggaacaatga
2821 acccgatacc cagtgtttcg agaatttcgg atgactcact cgatggtgcc tgctgtgcga
2881 ccgctgccgt cgatcgtctt tgatccccag cgctgctcct gctcactgac atcgttcacg
2941 aagtccggat ccgtctccca gtcgtcgtcc tccgcggagg ctgcactggt ggcctgaatc
3001 tggtgaccgg cacttgcctt ccacattctt tttaccggtt tctggtcgct ggttactcgc
3061 ttatccttat ctgctgggta gatagataaa caccatgacg agtgtgagta gtcggggcag
3121 gttttcagtt gcaggggctc cgcttccgat tcgtggccca gataacaagt cacaaaacac
3181 cgaagaaggg aggggccgac aaagccggat cgggaacaag atgccactgg cgctctatgt
3241 tcatatgtat actcacatcc cgatgactaa ctcctactgc aagaggaact tatgatctga
3301 tgccgctagg gatgtctgca aatttctggg aggagaaggg ttaaacaatt gattcataaa
3361 gagaccaaca aacgggaaac taactgaact aaatttcatt cgatacaatg gatgcatcga
3421 tatgctacga aaagccgata aattattgtg atgagcttga caaaaggtag ttgctgcagt
3481 tttaggggcg cccaatttaa atgattaaga gatacagtcg tggttctttt actttattcg
3541 ataagtcaat gcaccgatat atactgcagt accaatatat gtgatcgtga tcaatagaac
3601 tgagtgctcg ggcaaaaggt atttgatgtt ataccaagtg gcgccatagt aatttgtata
3661 attaataaga ttctgccaga acggtaatta acaggtataa gagttgactt ttttcatata
3721 accgcaaaca tatcatcgat atatcgatag ttttggattt gaatttgctt cgatgtcggc
3781 tgacaattaa cgtaattggt ttttatatat agttataaaa atacaaataa aaatagttat
3841 aaaacggctt aggaaggaac aggatattta tttaactgta ttgccctaac ggctgaatat
3901 tgaagtcaaa ggttacattt tgaaattcaa tgcgtaatca gatcttattt tcccaatgtc
3961 gtcattatca caacggaaat atcatgaatg tctcaacagc ttccaatgcc gatgattcat
4021 ccatctcgga acactttaga ttctacggtg tatactatga cgacgcagaa gataccagct
4081 tccagttagg cgctaccacc gaatatttcc tagcaacaac aacctatcgc cggttggttc
4141 atctctaatc gacttccagc gagagagcgc gagtataaaa gccaaagacg gagcgcaaac
4201 agctatggta tttattcaca tcgcttgcag ttcgcatcgt ctgacttgta cgttgaaaat
4261 cgaatatcat tggtaaatca aaaggaattt gtagataaaa ctcattgttt actctcgctg
4321 tgccagaaat cgcaaaaaat caagcggaaa atcagtcgga aaagtgagaa aagaaagtgt
4381 gtgctaaaca ctcaaagata attttcattt gtgccattcc aaaatgggca ttttgtagaa
4441 ttctagttgt gggtgcggta tcgcgagaat tctccagaat cttgtttctc tatttacttt
4501 atatttcaat aataaaagcc tcgggtggga gaaaaattca aaaccgaaag atgacatcat
4561 caaactcaaa agcaaaagcc ccactctttt ggcgttttct ctcgcgacgc gtgtgacttt
4621 tctcatctcc gacgaaatct aagtaaactt ttctgttcct ttcttgcttt cccattttac
4681 gtaggaatga gaagcggaat ttttcgaaat atctctatcc cccagtgccc tgcttaattt
4741 agtaattcaa tcaacttcaa ttgcccgaca ggcacttcaa actacagtaa gcactgtgta
4801 tgagcgcctg tgccttgctc agcattactc accggaatcg gatgtggcca cattccaccg
4861 tccacaaaat atagtagtag tgtactcgag gccgtgcaat tatagtagaa taggtgtgtc
4921 atcaaggttt attactaagt gaattgctca gtgcatttgc aaaacatttt tgaaacattt
4981 cgaggtacag tccatcgagg tctccaaaag aagttggcaa aagcttttgt ttatggagtt
5041 tttcacacat ggtcacgagc tattccagtt tttacaccta tattataata tctctttttt
5101 aaagtcgtgc tcagagctta aaattgacgt cacaaatctg gtgaatcact gagtgctgta
5161 atttcaatgt atcatcagtc tatttcaaga acaacattac cagatacgct atgctaaagc
5221 atatctatca atcttgtttt ttccagaccc caaccacttt ttataatgaa acaaaagaat
5281 tgtaattaag ggtgttttca aacacaaatg tgttttctac attgaggtca aactattacg
5341 gagaactcca atcaatattt agagagtcta accaattatt attgattaca ttattctcaa
5401 cttgcattat aatttattta aaaagaaatc atatattaac taaagtagtc tataaacact
5461 agaatcttga gtttaattgt acgccatacg taaaagaacc ggtttcaacg ttggccaaaa
5521 cattttgttt gtttatagct cgactgcaga ggcgtttgta gttgtatctt tttgttaact
5581 tttttgtttt tttggatggc gtcgccacgt caagtctttt ataatgcttt tttcggctgc
5641 cttttgttga ctttctactt tggcggcagc atagaaaggc aaacatgtat tttcagcact
5701 gtcgcactgt gattactaat ttggggaagc tatcaaaaat ctcaatccat ttatcattag
5761 gtaatttggc cctattgatt ttagggatta tgtggactag tccctcgaga tagtgttcca
5821 cttgacaact gaagtggagt ggtagatcta gaacgcaaac gggtgacgag tcagacattt
5881 ccctgaagct ttctcttggc cacaaccaag ccgtgcccga tgttttccaa gccagaacgc
5941 aatgagctca tccgtttatg aggccacttc gtgtgggttt ggtctggcct atcacgaaac
6001 tgccgcgcca tgccatttgg ctatcagacg cacccggttc caaggttcgt tcctccgcaa
6061 actgcgtact gaaaagtgga aactttcact tttccccggc agtacgttga acttcgattt
6121 gggcaccggt cggcaaggaa aactaaaaaa aaaaacatct cggtaaataa actaggaaaa
6181 aaaatcaatg ggtaaaaatg ctctcagacc agcgccaggc tggctacggg gcgtatgcgt
6241 aatgtgaggc ttttacgagt tgatgatgtc accaggcagg aattaaccga aggccgggct
6301 tattgggtct gggaaacata acgtatccaa cattgctggg ggttcagttt tcatcacgat
6361 ttcggcgtag ccattattca ttatgatttc tcacttccta cttacagatt tttcacaaca
6421 aaccaatcaa aatgagttcc gctgagtact acccattcaa ggtgagttta caatggattg
6481 tactttatga gtctggtata aatcaattat ttcatgcgat tagcgccgtt aatgtaaaaa
6541 atcagatcaa attagattca acagatagat gagaatatct taatattatt ttttctaaaa
6601 cgtgtgttgt tgtcaagtaa gcaagatttt tctcgatgca atctataaat tattacaacg
6661 accagatgct acgaaattat ctataattgg gctattaaat tatcatcaga ggtgtatact
6721 agacaatcgt tgacaaacaa gtaccgattg gtgggagaag agagtgataa gagggtttca
6781 gttatgagtt ccttagataa gagtcacaac gaaaaaaaaa gtcaatttag aaagctaact
6841 ttattgcaga ggaattcatg caatactgag ataatacatg tagggaataa ccacatatgt
6901 atttgtattt agaacaagtg ccactgagta gtgttgagtc atttctttgg gattacgtgc
6961 cctgcattaa aatacaccca attcttcttt gttgttactc atttgctgat tgcctatatt
7021 cgatttgcag tgcacaccca ctgtctaccc ggcggatccc ttcgatcccg tcgaggatgc
7081 ggctattctg cgcaaggcga tgaaaggctt cggcaccgac gagaaggcca tcatcgagat
7141 cctggccagg cgtggcatcg tccagcgttt ggagatcgct gaggcgttca agacctcgta
7201 cggcaaggac ctgatctcgg acctcaagtc cgaactgggc ggcaaattcg aggatgttat
7261 cctggctctg atgacgccgc tgccccagtt ctatgcccag gagctgcacg acgccatctc
7321 gggactggga accgacgagg aggccatcat cgagatcctc tgcacgctgt ccaactacgg
7381 catcaagacc attgcccagt tctacgagca gagcttcggc aagtccctag agtccgacct
7441 gaagggcgac accagtggcc acttcaagcg gctgtgtgtc tcgctcgtcc agggcaaccg
7501 ggatgagaac cagggcgtgg acgaggccgc ggccatcgcc gatgcccagg ctctgcacga
7561 cgccggcgag ggacagtggg gcacagatga gtccaccttc aactcgatcc tgatcacccg
7621 ctcctaccag cagctgcgcc agatcttcct cgaatacgag aatctgtcgg gcaacgacat
7681 cgagaaggcc atcaagcggg agtttagcgg ctccgtggag aagggtttcc tggccatcgg
7741 tacgttctta tagcatccta ttctttaggg tcccttctaa ctgatgcatt gctctgcagt
7801 caagtgctgc aagtccaaga tcgactactt ttcggagcgc ctgcacgact caatggccgg
7861 catgggcacc aaggacaaga cgctgatccg catcattgtc agccggtcgg agatcgatct
7921 gggtgacatc aaggaggcat tccagaacaa gtacggcaag agcttggagt cctggatcaa
7981 ggtaaatacc gatttcaatt acattcatat ctgcgtgtgc ttgccagaac tttcgattct
8041 gcaccctgtt caatgtgcca ctaactcgca ttcgattgca cctgcaacaa atcccattaa
8101 ttgtggctcc atcaaagttt aataatcgtt catccaagct ggcttctcct gttgttgtta
8161 ctgctccttt gcccaacact ttcttgccga tttctgaagc cattatccct tcccgcccga
8221 ttgcttcatt tgtgtgcata aaacattaaa acttggcata ttctatattt ttagggcgat
8281 acatccggcg attataagcg tgccctattg gctattgttg gcttctaaaa agaaccccat
8341 ccaacaataa tttatctctt tcgtctgttc cacgctctaa actatatgca aacagaatgt
8401 acaaacaaaa ttccgatatc aaatagttga caatgtatag tttttgaatt ggaacacgtt
8461 ttaacgaaga cgcagtgcat ttaagtcgta gaatcagaac cccagtctcg catcctgttg
8521 attattataa ccattgtgac ttttattatt atgactatgc acgccacatg cacataattg
8581 tatctctata attactacac ctcaggctac ttgcattgct gtgtaggtat actttcagtt
8641 ttgttttgag tctcatttgc aagatatttt aacttttaaa aaatacgaaa ataaaaaata
8701 cgaaaaaatg aaatacaaaa ttcaaatcga gtttctgtta cctttagcag aggtctctgc
8761 actgcttgtc atgtaaataa cagcgctaca ttgggtcgcc taacatcaaa acattaaaca
8821 ttaaaaaggg cgtggattaa accaacttaa aaatcgattt aaatggggct aaatgagtat
8881 attagccctc tttaattgtc tatataaact agatcagcaa gtgtataagg tatacaaact
8941 gttaaatata gttcaattta gatctaaata tacttgcact gcttgctaaa agtacatgtc
9001 aattacatgt aaatataatg tacatacaat ttcaagatgt aaaactttaa atgttatgtt
9061 aaatttgaaa gacattcatt tgctgatcag gtagatatat agttgattac cccttggagg
9121 agtagcttcc ggcaattaac caaaccataa gccatgtata caaagtaaaa ggcgtttaat
9181 gctctgaccc tctgctcttt tcacgctttt ctctacccgt ttaaacgaac caacaacagg
9241 aggatgccga gaccgatatt ggatacgtcc tggtcactct tacggcttgg tagacggaag
9301 cagccggaat atccgaatat ctatgagcaa taccccactg ttcaagtaga aaatgccaaa
9361 aacaaaaaaa cgttgcattt ccccaaaaaa aagtataaca aaagcgaaga acaaatggag
9421 ttggtctata tacagtagtt gtgatgtgtt ctaaaaatcc aatctacaaa acgcttagta
9481 ttttccctct gtgcaataac gtctaacgtt caacgattat ttaacatttt tacgtatttt
9541 tattttgtat acatgtcttt ttttattgta aattatggcg catcaaagtc gtatgcgtag
9601 tttgtgcttg tattaactaa taaagttggc ttacactcaa cggcaaagct gggtcacatt
9661 caccatccac tgatctcctt tccaggacga cctctccgga gactacagct acgtcttgca
9721 gtgcctggcc tcctactaag gatttcctcg ttggatcgat tgttaaccat tctatttgtt
9781 gtaactctta ctttaaggca agcatcgttt gccaactgtt ttgcggaaga ttcatagcct
9841 atgttcaatt cataaatgca ctgtaaaatc gcggtaaata attggaagat tttttcactt
9901 atctagggta accgaaacca agggggaatg ggtattgggg aatattcgtt aagggggaat
9961 gctgtttggg ttttctactt ggcaactcga tcctggtagg ctgcccacag gcggtcgcgg
10021 tacgtctgag ccgcctgctt cgggctgctg gccttcagaa tgcccctgcc caccactccg
10081 atgtcggcac cccgctcctt gaccacatgc tccggactct ggtactgctg gcccaattgg
10141 tccacgccct cgtctatctt cacaccgggc gtcagttgca gaagtccggg gaaggcaaag
10201 gcatcggagg attggcatac cacaccggca acgaaatcta catcggctcc ctcggttgcg
10261 atcttgttgc tgttctcctt gtacttggcg tcgatcaggt tgccgctggc agacatctcc
10321 gccagcagga agacgccgcg ttccttgccg gctcctcctt cgccaaggcc cgccttcaga
10381 ccctgcagaa tactacgtcc aggtaaggtg tgggccgtga ccagatctgc ccaactggaa
10441 atcttataga tgcccttgcc gtactgcagg gacaccgtgt tgccgatgtc tgcaaacttg
10501 cgatcctcca tcagcaggaa attgtgccgc tgggccagag cttgcaggtc agcgatgaat
10561 ttatcactga aatcctccac aatgtctacg tgcgtcttca gcaggcaaat gtacggaccg
10621 cacttgtcgg ccacgtccag gatctcatcg gcgtgtgtca ggtcggcggc caagcagagg
10681 ttcgtctgtt tgctggctat caggttgaag aggcgcttgg ccaccgcgct cttagccaga
10741 ttggcgcgat tctcgtaggt cagtttagtg cgctgcaagt cgttggctgc ggggaaagtg
10801 atggtgatta gttaacactt caattttgaa acgtgcacgt agacctatct tatcgcctct
10861 tcttatcgct atatggttgt tttgtaccta tgccaatttt cttcaatagc ttcacacttt
10921 ggcgatccca aagtacataa ctcttttcaa ctgggaatca attttggcta ttgtttatac
10981 agtatcttaa gagtatatac aaaagttact ccccaaaggc aaagacgatg attatgatga
11041 gattagaata ctgtgacttt acaggtggaa acaaaaagca aaccggttcc catacatctc
11101 atcaaagcgc cataaacgaa atgtgattat ttatagaaga tgggcttgat tgaaggagca
11161 ggctggctct cgagtgggct gataagcgtg gtggcataat tgagtgcgga ccggacactg
11221 ccctgaatca gaatcacgtg ccgcattccc gggaatgatc cggtccgata gtggaatcca
11281 gatcggccta ccgccgttta tggcggttgt ataactttat agtagcgcga ctggagcgct
11341 ccctaattga atttgttcga ggccatgtgc tggccgagtg tcatgtactt attatctatc
11401 tatgcatcta tctatctggc agatatcacg gggatctgga gcatacgcac ctctgacaac
11461 gtcgccctta tcgccgccga cgaaggtacc atcgctgttg atttgcacgg cggcgatgta
11521 cttggccacc gcctccactg tggacttctc gatgcgcccg gcctcgtgaa gtgtgttcag
11581 cagaaaggag agcgtgaaga gcgagtgcat ccgcacgccg tgcttggcga tgttggccac
11641 tccgccctgc tcacggtcta cgacgaccac ggcgtcggtg accacaatgc cctcgccttg
11701 cagatcccgc accgtgtcca ggatgctgga gccggaggtg accacgtcct cgacaatcag
11761 acaggtgtcg ccagcattga agatgccctc gaccagcttc ttggtgccgt acgccttggc
11821 ctccttgcgc cgcaccagca tgggagttcc ctgctgcacg gacacaatgg tggccagcgg
11881 gagcgccgtg tagggaacac cgcacacgtg tttggcgctc agctgcttgt ccttgatgtg
11941 ttccaccagc aggtcggata cggtttgctg cgggaagacc aagttagcta ctgccattga
12001 catggagcac taatcccaaa tactgaccat cacatccgga taactgacga tcactcggag
12061 gtcgaagtag accggcgaat ttatgcccac tttcatcttg aagtcgccga acttgaaggc
12121 attgatctcg aagagcttca gggccagggc ccgcattttg tcggagttct gggcaaccat
12181 gctggcaatt ctaaatctcg atcttaattc ttcacacacg tgctagctag gctccaataa
12241 gaaccgtcca attgggagtc tacgcttttt aaacatgctg ccagtgtgca cgtatctgct
12301 gtgacattgg ggcacatttc gaacacccta attaaggtac aagttctggt tgcgccgcct
12361 gggtggttaa cttcgctatg ccgccaaact tatcgaagtt caaattatta taaatgtcgt
12421 agattttatc aacattggct tcgaattaat aaacgtttat tattagttat agggtaacaa
12481 agtagcataa gtgttaaggt tttgaaataa actattttgc atgtgaaata tttcccaaat
12541 tcataaaata tataaccttt agtttctgag aagtcttaag aaatttcaag gaaatgaatg
12601 gatggattat atacaatatt ttgtccgctg cattgctgta ttgttcacta cttagcattt
12661 gtaatctgaa agctttggct ccgcccacaa acccttgact gcaacttcaa ggggaggagt
12721 ctagtcattt tgcaaccacc aatgacagtc ctgtaagctc atattgcaaa atgaaagcca
12781 aagcgcctgc gtaagtcaac aaagtttgcg ccattcattg aaacaattcc agatcctttg
12841 gcgcgtgttt ccaaaattta gtttcttttc gctggtctcc aaataagtcg caaatttgtg
12901 ctccaaaagc ggcaacttct tagtcgaaaa atcggttttc tctcaatcca tttctcgcct
12961 gcgttgcgat ggccagttca agtggtgaac ctgctgatga agtggctaat aagcgtcctc
13021 gtcttgtggc taatcccaag gccaccaaaa tagttgaacc cacaccggcc aaggtcacca
13081 atcgggtgcc caagtgcgcc cgctgccgga accatgggat catttcagag ctgcggggtc
13141 acaagaagct ctgcacctac aagaactgca agtgcgccaa gtgtgtcctg atctttgaga
13201 ggcagcggat catggccgct caggtaagtt aggatttata tgcacgatga caagcaatcc
13261 tttctgctta attggcaatc attgacatta ctatcctcat tttatgatta ctgcccactt
13321 gctaacttta atgtcatcta tcctggatgg taaaatcgct atcccaaaaa tagcttttta
13381 aaaattcggt gcattcgaat acagaaaatt gcctggttag atccatccat agacatccaa
13441 accatccaga ccagatatat tgctctaaca ttcggagact ttattcccag tcctttagaa
13501 aatttcttct tgtaaaaaca tattcccttt attagcattt acttaaaatg acatatcaaa
13561 tattctcaaa agccaaaagt tttctaaaat aacttcagga tattaatgta taaatgtata
13621 agcataaacg taattgtgtt tcatgttgta ttgttcgcaa tggattccgt gatcgatttt
13681 tactaggtat aactttgaaa cccaatttaa gcctttcgat tataatttaa cttgattaat
13741 gtcactgtta tatttataat ttactaacct gggacgacaa acaaaaacac ctattagcaa
13801 ggggagctta aattaacaat agcaccgaaa actccgacat tttcttatat cgtgttttgt
13861 ga
-------------- next part --------------
#! /usr/local/bin/perl
use lib $ENV{CodeBase};
use Bio::Seq;
use Bio::SeqIO;
use Bio::SeqFeature::Tools::Unflattener;
use strict;
my $seqio = Bio::SeqIO->new(-format=>'genbank', -file=>'AnnIX.v003');
my $seq = $seqio->next_seq();
my $u = Bio::SeqFeature::Tools::Unflattener->new;
my @top_sfs = $u->unflatten_seq($seq);
my $level = 0;
&printfeatures(\@top_sfs, $level);
sub printfeatures {
my $ref = shift;
my @fary = @$ref;
my $ind = shift;
foreach my $f (@fary) {
my $symbol = undef;
if ($f->has_tag('symbol')) {
($symbol) = $f->each_tag_value('symbol');
} else {
$symbol = 'unknown';
}
print "\t" x $ind, ' ', $f->primary_tag, ": $symbol\n";
my @sf = $f->get_SeqFeatures();
if (defined @sf) {
my $i = $ind + 1;
&printfeatures(\@sf, $i);
}
}
}
More information about the Bioperl-l
mailing list