[Bioperl-l] bp_genbank2gff3.pl error: "MSG: structure_type 2 is currently unknown"
Dave Clements
clements at nescent.org
Wed Oct 29 21:53:31 UTC 2008
Hello all,
I'm trying to translate the threespine stickleback genome from Ensembl in
GenBank format (
ftp://ftp.ensembl.org/pub/current_genbank/gasterosteus_aculeatus/) into GFF3
format using the bp_genbank2gff3.pl script. I get several data errors and
I've contacted Ensembl about some of them.
However, I also have a question about one of the errors. I get this error
many times while parsing the files:
---
# working on region:scaffold:BROADS1:scaffold_180:1:137802:1,
Gasterosteus aculeatus, 30-JUN-2008, Gasterosteus aculeatus scaffold
scaffold_180 BROADS1 full sequence 1..137802 reannotated via EnsEMBL
scaffold:BROADS1:scaffold_180:1:137802:1 Unflattening error:
Details:
------------- EXCEPTION -------------
MSG: structure_type 2 is currently unknown
STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
/usr/local/share/perl/5.8.8/Bio/SeqFeature/Tools/Unflattener.pm:1445
STACK (eval) /usr/local/bin/bp_genbank2gff3.pl:895
STACK main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:894
STACK toplevel /usr/local/bin/bp_genbank2gff3.pl:411
-------------------------------------
# Possible gene unflattening error
withscaffold:BROADS1:scaffold_180:1:137802:1: consult STDERR
---
The code snippet that generates this message is:
1432 # TYPE CONTAINMENT HIERARCHY (aka partonomy)
1433 # set the containment hierarchy if desired
1434 # see docs for structure_type() method
1435 if ($structure_type) {
1436 if ($structure_type == 1) {
1437 $self->partonomy(
1438 {CDS => 'gene',
1439 exon => 'CDS',
1440 intron => 'CDS',
1441 }
1442 );
1443 }
1444 else {
1445 $self->throw("structure_type $structure_type is currently
unknown");
1446 }
1447 }
I get this error if I specify --noCDS or --CDS. I also get it if I parse
the EMBL format files instead. However, if I specify "--filter exon
--filter mRNA" (I have to specify both) the errors go away. According to
http://search.cpan.org/~birney/bioperl/Bio/SeqFeature/Tools/Unflattener.pm#structure_type(and
my copy of the PM), 0 and 1 are the only valid values for this.
However, $structure_type gets set by this chunk of code:
1337 # Are there any mRNA features in the record?
1338 if ($n_mrnas == 0) {
1339 # NO mRNAs:
1340 # looks like structure_type == 1
* 1341 $structure_type = 1;
1342 $need_to_infer_mRNAs = 1;
1343 }
1344 elsif ($n_mrnas_attached_to_gene == 0) {
1345 # $n_mrnas > 0
1346 # $n_mrnas_attached_to_gene = 0
1347 #
1348 # The entries _do_ contain mRNA features,
1349 # but none of them are part of a group/gene, i.e.
they
1350 # are 'floating'
1351
1352 # this is an annoying weird file that has some
floating
1353 # mRNA features;
1354 # eg
ftp.ncbi.nih.gov/genomes/Schizosaccharomyces_pombe/
1355
1356 if ($self->verbose) {
1357 my @floating_mrnas =
1358 grep {$_->primary_tag eq 'mRNA' &&
1359 !$_->has_tag($group_tag)}
@flat_seq_features;
1360 printf STDERR "Unattached mRNAs:\n";
1361 foreach my $mrna (@floating_mrnas) {
1362 $self->_write_sf_detail($mrna);
1363 }
1364 printf STDERR "Don't know how to deal with these;
filter at source?\n";
1365 }
1366
1367 foreach (@flat_seq_features) {
1368 if ($_->primary_tag eq 'mRNA') {
1369 # what should we do??
1370
1371 # I think for pombe we just have to filter
1372 # out bogus mRNAs prior to starting
1373 }
1374 }
1375
1376 # looks like structure_type == 2
* 1377 $structure_type = 2;
1378 $need_to_infer_mRNAs = 1;
1379 }
1380 else {
1381 }
I've attached a file containing only scaffold_180 (cleaned up some), but it
may be too big to make it through the list's filters. If that happens the
files are at
ftp://ftp.ensembl.org/pub/current_genbank/gasterosteus_aculeatus/.
Scaffold_180 is in the "0" data file. I've also appended the relevant
parts of the file at the end.
Can someone explain what the comments mean by:
The entries _do_ contain mRNA features, but none of them are part of a
group/gene, i.e. they are 'floating'
this is an annoying weird file that has some floating mRNA features;
The mRNAs all appear to have gene names associated with them. What am I
missing?
Any ideas?
Thanks,
Dave C
LOCUS scaffold_180 137802 bp DNA HTG 30-JUN-2008
DEFINITION Gasterosteus aculeatus scaffold scaffold_180 BROADS1 full
sequence
1..137802 reannotated via EnsEMBL
ACCESSION scaffold:BROADS1:scaffold_180:1:137802:1
VERSION scaffold_180BROADS1
KEYWORDS .
SOURCE three-spined stickleback
ORGANISM Gasterosteus aculeatus
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
Actinopterygii; Neopterygii; Teleostei; Euteleostei;
Neoteleostei;
Acanthomorpha; Acanthopterygii; Percomorpha; Gasterosteiformes;
Gasterosteidae; Gasterosteus.
COMMENT This sequence was annotated by the Ensembl system. Please visit
the
Ensembl web site, http://www.ensembl.org/ for more information.
COMMENT All feature locations are relative to the first (5') base of the
sequence in this file. The sequence presented is always the
forward strand of the assembly. Features that lie outside of the
sequence contained in this file have clonal location coordinates
in
the format: <clone accession>.<version>:<start>..<end>
COMMENT The /gene indicates a unique id for a gene,
/note="transcript_id=..." a unique id for a transcript,
/protein_id
a unique id for a peptide and note="exon_id=..." a unique id for
an
exon. These ids are maintained wherever possible between
versions.
COMMENT All the exons and transcripts in Ensembl are confirmed by
similarity to either protein or cDNA sequences.
FEATURES Location/Qualifiers
source 1..137802
/organism="Gasterosteus aculeatus"
/db_xref="taxon:69293"
gene complement(1399..13644)
/gene=ENSGACG00000001596
/locus_tag="TOP1 (2 of 2)"
/note="DNA topoisomerase 1 (EC 5.99.1.2) (DNA
topoisomerase I).
[Source:Uniprot/SWISSPROT;Acc:P11387]"
mRNA join(complement(2230..2423),complement(1399..1718),
complement(3787..3936),complement(4166..4260),
complement(4370..4497),complement(5297..5451),
complement(5953..5999),complement(6147..6212),
complement(6228..6374),complement(6548..6582),
complement(6594..6760),complement(7035..7222),
complement(7299..7421),complement(7497..7662),
complement(7676..7735),complement(8704..8786),
complement(8863..8950),complement(9267..9302),
complement(9718..9792),complement(9899..9954),
complement(10037..10139),complement(10661..10751),
complement(13289..13313),complement(13612..13644))
/gene="ENSGACG00000001596"
/note="transcript_id=ENSGACT00000002089"
CDS join(
complement(2321..2423),
complement(3787..3936),complement(4166..4260),
complement(4370..4497),complement(5297..5451),
complement(5953..5999),complement(6147..6212),
complement(6228..6374),complement(6548..6582),
complement(6594..6760),complement(7035..7222),
complement(7299..7421),complement(7497..7662),
complement(7676..7735),complement(8704..8786),
complement(8863..8950),complement(9267..9302),
complement(9718..9792),complement(9899..9954),
complement(10037..10139),complement(10661..10751),
complement(13289..13313),complement(13612..13644))
/gene="ENSGACG00000001596"
/protein_id="ENSGACP00000002084"
/note="transcript_id=ENSGACT00000002089"
/db_xref="HGNC_curated_gene:TOP1 (2 of 2)"
/translation="MSGGHAHAHAQVNSGSKGSETHKHKEKHKEHRHKEHRKEKEREK
LKHSNSEHKDPAEKKLRDKQKLKHSNGSSEKPREKRREEKIQPSHVEKPKKEKENGFV
RERSPSALKSEPEEDNGFYPSPQHLNTCRAESAGRDVGLEYRPKKIKSEHDKKAKKRK
QEYEEDEEEDIKPKKKTRDQKATQGKKIKKEEEKWKCVCKERTETSRRHSLVCGPTFL
TPSWIVDLWDLFAGKPMKLKPPAEEVATFFAKMLDHEYTTKDIFRKNFFKDWRKEMTS
EEKSKLSDLNKCDFGEMSEYFKAQSEARKQMSKEEKQKLKEENERLLQEYGFCIMDNH
KERIGNFRIEPPGLFRGRGDHPKMGMLKRRIRPEDIIINCSKDSKQPKPPPGTKWKEV
RHDNKVTWLASWTENIQGSIKYIMLNPSSRIKVPCQHMTTEKKGSVSLMNNSLRWEPA
ALSLRAPGGVRFLLERFGLRIVSEQQLQRNSGFDENTFFFNKMDSWTIMAASYAGKEK
QNCCHKLHAEAEVYAGEQYLLCPRAPFESSSSPLQTSILNKHLQELMDGLTAKVFRTY
NASITLQQQLKELACPDDSLPAKVLSYNRANRAVAILCNHQRAPPKTFEKSMQNLQTK
IDEKQNQLSAARKQLKSAKAAHKTSHDDKSRKAWKVKRKAVQRIEEQLMKLQVQATDR
EENKQIALGTSKLNYLDPRISVAWCKKWAVPIEKIYNKTQREKFAWAIDMAEKDFEF"
gene complement(16523..17577)
/gene=ENSGACG00000001598
mRNA join(complement(16523..16551),
complement(16802..17577))
/gene="ENSGACG00000001598"
/note="transcript_id=ENSGACT00000002091"
CDS join(complement(16523..16551),
complement(16802..17399))
/gene="ENSGACG00000001598"
/protein_id="ENSGACP00000002086"
/note="transcript_id=ENSGACT00000002091"
/translation="MSPPPAPQVKGQPSPAPAVVSATADSHQSLVERTGQGPPGAVPP
QVLHPPAIQIEAIAPPTSAPAASNNITAPTASSPTPAASQVAVPTPIISQAPVPSTAA
ASNQAQAVAPQPPAVALAGASTSVAATLVSTAAPVQRPVPSVVPIVAGSGPSLEAVAT
TSSPVANPSGVPPAQPNPPAVERPMPPTAASAAITQTSPVSIQQAPPSQ"
gene complement(18492..25815)
/gene=ENSGACG00000001600
mRNA join(complement(18492..19760),
complement(19856..20080),
complement(20334..20468),
complement(20661..20713),
complement(20841..20959),
complement(21093..21501),
complement(21610..21727),
complement(21929..22470),
complement(23568..23708),
complement(23816..24424),
complement(24488..24643),
complement(24749..24877),
complement(24989..25111),
complement(25218..25373),
complement(25716..25815))
/gene="ENSGACG00000001600"
/note="transcript_id=ENSGACT00000002099"
CDS join(complement(18492..19760),
complement(19856..20080),
complement(20334..20468),
complement(20661..20713),
complement(20841..20959),
complement(21093..21501),
complement(21610..21727),
complement(21929..22470),
complement(23568..23708),
complement(23816..24424),
complement(24488..24643),
complement(24749..24877),
complement(24989..25111),
complement(25218..25373),
complement(25716..25815))
/gene="ENSGACG00000001600"
/protein_id="ENSGACP00000002094"
/note="transcript_id=ENSGACT00000002099"
/translation="SAVFIAFRGNMEDEDFSLKLDSILSGIPNMLDMASERLQPQHVE
PWNSVRVTFNIPRDAAERLRLLAQNNQQQLRDLGILSVQIEGEGAINVAVGPNRGQDV
RVNGPTGAPGQMRMDVGFSGQPGPGGVRMANPAMVPPGPGIAGQAMVPGSSGQMHPRI
QRPTSQTGSDGTDPMMAGMSVQQQQQPLQHQQAGPHVPGPMPQAAHHLQALQGGRPLN
PAAQAQLSQLGPRPPFNPSGQMAVPPGWNQLPSGVLQPPATQGSPAWRKPPPQAQMVP
RPPSLATVQTPSHPPPPYPFGSQQAGQVFNAIGQLQQQQQTGVGQFAAPQPKGLQTGP
GGVAGPPRPPPPLPPTSGPQGNLTAKSPGSSSSPFQQGSPGTPPMRPTTPQGFPQGVG
SPGRAALGQPGNMQQGFMGMPQHGQPGAQVHPVITGMPKRPMGFPNPNFVQGQVSGST
PGTPVGGASQQLQGNQAMTHTGALPSASTPNSMQGPPHAQPNVMGVQSGMAGLPPGTT
AGPSMGQQQPGLQTQMMGLQHQAQPVSSSPSQKVQGQGGGQTVLSRPLSQGQRGGMTP
PKQMMPQQGQGVMHGQGQMVGGQGHQAMLMQQQQQQNSMMEQMVANQMQGNKQPFGGK
IPAGVMPGQMMRGPAPNVPGNMVQFQGQQQHQQMNQQQPQQVPIAGNPNQAMGMHGQQ
LRLPAGHPLTAQQHPHPLGDPNGGTGDLGVQQMVPDMQAQQQQGMMGGPQHMQMGNGH
FAGHGMNFNSQFQGQMPMAGACVQPGGFPVSKDVTLTSPLLVNLLQSDISASQFGPGG
KQGAGGGNQAKPKKKKPARKKKSKEGDGPHGLDAAAGMEDSELPNLGGEQSLGLENSG
QKLPEFANRPAGFPGQAGDQRVLQQVPMQMQMQSLQNAQGPQGMTGPQAPGQGQPQMH
PHQLQQQPQQSNLLQQMLMMLKMQQEQAKNRMSIPPGGQIPPRGMGNPPEVQRLPVSQ
QNNMPVMISLPGHGGVPPSPDKARGMPLMVNPQLAGAVRRMSHPDAGQGLQGAGSEEA
IAHQKQPGGPDVGLQHPGNGNQQMMANQGSNAHMMKQGPGPSPMPQHTGASPQQQLPS
QPQQGGPMPGLHFPNVPTTSQSSRPKTPNRASPRPYHHPLTPTNRPPSTEPSEINLSP
ERLNASIAGLFPPKINIPLPPRQPNLNRGFDQQGLNPTTLKAIGQAPPSLTLPGNSNN
GSVGGNNNQQPFSTGSGVGGAGGKQDKQPGGQAKRASPSNSRRSSPASSRKSATPSPG
RQKGTKMAINCPPPQQQLVGSQAQTTMLSPASALPNPLSMPSQVSGAVEAQQTQSPFH
GMQGNAAEGIRESQGMATAEQRQVPQTPPQPLRELSAPRMASPRFPLPQQPKPDLEVK
AGTVDRLPVQTPPVPDSEASPTLRAAPTSLNQLLDNSAIANMPPRAGQNT"
gene complement(28650..36301)
/gene=ENSGACG00000001608
/locus_tag="GGTL3 (1 of 2)"
/note="Gamma-glutamyltransferase 4 precursor (EC
2.3.2.2)
(Gamma- glutamyltranspeptidase 4) (GGT 4) (Gamma-
glutamyltransferase-like 3) (Gamma-glutamyltransferase-
like 5) [Contains: Gamma- glutamyltransferase 4 heavy
chain; Gamma-glutamyltransferase 4 light chain
[Source:Uniprot/SWISSPROT;Acc:Q9UJ14]"
mRNA join(complement(28650..28810),
complement(29299..29398),
complement(29498..29635),
complement(29741..29855),
complement(30289..30441),
complement(30537..30625),
complement(31122..31249),
complement(31894..31981),
complement(32781..32974),
complement(33108..33181),
complement(33575..33642),
complement(34074..34191),
complement(34272..34423),
complement(34505..34731),
complement(35555..35616),
complement(35650..35756))
/gene="ENSGACG00000001608"
/note="transcript_id=ENSGACT00000002107"
CDS join(complement(28650..28810),
complement(29299..29398),
complement(29498..29635),
complement(29741..29855),
complement(30289..30441),
complement(30537..30625),
complement(31122..31249),
complement(31894..31981),
complement(32781..32974),
complement(33108..33181),
complement(33575..33642),
complement(34074..34191),
complement(34272..34423),
complement(34505..34731),
complement(35555..35616),
complement(35650..35756))
/gene="ENSGACG00000001608"
/protein_id="ENSGACP00000002102"
/note="transcript_id=ENSGACT00000002107"
/db_xref="HGNC_curated_gene:GGTL3 (1 of 2)"
/translation="RTEDKSANPETTLGSAYSPVDYMSITSFPRLPEDDKGDNTLKLR
KGEENALSEQDTDPDVFLKSAHLQRLPSSASDLASHEIASLRETRTDPFTEDCACQRD
GLTVIITAGLTFALGVTVALIMQIYLGPPQIFNQGAVVTDVAQCTSLGFDVLERQGSS
VDAAIAAALCLGIVHPHTSGIGGGGVMLVHNIRRNETRVIDFRETAPAAISEEMLLTK
LHLNPGLLVGVPGMLSGLHQAHQLYGRMPWKDVVTMAAEVARTGFNVTHDLAEALAKA
KDQNMSDAFGHLFLPDGQPPPSGLLTRRLDLAAILDAVASKGTSEFYSENLTREMAAA
VQAAGGVLTEEDFGNYSTVLQQPAEIIYQGHHVMAAPAPHAGIALIAALNILEGYNIT
SQVPRNSTYHWIAEALKISLALASGLGDPMYDTSISDVVAKMLSKSQASLLRQMINDS
QAFPVGHYAPSFTLETGAAAAQVMVMGPDDHIVSVMSSLNKPFGSGIVTPSGILLNSQ
ILDFSWPNKTRGSSPNPHNSLQPGKRPMSFLMPTAVRPAVGLCGTYVAVGSSDGEKAL
SGITQVLMNVLSSRKNMSDSLAYGRLHPHLLPNMLLVDSEFEDEDVELLQAKGHKVER
RDVLSLVEGTRRTNDLIIGVKDPRSADASALTMS"
mRNA join(complement(29318..29398),
complement(29498..29635),
complement(29741..29855),
complement(30289..30441),
complement(30537..30625),
complement(31122..31249),
complement(31894..31981),
complement(32781..32974),
complement(33108..33181),
complement(33575..33642),
complement(34074..34191),
complement(34272..34423),
complement(34505..34731),
complement(35555..35849),
complement(36242..36301))
/gene="ENSGACG00000001608"
/note="transcript_id=ENSGACT00000002113"
CDS join(complement(29318..29398),
complement(29498..29635),
complement(29741..29855),
complement(30289..30441),
complement(30537..30625),
complement(31122..31249),
complement(31894..31981),
complement(32781..32974),
complement(33108..33181),
complement(33575..33642),
complement(34074..34191),
complement(34272..34423),
complement(34505..34731),
complement(35555..35690))
/gene="ENSGACG00000001608"
/protein_id="ENSGACP00000002108"
/note="transcript_id=ENSGACT00000002113"
/translation="MSITSFPRLPEDDNAAAAAAPAPAPGDNTLKLRKGEENALSEQD
TDPDVFLKSAHLQRLPSSASDLASHEIASLRETRTDPFTEDCACQRDGLTVIITAGLT
FALGVTVALIMQIYLGPPQIFNQGAVVTDVAQCTSLGFDVLERQGSSVDAAIAAALCL
GIVHPHTSGIGGGGVMLVHNIRRNETRVIDFRETAPAAISEEMLLTKLHLNPGLLVGV
PGMLSGLHQAHQLYGRMPWKDVVTMAAEVARTGFNVTHDLAEALAKAKDQNMSDAFGH
LFLPDGQPPPSGLLTRRLDLAAILDAVASKGTSEFYSENLTREMAAAVQAAGGVLTEE
DFGNYSTVLQQPAEIIYQGHHVMAAPAPHAGIALIAALNILEGYNITSQVPRNSTYHW
IAEALKISLALASGLGDPMYDTSISDVVAKMLSKSQASLLRQMINDSQAFPVGHYAPS
FTLETGAAAAQVMVMGPDDHIVSVMSSLNKPFGSGIVTPSGILLNSQILDFSWPNKTR
GSSPNPHNSLQPGKRPMSFLMPTAVRPAVGLCGTYVAVGSSDGEKALSGITQVLMNVL
SSRKNMSDSLAYGRLHPHLLP"
gene 53987..55637
/gene=ENSGACG00000001618
/locus_tag="SNAI1 (2 of 2)"
/note="Zinc finger protein SNAI1 (Protein snail homolog
1)
(Protein sna). [Source:Uniprot/SWISSPROT;Acc:O95863]"
mRNA
join(53987..54136,54343..54534,54562..54707,54756..54915,
55148..55314,55605..55637)
/gene="ENSGACG00000001618"
/note="transcript_id=ENSGACT00000002120"
CDS
join(54052..54136,54343..54534,54562..54707,54756..54915,
55148..55314,55605..55637)
/gene="ENSGACG00000001618"
/protein_id="ENSGACP00000002115"
/note="transcript_id=ENSGACT00000002120"
/db_xref="HGNC_curated_gene:SNAI1 (2 of 2)"
/translation="MPRSFLVKKYFSNRKPSWDRDSQLESQAAFVPESFAQAELPTQN
GSFALTCYPTGPSFSGVGVLPAPLSPIAPASPSPSPLGPLDLSSAPSSNGGRTSDPPS
PDVVQHAFHCLRCTSSYSSLSALSHHQASHHQASQRARQRPAFHCKHCPKEYTSLGAL
KMHIRSHTLPCVCPTCGKAFSRPWLLRGHIRTHTGERPFACQHCNRAFADRSNLRAHL
QKHPEVKKYQCGSCSRTFSRMFLLLNTAPPGAGVCAPLRGNIQ"
mRNA
join(54052..54136,54343..54535,54563..54731,54756..54915,
55148..55290,55293..55316,55420..55428)
/gene="ENSGACG00000001618"
/note="transcript_id=ENSGACT00000002124"
CDS
join(54052..54136,54343..54535,54563..54731,54756..54915,
55148..55290,55293..55316,55420..55428)
/gene="ENSGACG00000001618"
/protein_id="ENSGACP00000002119"
/note="transcript_id=ENSGACT00000002124"
/translation="MPRSFLVKKYFSNRKPSWDRDSQLESQAAFVPESFAQAELPTQN
GSFALTCYPTGPSFSGVGVLPAPLSPIAPASPSPSPLGPLDLSSAPSSSGGRTSDPPS
PDVVQHAFHCLRCTSSYSSLSALSHHQASHHQASQRARQQHSSPLPPRPAFHCKHCPK
EYTSLGALKMHIRSHTLPCVCPTCGKAFSRPWLLRGHIRTHTGERPFACQHCNRAFAD
RSNLRAHLQKHPEVKKYQCGSCSRTFSRMFLLQHSASGCCPPC"
gene 69424..106380
/gene=ENSGACG00000001624
mRNA join(69424..70049,70523..70546,105631..105758,
106305..106380)
/gene="ENSGACG00000001624"
/note="transcript_id=ENSGACT00000002125"
CDS 69516..69824
/gene="ENSGACG00000001624"
/protein_id="ENSGACP00000002120"
/note="transcript_id=ENSGACT00000002125"
/translation="MKRLKNLIMLTIDLTKIPSQRRSLPLLTRGRFVRRPQAFLAAFV
VVWPDCRRVQSSEDPSIAARSLHLNICFKGCDRRREHDLLHLISNKTNIKKGKTKTKC
L"
gene complement(69458..72266)
/gene=ENSGACG00000001627
mRNA join(complement(69458..70045),
complement(70528..70553),
complement(71520..71631),
complement(72179..72266))
/gene="ENSGACG00000001627"
/note="transcript_id=ENSGACT00000002128"
CDS complement(69515..69880)
/gene="ENSGACG00000001627"
/protein_id="ENSGACP00000002123"
/note="transcript_id=ENSGACT00000002128"
/translation="MRYSGPFACVFKLFYYNTHKHLVLVLPFLMLVLFDIKCNRSCSL
LLSQPLKHIFKCKLRAAMDGSSLLCTRRQSGQTTTNAAKNACGRRTKRPRVSNGSERR
WEGIFVKSIVNIIKFLSLFI"
gene complement(74610..77849)
/gene=ENSGACG00000001630
/locus_tag="SAMHD1 (2 of 3)"
/note="SAM domain and HD domain-containing protein 1
(Dendritic cell-derived IFNG-induced protein) (DCIP)
(Monocyte protein 5) (MOP-5).
[Source:Uniprot/SWISSPROT;Acc:Q9Y3Z3]"
mRNA join(complement(74610..74684),
complement(74761..74861),
complement(74977..75120),
complement(75749..75819),
complement(75907..76022),
complement(76434..76594),
complement(77706..77849))
/gene="ENSGACG00000001630"
/note="transcript_id=ENSGACT00000002132"
CDS join(complement(74612..74684),
complement(74761..74861),
complement(74977..75120),
complement(75749..75819),
complement(75907..76022),
complement(76434..76594),
complement(77706..77738))
/gene="ENSGACG00000001630"
/protein_id="ENSGACP00000002127"
/note="transcript_id=ENSGACT00000002132"
/db_xref="HGNC_curated_gene:SAMHD1 (2 of 3)"
/translation="MAGRPSDLLGKVFNDPIHGHMEMHPLLIRIIDTPQFQRLRHIKQ
LGGVYFVFPGASHNRFEHSLGVAHLAGELVRDLKQRQPDLNITDRDVLCVQIAGLCHD
LGHGPFSHMFDGMFIPKARPGLTWKHEKASVEMFDHLVADNDLKPVMKEHGLKLPEDL
VFIKELMDPKDPKDPWSYKGRLENKSFLYEIVSNKRNAIDVDKWDYFARDCYHLGIKN
NFDHGRCLMFARVCE"
gene complement(90678..100224)
/gene=ENSGACG00000001632
/locus_tag="SAMHD1 (1 of 3)"
/note="SAM domain and HD domain-containing protein 1
(Dendritic cell-derived IFNG-induced protein) (DCIP)
(Monocyte protein 5) (MOP-5).
[Source:Uniprot/SWISSPROT;Acc:Q9Y3Z3]"
mRNA join(complement(90678..91320),
complement(91530..91667),
complement(91829..91933),
complement(92020..92107),
complement(92441..92582),
complement(93438..93553),
complement(93628..93719),
complement(94565..94673),
complement(94750..94850),
complement(94942..95100),
complement(95645..95715),
complement(95803..95918),
complement(98829..98989),
complement(99304..99376),
complement(99751..99817),
complement(99921..100224))
/gene="ENSGACG00000001632"
/note="transcript_id=ENSGACT00000002142"
CDS join(complement(91198..91320),
complement(91530..91667),
complement(91829..91933),
complement(92020..92107),
complement(92441..92582),
complement(93438..93553),
complement(93628..93719),
complement(94565..94673),
complement(94750..94850),
complement(94942..95100),
complement(95645..95715),
complement(95803..95918),
complement(98829..98989),
complement(99304..99376),
complement(99751..99817),
complement(99921..100089))
/gene="ENSGACG00000001632"
/protein_id="ENSGACP00000002136"
/note="transcript_id=ENSGACT00000002142"
/db_xref="HGNC_curated_gene:SAMHD1 (1 of 3)"
/translation="MASRKRSFPPDSSLSAPGKRAPGPGAPQTDYAGWGAEETCRYLR
AEGLGEWEDAFREHRITGVGLRYLADADLEKMGLKFLGDRLRVLHSLRTLWQIEVEPS
KVFNDPIHGHMEMHPLLIRIIDTPQFQRLRHIKQLGGAYFVFPGASHNRFEHSLGVGH
LAGQLVRALDQRQPELHITRRDVLCVQIAGLCHDLGHGPFSHMFDGKFIPKARPGFTW
KHEDASVKMFDHLVADNDLQPVMKEHGLVLPEDLDFIKEQIAGPMDPKDMKKLEWPYR
GRPKDKSFLYEIVSNKRNGIDVDKWDYFARDCYHLGIKNNFDYGRCLMFAKVCEVDGQ
KHICTRDKEVGNLYDMFHTRNCLHRRAYQHKVAKIVETMITEAFLKADGHILFEGSKG
KMFSLSTAIDDMEAYTKVTVDNVFEQILNSSSAALKDSREILKNVVCRRLYKCLGHTQ
ADQHENVPQKERIASWEADLARCASQDVVLNPEDFIIDVINLDYGMKEKNPINSVRFY
SKDDPSKAVQIRKNQVSKLLPEQFAEQLIRVYCKKLDSRSLEAAKKNFVQWCMDENFS
KPQDGDIIAPELTPLKPSRQEDDDNNKKEVNPVGKARIQLFER"
gene complement(105654..110675)
/gene=ENSGACG00000001639
/locus_tag="SAMHD1 (3 of 3)"
/note="SAM domain and HD domain-containing protein 1
(Dendritic cell-derived IFNG-induced protein) (DCIP)
(Monocyte protein 5) (MOP-5).
[Source:Uniprot/SWISSPROT;Acc:Q9Y3Z3]"
mRNA join(complement(105654..105755),
complement(106302..106406),
complement(106493..106528),
complement(106762..106946),
complement(107965..108080),
complement(108157..108248),
complement(108694..108802),
complement(108879..109033),
complement(109155..109214),
complement(109846..109916),
complement(110004..110119),
complement(110524..110675))
/gene="ENSGACG00000001639"
/note="transcript_id=ENSGACT00000002145"
CDS join(complement(105654..105755),
complement(106302..106406),
complement(106493..106528),
complement(106762..106946),
complement(107965..108080),
complement(108157..108248),
complement(108694..108802),
complement(108879..109033),
complement(109155..109214),
complement(109846..109916),
complement(110004..110119),
complement(110524..110675))
/gene="ENSGACG00000001639"
/protein_id="ENSGACP00000002139"
/note="transcript_id=ENSGACT00000002145"
/db_xref="HGNC_curated_gene:SAMHD1 (3 of 3)"
/translation="DPIHGHMEMHPLLIRIIDTPQFQRLRRIKQLGGAYFVFPGASHN
RFEHSLGVAHLAGKLVRALDQRQGDLHIDDRDVLCVQIAGLCHDLGHGPFSHMFDGKF
IPKARPGFTWKHEKASVEMFDHLVADNDLQPNDHVVLFVPDVTVVSSPQWPYRGRLEN
KSFLYEIVSNKRNCIDVDKWDYFARDCYHLGIKNNFDHGRCLMFARVCEVDGQKQICF
RDKEVEDLYDMFYTRICLHRRAYQHKAANIVETMITEAFWKADGHIEFEGSGGQKFKL
SDTIKDMEAYTKVTDDVFEKILNSSSDELKDSREILQDVVCRRIYKCIGQAQPTQPTT
VTVSVIIFSYFTLEKLEADVVLNPEDFIIDVINLDYGMKEENPIDRVRFYSKDDPDKG
FQIPQNQVFGFLPEKFTKELIRVYCKKLDSESLKAAKDNFK"
exon complement(10037..10139)
/note="exon_id=ENSGACE00000016865"
exon complement(8863..8950)
/note="exon_id=ENSGACE00000016876"
exon complement(7299..7421)
/note="exon_id=ENSGACE00000016885"
exon complement(9899..9954)
/note="exon_id=ENSGACE00000016866"
exon complement(5297..5451)
/note="exon_id=ENSGACE00000016904"
exon complement(4370..4497)
/note="exon_id=ENSGACE00000016905"
exon complement(7676..7735)
/note="exon_id=ENSGACE00000016881"
exon complement(7035..7222)
/note="exon_id=ENSGACE00000016889"
exon complement(5953..5999)
/note="exon_id=ENSGACE00000016902"
exon complement(6228..6374)
/note="exon_id=ENSGACE00000016896"
exon complement(4166..4260)
/note="exon_id=ENSGACE00000016911"
exon complement(1399..1718)
/note="exon_id=ENSGACE00000016924"
exon complement(10661..10751)
/note="exon_id=ENSGACE00000016863"
exon complement(6548..6582)
/note="exon_id=ENSGACE00000016893"
exon complement(13289..13313)
/note="exon_id=ENSGACE00000016862"
exon complement(9267..9302)
/note="exon_id=ENSGACE00000016873"
exon complement(9718..9792)
/note="exon_id=ENSGACE00000016870"
exon complement(13612..13644)
/note="exon_id=ENSGACE00000016858"
exon complement(3787..3936)
/note="exon_id=ENSGACE00000016915"
exon complement(6147..6212)
/note="exon_id=ENSGACE00000016899"
exon complement(6594..6760)
/note="exon_id=ENSGACE00000016891"
exon complement(2230..2423)
/note="exon_id=ENSGACE00000016920"
exon complement(8704..8786)
/note="exon_id=ENSGACE00000016877"
exon complement(7497..7662)
/note="exon_id=ENSGACE00000016884"
exon complement(16523..16551)
/note="exon_id=ENSGACE00000016940"
exon complement(16802..17577)
/note="exon_id=ENSGACE00000016938"
exon complement(23568..23708)
/note="exon_id=ENSGACE00000016969"
exon complement(18492..19760)
/note="exon_id=ENSGACE00000017004"
exon complement(20661..20713)
/note="exon_id=ENSGACE00000016989"
exon complement(21093..21501)
/note="exon_id=ENSGACE00000016980"
exon complement(19856..20080)
/note="exon_id=ENSGACE00000016996"
exon complement(24749..24877)
/note="exon_id=ENSGACE00000016956"
exon complement(25218..25373)
/note="exon_id=ENSGACE00000016951"
exon complement(25716..25815)
/note="exon_id=ENSGACE00000016949"
exon complement(21929..22470)
/note="exon_id=ENSGACE00000016971"
exon complement(24989..25111)
/note="exon_id=ENSGACE00000016953"
exon complement(20841..20959)
/note="exon_id=ENSGACE00000016984"
exon complement(20334..20468)
/note="exon_id=ENSGACE00000016991"
exon complement(24488..24643)
/note="exon_id=ENSGACE00000016961"
exon complement(23816..24424)
/note="exon_id=ENSGACE00000016965"
exon complement(21610..21727)
/note="exon_id=ENSGACE00000016974"
exon complement(36242..36301)
/note="exon_id=ENSGACE00000017125"
exon complement(29741..29855)
/note="exon_id=ENSGACE00000017086"
exon complement(34272..34423)
/note="exon_id=ENSGACE00000017035"
exon complement(33575..33642)
/note="exon_id=ENSGACE00000017051"
exon complement(35650..35756)
/note="exon_id=ENSGACE00000017021"
exon complement(29318..29398)
/note="exon_id=ENSGACE00000017135"
exon complement(34074..34191)
/note="exon_id=ENSGACE00000017044"
exon complement(29299..29398)
/note="exon_id=ENSGACE00000017095"
exon complement(30537..30625)
/note="exon_id=ENSGACE00000017071"
exon complement(33108..33181)
/note="exon_id=ENSGACE00000017056"
exon complement(31894..31981)
/note="exon_id=ENSGACE00000017062"
exon complement(35555..35849)
/note="exon_id=ENSGACE00000017127"
exon complement(35555..35616)
/note="exon_id=ENSGACE00000017028"
exon complement(31122..31249)
/note="exon_id=ENSGACE00000017066"
exon complement(29498..29635)
/note="exon_id=ENSGACE00000017091"
exon complement(28650..28810)
/note="exon_id=ENSGACE00000017102"
exon complement(34505..34731)
/note="exon_id=ENSGACE00000017031"
exon complement(32781..32974)
/note="exon_id=ENSGACE00000017060"
exon complement(30289..30441)
/note="exon_id=ENSGACE00000017076"
exon 55148..55290
/note="exon_id=ENSGACE00000017228"
exon 54343..54535
/note="exon_id=ENSGACE00000017218"
exon 55293..55316
/note="exon_id=ENSGACE00000017233"
exon 54563..54731
/note="exon_id=ENSGACE00000017224"
exon 55605..55637
/note="exon_id=ENSGACE00000017197"
exon 54756..54915
/note="exon_id=ENSGACE00000017188"
exon 54343..54534
/note="exon_id=ENSGACE00000017167"
exon 53987..54136
/note="exon_id=ENSGACE00000017156"
exon 54052..54136
/note="exon_id=ENSGACE00000017212"
exon 55148..55314
/note="exon_id=ENSGACE00000017193"
exon 54562..54707
/note="exon_id=ENSGACE00000017179"
exon 55420..55428
/note="exon_id=ENSGACE00000017240"
exon 106305..106380
/note="exon_id=ENSGACE00000017258"
exon 69424..70049
/note="exon_id=ENSGACE00000017248"
exon 105631..105758
/note="exon_id=ENSGACE00000017255"
exon 70523..70546
/note="exon_id=ENSGACE00000017250"
exon complement(70528..70553)
/note="exon_id=ENSGACE00000017275"
exon complement(69458..70045)
/note="exon_id=ENSGACE00000017281"
exon complement(72179..72266)
/note="exon_id=ENSGACE00000017268"
exon complement(71520..71631)
/note="exon_id=ENSGACE00000017272"
exon complement(75907..76022)
/note="exon_id=ENSGACE00000017299"
exon complement(74977..75120)
/note="exon_id=ENSGACE00000017305"
exon complement(74761..74861)
/note="exon_id=ENSGACE00000017308"
exon complement(76434..76594)
/note="exon_id=ENSGACE00000017296"
exon complement(75749..75819)
/note="exon_id=ENSGACE00000017303"
exon complement(77706..77849)
/note="exon_id=ENSGACE00000017294"
exon complement(74610..74684)
/note="exon_id=ENSGACE00000017310"
exon complement(98829..98989)
/note="exon_id=ENSGACE00000017342"
exon complement(93438..93553)
/note="exon_id=ENSGACE00000017375"
exon complement(90678..91320)
/note="exon_id=ENSGACE00000017391"
exon complement(94750..94850)
/note="exon_id=ENSGACE00000017366"
exon complement(99921..100224)
/note="exon_id=ENSGACE00000017324"
exon complement(99751..99817)
/note="exon_id=ENSGACE00000017329"
exon complement(94565..94673)
/note="exon_id=ENSGACE00000017369"
exon complement(91530..91667)
/note="exon_id=ENSGACE00000017386"
exon complement(95645..95715)
/note="exon_id=ENSGACE00000017355"
exon complement(91829..91933)
/note="exon_id=ENSGACE00000017381"
exon complement(92020..92107)
/note="exon_id=ENSGACE00000017379"
exon complement(99304..99376)
/note="exon_id=ENSGACE00000017335"
exon complement(94942..95100)
/note="exon_id=ENSGACE00000017360"
exon complement(92441..92582)
/note="exon_id=ENSGACE00000017377"
exon complement(93628..93719)
/note="exon_id=ENSGACE00000017372"
exon complement(95803..95918)
/note="exon_id=ENSGACE00000017347"
exon complement(110004..110119)
/note="exon_id=ENSGACE00000017407"
exon complement(110524..110675)
/note="exon_id=ENSGACE00000017404"
exon complement(107965..108080)
/note="exon_id=ENSGACE00000017418"
exon complement(106493..106528)
/note="exon_id=ENSGACE00000017422"
exon complement(108157..108248)
/note="exon_id=ENSGACE00000017414"
exon complement(108694..108802)
/note="exon_id=ENSGACE00000017413"
exon complement(109846..109916)
/note="exon_id=ENSGACE00000017408"
exon complement(109155..109214)
/note="exon_id=ENSGACE00000017410"
exon complement(106762..106946)
/note="exon_id=ENSGACE00000017420"
exon complement(105654..105755)
/note="exon_id=ENSGACE00000017425"
exon complement(106302..106406)
/note="exon_id=ENSGACE00000017424"
exon complement(108879..109033)
/note="exon_id=ENSGACE00000017412"
misc_feature 1..5487
/note="contig contig_13399 1..5487(1)"
misc_feature 5852..7735
/note="contig contig_13400 1..1884(1)"
misc_feature 8660..14728
/note="contig contig_13401 1..6069(1)"
misc_feature 15200..39327
/note="contig contig_13402 1..24128(1)"
misc_feature 46327..47864
/note="contig contig_13403 1..1538(1)"
misc_feature 50118..51320
/note="contig contig_13404 1..1203(1)"
misc_feature 53911..55318
/note="contig contig_13405 1..1408(1)"
misc_feature 55419..56091
/note="contig contig_13406 1..673(1)"
misc_feature 56664..57183
/note="contig contig_13407 1..520(1)"
misc_feature 57284..83877
/note="contig contig_13408 1..26594(1)"
misc_feature 83978..137802
/note="contig contig_13409 1..53825(1)"
BASE COUNT 32610 a 28563 c 28961 g 33195 t 14473 n
ORIGIN
1 GGTTTACCTC CCGGGGGGGG GGCGACACGG CGGAGTTGCC CCCCCGGAGG GAACCAGCCG
--
Fill out the the GMOD Community Survey NOW and win some GMOD Gear:
http://gmod.org/wiki/GMOD_News#2008_GMOD_Community_Survey
-------------- next part --------------
A non-text attachment was scrubbed...
Name: scaffold_180.genbank
Type: application/octet-stream
Size: 213889 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20081029/1ac1ae41/attachment-0004.obj>
More information about the Bioperl-l
mailing list