[Bioperl-l] Memory requirements for conversion from embl to genbank
Martin MOKREJŠ
mmokrejs at ribosome.natur.cuni.cz
Thu Aug 31 20:45:56 UTC 2006
Sorry, was too quick in sending it away, forgot to correct one part.
Martin MOKREJŠ wrote:
[...]
> But the original record in both GenBank and EMBL does make sense, right?
>
>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=21727885
>
> LOCUS GVI428955 8027 bp mRNA linear VRL 15-APR-2005
> DEFINITION Hepatitis GB virus B subgenomic replicon neoRepB.
> ACCESSION AJ428955
> VERSION AJ428955.1 GI:21727885
> KEYWORDS core-neo fusion protein; core-neo gene; polyprotein.
> SOURCE Hepatitis GB virus B
> ORGANISM Hepatitis GB virus B
> Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae.
> REFERENCE 1
> AUTHORS De Tomassi,A., Pizzuti,M., Graziani,R., Sbardellati,A.,
> Altamura,S., Paonessa,G. and Traboni,C.
> TITLE Cell clones selected from the Huh7 human hepatoma cell line support
> efficient replication of a subgenomic GB virus B replicon
> JOURNAL J. Virol. 76 (15), 7736-7746 (2002)
> PUBMED 12097587
> REFERENCE 2 (bases 1 to 8027)
> AUTHORS Traboni,C.
> TITLE Direct Submission
> JOURNAL Submitted (22-JAN-2002) Traboni C., Biochemistry, IRBM P.Angeletti,
> via Pontina, km.30, 600. 00040 Pomezia (Roma), ITALY
> COMMENT related sequence AJ277947.
> FEATURES Location/Qualifiers
> source join(1..1281,1893..8027)
> /organism="Hepatitis GB virus B"
> /mol_type="mRNA"
> /isolate="FL3"
> /db_xref="taxon:39113"
> /focus
> source 1282..1892
> /organism="Encephalomyocarditis virus"
> /mol_type="mRNA"
> /db_xref="taxon:12104"
> 5'UTR 1..445
> /experiment="experimental evidence, no additional details
> recorded"
> CDS 446..1273
> /function="core-neo fusion protein"
> /codon_start=1
> /product="neomycin phosphotransferase"
> /protein_id="CAD21956.1"
> /db_xref="GI:21727886"
> /db_xref="GOA:Q8JKE5"
> /db_xref="InterPro:IPR002575"
> /db_xref="UniProtKB/TrEMBL:Q8JKE5"
> /translation="MPVISTQTGRAMIEQDGLHAGSPAAWVERLFGYDWAQQTIGCSD
> AAVFRLSAQGRPVLFVKTDLSGALNELQDEAARLSWLATTGVPCAAVLDVVTEAGRDW
> LLLGEVPGQDLLSSHLAPAEKVSIMADAMRRLHTLDPATCPFDHQAKHRIERARTRME
> AGLVDQDDLDEEHQGLAPAELFARLKARMPDGEDLVVTHGDACLPNIMVENGRFSGFI
> DCGRLGVADRYQDIALATRDIAEELGGEWADRFLVLYGIAAPDSQRIAFYRLLDEFF"
> misc_feature 1282..1892
> /note="internal ribosome entry site (IRES)"
> [...]
>
>
> The above official GenBank record cannot be parsed and the parsing code
> silently leaks through and exits with no data written out. I have filed
> bug #2087.
No, that was my fault, I forgot to say the input is in genbank format. Instead,
bioperl expected embl code and silently run through. Not nice, still.
When correctly setting the input format as genbank and output as embl
I got some record in EMBL record out, but the second 'OS' line with second
organism is missing. Haven't inspected by diff(1) all the differences.
Certainly nice testcases - to convert from EMBL to GenaBank and back to EMBL
and use diff to see what happened.
m.
More information about the Bioperl-l
mailing list