[Bioperl-l] Memory requirements for conversion from embl to genbank

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Thu Aug 31 20:45:56 UTC 2006


Sorry, was too quick in sending it away, forgot to correct one part.

Martin MOKREJŠ wrote:

[...]

> But the original record in both GenBank and EMBL does make sense, right?
> 
> 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=21727885
> 
> LOCUS       GVI428955               8027 bp    mRNA    linear   VRL 15-APR-2005
> DEFINITION  Hepatitis GB virus B subgenomic replicon neoRepB.
> ACCESSION   AJ428955
> VERSION     AJ428955.1  GI:21727885
> KEYWORDS    core-neo fusion protein; core-neo gene; polyprotein.
> SOURCE      Hepatitis GB virus B
>   ORGANISM  Hepatitis GB virus B
>             Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae.
> REFERENCE   1
>   AUTHORS   De Tomassi,A., Pizzuti,M., Graziani,R., Sbardellati,A.,
>             Altamura,S., Paonessa,G. and Traboni,C.
>   TITLE     Cell clones selected from the Huh7 human hepatoma cell line support
>             efficient replication of a subgenomic GB virus B replicon
>   JOURNAL   J. Virol. 76 (15), 7736-7746 (2002)
>    PUBMED   12097587
> REFERENCE   2  (bases 1 to 8027)
>   AUTHORS   Traboni,C.
>   TITLE     Direct Submission
>   JOURNAL   Submitted (22-JAN-2002) Traboni C., Biochemistry, IRBM P.Angeletti,
>             via Pontina, km.30, 600. 00040 Pomezia (Roma), ITALY
> COMMENT     related sequence AJ277947.
> FEATURES             Location/Qualifiers
>      source          join(1..1281,1893..8027)
>                      /organism="Hepatitis GB virus B"
>                      /mol_type="mRNA"
>                      /isolate="FL3"
>                      /db_xref="taxon:39113"
>                      /focus
>      source          1282..1892
>                      /organism="Encephalomyocarditis virus"
>                      /mol_type="mRNA"
>                      /db_xref="taxon:12104"
>      5'UTR           1..445
>                      /experiment="experimental evidence, no additional details
>                      recorded"
>      CDS             446..1273
>                      /function="core-neo fusion protein"
>                      /codon_start=1
>                      /product="neomycin phosphotransferase"
>                      /protein_id="CAD21956.1"
>                      /db_xref="GI:21727886"
>                      /db_xref="GOA:Q8JKE5"
>                      /db_xref="InterPro:IPR002575"
>                      /db_xref="UniProtKB/TrEMBL:Q8JKE5"
>                      /translation="MPVISTQTGRAMIEQDGLHAGSPAAWVERLFGYDWAQQTIGCSD
>                      AAVFRLSAQGRPVLFVKTDLSGALNELQDEAARLSWLATTGVPCAAVLDVVTEAGRDW
>                      LLLGEVPGQDLLSSHLAPAEKVSIMADAMRRLHTLDPATCPFDHQAKHRIERARTRME
>                      AGLVDQDDLDEEHQGLAPAELFARLKARMPDGEDLVVTHGDACLPNIMVENGRFSGFI
>                      DCGRLGVADRYQDIALATRDIAEELGGEWADRFLVLYGIAAPDSQRIAFYRLLDEFF"
>      misc_feature    1282..1892
>                      /note="internal ribosome entry site (IRES)"
> [...]
> 
> 
> The above official GenBank record cannot be parsed and the parsing code
> silently leaks through and exits with no data written out. I have filed
> bug #2087.

No, that was my fault, I forgot to say the input is in genbank format. Instead,
bioperl expected embl code and silently run through. Not nice, still.
When correctly setting the input format as genbank and output as embl
I got some record in EMBL record out, but the second 'OS' line with second
organism is missing. Haven't inspected by diff(1) all the differences.
Certainly nice testcases - to convert from EMBL to GenaBank and back to EMBL
and use diff to see what happened.

m.



More information about the Bioperl-l mailing list