[Biopython-dev] [Bug 2907] New: When a genomic record has been loaded using eFetch, if it is written to genbank format the header line refers to 'aa'

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Aug 25 01:34:02 UTC 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2907

           Summary: When a genomic record has been loaded using eFetch, if
                    it is written to genbank format the header line refers
                    to 'aa'
           Product: Biopython
           Version: 1.51b
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: david.wyllie at ndm.ox.ac.uk


When a genomic record has been loaded using eFetch, if it is written to genbank
format the header line refers to 'aa' not 'bp' although the .seq.alphabet is
set (correctly, I think) to generic_dna.

The background here is that we're annotating some viral genomes computationally
(however, the annotation isn't necessary for the problem here, see below) and
then writing the output to .gb format.  After this we load the file using
LaserGene (a commercial sequence editing program) to have a look at it etc. 
This doesn't work terribly well because of the 'aa' designation in the header
line.  Apart from this, the export seems ok.

I'm using a git download from mid-June 09.
here is an example which illustrates this:

# load dependencies
from Bio import Entrez
from Bio import SeqIO
from Bio import SeqRecord
from Bio.Alphabet import generic_protein, generic_dna

# get a sequence from Genbank
print "going to recover a sequence from genbank...."
ifh = Entrez.efetch(db="nucleotide",id="DQ923122",rettype="gb")

# parse the file handle
recordlist=[]
print "OK, got the records from genbank, parsing ..."
for record in SeqIO.parse(ifh, "genbank"):    
        recordlist.append(record)
ifh.close()

# write it to a file
for thisrecord in recordlist:
        # confirm it's dna
        assert (type(thisrecord.seq.alphabet)==type(generic_dna)), "We are
supposed to be dealing with a DNA sequence, but we aren't, can't continue."

        # write to gb
        ofn=thisrecord.id+".gb"
        print "Writing thisrecord to ",ofn
        ofh=open(ofn,"w")
        SeqIO.write([thisrecord], ofh, "gb")
        ofh.close

exit()

# top lines of the genbank file reads as follows
#
#LOCUS       DQ923122               34250 aa    DNA              VRL
01-JAN-1980
#DEFINITION  Human adenovirus 52 isolate T03-2244, complete genome.
#ACCESSION   DQ923122
#VERSION     DQ923122.2  GI:124375632
#KEYWORDS    
#SOURCE      Human adenovirus 52
#  ORGANISM  Human adenovirus 52
#            Viruses; dsDNA viruses, no RNA stage; Adenoviridae;
Mastadenovirus;
#            unclassified Human adenoviruses
#FEATURES             Location/Qualifiers
#     source          1..34250
#                     /country="USA"
#                     /isolate="T03-2244"
#                     /mol_type="genomic DNA"
#                     /organism="Human adenovirus 52"
#                     /db_xref="taxon:332179

Thank you for any advice you have to offer.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list