[Biopython] Problem with parsing strand in Homo_sapiens.GRCh37.68 genbank files
    Peter Cock 
    p.j.a.cock at googlemail.com
       
    Tue Aug 14 15:38:42 UTC 2012
    
    
  
On Tue, Aug 14, 2012 at 3:54 PM, Susan Wilson <smwilson at hpc.unm.edu> wrote:
> Hi Peter,
>
> Thanks for quick response. I have downloaded the files from
> ftp://ftp.ensembl.org/pub/release-68/genbank/homo_sapiens/. Got version 1.53
> of biopython. Maybe I should try 1.6?
Biopython 1.53 was released over two years ago (December 2009). The
current release is 1.60 (one dot sixty), there never was a 1.6 (one dot six).
Yes, please try the current Biopython. It seems fine here at least - using
this quick test I seem to get strands of +1 or -1 only as expected:
from Bio import SeqIO
genome = SeqIO.read("Homo_sapiens.GRCh37.68.chromosome.1.dat", "gb")
for f in genome.features: print f.strand, f.location, f.qualifiers.get("gene")
Going back to Biopython 1.53 on my machine (which didn't allow a filename
in SeqIO thus needs an explicit open), I get a parser warning:
UserWarning: Malformed LOCUS line found - is this correct?
LOCUS       1 249250621 bp DNA HTG 14-JUL-2012
You should have seen this warning on your machine. Did you?
This meant the sequence wasn't considered DNA or RNA (but an
unspecified alphabet), and as a result the strand wasn't set to +1,
but left as None (which would normally only happen on proteins).
At some point the LOCUS line handling was updated, so it now
does recognise this as a nucleotide sequence.
Peter
    
    
More information about the Biopython
mailing list