[Biopython] Problem with parsing strand in Homo_sapiens.GRCh37.68 genbank files

Susan Wilson smwilson at hpc.unm.edu
Tue Aug 14 14:10:53 UTC 2012


Hi,

I am parsing the gb files with biopython. My problem is that none of the 
seqfeature.strand values are returning the plus strand (value == 1).

The commands below are a bit fabricated. (For instance, I have left out 
the opening and closing of fout.) I have read in 
Homo_sapiens.GRCh37.68.chromosome.1.dat using SeqIO.read. The file 
output of command [13] shows only "-1" and "None". Is there a bug in the 
parser? Or am I making a mistake of some sort?

Thanks.
Susan

In [10]: genome
Out[10]: 
SeqRecord(seq=Seq('NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNN', 
Alphabet()), id='1GRCh37', name='1', description='Homo sapiens 
chromosome 1 GRCh37 full sequence 1..249250621 reannotated via EnsEMBL', 
dbxrefs=[])

In [11]: len(genome)
Out[11]: 249250621

In [12]: len(genome.features)
Out[12]: 109751

In [13]: for f in genome.features:
      ...:     fout.write(str(f.strand) + "~" + str(f.location) + \
      ...: "~" + str(f.qualifiers.get('gene')) + "\n")




More information about the Biopython mailing list