[Biopython] [BioPython] Genbank parser

Timothy Wu 2huggie at gmail.com
Wed Mar 16 08:26:44 UTC 2011


Hi,

I'm using Biopython to parse human genome files with code like this:

        for seq_record in SeqIO.parse(fd, "genbank"):
            * do something with seq_record*

However something tripped on me:

Traceback (most recent call last):
  File "./buildSyn.py", line 26, in <module>
    main()
  File "./buildSyn.py", line 19, in main
    gene2SynMapping, syn2GeneMapping = mapper.getMappingDicts(files)
  File
"/home/thw/MyPythonPackage/frameworks/BioProg/idmapping/idmapper/human_genome_id_mapper.py",
line 29, in getMappingDicts
    self.parseAndGetMapping(fd, gene2syn)
  File
"/home/thw/MyPythonPackage/frameworks/BioProg/idmapping/idmapper/human_genome_id_mapper.py",
line 74, in parseAndGetMapping
    for seq_record in SeqIO.parse(fd, "genbank"):
  File "/usr/lib/pymodules/python2.6/Bio/SeqIO/__init__.py", line 525, in
parse
    for r in i:
  File "/usr/lib/pymodules/python2.6/Bio/GenBank/Scanner.py", line 437, in
parse_records
    record = self.parse(handle, do_features)
  File "/usr/lib/pymodules/python2.6/Bio/GenBank/Scanner.py", line 420, in
parse
    if self.feed(handle, consumer, do_features):
  File "/usr/lib/pymodules/python2.6/Bio/GenBank/Scanner.py", line 392, in
feed
    self._feed_feature_table(consumer, self.parse_features(skip=False))
  File "/usr/lib/pymodules/python2.6/Bio/GenBank/Scanner.py", line 344, in
_feed_feature_table
    consumer.location(location_string)
  File "/usr/lib/pymodules/python2.6/Bio/GenBank/__init__.py", line 975, in
location
    raise LocationParserError(location_line)
Bio.GenBank.LocationParserError: 958574^958575..958886

The Genbank file involved has the following structure:

    CDS             958574^958575..958772
                     /gene="CSH2"
                     /gene_synonym="CS-2; CSB; hCS-B"
                     /exception="unclassified translation discrepancy"
                     /note="placental lactogen; chorionic somatomammotropin
B;
                     Derived by automated computational analysis using gene
                     prediction method: Curated Genomic."
                     /codon_start=1
                     /product="chorionic somatomammotropin hormone 2 isoform
3"
                     /protein_id="NP_072171.1"
                     /db_xref="GI:12408694"
                     /db_xref="CCDS:CCDS42368.1"
                     /db_xref="GeneID:1443"
                     /db_xref="HGNC:2441"
                     /db_xref="MIM:118820"

This isn't the first occurrence in this file, however I manually deleted
what's equivalent of "^958575"
in the location and it works out OK.

Is there something I can do? Right now I edit the genbank file instead
(since I won't be needing the location information)
And I'm not sure what the caret is suppose to represent.

Thanks for your attention.

Timothy



More information about the Biopython mailing list