[Biopython-dev] Genbank structured comments

Peter Cock p.j.a.cock at googlemail.com
Wed Sep 9 14:27:27 UTC 2015


This sounds good - would you turn these into a Python dict?

Peter

On Wed, Sep 9, 2015 at 2:56 PM, Brian Osborne <bosborne11 at verizon.net> wrote:
> All,
>
> I noticed that BioPython, like the versions of BioPerl in CPAN, does not
> handle GenBank structured comments
> (http://www.ncbi.nlm.nih.gov/genbank/structuredcomment) in the ideal way.
> Here’s an example structured comment:
>
> COMMENT     ##FluData-START##
>            EPI_ISOLATE_ID        :: EPI_ISL_77637
>            NAME                  :: A/California/07/2009
>            TYPE                  :: H1N1
>            Segment_name          :: M'
>            HOST_AGE              :: 54
>            HOST_GENDER           :: F'
>            PASSAGE               :: M1/C1 (2009-04-24)
>            LOCATION              :: United States / California'
>            COLLECT_DATE          :: 09-Apr-2009
>            Lineage               :: A(H1N1)pdm09
>            RESIST_TO_ADAMANTANES :: Resistant'
>            RESIST_TO_OSELTAMIVIR :: Sensitive'
>            RESIST_TO_ZANAMVIR    :: Sensitive'
>            SPECIMEN_ID           :: H13596
>            SENDER_LAB            :: Naval Health Research Center'
>            SEQLAB_SAMPLE_ID      :: 2009712111
>            EPI_SEQUENCE_ID       :: EPI273604
>            ##FluData-END##
>
> Or here: http://www.ncbi.nlm.nih.gov/nuccore/291609868
>
> A table, with tag/value pairs. A fair number of bacterial genomes in GenBank
> use the structured comment to hold MIGS/MIMS data. The comment() method
> should return something like this, which is easily parsed:
>
> ##FluData-START##
> EPI_ISOLATE_ID        :: EPI_ISL_77637
> NAME                  :: A/California/07/2009
> TYPE                  :: H1N1
> Segment_name          :: M'
> HOST_AGE              :: 54
> HOST_GENDER           :: F'
> PASSAGE               :: M1/C1 (2009-04-24)
> LOCATION              :: United States / California'
> COLLECT_DATE          :: 09-Apr-2009
> Lineage               :: A(H1N1)pdm09
> RESIST_TO_ADAMANTANES :: Resistant'
> RESIST_TO_OSELTAMIVIR :: Sensitive'
> RESIST_TO_ZANAMVIR    :: Sensitive'
> SPECIMEN_ID           :: H13596
> SENDER_LAB            :: Naval Health Research Center'
> SEQLAB_SAMPLE_ID      :: 2009712111
> EPI_SEQUENCE_ID       :: EPI273604
> ##FluData-END##
>
> Rather than this, which is what it currently returns:
>
> ##FluData-START## EPI_ISOLATE_ID        :: EPI_ISL_77637 NAME
> :: A/California/07/2009 TYPE                  :: H1N1 Segment_name
> :: M' HOST_AGE              :: 54 HOST_GENDER           :: F' PASSAGE
> :: M1/C1 (2009-04-24) LOCATION              :: United States / California'
> COLLECT_DATE          :: 09-Apr-2009 Lineage               :: A(H1N1)pdm09
> RESIST_TO_ADAMANTANES :: Resistant' RESIST_TO_OSELTAMIVIR :: Sensitive'
> RESIST_TO_ZANAMVIR    :: Sensitive' SPECIMEN_ID           :: H13596
> SENDER_LAB            :: Naval Health Research Center' SEQLAB_SAMPLE_ID
> :: 2009712111 EPI_SEQUENCE_ID       :: EPI273604 ##FluData-END##
>
> Are there any objections to me putting in a pull request with this change? I
> made this same fix in BioPerl. Of course, if the comment is a “normal” one,
> it will be treated the same as it is treated now. Another words, the vast
> majority of comments stay the same.
>
> I’ll also add tests.
>
> Thanks again,
>
> Brian O.
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython-dev



More information about the Biopython-dev mailing list