[Bioperl-l] genbank parsing of multiple 'function' tags within primary tag

galeb abu-ali abualiga2 at gmail.com
Thu Sep 8 15:39:20 UTC 2011


I guess I was not clear. 'locus_tag' qualifiers are single, but there are
mutliple 'function' qualifiers within a primary feature (e.g. 'CDS').

# gbk file
LOCUS       NC_011748            5154862 bp    DNA     circular BCT
15-MAY-2010

# example feature
     gene        complement(1336169..1337905)
                     /gene="cvrA"
                     /locus_tag="EC55989_1287"
                     /db_xref="GeneID:7145846"
     CDS             complement(1336169..1337905)
                     /gene="cvrA"
                     /locus_tag="EC55989_1287"
                     /function="7 : Transport and binding proteins"
                     /function="15.10 : Adaptations to atypical conditions"
                     /function="16.1 : Circulate"
                     /inference="ab initio prediction:AMIGene:2.0"
                     /note="the Vibrio parahaemolyticus gene VP2867 was
found
                     to be a potassium/proton antiporter; can rapidly
extrude
                     potassium against a potassium gradient at alkaline pH
when
                     cloned and expressed in Escherichia coli"
                     /codon_start=1
                     /transl_table=11
                     /product="potassium/proton antiporter"
                     /protein_id="YP_002402372.1"
                     /db_xref="GI:218694705"
                     /db_xref="GeneID:7145846"

/translation="MDATTIISLFILGSILVTSSILLSSFSSRLGIPILVIFLAIGML

AGVDGVGGIPFDNYPFAYMVSNLALAIILLDGGMRTQASSFRVALGPALSLATLGVLI

TSGLTGMMAAWLFNLDLIEGLLIGAIVGSTDAAAVFSLLGGKGLNERVGSTLEIESGS

NDPMAVFLTITLIAMIQQHESSVSWMFVVDILQQFGLGIVIGLGGGYLLLQMINRIAL

PAGLYPLLALSGGILIFALTTALEGSGILAVYLCGFLLGNRPIRNRYGILQNFDGLAW

LAQIAMFLVLGLLVNPSDLLPIAIPALILSAWMIFFARPLSVFAGLLPFRGFNLRERV

FISWVGLRGAVPIILAVFPMMAGLENARLFFNVAFFVVLVSLLLQGTSLSWAAKKAKV

VVPPVGRPVSRVGLDIHPENPWEQFVYQLSADKWCVGAALRDLHMPKETRIAALFRDN

QLLHPTGSTRLREGDVLCVIGRERDLPALGKLFSQSPPVALDQRFFGDFILEASAKYA

DVALIYGLEDGREYRDKQQTLGEIVQQLLGAAPVVGDQVEFAGMIWTVAEKEDNEVLK
                     IGVRVAEEEAES"

On Thu, Sep 8, 2011 at 11:32 AM, Fields, Christopher J <
cjfields at illinois.edu> wrote:

> On Sep 8, 2011, at 10:27 AM, Peter Cock wrote:
>
> > On Thu, Sep 8, 2011 at 3:44 PM, galeb abu-ali <abualiga2 at gmail.com>
> wrote:
> >> Hi,
> >>
> >> I'm parsing a genbank file with Bio::SeqIO and am stuck on instances of
> >> multiple tags within a primary tag.  E.g., when there are several
> 'function'
> >> tag-values within a 'CDS' primary tag, I don't know how to link those
> >> 'function' tag-values to a particular 'locus_tag'.
> >
> > Do you have GenBank features with multiple locus_tag qualifiers?
> > That would be very unusual...
> >
> > Peter
>
> Agreed; in order to clarify what you mean, I think we would need to see the
> record in question to get a better idea of the problem.
>
> chris



More information about the Bioperl-l mailing list