[Bioperl-l] Tag in feature not written correctly

Michael Muratet mam@torchconcepts.com
Thu, 22 Aug 2002 12:38:21 -0500


Greetings All

I am working on an application with bioperl 1.0.1 that reads in records in Genbank or EMBL format, extracts CDS data, processes it, and writes out a subset of the data with new features to a new file in EMBL format. Subsequent processing of this new file is crashing, because the /product tag in the CDS feature is not getting written correctly. Here's an example:

Starting with....

     CDS             join(97289..99104,99214..99330,99429..99604,
                     100135..100280,100599..100617)
                     /gene="OSJNBa0082M15.18"
                     /codon_start=1
                     /product="putative
                     UDP-N-acetylmuramoylananyl-D-glutamate-2,6
                     -diaminopimelate ligase"
                     /protein_id="AAK43503.1"
                     /db_xref="GI:13876527"
                     /translation="MATAPLAFHLPFPFPSASRPPPRLLPPSRRPPAARLAATRRFRP
                     PTADDEPPEAAEDSSHGLNRYDQLTRHVERARRRQQAEQPEITPDHPLFSSPPSSGEA
                     GSYDPDDEFFDEIDRAIAEKREEFTRRGLIKPSAPAPSQPEEEDGLADELSPEEVIDL
                     DEIRRLQGLSVVSLADEEDEEANGGGGGVDYGDDGVPLDDDGEVFDVADEVGLEGARV
                     RYPAFRMTLAELLDESKLVPVAVTGDQDVALAGVQRDASLVAAGDLYVCVGEEGLAGL
                     TEADKRGAVAVVADQTVDIEGTLACRALVIVDDITAALRMLPACLYRRPSKDMAVIGV
                     AGTDGVTTTAHLVRAMYEAMGVRTGMVGVLGAYAFGNNKLDAQPDASGDPIAVQRLMA
                     TMLYNGAEAALLEATTDGMPSSGVDSEIDYDIAVLTNVRHAGDEAGMTYEEYMNSMAS
                     LFSRMVDPERHRKVVNIDDPSAPFFAAQGGQDVPVVTYSFENKKADVHTLKYQLSLFE
                     TEVLVQTPHGILEISSGLLGRDNIYNILASVAVGVAVGAPLEDIVKGIEEVDAIPGRC
                     ELIDEEQAFGVIVDHARTPESLSRLLDGVKELGPRRIVTVIGCCGERERGKRPVMTKV
                     AAEKSDVVMLTSDNPANEDPLDILDDMLAGVGWTMEEYLKHGTNDYYPPLPNGHRIFL
                     HDIRRVAVRAAVAMGEQGDVVVITGKGNDTYQIEVDKKEFFDDREECREALQYVDQLH
                     RAGIDTSEFPWRLPESH"

The output looks like...

/gene="OSJNBa0082M15.18"
roduct="putativeUDP-N-acetylmuramoylananyl-D-glutamate-2,6
-diaminopimelate ligase"
/proteinId="AAK43503.1"
/ec_number="NA"
/translation="MATAPLAFHLPFPFPSASRPPPRLLPPSRRPPAARLAATRRFRPP
TADDEPPEAAEDSSHGLNRYDQLTRHVERARRRQQAEQPEITPDHPLFSSPPSSGEAGS
YDPDDEFFDEIDRAIAEKREEFTRRGLIKPSAPAPSQPEEEDGLADELSPEEVIDLDEI
RRLQGLSVVSLADEEDEEANGGGGGVDYGDDGVPLDDDGEVFDVADEVGLEGARVRYPA
FRMTLAELLDESKLVPVAVTGDQDVALAGVQRDASLVAAGDLYVCVGEEGLAGLTEADK
RGAVAVVADQTVDIEGTLACRALVIVDDITAALRMLPACLYRRPSKDMAVIGVAGTDGV
TTTAHLVRAMYEAMGVRTGMVGVLGAYAFGNNKLDAQPDASGDPIAVQRLMATMLYNGA
EAALLEATTDGMPSSGVDSEIDYDIAVLTNVRHAGDEAGMTYEEYMNSMASLFSRMVDP
ERHRKVVNIDDPSAPFFAAQGGQDVPVVTYSFENKKADVHTLKYQLSLFETEVLVQTPH
GILEISSGLLGRDNIYNILASVAVGVAVGAPLEDIVKGIEEVDAIPGRCELIDEEQAFG
VIVDHARTPESLSRLLDGVKELGPRRIVTVIGCCGERERGKRPVMTKVAAEKSDVVMLT
SDNPANEDPLDILDDMLAGVGWTMEEYLKHGTNDYYPPLPNGHRIFLHDIRRVAVRAAV
AMGEQGDVVVITGKGNDTYQIEVDKKEFFDDREECREALQYVDQLHRAGIDTSEFPWRL
PESH"

The read program will throw:

------------- EXCEPTION  -------------
MSG: Can't see new qualifier in: roduct="putativeUDP-N-acetylmuramoylananyl-D-glutamate-2,6
from:

Note that the product tag is missing "/p".

I've looked in the bioperl source, but I can't locate the modules that read or write tags. If somebody could point me towards those, I'll chase it. Better yet, if anyone recognizes a silly error I've made, pass it along.

Thanks.

Mike