[Bioperl-l] SeqIO-based parser for Vector NTI sequence files
Scott Markel
SMarkel at accelrys.com
Mon Feb 9 18:34:51 UTC 2009
Malcolm,
It looks like Vector NTI puts features into COMMENT lines rather than leveraging
the DDBJ/EMBL/GenBank Feature table syntax. I'd like to treat these features the
same way I treat other features, hence my interest in parsing them.
My only example file is from a customer so the following snippets have been tweaked
a bit. My replacements are in angle brackets: <...>.
COMMENT <date here> <user name here> wrote:
<user comment here>
.COMMENT This file is created by Vector NTI
http://www.informaxinc.com/
COMMENT ORIGDB|GenBank
COMMENT VNTDATE|<integers here>|
COMMENT VNTDBDATE|<integers here>|
COMMENT LSOWNER|
COMMENT VNTNAME|<string here>|
COMMENT VNTAUTHORNAME|<user name here>|
COMMENT VNTREPLTYPE|<string here>
COMMENT VNTEXTCHREPL|Animal/Other Eukaryotic
COMMENT Vector_NTI_Display_Data_(Do_Not_Edit!)
COMMENT (SXF
COMMENT (CGexDoc "<string here>" 0 7616
COMMENT (CDBMol 0 0 1 1 1 0 0 0 0 "" "" 0 0 0 0 (CObList) (CObList) (CObList)
COMMENT (CObList) -1 "")
COMMENT (CDocSetData 1 1 0 1 0 1 "MAIN" 1 1 1 1 1 0 1 1 1 0 10 10 4294967295 50 0
COMMENT 1 0 (CHomObj 1 0 0 3 100) (CWordArray 23) (CWordArray)
COMMENT (CStringList <multiple quoted strings here>)
COMMENT (CStringList <multiple quoted strings here>) (CStringList <multiple quoted strings here>)
COMMENT (CObList
COMMENT #0=(COligo <quoted string here> <quoted string here>
COMMENT "Tm: 52.1C Length: 16mer GC: 56.3%" 0 (CStringList) 0)
COMMENT #1=(COligo <quoted string here> <quoted string here>
COMMENT "Tm: 56.8C Length: 18mer GC: 61.1%" 0 (CStringList) 0)
There are also some hierarchical sections.
COMMENT (CObList) (CObList) (CObList)
COMMENT (CTextView 0
COMMENT #120=(CGroupPar (CParagraph 0 (0 0) 1 2 0 0 180)
COMMENT (CObjectList
COMMENT #121=(CRefLinePar
COMMENT (CLinePar (CParagraph 0 (0 0) 0 2 0 1 233) <quoted string here> 2) 5
COMMENT "" 0 4)
COMMENT #122=(CFolderPar
COMMENT (CGroupPar (CParagraph 1 (0 0) 1 1 0 0 178)
COMMENT (CObjectList
COMMENT #123=(CLinePar (CParagraph 0 (0 0) 1 2 1 0 180)
COMMENT <quoted string here> 1)
COMMENT #124=(CLinePar (CParagraph 0 (0 0) 1 2 1 0 180)
COMMENT <quoted string here> 1)
Scott
> -----Original Message-----
> From: Cook, Malcolm [mailto:MEC at stowers.org]
> Sent: Monday, 09 February 2009 6:51 AM
> To: Scott Markel; 'bioperl-ml'
> Subject: RE: [Bioperl-l] SeqIO-based parser for Vector NTI sequence files
>
> Scott,
>
> What do you expect to extract from the COMMENT lines?
>
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Scott Markel
> Sent: Tuesday, October 21, 2008 3:49 PM
> To: bioperl-ml
> Cc: smarkel at accelrys.com
> Subject: [Bioperl-l] SeqIO-based parser for Vector NTI sequence files
>
> I'm looking for a BioPerl-related solution to parsing Vector NTI sequence
> files. The genbank.pm parser will work, but it doesn't parse the COMMENT
> lines beyond grabbing the simple string value, so it misses all of the
> added information in those lines.
>
> If you know of any existing code, I'd be interesting in hearing about it.
> I checked BioPerl, BioJava, and EMBOSS documentation.
> I also checked the Invitrogen web site.
>
> Scott
>
> --
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect email: smarkel at accelrys.com
> Accelrys (SciTegic R&D) mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100 voice: +1 858 799 5603
> San Diego, CA 92121 fax: +1 858 799 5222
> USA web: http://www.accelrys.com
>
> http://www.linkedin.com/in/smarkel
> Board of Directors: International Society for Computational Biology
> Co-chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology Editorial Board: Briefings in
> Bioinformatics _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list