[Bioperl-l] SeqIO-based parser for Vector NTI sequence files

Chris Fields cjfields at illinois.edu
Mon Feb 9 17:49:03 UTC 2009


I think the best short-term thing may be to wrap the genbank.pm parser  
and simply reparse/rework the relevant Bio::Annotation::Comment  
instance containing the COMMENT data.

Long-term, I would like to have an XML-like parser that just takes the  
data and passes it in to a handler (so you could customize what  
happens to data, create objects, load databases, etc).  Along these  
lines I've been (very slowly) reworking GenBank/EMBL/UniProt parsing  
so it generically parses data and passes it on to a relevant handler  
instance (in this case it just generates a Bio::Seq::Richseq as the  
regular parser does).

It still needs a bit more work, though, particularly the internals.   
if you want to test them out the modules are in the last 1.6.0 release  
as Bio::SeqIO::gbdriver/embldriver/swissdriver.

chris

On Feb 9, 2009, at 8:50 AM, Cook, Malcolm wrote:

> Scott,
>
> What do you expect to extract from the COMMENT lines?
>
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org 
> ] On Behalf Of Scott Markel
> Sent: Tuesday, October 21, 2008 3:49 PM
> To: bioperl-ml
> Cc: smarkel at accelrys.com
> Subject: [Bioperl-l] SeqIO-based parser for Vector NTI sequence files
>
> I'm looking for a BioPerl-related solution to parsing Vector NTI  
> sequence files.  The genbank.pm parser will work, but it doesn't  
> parse the COMMENT lines beyond grabbing the simple string value, so  
> it misses all of the added information in those lines.
>
> If you know of any existing code, I'd be interesting in hearing  
> about it.  I checked BioPerl, BioJava, and EMBOSS documentation.
> I also checked the Invitrogen web site.
>
> Scott
>
> --
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
>
> http://www.linkedin.com/in/smarkel
> Board of Directors: International Society for Computational Biology
> Co-chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology Editorial Board:  
> Briefings in Bioinformatics  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list