[Biojava-l] Reading pfam/Stockholm format

Michael Muratet michael.muratet@invitrogen.com
Wed, 12 Sep 2001 09:25:40 -0500


Greetings All:

I've been trying to read pfam files to get the protein accessions for
each family. This almost works using Embl format readers, but the
(Stockholm) sequence format in the fifth or sixth family is causing an
exception:

     [java] Exception in thread "main" rethrown as
org.biojava.bio.BioError: FIXME
     [java] 	at
org.biojava.bio.seq.io.EmblProcessor.addSequenceProperty(EmblProcessor.java:111)
     [java] 	at
org.biojava.bio.seq.io.EmblLikeFormat.readSequence(EmblLikeFormat.java:128)
     [java] org.biojava.bio.BioException: bad locator:
GAKRSLRAELKQRLRAISAE        
GAKRSLRAELKQRLRAISAE.ERLRCQRLLTQKVIAHRQYQKSQ-.--.-R-ISIFLSMPDEIET-EEIIKDIFQQGKV-CFIPRYRLQSNHMDMVKLASADEISSLPKT.......SWNIHQPSESDTREEALAT-.........GGLDLIFMPGLGFDRN-GNRLGRGRGYYDTYLQRCL-Q.QQGAKPYTIALAFREQICPQ-.VPVDD.-.TDVSVDEVLYV
     [java] 	at
org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:99)
     [java] 	at
resgen.biotools.util.EmblFileReader.next(EmblFileReader.java:72)
     [java] 	at
org.biojava.bio.seq.io.FeatureTableParser.parseLocation(FeatureTableParser.java:209)
     [java] 	at
org.biojava.bio.seq.io.FeatureTableParser.featureData(FeatureTableParser.java:91)
     [java] 	at
resgen.biotools.util.EmblFileReader.main(EmblFileReader.java:121)
     [java] 	at
org.biojava.bio.seq.io.EmblProcessor.addSequenceProperty(EmblProcessor.java:99)
     [java] Java Result: 1

I can't see anything in the API that says it supports the Stockholm
format per se, although there appear to be some hmmr related classes
that might work. Does anybody have any suggestions? Is there some other
tool set I should investigate (the WashU tools, or bioperl, etc.)?

Thanks.

Mike

-- 
Michael A. Muratet
Senior Software Engineer
Bioinformatics
ResGen, Invitrogen Corp.
(800) 533-4363 x74431
(256) 539-4086 FAX