[Biojava-l] need help for SimpleSequenceBuilder class

Cox, Greg gcox@netgenics.com
Mon, 23 Jul 2001 08:44:31 -0400


Okay, this has already been fixed for RefSeq files.  The demo TestRefSeqPrt
has the code for parsing these files, and I've attached the relevant
snippet. "**" marks what should change.  Let me know if you run into any
problems with RefSeq; you get to be tester #1.

Greg

SequenceFormat gFormat = new GenbankFormat();
BufferedReader gReader = new BufferedReader(
new	InputStreamReader(new FileInputStream(genbankFile)));
**  SequenceBuilderFactory sbFact =
**  	new ProteinRefSeqProcessor.Factory(SimpleSequenceBuilder.FACTORY);
Alphabet alpha = ProteinTools.getTAlphabet();
SymbolParser rParser = alpha.getParser("token");
SequenceIterator seqI =
		new	StreamReader(gReader, gFormat, rParser,	sbFact);

> -----Original Message-----
> From: Cox, Greg [mailto:gcox@netgenics.com]
> Sent: Monday, July 23, 2001 8:06 AM
> To: 'xling@tularik.com'; td2@sanger.ac.uk
> Cc: biojava-l@biojava.org
> Subject: RE: [Biojava-l] need help for SimpleSequenceBuilder class
> 
> 
> Hi Bruce,
>     I've done a lot with Genbank files, and the problem isn't 
> actually in
> SimpleSequenceBuilder, that's just the symptom.  The feature 
> table renderer
> builds a stranded feature by default, and that's not acceptable for
> proteins.  I'll look into your case, and try to get a fix 
> into CVS later
> today.
>  
> Greg
> 
> -----Original Message-----
> From: Bruce Ling [mailto:xling@tularik.com]
> Sent: Sunday, July 22, 2001 11:17 AM
> To: td2@sanger.ac.uk
> Cc: biojava-l@biojava.org
> Subject: [Biojava-l] need help for SimpleSequenceBuilder class
> 
> 
> Hi, Thomas,
>  
> As I saw the doc says you are the author of 
> SimpleSequenceBuilder class, I
> am asking for help with the following problem?
>  
> I am in the way of using biojava GenbankFormat class, the code is as
> following:
>  
>  {
>    SequenceFormat gFormat = new GenbankFormat();
>    SequenceBuilderFactory sbFact =
>      new GenbankProcessor.Factory(SimpleSequenceBuilder.FACTORY);
>    //Alphabet alpha = DNATools.getDNA();
> //this following line does not work for protein, need more 
> work to figure
> out the library
>                       Alphabet alpha = ProteinTools.getAlphabet();
>    SymbolParser rParser = alpha.getParser("token");
>    seqI =
>      new StreamReader(gReader, gFormat, rParser, sbFact);
>  
>             }
>  
> see the commented out part, if I am using a DNA genbank file 
> as the one
> sample in the demo part it works fine.  But if I want to use 
> the above code
> to use PROTEIN alphabet and parse a protein record in genbank 
> format such
> as: 
> http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=NP_005154
> <http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=NP_00
> 5154&form=6&db
> =p&Dopt=g> &form=6&db=p&Dopt=g
>  
> it gives the exception shown at the end of the email.
>  
> I have traced down and problem is at:
> SimpleSequenceBuilder class TemplateWithChildren.  It seems 
> by default it
> assumes this is a DNA genbank record. that is why it is 
> trying to create a
> strand feature which protein record does not have it.
>  
>    public Sequence makeSequence() {
>  SymbolList symbols = slBuilder.makeSymbolList();
>  Sequence seq = new SimpleSequence(symbols, uri, name, annotation);
>  try {
>      for (Iterator i = rootFeatures.iterator(); i.hasNext(); ) {
>   TemplateWithChildren twc = (TemplateWithChildren) i.next();
>   Feature f = seq.createFeature(twc.template);
>   if (twc.children != null) {
>       makeChildFeatures(f, twc.children);
>   }
>      }
>  } catch (Exception ex) {
>      throw new BioError(ex, "Couldn't create feature");
>  }
>  return seq;
>     }
> 
> ==================================
> java Exceptions
> ==================================
> java.lang.reflect.InvocationTargetException:
> org.biojava.bio.symbol.IllegalAlphabetException: Can not 
> create a stranded
> feature within a sequence of type PROTEIN
>  
>  at
> org.biojava.bio.seq.impl.SimpleStrandedFeature.<init>(SimpleSt
> randedFeature.
> java:76)
>  
>  at java.lang.reflect.Constructor.newInstance(Native Method)
>  
>  at
> org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize
> (SimpleFeature
> Realizer.java:136)
>  
> rethrown as org.biojava.bio.BioException: Couldn't realize feature
>  
>  at
> org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize
> (SimpleFeature
> Realizer.java:138)
>  
>  at
> org.biojava.bio.seq.SimpleFeatureRealizer.realizeFeature(Simpl
> eFeatureRealiz
> er.java:92)
>  
>  at
> org.biojava.bio.seq.impl.SimpleSequence.realizeFeature(SimpleS
> equence.java:1
> 76)
>  
>  at
> org.biojava.bio.seq.impl.SimpleSequence.createFeature(SimpleSe
> quence.java:18
> 2)
>  
>  at
> org.biojava.bio.seq.io.SimpleSequenceBuilder.makeSequence(Simp
> leSequenceBuil
> der.java:154)
>  
> rethrown as org.biojava.bio.BioError: Couldn't create feature
>  
>  at
> org.biojava.bio.seq.io.SimpleSequenceBuilder.makeSequence(Simp
> leSequenceBuil
> der.java:160)
>  
>  at
> org.biojava.bio.seq.io.SequenceBuilderFilter.makeSequence(Sequ
> enceBuilderFil
> ter.java:98)
>  
>  at 
> org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.
> java:100)
> 
>  
>  
>  
>  
>  
> 
> Thanks.
> 
> Bruce Ling, Ph.D.
> Director, Bioinformatics
> Tularik, Inc -- http://www.tularik.com <http://www.tularik.com/> 
> Email: bruce@tularik.com
> Phone: 650-825-7143
> fax: 1-435-804-4009 
> 
>  
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>