[Biojava-l] need help for SimpleSequenceBuilder class

Bruce Ling xling@tularik.com
Sun, 22 Jul 2001 08:16:37 -0700


This is a multi-part message in MIME format.

------=_NextPart_000_0000_01C11286.A1091850
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

Hi, Thomas,

As I saw the doc says you are the author of SimpleSequenceBuilder class, I
am asking for help with the following problem?

I am in the way of using biojava GenbankFormat class, the code is as
following:

 {
   SequenceFormat gFormat = new GenbankFormat();
   SequenceBuilderFactory sbFact =
     new GenbankProcessor.Factory(SimpleSequenceBuilder.FACTORY);
   //Alphabet alpha = DNATools.getDNA();
//this following line does not work for protein, need more work to figure
out the library
                      Alphabet alpha = ProteinTools.getAlphabet();
   SymbolParser rParser = alpha.getParser("token");
   seqI =
     new StreamReader(gReader, gFormat, rParser, sbFact);

            }

see the commented out part, if I am using a DNA genbank file as the one
sample in the demo part it works fine.  But if I want to use the above code
to use PROTEIN alphabet and parse a protein record in genbank format such
as:
http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=NP_005154&form=6&db=
p&Dopt=g

it gives the exception shown at the end of the email.

I have traced down and problem is at:
SimpleSequenceBuilder class TemplateWithChildren.  It seems by default it
assumes this is a DNA genbank record. that is why it is trying to create a
strand feature which protein record does not have it.

   public Sequence makeSequence() {
 SymbolList symbols = slBuilder.makeSymbolList();
 Sequence seq = new SimpleSequence(symbols, uri, name, annotation);
 try {
     for (Iterator i = rootFeatures.iterator(); i.hasNext(); ) {
  TemplateWithChildren twc = (TemplateWithChildren) i.next();
  Feature f = seq.createFeature(twc.template);
  if (twc.children != null) {
      makeChildFeatures(f, twc.children);
  }
     }
 } catch (Exception ex) {
     throw new BioError(ex, "Couldn't create feature");
 }
 return seq;
    }

==================================
java Exceptions
==================================
java.lang.reflect.InvocationTargetException:
org.biojava.bio.symbol.IllegalAlphabetException: Can not create a stranded
feature within a sequence of type PROTEIN

 at
org.biojava.bio.seq.impl.SimpleStrandedFeature.<init>(SimpleStrandedFeature.
java:76)

 at java.lang.reflect.Constructor.newInstance(Native Method)

 at
org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(SimpleFeature
Realizer.java:136)

rethrown as org.biojava.bio.BioException: Couldn't realize feature

 at
org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(SimpleFeature
Realizer.java:138)

 at
org.biojava.bio.seq.SimpleFeatureRealizer.realizeFeature(SimpleFeatureRealiz
er.java:92)

 at
org.biojava.bio.seq.impl.SimpleSequence.realizeFeature(SimpleSequence.java:1
76)

 at
org.biojava.bio.seq.impl.SimpleSequence.createFeature(SimpleSequence.java:18
2)

 at
org.biojava.bio.seq.io.SimpleSequenceBuilder.makeSequence(SimpleSequenceBuil
der.java:154)

rethrown as org.biojava.bio.BioError: Couldn't create feature

 at
org.biojava.bio.seq.io.SimpleSequenceBuilder.makeSequence(SimpleSequenceBuil
der.java:160)

 at
org.biojava.bio.seq.io.SequenceBuilderFilter.makeSequence(SequenceBuilderFil
ter.java:98)

 at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:100)






Thanks.

Bruce Ling, Ph.D.
Director, Bioinformatics
Tularik, Inc -- http://www.tularik.com
Email: bruce@tularik.com
Phone: 650-825-7143
fax: 1-435-804-4009



------=_NextPart_000_0000_01C11286.A1091850
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2462.0" name=3DGENERATOR></HEAD>
<BODY>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D504365914-22072001>Hi,=20
Thomas,</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D504365914-22072001></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D504365914-22072001>As I =
saw the doc=20
says you are the author of SimpleSequenceBuilder class, I am =
asking&nbsp;for=20
help</SPAN></FONT><FONT face=3DArial size=3D2><SPAN=20
class=3D504365914-22072001>&nbsp;with the following =
problem?</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D504365914-22072001></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D504365914-22072001>I am =
in the way of=20
using biojava GenbankFormat class, the code is as =
following:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D504365914-22072001></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D504365914-22072001>&nbsp;{<BR>&nbsp;&nbsp;&nbsp;SequenceFormat =
gFormat =3D=20
new GenbankFormat();<BR>&nbsp;&nbsp;&nbsp;SequenceBuilderFactory sbFact=20
=3D<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;new=20
GenbankProcessor.Factory(SimpleSequenceBuilder.FACTORY);<BR>&nbsp;&nbsp;&=
nbsp;//Alphabet=20
alpha =3D DNATools.getDNA();<BR>//this following line does not work for =
protein,=20
need more work to figure out the=20
library<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
Alphabet alpha =3D =
ProteinTools.getAlphabet();<BR>&nbsp;&nbsp;&nbsp;SymbolParser=20
rParser =3D alpha.getParser("token");<BR>&nbsp;&nbsp;&nbsp;seqI=20
=3D<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;new&nbsp;StreamReader(gReader, =
gFormat,=20
rParser,&nbsp;sbFact);</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D504365914-22072001>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;=20
}</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D504365914-22072001></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D504365914-22072001>see =
the commented=20
out part, if I am using a DNA genbank file as the one sample in the demo =
part it=20
works fine.&nbsp; But if I want to use the above code to use PROTEIN =
alphabet=20
and parse a protein record in genbank format such as: =
</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D504365914-22072001><A=20
href=3D"http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=3DNP_0051=
54&amp;form=3D6&amp;db=3Dp&amp;Dopt=3Dg">http://www.ncbi.nlm.nih.gov/htbi=
n-post/Entrez/query?uid=3DNP_005154&amp;form=3D6&amp;db=3Dp&amp;Dopt=3Dg<=
/A></SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D504365914-22072001></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D504365914-22072001>it =
gives the=20
exception shown at the end of the email.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D504365914-22072001></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D504365914-22072001>I have =
traced down=20
and problem is at:</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D504365914-22072001>SimpleSequenceBuilder class =
TemplateWithChildren.&nbsp;=20
It seems by default it assumes this is a DNA genbank record. that is why =
it is=20
trying to create a strand feature which protein record does not have=20
it.</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D504365914-22072001></SPAN></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp; public Sequence =
makeSequence()=20
{<BR>&nbsp;SymbolList symbols =3D =
slBuilder.makeSymbolList();<BR>&nbsp;Sequence=20
seq =3D new SimpleSequence(symbols, uri, name, annotation);<BR>&nbsp;try =

{<BR>&nbsp;&nbsp;&nbsp;&nbsp; for (Iterator i =3D =
rootFeatures.iterator();=20
i.hasNext(); ) {<BR>&nbsp;&nbsp;TemplateWithChildren twc =3D=20
(TemplateWithChildren) i.next();<BR>&nbsp;&nbsp;Feature f =3D=20
seq.createFeature(twc.template);<BR>&nbsp;&nbsp;if (twc.children !=3D =
null)=20
{<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; makeChildFeatures(f,=20
twc.children);<BR>&nbsp;&nbsp;}<BR>&nbsp;&nbsp;&nbsp;&nbsp; }<BR>&nbsp;} =
catch=20
(Exception ex) {<BR>&nbsp;&nbsp;&nbsp;&nbsp; throw new BioError(ex, =
"Couldn't=20
create feature");<BR>&nbsp;}<BR>&nbsp;return seq;<BR>&nbsp;&nbsp;&nbsp;=20
}<BR></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D504365914-22072001>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</SPAN></FONT></DIV=
>
<DIV><FONT face=3DArial size=3D2><SPAN class=3D504365914-22072001>java=20
Exceptions</SPAN></FONT></DIV>
<DIV><FONT face=3DArial size=3D2><SPAN=20
class=3D504365914-22072001>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</SPAN></FONT></DIV=
>
<DIV><FONT face=3DArial =
size=3D2>java.lang.reflect.InvocationTargetException:=20
org.biojava.bio.symbol.IllegalAlphabetException: Can not create a =
stranded=20
feature within a sequence of type PROTEIN</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;at=20
org.biojava.bio.seq.impl.SimpleStrandedFeature.&lt;init&gt;(SimpleStrande=
dFeature.java:76)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;at=20
java.lang.reflect.Constructor.newInstance(Native Method)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;at=20
org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(SimpleFeat=
ureRealizer.java:136)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>rethrown as =
org.biojava.bio.BioException: Couldn't=20
realize feature</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;at=20
org.biojava.bio.seq.SimpleFeatureRealizer$TemplateImpl.realize(SimpleFeat=
ureRealizer.java:138)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;at=20
org.biojava.bio.seq.SimpleFeatureRealizer.realizeFeature(SimpleFeatureRea=
lizer.java:92)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;at=20
org.biojava.bio.seq.impl.SimpleSequence.realizeFeature(SimpleSequence.jav=
a:176)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;at=20
org.biojava.bio.seq.impl.SimpleSequence.createFeature(SimpleSequence.java=
:182)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;at=20
org.biojava.bio.seq.io.SimpleSequenceBuilder.makeSequence(SimpleSequenceB=
uilder.java:154)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>rethrown as org.biojava.bio.BioError: =
Couldn't=20
create feature</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;at=20
org.biojava.bio.seq.io.SimpleSequenceBuilder.makeSequence(SimpleSequenceB=
uilder.java:160)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;at=20
org.biojava.bio.seq.io.SequenceBuilderFilter.makeSequence(SequenceBuilder=
Filter.java:98)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;at=20
org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:100)<B=
R></FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;</DIV></FONT>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<P><FONT size=3D2>Thanks.<BR><BR>Bruce Ling, Ph.D.<BR>Director,=20
Bioinformatics<BR>Tularik, Inc -- <A href=3D"http://www.tularik.com/"=20
target=3D_blank>http://www.tularik.com</A><BR>Email: =
bruce@tularik.com<BR>Phone:=20
650-825-7143<BR>fax: 1-435-804-4009</FONT> </P>
<DIV>&nbsp;</DIV></BODY></HTML>

------=_NextPart_000_0000_01C11286.A1091850--