[Biojava-l] 3 questions and problems

mark.schreiber at novartis.com mark.schreiber at novartis.com
Tue Sep 20 05:08:26 EDT 2005


Protein-Term is an Alphabet that includes the * symbol (for termination 
codon). It is useful when doing a six frame translation and you want to 
include the possiblity of having a *. Protein-Term completely contains 
Protein and due to some wizardry I'm not even going to try and explain it 
thinks all protein symbols are also members of Protein-Term.

The best solution might be to change biojava such that the name Protein is 
stored in the database and that biojava will know that the best option to 
read this back in is Protein-Term.

My other suggestion would be that it is highly unlikely anyone would want 
to store a sequence with * in it so all biojava protein sequences could be 
stored and read as protein. Having said that I'm sure someone somewhere 
will try it : )

- Mark





"Richard HOLLAND" <hollandr at gis.a-star.edu.sg>
Sent by: biojava-l-bounces at portal.open-bio.org
09/20/2005 01:59 PM

 
        To:     "Andreas Dräger" <duze at gmx.de>, <biojava-l at biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        RE: [Biojava-l] 3 questions and problems


Here's my 2P:

1. Don't know what's causing it, but does not occur when using the new 
BioJavaX Genbank file former - still undergoing testing+documentation at 
present but if you're feeling like risking the cutting edge it's in CVS 
under biojava-live - org.biojavax.bio.seq.io.SeqIOTools behaves almost 
identically to the one you mention below. It reads/writes instances of 
org.biojavax.bio.seq.RichSequence - if you pass it a plain old Sequence 
it'll do its best but you'll probably lose detail. At the moment, GenPept 
format = GenBank format, unless anyone can tell me the exact difference 
beyond the symbol frequency line.

2. Protein-Term is a weird BioJava specific thing - I asked Hilmar about 
this before and he says there is no concept of it in BioPerl, and he would 
not alter BioSQL to allow for it. I'm not even sure what it's for myself. 
Is using just Protein a viable alternative?

3. Dunno, that's a question that sounds like something Mark might be able 
to answer.

cheers,
Richard


Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the 
intended recipient, please delete it and notify us immediately. Please do 
not copy or use it for any purpose, or disclose its content to any other 
person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biojava-l-bounces at portal.open-bio.org 
> [mailto:biojava-l-bounces at portal.open-bio.org] On Behalf Of 
> "Andreas Dräger"
> Sent: Tuesday, September 13, 2005 7:10 PM
> To: biojava-l at biojava.org
> Subject: [Biojava-l] 3 questions and problems
> 
> 
> Hello,
> 
> I would like to ask three questions or to mention problems, 
> respectively.
> 
> 1. Trying to write a protein-sequence in a GenPept file 
> resulted in the 
> following error message: ClassCastException in 
> GenpeptFileFormer line 361.
> What does this mean and how can I write my sequences?
> 
> 2. There is a problem with BioSQL. The attribute alphabet in the table
> biosequence has the type VARCHAR(10). The BioJava alphabet 
> PROTEIN-TERM has
> 12 characters. I always got an error message, when I tryed to 
> get a protein
> sequence with this alphabet from the database. A simple 
> select statement
> showed that the alphabet in the table is abbrevated to 
> PROTEIN-TE, which is
> not equal to the BioJava name and causes trouble. I solved 
> this problem by
> altering the table declaration to VARCHAR(12). Now it works 
> fine. Is there
> another solution for this or should this be the only one?
> 
> 3. I also experimented with the HMM for pair wise sequence 
> alignments, which
> was proposed in the cookbook. Has anybody an idea how one 
> could combine this
> HMM with the SubstitutionMatrix from the alignment package? I 
> don't see how
> we can produce a senseful distribution including a 
> substitution matrix in
> the match state. This might especially be hard to realize 
> because we can't
> exclude that there are some ambigious symbols in the sequences to be
> aligned, which are not in the substitution matrix at all. I 
> am thankfull for
> any good ideas.
> 
> 
> Sincerely
> Andreas Dräger
> 
> -- 
> Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
> Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 

_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l






More information about the Biojava-l mailing list