[Biojava-l] 3 questions and problems

Richard HOLLAND hollandr at gis.a-star.edu.sg
Tue Sep 20 01:59:23 EDT 2005


Here's my 2P:

1. Don't know what's causing it, but does not occur when using the new BioJavaX Genbank file former - still undergoing testing+documentation at present but if you're feeling like risking the cutting edge it's in CVS under biojava-live - org.biojavax.bio.seq.io.SeqIOTools behaves almost identically to the one you mention below. It reads/writes instances of org.biojavax.bio.seq.RichSequence - if you pass it a plain old Sequence it'll do its best but you'll probably lose detail. At the moment, GenPept format = GenBank format, unless anyone can tell me the exact difference beyond the symbol frequency line.

2. Protein-Term is a weird BioJava specific thing - I asked Hilmar about this before and he says there is no concept of it in BioPerl, and he would not alter BioSQL to allow for it. I'm not even sure what it's for myself. Is using just Protein a viable alternative?

3. Dunno, that's a question that sounds like something Mark might be able to answer.

cheers,
Richard


Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biojava-l-bounces at portal.open-bio.org 
> [mailto:biojava-l-bounces at portal.open-bio.org] On Behalf Of 
> "Andreas Dräger"
> Sent: Tuesday, September 13, 2005 7:10 PM
> To: biojava-l at biojava.org
> Subject: [Biojava-l] 3 questions and problems
> 
> 
> Hello,
> 
> I would like to ask three questions or to mention problems, 
> respectively.
> 
> 1. Trying to write a protein-sequence in a GenPept file 
> resulted in the  
> following error message: ClassCastException in 
> GenpeptFileFormer line 361.
> What does this mean and how can I write my sequences?
> 
> 2. There is a problem with BioSQL. The attribute alphabet in the table
> biosequence has the type VARCHAR(10). The BioJava alphabet 
> PROTEIN-TERM has
> 12 characters. I always got an error message, when I tryed to 
> get a protein
> sequence with this alphabet from the database. A simple 
> select statement
> showed that the alphabet in the table is abbrevated to 
> PROTEIN-TE, which is
> not equal to the BioJava name and causes trouble. I solved 
> this problem by
> altering the table declaration to VARCHAR(12). Now it works 
> fine. Is there
> another solution for this or should this be the only one?
> 
> 3. I also experimented with the HMM for pair wise sequence 
> alignments, which
> was proposed in the cookbook. Has anybody an idea how one 
> could combine this
> HMM with the SubstitutionMatrix from the alignment package? I 
> don't see how
> we can produce a senseful distribution including a 
> substitution matrix in
> the match state. This might especially be hard to realize 
> because we can't
> exclude that there are some ambigious symbols in the sequences to be
> aligned, which are not in the substitution matrix at all. I 
> am thankfull for
> any good ideas.
> 
> 
> Sincerely
> Andreas Dräger
> 
> -- 
> Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
> Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 



More information about the Biojava-l mailing list