[Biojava-l] 3 questions and problems

mark.schreiber at novartis.com mark.schreiber at novartis.com
Sun Sep 18 23:36:57 EDT 2005


>Hello,
>
>I would like to ask three questions or to mention problems, respectively.
>
>1. Trying to write a protein-sequence in a GenPept file resulted in the 
>following error message: ClassCastException in GenpeptFileFormer line 
361.
>What does this mean and how can I write my sequences?

The class is trying to cast an object called value to a List without 
checking it's type. Aparently in the case you have value is not an 
instance of a List.

Try changing the code to this and let me know if it fixes the problem. If 
it does I'll commit it to CVS.

            ub.append("ACCESSION   ");
            List l;
            if(value instanceof List){
                l = (List)value;
            }else{
                l = new ArrayList();
                l.add(value);
            }
            for (Iterator ai = l.iterator(); ai.hasNext();)
            {
                ub.append((String) ai.next());
            }
            acb = new StringBuffer(ub.toString());


>2. There is a problem with BioSQL. The attribute alphabet in the table
>biosequence has the type VARCHAR(10). The BioJava alphabet PROTEIN-TERM 
has
>12 characters. I always got an error message, when I tryed to get a 
protein
>sequence with this alphabet from the database. A simple select statement
>showed that the alphabet in the table is abbrevated to PROTEIN-TE, which 
is
>not equal to the BioJava name and causes trouble. I solved this problem 
by
>altering the table declaration to VARCHAR(12). Now it works fine. Is 
there
>another solution for this or should this be the only one?

This is probably the best fix for now. Ideally it would be good for biosql 
to standardise some alphabet names but this might not happen for a while. 
Might be worth suggesting to the biosql list that the size of the alphabet 
name field be increased.

>3. I also experimented with the HMM for pair wise sequence alignments, 
which
>was proposed in the cookbook. Has anybody an idea how one could combine 
this
>HMM with the SubstitutionMatrix from the alignment package? I don't see 
how
>we can produce a senseful distribution including a substitution matrix in
>the match state. This might especially be hard to realize because we 
can't
>exclude that there are some ambigious symbols in the sequences to be
>aligned, which are not in the substitution matrix at all. I am thankfull 
for
>any good ideas.

It is possible in theory to make a Distribution from a similarity matrix 
providing you know how it was made. Typically similarity matrices are log 
odds scores that are mutliplied by a constant and then rounded to an 
integer. The value of the constant is probably irrelevant (it's a 
constant) so you could convert back again as long as you can normalize to 
1.0. This is not perfect as you get some rounding errors but it should be 
close enough.

By the way, it seems your alignment classes have not been checked in. Are 
you going to do this soon?

- Mark


Sincerely
Andreas Dräger

-- 
Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l






More information about the Biojava-l mailing list