[Biojava-l] SequenceMixin Error in BioJava3 Alignment

Chris Friedline cfriedline at vcu.edu
Mon Dec 6 18:41:17 UTC 2010


OK, so here's a quick fix now that I know where to look.  In my local
source I added the following line to the constructor of DNACompoundSet
and recompiled.

addNucleotideCompound("-", "-");

Not sure if this is the correct place for it in terms of what the devs
want to do globally, but it gets me moving forward again.  Gap
characters are in AminoAcidCompoundSet so I'm wondering if this was
just a tiny oversight on the nucleotide front.

Thanks again for the help everyone,
Chris

On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline <cfriedline at vcu.edu> wrote:
> That does help, thanks.  However, when calling getAsList() on the
> aligned sequences and printing, this is what I see.  Something seems
> wrong.  It does appear as though null is being inserted where there
> should be gaps
>
> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C,
> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T,
> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C,
> null, null, null, null, null, null]
> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T,
> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G,
> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null,
> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T]
>
> Chris
>
> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>> Hi Andy,
>>
>> Check out the SimpleAlignedSequence class, for how Gaps are handled...
>> Does that help?
>>
>> Andreas
>>
>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates <ayates at ebi.ac.uk> wrote:
>>> So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method.
>>>
>>> Does anyone on list know how the AlignedSequence code encodes gaps & the alike?
>>>
>>> Andy
>>>
>>> On 6 Dec 2010, at 13:50, Andy Yates wrote:
>>>
>>>> Hi Chris,
>>>>
>>>> Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful
>>>>
>>>> Andy
>>>>
>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Found another potential error case, this time in beta2 (fresh pull
>>>>> from git last evening).  For more info, please see
>>>>> http://pastie.org/1351388 for test case and stack trace.  The JUnit
>>>>> test passes simply because the pair object is not null, but fails when
>>>>> trying to extract any information from the pair itself (toString(),
>>>>> getIdenticals(), etc). The substitution matrix file is from
>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices.  I'm doing large numbers of
>>>>> pairwise alignments, which do not all fail, but most do with this same
>>>>> error.
>>>>>
>>>>> Thanks,
>>>>> Chris
>>>>>
>>>>> --
>>>>> PhD Candidate, Integrative Life Sciences
>>>>> Virginia Commonwealth University
>>>>> Richmond, VA
>>>>>
>>>>> _______________________________________________
>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>> --
>>> Andrew Yates                   Ensembl Genomes Engineer
>>> EMBL-EBI                       Tel: +44-(0)1223-492538
>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>
>>
>>
>> --
>> -----------------------------------------------------------------------
>> Dr. Andreas Prlic
>> Senior Scientist, RCSB PDB Protein Data Bank
>> University of California, San Diego
>> (+1) 858.246.0526
>> -----------------------------------------------------------------------
>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
>
>
> --
> PhD Candidate, Integrative Life Sciences
> Virginia Commonwealth University
> Richmond, VA
>



-- 
PhD Candidate, Integrative Life Sciences
Virginia Commonwealth University
Richmond, VA




More information about the Biojava-l mailing list