[Biojava-l] SequenceMixin Error in BioJava3 Alignment
Andy Yates
ayates at ebi.ac.uk
Mon Dec 6 19:32:55 UTC 2010
I would say partially an oversight on my part & partially done on purpose (a gap is not a nucleotide after all). However I'm all in favour of being pragmatic here so lets add them in. If I get an okay from the relevant parties I'll commit the change in.
Andy
On 6 Dec 2010, at 18:41, Chris Friedline wrote:
> OK, so here's a quick fix now that I know where to look. In my local
> source I added the following line to the constructor of DNACompoundSet
> and recompiled.
>
> addNucleotideCompound("-", "-");
>
> Not sure if this is the correct place for it in terms of what the devs
> want to do globally, but it gets me moving forward again. Gap
> characters are in AminoAcidCompoundSet so I'm wondering if this was
> just a tiny oversight on the nucleotide front.
>
> Thanks again for the help everyone,
> Chris
>
> On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline <cfriedline at vcu.edu> wrote:
>> That does help, thanks. However, when calling getAsList() on the
>> aligned sequences and printing, this is what I see. Something seems
>> wrong. It does appear as though null is being inserted where there
>> should be gaps
>>
>> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C,
>> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T,
>> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C,
>> null, null, null, null, null, null]
>> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T,
>> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G,
>> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null,
>> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T]
>>
>> Chris
>>
>> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>> Hi Andy,
>>>
>>> Check out the SimpleAlignedSequence class, for how Gaps are handled...
>>> Does that help?
>>>
>>> Andreas
>>>
>>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates <ayates at ebi.ac.uk> wrote:
>>>> So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method.
>>>>
>>>> Does anyone on list know how the AlignedSequence code encodes gaps & the alike?
>>>>
>>>> Andy
>>>>
>>>> On 6 Dec 2010, at 13:50, Andy Yates wrote:
>>>>
>>>>> Hi Chris,
>>>>>
>>>>> Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful
>>>>>
>>>>> Andy
>>>>>
>>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Found another potential error case, this time in beta2 (fresh pull
>>>>>> from git last evening). For more info, please see
>>>>>> http://pastie.org/1351388 for test case and stack trace. The JUnit
>>>>>> test passes simply because the pair object is not null, but fails when
>>>>>> trying to extract any information from the pair itself (toString(),
>>>>>> getIdenticals(), etc). The substitution matrix file is from
>>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices. I'm doing large numbers of
>>>>>> pairwise alignments, which do not all fail, but most do with this same
>>>>>> error.
>>>>>>
>>>>>> Thanks,
>>>>>> Chris
>>>>>>
>>>>>> --
>>>>>> PhD Candidate, Integrative Life Sciences
>>>>>> Virginia Commonwealth University
>>>>>> Richmond, VA
>>>>>>
>>>>>> _______________________________________________
>>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>> --
>>>> Andrew Yates Ensembl Genomes Engineer
>>>> EMBL-EBI Tel: +44-(0)1223-492538
>>>> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
>>>> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>>
>>>
>>>
>>> --
>>> -----------------------------------------------------------------------
>>> Dr. Andreas Prlic
>>> Senior Scientist, RCSB PDB Protein Data Bank
>>> University of California, San Diego
>>> (+1) 858.246.0526
>>> -----------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>
>>
>>
>> --
>> PhD Candidate, Integrative Life Sciences
>> Virginia Commonwealth University
>> Richmond, VA
>>
>
>
>
> --
> PhD Candidate, Integrative Life Sciences
> Virginia Commonwealth University
> Richmond, VA
--
Andrew Yates Ensembl Genomes Engineer
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
More information about the Biojava-l
mailing list