[Biojava-l] SequenceMixin Error in BioJava3 Alignment

Andy Yates ayates at ebi.ac.uk
Wed Dec 8 09:06:32 UTC 2010


I've added the gap symbol to DNA & RNA compound sets. Hopefully this error will go away. If not then we'll have to look into the alignment code & get it to use the gap symbol

Andy

On 6 Dec 2010, at 20:00, Scooter Willis wrote:

> It would be nice to have a cool indexing system that allowed dynamic indexes of the data model but not worth the headache. If we are going to go big we should use the same gap symbols that were added for protein sequences.
> 
> Scooter
> 
> On Mon, Dec 6, 2010 at 2:32 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
> I would say partially an oversight on my part & partially done on purpose (a gap is not a nucleotide after all). However I'm all in favour of being pragmatic here so lets add them in. If I get an okay from the relevant parties I'll commit the change in.
> 
> Andy
> 
> On 6 Dec 2010, at 18:41, Chris Friedline wrote:
> 
> > OK, so here's a quick fix now that I know where to look.  In my local
> > source I added the following line to the constructor of DNACompoundSet
> > and recompiled.
> >
> > addNucleotideCompound("-", "-");
> >
> > Not sure if this is the correct place for it in terms of what the devs
> > want to do globally, but it gets me moving forward again.  Gap
> > characters are in AminoAcidCompoundSet so I'm wondering if this was
> > just a tiny oversight on the nucleotide front.
> >
> > Thanks again for the help everyone,
> > Chris
> >
> > On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline <cfriedline at vcu.edu> wrote:
> >> That does help, thanks.  However, when calling getAsList() on the
> >> aligned sequences and printing, this is what I see.  Something seems
> >> wrong.  It does appear as though null is being inserted where there
> >> should be gaps
> >>
> >> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C,
> >> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T,
> >> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C,
> >> null, null, null, null, null, null]
> >> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T,
> >> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G,
> >> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null,
> >> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T]
> >>
> >> Chris
> >>
> >> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
> >>> Hi Andy,
> >>>
> >>> Check out the SimpleAlignedSequence class, for how Gaps are handled...
> >>> Does that help?
> >>>
> >>> Andreas
> >>>
> >>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates <ayates at ebi.ac.uk> wrote:
> >>>> So myself & Chris have discussed this off list & we believe it's because of a NULL compound element in the Sequence given to the SequenceMixin method.
> >>>>
> >>>> Does anyone on list know how the AlignedSequence code encodes gaps & the alike?
> >>>>
> >>>> Andy
> >>>>
> >>>> On 6 Dec 2010, at 13:50, Andy Yates wrote:
> >>>>
> >>>>> Hi Chris,
> >>>>>
> >>>>> Well that's going into my toStringBuilder() method & that particular line is concerned with asking a compound for its String representation. How often do we get nulls in our Sequences and how to deal with them. After all the Sequence AGTCNULLAGTC is probably more harmful then helpful
> >>>>>
> >>>>> Andy
> >>>>>
> >>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> Found another potential error case, this time in beta2 (fresh pull
> >>>>>> from git last evening).  For more info, please see
> >>>>>> http://pastie.org/1351388 for test case and stack trace.  The JUnit
> >>>>>> test passes simply because the pair object is not null, but fails when
> >>>>>> trying to extract any information from the pair itself (toString(),
> >>>>>> getIdenticals(), etc). The substitution matrix file is from
> >>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices.  I'm doing large numbers of
> >>>>>> pairwise alignments, which do not all fail, but most do with this same
> >>>>>> error.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Chris
> >>>>>>
> >>>>>> --
> >>>>>> PhD Candidate, Integrative Life Sciences
> >>>>>> Virginia Commonwealth University
> >>>>>> Richmond, VA
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>
> >>>> --
> >>>> Andrew Yates                   Ensembl Genomes Engineer
> >>>> EMBL-EBI                       Tel: +44-(0)1223-492538
> >>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> >>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> -----------------------------------------------------------------------
> >>> Dr. Andreas Prlic
> >>> Senior Scientist, RCSB PDB Protein Data Bank
> >>> University of California, San Diego
> >>> (+1) 858.246.0526
> >>> -----------------------------------------------------------------------
> >>>
> >>> _______________________________________________
> >>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>
> >>
> >>
> >>
> >> --
> >> PhD Candidate, Integrative Life Sciences
> >> Virginia Commonwealth University
> >> Richmond, VA
> >>
> >
> >
> >
> > --
> > PhD Candidate, Integrative Life Sciences
> > Virginia Commonwealth University
> > Richmond, VA
> 
> --
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> 
> 
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> 

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/








More information about the Biojava-l mailing list