[Biojava-l] SequenceMixin Error in BioJava3 Alignment

Scooter Willis willishf at ufl.edu
Mon Dec 6 20:00:18 UTC 2010


It would be nice to have a cool indexing system that allowed dynamic indexes
of the data model but not worth the headache. If we are going to go big we
should use the same gap symbols that were added for protein sequences.

Scooter

On Mon, Dec 6, 2010 at 2:32 PM, Andy Yates <ayates at ebi.ac.uk> wrote:

> I would say partially an oversight on my part & partially done on purpose
> (a gap is not a nucleotide after all). However I'm all in favour of being
> pragmatic here so lets add them in. If I get an okay from the relevant
> parties I'll commit the change in.
>
> Andy
>
> On 6 Dec 2010, at 18:41, Chris Friedline wrote:
>
> > OK, so here's a quick fix now that I know where to look.  In my local
> > source I added the following line to the constructor of DNACompoundSet
> > and recompiled.
> >
> > addNucleotideCompound("-", "-");
> >
> > Not sure if this is the correct place for it in terms of what the devs
> > want to do globally, but it gets me moving forward again.  Gap
> > characters are in AminoAcidCompoundSet so I'm wondering if this was
> > just a tiny oversight on the nucleotide front.
> >
> > Thanks again for the help everyone,
> > Chris
> >
> > On Mon, Dec 6, 2010 at 1:28 PM, Chris Friedline <cfriedline at vcu.edu>
> wrote:
> >> That does help, thanks.  However, when calling getAsList() on the
> >> aligned sequences and printing, this is what I see.  Something seems
> >> wrong.  It does appear as though null is being inserted where there
> >> should be gaps
> >>
> >> seq = [A, A, C, A, C, T, T, G, A, C, A, T, G, T, T, C, null, G, T, C,
> >> G, C, A, A, C, T, T, T, T, A, A, G, A, G, A, T, T, A, G, A, G, T, T,
> >> T, T, C, G, G, T, T, C, G, G, C, C, G, G, A, C, G, A, A, A, C, A, C,
> >> null, null, null, null, null, null]
> >> seq = [T, A, C, C, C, T, T, A, A, C, A, T, null, null, T, C, A, G, T,
> >> G, A, C, A, A, C, C, T, C, null, null, A, G, A, G, A, T, G, A, G,
> >> null, G, C, T, T, T, C, T, C, T, T, C, G, G, null, null, null, null,
> >> null, null, null, A, G, A, C, A, C, T, G, G, G, A, T]
> >>
> >> Chris
> >>
> >> On Mon, Dec 6, 2010 at 12:22 PM, Andreas Prlic <andreas at sdsc.edu>
> wrote:
> >>> Hi Andy,
> >>>
> >>> Check out the SimpleAlignedSequence class, for how Gaps are handled...
> >>> Does that help?
> >>>
> >>> Andreas
> >>>
> >>> On Mon, Dec 6, 2010 at 7:13 AM, Andy Yates <ayates at ebi.ac.uk> wrote:
> >>>> So myself & Chris have discussed this off list & we believe it's
> because of a NULL compound element in the Sequence given to the
> SequenceMixin method.
> >>>>
> >>>> Does anyone on list know how the AlignedSequence code encodes gaps &
> the alike?
> >>>>
> >>>> Andy
> >>>>
> >>>> On 6 Dec 2010, at 13:50, Andy Yates wrote:
> >>>>
> >>>>> Hi Chris,
> >>>>>
> >>>>> Well that's going into my toStringBuilder() method & that particular
> line is concerned with asking a compound for its String representation. How
> often do we get nulls in our Sequences and how to deal with them. After all
> the Sequence AGTCNULLAGTC is probably more harmful then helpful
> >>>>>
> >>>>> Andy
> >>>>>
> >>>>> On 6 Dec 2010, at 12:45, Chris Friedline wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> Found another potential error case, this time in beta2 (fresh pull
> >>>>>> from git last evening).  For more info, please see
> >>>>>> http://pastie.org/1351388 for test case and stack trace.  The JUnit
> >>>>>> test passes simply because the pair object is not null, but fails
> when
> >>>>>> trying to extract any information from the pair itself (toString(),
> >>>>>> getIdenticals(), etc). The substitution matrix file is from
> >>>>>> ftp://ftp.ncbi.nih.gov/blast/matrices.  I'm doing large numbers of
> >>>>>> pairwise alignments, which do not all fail, but most do with this
> same
> >>>>>> error.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Chris
> >>>>>>
> >>>>>> --
> >>>>>> PhD Candidate, Integrative Life Sciences
> >>>>>> Virginia Commonwealth University
> >>>>>> Richmond, VA
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>
> >>>> --
> >>>> Andrew Yates                   Ensembl Genomes Engineer
> >>>> EMBL-EBI                       Tel: +44-(0)1223-492538
> >>>> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> >>>> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> -----------------------------------------------------------------------
> >>> Dr. Andreas Prlic
> >>> Senior Scientist, RCSB PDB Protein Data Bank
> >>> University of California, San Diego
> >>> (+1) 858.246.0526
> >>> -----------------------------------------------------------------------
> >>>
> >>> _______________________________________________
> >>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>>
> >>
> >>
> >>
> >> --
> >> PhD Candidate, Integrative Life Sciences
> >> Virginia Commonwealth University
> >> Richmond, VA
> >>
> >
> >
> >
> > --
> > PhD Candidate, Integrative Life Sciences
> > Virginia Commonwealth University
> > Richmond, VA
>
> --
> Andrew Yates                   Ensembl Genomes Engineer
> EMBL-EBI                       Tel: +44-(0)1223-492538
> Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
> Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/
>
>
>
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>



More information about the Biojava-l mailing list