[Bioperl-l] SeqIO - masked seqs

Nathan Haigh nathanhaigh at ukonline.co.uk
Thu Mar 17 04:06:28 EST 2005


Without going back and double checking, i think this is how things stand 
with the current CVS (and probably the 1.5 release). There was a 
modification in the module that trys to guess the alphabet of the 
sequence in question (X was added to the set of characters that were 
removed from the sequence prior to attempting to guess the alphabet) 
this resulted in the error shown when you have a fully masked sequence. 
I think the fix i implemented was in Bio::SeqIO::fasta which allowed you 
to do set the alphabet manually thus not allowing Bioperl to guess the 
alphabet.

soemthing like this should curcumvent this problem:

$in  = Bio::SeqIO->new(-file => "inputfilename" , 
                       -format => 'Fasta',
			-alphabet => 'dna');

Let us know how you get on
Nathan


chauser wrote:

> Hi Marc,
>
> I updated to the current CVS and get the same error. If I tack on a 
> single valid base to the offending clone(below) SeqIO reads it.
>
> # $Id: README,v 1.37 2005/03/01 16:56:02 amackey Exp $
>
> o Version
>
> This is Bioperl version 1.5 from CVS HEAD
>
>
>
> >1115008E10.y1 CHROMAT_FILE: 1115008E10.y1 PHD_FILE: 
> 1115008E10.y1.phd.1 CHEM: term
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXX
>
>
>
> ------------- EXCEPTION -------------
> MSG: Got a sequence with no letters in it cannot guess alphabet []
> STACK Bio::PrimarySeq::_guess_alphabet 
> /usr/local/src/bioperl/core/Bio/PrimarySeq.pm:837
> STACK Bio::Seq::SeqFastaSpeedFactory::create 
> /usr/local/src/bioperl/core/Bio/Seq/SeqFastaSpeedFactory.pm:137
> STACK Bio::SeqIO::fasta::next_seq 
> /usr/local/src/bioperl/core/Bio/SeqIO/fasta.pm:143
> STACK main::RAW ESTcount.pl:81
> STACK toplevel ESTcount.pl:49
>
>
> Chuck
>
>
>
> On Mar 16, 2005, at 2:16 AM, Marc Logghe wrote:
>
>
>         All,
>
>         I ran into a glitch when reading sets of EST reads where some
>         reads are masked in their entirety - i.e. all bases are X's.
>         Is there a way to either modify the alphabet to accept X or
>         some other solution?
>
>
>     I was not able to trace the actual fix. But there was a thread in
>     december/january about that.
>     In one of the last messages Nathan was about the fix this:
>     http://bioperl.org/pipermail/bioperl-l/2005-January/017829.html
>
>     Brian added a comment on this alphabet() issue.
>     http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqI
>
>     O.pm?cvsroot=bioperl
>     Have you tried bioperl release 1.5.0 or bioperl-release-1-5-0-rc2 ?
>     Guess it should be fixed there.
>     Is bioperl-release-1-5-0-rc2 behaving better than 1.5.0 related to
>     the
>     Bio::SeqFeatureI architecture ?
>     Marc
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list