[Bioperl-l] Bio::SeqIO issue

Chris Fields cjfields at illinois.edu
Wed Aug 5 21:04:14 UTC 2009


On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote:

> Is my impression correct that Bio::SeqIO just assumes that sequences  
> are
> being submitted in FASTA format?

No. See:

http://www.bioperl.org/wiki/HOWTO:SeqIO

SeqIO tries to guess at the format using the file extension, and if  
one isn't present makes use of Bio::Tools::GuessSeqFormat.  It's  
possible that the extension is causing the problem, or that  
GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to  
guessing).  In any case, it's always advisable to explicitly indicate  
the format when possible.

Relevant lines:

    return 'fasta'   if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/i;
...
    return 'raw'     if /\.(txt)$/i;

> In our experience, implementing
> Bio::SeqIO led to the first line of files being cut off, regardless of
> whether the files were indeed fasta files or files that only contained
> sequence.

Files that only contain sequence are 'raw'.  Ones in FASTA are 'fasta'.

> Which, in the latter, led to sequence submissions that had the
> first line of nucleotides removed. Has anyone tried to write a fix for
> this?

This sounds like a bug, but we have very little to go on beyond your  
description.  What version of bioperl are you using, OS, etc?  What  
does your data look like?  File extension?

chris

> Thanks,
>
> Uwe
>
>
>
>
>
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>
> Uwe Hilgert, Ph.D.
>
> Dolan DNA Learning Center
>
> Cold Spring Harbor Laboratory
>
>
>
> V: (516) 367-5185
>
> E: hilgert at cshl.edu <mailto:hilgert at cshl.edu>
>
> F: (516) 367-5182
>
> W: http://www.dnalc.org
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list