[Bioperl-l] a problem when using the Bio::DB::Fasta

Peter biopython at maubp.freeserve.co.uk
Tue Aug 24 13:28:33 UTC 2010


On Tue, Aug 24, 2010 at 12:28 PM, Guifeng Wei <guifengwei at gmail.com> wrote:
> Hi,
>
> i have revised my scripts according to the previous email from Florent.
> However, there were still some errors which frustrated me so much.
>
> The errors are as follows:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Each line of the fasta entry must be the same length except the last.
>    Line above #301451 '
> ..' is 22 != 51 chars.
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> STACK: Bio::DB::Fasta::calculate_offsets
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> STACK: Bio::DB::Fasta::index_dir
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> STACK: Bio::DB::Fasta::new
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> STACK: bed2fasta.pl:13
> -----------------------------------------------------------
> indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory.index at
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> each contains the complete sequences from one single chromosome, the format
> is fasta. The extension of the FASTA files is .fa. Every single file is
> started as ">chromosoemeXXX" followed by the thousands of sequences.
>
> and therefore, it warn me that "Each line of the fasta entry must be the
> same length except the last". and "indexing was interrupted, so unlinking
> /home/wgf/elegans190.dna//directory".
>
> i was much confused about this. so for help.
>
> Wei Guifeng

Hi Wei,

It sounds like there is inconsistent line wrapping in your FASTA file.
This is often not a problem at all, but the DB indexing system (and
indeed other indexing tools like the samtools fasta index) requires
all the entries have the same wrapping.

e.g. This is a valid FASTA file but would not be suitable for indexing:

>Test
ACGTACGT
ACGTACGT
ACGTACGT
ACGT
ACGT
T

Ignoring the final line (special case - here length one) that uses a
mixture of line lengths, 8 and 4. If you had used this it should be
fine:

>Test
ACGTACGT
ACGTACGT
ACGTACGT
ACGTACGT
T

All the lines are now wrapped at length 8 (and the final line is
less than or equal to length 8).

Of course, in a real file wrapping a 60 or 80 characters is more
common ;)

Peter




More information about the Bioperl-l mailing list