[Bioperl-l] a problem when using the Bio::DB::Fasta

Tue Aug 24 15:54:20 UTC 2010

Please keep all responses on-list.  

Regarding sreformat:

http://tinyurl.com/28q75rr

Judging by the stack traces below, you are also running off a UNIX-like system.  To concatenate files, use 'cat'.  So, for all files ending with .fa:

cat *.fa >> all.fa

chris

On Aug 24, 2010, at 8:54 AM, Guifeng Wei wrote:

> Hello Fields,
>  
> i have checked the fasta files. i suddenly find that the last line is blank line, and the last second is less than common.
>  
> i am not able to run the command line as Jason's advice because i have no knowledge about "sreformat".
>  
> i also want to ask a more question. i want megre the several single chromosome sequence file into one, OK?
>  
> thank you very much.
>  
> Wei Guifeng
> 2010/8/24 Chris Fields <cjfields at illinois.edu>
> Guifeng,
> 
> Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length?  Or checking the database itself?  These are both things reiterated by Florent and Peter.
> 
> From Jason's last response:
> 
> -------------------------
> Wei -
> 
> Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests.
> Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point.
> 
> The line lengths of the fasta file sequence aren't the same length.
> 
> you need to run this
> bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
> mv NEW ORIGINAL
> 
> or with sreformat
> sreformat fasta ORIGINAL > NEW
> mv NEW ORIGINAL
> -------------------------
> 
> chris
> 
> 
> On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote:
> 
> > Hi,
> >
> > i have revised my scripts according to the previous email from Florent.
> > However, there were still some errors which frustrated me so much.
> >
> > The errors are as follows:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Each line of the fasta entry must be the same length except the last.
> >   Line above #301451 '
> > ..' is 22 != 51 chars.
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> > STACK: Bio::DB::Fasta::calculate_offsets
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> > STACK: Bio::DB::Fasta::index_dir
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> > STACK: Bio::DB::Fasta::new
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> > STACK: bed2fasta.pl:13
> > -----------------------------------------------------------
> > indexing was interrupted, so unlinking
> > /home/wgf/elegans190.dna//directory.index at
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> > each contains the complete sequences from one single chromosome, the format
> > is fasta. The extension of the FASTA files is .fa. Every single file is
> > started as ">chromosoemeXXX" followed by the thousands of sequences.
> >
> > and therefore, it warn me that "Each line of the fasta entry must be the
> > same length except the last". and "indexing was interrupted, so unlinking
> > /home/wgf/elegans190.dna//directory".
> >
> > i was much confused about this. so for help.
> >
> > Wei Guifeng
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> -- 
> 危贵峰 Wei Guifeng
> 
> 
>