Bioperl: Re: Bio::Tools::Blast

Rubin Eitan bcrubin@dapsas1.weizmann.ac.il
Thu, 27 Aug 1998 11:53:07 +0300 (IDT)


I would like to warn all Bio::PreSeq::parse_fasta() users. Some fasta 
databases (such as RepBase, if I'm not mistaken) are using non \S letters 
in their naming scheme. Most fasta parsers fail when they see
>gb|AC000254 blah blah blah
>AC000254_1 blash blah blah

In my case I overcome this with sed 's/^>gb|//' etc. or with perl 
scripts. It may pose a serious problem though if you want the 
Bio::PreSeq package to be universal.

       Eitan.  


======================================================================
Eitan Rubin,
Plant Genetics, Weizmann Inst of Science, Rehovot, Israel.  
EMail: bcrubin@dapsas1.weizmann.ac.il
Tel: (00972)-(8)9342421 Fax: (00972)-(8)9344181
EitanR@BioMOO (http://bioinfo.weizmann.ac.il/BioMOO) - visit 
                            the 
GCG help desk

in vivo -> in vitro -> in silico
======================================================================

On Wed, 26 Aug 1998, Steve A. Chervitz wrote:

> 
> Lincoln, 
> 
> Spaces are not permitted in identifiers in Blast.pm. In the Fasta
> files I've seen, a space is used to separate the identifier from the 
> description line. Here's how Bio::PreSeq::parse_fasta() grabs the 
> identifier and description:
> 
> ($self->{"id"}, $self->{"desc"}) = $head =~ /^>[ \t]*(\S*)[ \t]*(.*)$/;
> 
> BTW, I just updated the Blast distribution (now 0.061). It includes   
> an important memory management fix that helps when crunching lots of 
> reports. 
> 
> Steve Chervitz
> sac@genome.stanford.edu
> 
> 
> On 26 Aug 1998, Lincoln Stein wrote:
> 
> > Hi Steve,
> > 
> > Does Blast.pm not deal correctly with sequence identifiers that
> > contain spaces?  I just tried to blast a database made from
> > identifiers like this:
> > 
> > >notch4 exon #1
> > atgcagccccagttgctgctgctgctgctcttgccactcaatttccctgtcatcctgacc
> > agag
> > 
> > >notch4 exon #2
> > agcttctgtgtggaggatccccagagccctgtgccaacggaggcacctgcctgaggctat
> > ctcggggacaagggatctgcca
> > 
> > >notch4 exon #3
> > gtgtgcccctggatttctgggtgagacttgccagtttcctgacccctgcagggataccca
> > actctgcaagaatggtggcagctgccaagccctgctccccacacccccaagctcccgtag
> > tcctacttctccactgacccctcacttctcctgcacctgcccctctggcttcaccggtga
> > tcgatgccaaacccatctggaagagctctgtccaccttctttctgttccaacgggggtca
> > ctgctatgttcaggcctcaggccgcccacagtgctcctgcgagcctgggtggacag
> > 
> > but I only got "notch4" as the hit.  When I changed the spaces to
> > dots, I got the full identifier.
> > 
> > I don't think the FASTA format forbids spaces in the identifiers.
> > 
> > Oh, this is with 0.06, just downloaded today.
> > 
> > Lincoln
> > 
> > -- 
> > ========================================================================
> > Lincoln D. Stein                           Cold Spring Harbor Laboratory
> > lstein@cshl.org			                  Cold Spring Harbor, NY
> > ========================================================================
> > 
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
> 
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================