Bioperl: Re: Bio::Tools::Blast

Lincoln Stein lstein@cshl.org
Thu, 27 Aug 1998 10:33:58 -0400


I guess this is just another illustration of why it's useful to have
solid WRITTEN standards for file formats, not just informal standards
based on how some program written years ago parsed its input.

Lincoln

Rubin Eitan writes:
 > I would like to warn all Bio::PreSeq::parse_fasta() users. Some fasta 
 > databases (such as RepBase, if I'm not mistaken) are using non \S letters 
 > in their naming scheme. Most fasta parsers fail when they see
 > >gb|AC000254 blah blah blah
 > >AC000254_1 blash blah blah
 > 
 > In my case I overcome this with sed 's/^>gb|//' etc. or with perl 
 > scripts. It may pose a serious problem though if you want the 
 > Bio::PreSeq package to be universal.
 > 
 >        Eitan.  
 > 
 > 
 > ======================================================================
 > Eitan Rubin,
 > Plant Genetics, Weizmann Inst of Science, Rehovot, Israel.  
 > EMail: bcrubin@dapsas1.weizmann.ac.il
 > Tel: (00972)-(8)9342421 Fax: (00972)-(8)9344181
 > EitanR@BioMOO (http://bioinfo.weizmann.ac.il/BioMOO) - visit 
 >                             the 
 > GCG help desk
 > 
 > in vivo -> in vitro -> in silico
 > ======================================================================
 > 
 > On Wed, 26 Aug 1998, Steve A. Chervitz wrote:
 > 
 > > 
 > > Lincoln, 
 > > 
 > > Spaces are not permitted in identifiers in Blast.pm. In the Fasta
 > > files I've seen, a space is used to separate the identifier from the 
 > > description line. Here's how Bio::PreSeq::parse_fasta() grabs the 
 > > identifier and description:
 > > 
 > > ($self->{"id"}, $self->{"desc"}) = $head =~ /^>[ \t]*(\S*)[ \t]*(.*)$/;
 > > 
 > > BTW, I just updated the Blast distribution (now 0.061). It includes   
 > > an important memory management fix that helps when crunching lots of 
 > > reports. 
 > > 
 > > Steve Chervitz
 > > sac@genome.stanford.edu
 > > 
 > > 
 > > On 26 Aug 1998, Lincoln Stein wrote:
 > > 
 > > > Hi Steve,
 > > > 
 > > > Does Blast.pm not deal correctly with sequence identifiers that
 > > > contain spaces?  I just tried to blast a database made from
 > > > identifiers like this:
 > > > 
 > > > >notch4 exon #1
 > > > atgcagccccagttgctgctgctgctgctcttgccactcaatttccctgtcatcctgacc
 > > > agag
 > > > 
 > > > >notch4 exon #2
 > > > agcttctgtgtggaggatccccagagccctgtgccaacggaggcacctgcctgaggctat
 > > > ctcggggacaagggatctgcca
 > > > 
 > > > >notch4 exon #3
 > > > gtgtgcccctggatttctgggtgagacttgccagtttcctgacccctgcagggataccca
 > > > actctgcaagaatggtggcagctgccaagccctgctccccacacccccaagctcccgtag
 > > > tcctacttctccactgacccctcacttctcctgcacctgcccctctggcttcaccggtga
 > > > tcgatgccaaacccatctggaagagctctgtccaccttctttctgttccaacgggggtca
 > > > ctgctatgttcaggcctcaggccgcccacagtgctcctgcgagcctgggtggacag
 > > > 
 > > > but I only got "notch4" as the hit.  When I changed the spaces to
 > > > dots, I got the full identifier.
 > > > 
 > > > I don't think the FASTA format forbids spaces in the identifiers.
 > > > 
 > > > Oh, this is with 0.06, just downloaded today.
 > > > 
 > > > Lincoln
 > > > 
 > > > -- 
 > > > ========================================================================
 > > > Lincoln D. Stein                           Cold Spring Harbor Laboratory
 > > > lstein@cshl.org			                  Cold Spring Harbor, NY
 > > > ========================================================================
 > > > 
 > > =========== Bioperl Project Mailing List Message Footer =======
 > > Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
 > > For info about how to (un)subscribe, where messages are archived, etc:
 > > http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
 > > ====================================================================
 > > 
 > 
-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
========================================================================
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================