Bioperl: Re: Bio::Tools::Blast
Rubin Eitan
bcrubin@dapsas1.weizmann.ac.il
Thu, 27 Aug 1998 11:53:07 +0300 (IDT)
I would like to warn all Bio::PreSeq::parse_fasta() users. Some fasta
databases (such as RepBase, if I'm not mistaken) are using non \S letters
in their naming scheme. Most fasta parsers fail when they see
>gb|AC000254 blah blah blah
>AC000254_1 blash blah blah
In my case I overcome this with sed 's/^>gb|//' etc. or with perl
scripts. It may pose a serious problem though if you want the
Bio::PreSeq package to be universal.
Eitan.
======================================================================
Eitan Rubin,
Plant Genetics, Weizmann Inst of Science, Rehovot, Israel.
EMail: bcrubin@dapsas1.weizmann.ac.il
Tel: (00972)-(8)9342421 Fax: (00972)-(8)9344181
EitanR@BioMOO (http://bioinfo.weizmann.ac.il/BioMOO) - visit
the
GCG help desk
in vivo -> in vitro -> in silico
======================================================================
On Wed, 26 Aug 1998, Steve A. Chervitz wrote:
>
> Lincoln,
>
> Spaces are not permitted in identifiers in Blast.pm. In the Fasta
> files I've seen, a space is used to separate the identifier from the
> description line. Here's how Bio::PreSeq::parse_fasta() grabs the
> identifier and description:
>
> ($self->{"id"}, $self->{"desc"}) = $head =~ /^>[ \t]*(\S*)[ \t]*(.*)$/;
>
> BTW, I just updated the Blast distribution (now 0.061). It includes
> an important memory management fix that helps when crunching lots of
> reports.
>
> Steve Chervitz
> sac@genome.stanford.edu
>
>
> On 26 Aug 1998, Lincoln Stein wrote:
>
> > Hi Steve,
> >
> > Does Blast.pm not deal correctly with sequence identifiers that
> > contain spaces? I just tried to blast a database made from
> > identifiers like this:
> >
> > >notch4 exon #1
> > atgcagccccagttgctgctgctgctgctcttgccactcaatttccctgtcatcctgacc
> > agag
> >
> > >notch4 exon #2
> > agcttctgtgtggaggatccccagagccctgtgccaacggaggcacctgcctgaggctat
> > ctcggggacaagggatctgcca
> >
> > >notch4 exon #3
> > gtgtgcccctggatttctgggtgagacttgccagtttcctgacccctgcagggataccca
> > actctgcaagaatggtggcagctgccaagccctgctccccacacccccaagctcccgtag
> > tcctacttctccactgacccctcacttctcctgcacctgcccctctggcttcaccggtga
> > tcgatgccaaacccatctggaagagctctgtccaccttctttctgttccaacgggggtca
> > ctgctatgttcaggcctcaggccgcccacagtgctcctgcgagcctgggtggacag
> >
> > but I only got "notch4" as the hit. When I changed the spaces to
> > dots, I got the full identifier.
> >
> > I don't think the FASTA format forbids spaces in the identifiers.
> >
> > Oh, this is with 0.06, just downloaded today.
> >
> > Lincoln
> >
> > --
> > ========================================================================
> > Lincoln D. Stein Cold Spring Harbor Laboratory
> > lstein@cshl.org Cold Spring Harbor, NY
> > ========================================================================
> >
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
>
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================