[Bioperl-l] new modules for sarching for patterns in fasta-files
markus.riester at student.uni-tuebingen.de
markus.riester at student.uni-tuebingen.de
Tue Aug 9 15:32:09 EDT 2005
with a cheap trick, yes, split the fasta files in two files. ids in one file,
sequences -one per line- in the second.
this should be ok for cdna/protein fastafiles (but I am currently writing
tests-maybe some serious problems with the chars per line limitations show
up-but I did look good in some first tests.)
we don't use agrep anymore, because vmatch is really, really good. only with
many mismatches and short query sequences, agrep seems to be a bit faster.
markus
"Aaron J. Mackey" <amackey at pcbi.upenn.edu> schrieb:
> Out of curiosity, are your patterns allowed to cross newlines
> embedded in the FASTA file? This is the typical problem with using
> grep/agrep directly with sequence files ...
>
> -Aaron
>
> On Aug 8, 2005, at 1:12 PM, <markus.riester at student.uni-tuebingen.de>
> <markus.riester at student.uni-tuebingen.de> wrote:
>
> >
> > Hi,
> >
> > I've made some modules for searching for patterns in fasta files with
> > different (really fast) backends like agrep and vmatch. I don't
> > think you
> > want to include this in standard bioperl. But we think it is useful
> > code and
> > we'd like to share it on cpan. The main reason for this email is a
> > discussion
> > about the right namespace for this module. What do you think?
> >
> > Markus
> >
> > (hope the attachment reaches the mailinglist, if not, please send
> > me a mail if
> > you are interested in this code)
> >
> >
> > <Weigel-Search-0.03.tar.gz>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Aaron J. Mackey, Ph.D.
> Project Manager, ApiDB Bioinformatics Resource Center
> Penn Genomics Institute, University of Pennsylvania
> email: amackey at pcbi.upenn.edu
> office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI)
> fax: 215-746-6697
> postal: Penn Genomics Institute
> Goddard Labs 212
> 415 S. University Avenue
> Philadelphia, PA 19104-6017
>
>
--
More information about the Bioperl-l
mailing list