[Bioperl-l] new modules for sarching for patterns in fasta-files

markus.riester at student.uni-tuebingen.de markus.riester at student.uni-tuebingen.de
Tue Aug 9 15:32:09 EDT 2005


with a cheap trick, yes, split the fasta files in two files. ids in one file,
sequences -one per line- in the second. 

this should be ok for cdna/protein fastafiles (but I am currently writing
tests-maybe some serious problems with the chars per line limitations show
up-but I did look good in some first tests.)

we don't use agrep anymore, because vmatch is really, really good. only with
many mismatches and short query sequences, agrep seems to be a bit faster. 
 
markus

"Aaron J. Mackey" <amackey at pcbi.upenn.edu> schrieb:

> Out of curiosity, are your patterns allowed to cross newlines  
> embedded in the FASTA file?  This is the typical problem with using  
> grep/agrep directly with sequence files ...
> 
> -Aaron
> 
> On Aug 8, 2005, at 1:12 PM, <markus.riester at student.uni-tuebingen.de>  
> <markus.riester at student.uni-tuebingen.de> wrote:
> 
> >
> > Hi,
> >
> > I've made some modules for searching for patterns in fasta files with
> > different (really fast) backends like agrep and vmatch.  I don't  
> > think you
> > want to include this in standard bioperl. But we think it is useful  
> > code and
> > we'd like to share it on cpan. The main reason for this email is a  
> > discussion
> > about the right namespace for this module. What do you think?
> >
> > Markus
> >
> > (hope the attachment reaches the mailinglist, if not, please send  
> > me a mail if
> > you are interested in this code)
> >
> >
> > <Weigel-Search-0.03.tar.gz>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Aaron J. Mackey, Ph.D.
> Project Manager, ApiDB Bioinformatics Resource Center
> Penn Genomics Institute, University of Pennsylvania
> email:  amackey at pcbi.upenn.edu
> office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI)
> fax:    215-746-6697
> postal: Penn Genomics Institute
>          Goddard Labs 212
>          415 S. University Avenue
>          Philadelphia, PA  19104-6017
> 
> 



-- 





More information about the Bioperl-l mailing list