[Bioperl-l] new modules for sarching for patterns in fasta-files

markus.riester at student.uni-tuebingen.de markus.riester at student.uni-tuebingen.de
Tue Aug 9 18:40:11 EDT 2005


update:

the tests are written. looks good. agrep finds matches at the end of the
longest arabidopsis cdna sequence (16kb). 

(but the tests showed some serious bugs in version 0.03, the one in the first
attachment. they are all fixed in this attachment)

markus


markus.riester at student.uni-tuebingen.de schrieb:

> with a cheap trick, yes, split the fasta files in two files. ids in one file,
> sequences -one per line- in the second. 
> 
> this should be ok for cdna/protein fastafiles (but I am currently writing
> tests-maybe some serious problems with the chars per line limitations show
> up-but I did look good in some first tests.)
> 
> we don't use agrep anymore, because vmatch is really, really good. only with
> many mismatches and short query sequences, agrep seems to be a bit faster. 
>  
> markus
> 
> "Aaron J. Mackey" <amackey at pcbi.upenn.edu> schrieb:
> 
> > Out of curiosity, are your patterns allowed to cross newlines  
> > embedded in the FASTA file?  This is the typical problem with using  
> > grep/agrep directly with sequence files ...
> > 
> > -Aaron
> > 
> > On Aug 8, 2005, at 1:12 PM, <markus.riester at student.uni-tuebingen.de>  
> > <markus.riester at student.uni-tuebingen.de> wrote:
> > 
> > >
> > > Hi,
> > >
> > > I've made some modules for searching for patterns in fasta files with
> > > different (really fast) backends like agrep and vmatch.  I don't  
> > > think you
> > > want to include this in standard bioperl. But we think it is useful  
> > > code and
> > > we'd like to share it on cpan. The main reason for this email is a  
> > > discussion
> > > about the right namespace for this module. What do you think?
> > >
> > > Markus
> > >
> > > (hope the attachment reaches the mailinglist, if not, please send  
> > > me a mail if
> > > you are interested in this code)
> > >
> > >
> > > <Weigel-Search-0.03.tar.gz>
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > --
> > Aaron J. Mackey, Ph.D.
> > Project Manager, ApiDB Bioinformatics Resource Center
> > Penn Genomics Institute, University of Pennsylvania
> > email:  amackey at pcbi.upenn.edu
> > office: 215-898-1205 (Goddard) / 215-746-7018 (PCBI)
> > fax:    215-746-6697
> > postal: Penn Genomics Institute
> >          Goddard Labs 212
> >          415 S. University Avenue
> >          Philadelphia, PA  19104-6017
> > 
> > 
> 
> 
> 
> -- 
> 
> 
> 



-- 



-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/x-gzip
Size: 36342 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050809/1a48a1f2/attachment-0001.bin


More information about the Bioperl-l mailing list