[BioRuby] RegEx search example fasta file

pjotr at pckassa.com pjotr at pckassa.com
Wed Mar 24 01:24:16 EST 2004


Thanks! I'll have a look and will improve the Wiki to cover that. Pays
off immediately ;-).

Pj.

On Wed, Mar 24, 2004 at 11:58:20AM +0900, Toshiaki Katayama wrote:
> On 2004/03/21, at 22:33, pjotr at pckassa.com wrote:
> >Can this go in the sample directory of bioruby - I have added it to
> >the Wiki. Comments welcome.
> 
> As for the wiki page, comparing to the original BJIA,
> (http://www.biojava.org/docs/bj_in_anger/FastaParser.htm)
> this section is to answer how to parse fasta results.
> 
> As the Bio::FlatFile.auto in BioRuby is very powerful and
> entry.definition is implemented in various DB classes,
> the way of your code that finds entries by regexp
> is not limited to the FastaFormat as follows:
> 
> % re_grep_def.rb 'serine.* kinase' genbank/gb*.seq
> % re_grep_def.rb 'serine.* kinase' kegg/genes/*.ent
> % re_grep_def.rb 'serine.* kinase' kegg/sequences/*.pep
> 
> ----------------------------------------------
> #!/usr/bin/env ruby
> 
> require 'bio'
> 
> re = /#{ARGV.shift}/i
> 
> Bio::FlatFile.auto(ARGF) do |ff|
>   ff.each do |entry|
>     if re.match(entry.definition)
>       puts ff.entry_raw
>     end
>   end
> end
> ----------------------------------------------
> 
> 
> -k
> 
> 
> 
> >
> >Pj.
> >
> >
> >#! /usr/bin/ruby
> >#
> >#   $Id: fastasearch,v 1.1 2004/03/21 13:18:41 wrk Exp $
> >#   $Source: /home/cvs/home/pjotr/lwrk/luw/fasta/fastasearch,v $
> >#
> >
> ># require 'profile'
> >
> >COPYRIGHT = "GPL (c) 2003-2004"
> >
> >usage = <<USAGE
> >
> >    Search fasta file(s) tags using a regular expression (regex)
> >
> >    Usage: fastasearch [-q query] filename(s)
> >
> >    Example:
> >
> >      ruby fastasearch -q '/([Hh]uman|[Hh]omo sapiens)/' nr.fa
> >
> >    For more information see
> >
> >        http://thebird.nl/bioinformatics/
> >	
> >    Pjotr Prins
> >    Wageningen University and Research Centre
> >    http://www.wur.nl/
> >    http://www.dpw.wageningen-ur.nl/nema/
> >
> >USAGE
> >
> ># --------------------------------------------------------------------
> >
> >srcpath=File.dirname($0)
> >libpath=File.dirname(srcpath)+'/lib'
> >$: << srcpath         # ---- Add start path to search libraries
> >$: << libpath
> >
> >require 'getoptlong'
> >require 'bio'
> >
> ># ---- Parse command line
> >opts = GetoptLong.new(
> > [ "--help", "-h", GetoptLong::NO_ARGUMENT ],
> > [ "--query", "-q", GetoptLong::REQUIRED_ARGUMENT ]
> >)
> >
> >do_help       = false
> >query=nil
> >
> >opts.each do | opt, arg |
> >   do_help   |= (opt == '--help')
> >   query = arg if (opt == '--query')
> >end
> >
> ># ---- Print usage
> >if (do_help || ARGV.size==0)
> >  print usage
> >  exit 1
> >end
> >
> >if !query
> >  print "Give query: "
> >  query = $stdin.gets.chomp
> >end
> >
> >ARGV.each do | fn |
> >  $stderr.print "Loading #{fn}..."
> >  f = Bio::FlatFile.auto(fn)
> >  $stderr.print " detected: #{f.dbclass}\n"
> >  f.each_entry do | e |
> >    if e.definition =~ /#{query}/
> >      print '>',e.definition,e.data
> >    end
> >  end
> >end
> >
> >_______________________________________________
> >BioRuby mailing list
> >BioRuby at open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioruby
> 
> _______________________________________________
> BioRuby mailing list
> BioRuby at open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioruby


More information about the BioRuby mailing list