[BioRuby] RegEx search example fasta file

Toshiaki Katayama ktym at hgc.jp
Tue Mar 23 21:58:20 EST 2004


On 2004/03/21, at 22:33, pjotr at pckassa.com wrote:
> Can this go in the sample directory of bioruby - I have added it to
> the Wiki. Comments welcome.

As for the wiki page, comparing to the original BJIA,
(http://www.biojava.org/docs/bj_in_anger/FastaParser.htm)
this section is to answer how to parse fasta results.

As the Bio::FlatFile.auto in BioRuby is very powerful and
entry.definition is implemented in various DB classes,
the way of your code that finds entries by regexp
is not limited to the FastaFormat as follows:

% re_grep_def.rb 'serine.* kinase' genbank/gb*.seq
% re_grep_def.rb 'serine.* kinase' kegg/genes/*.ent
% re_grep_def.rb 'serine.* kinase' kegg/sequences/*.pep

----------------------------------------------
#!/usr/bin/env ruby

require 'bio'

re = /#{ARGV.shift}/i

Bio::FlatFile.auto(ARGF) do |ff|
   ff.each do |entry|
     if re.match(entry.definition)
       puts ff.entry_raw
     end
   end
end
----------------------------------------------


-k



>
> Pj.
>
>
> #! /usr/bin/ruby
> #
> #   $Id: fastasearch,v 1.1 2004/03/21 13:18:41 wrk Exp $
> #   $Source: /home/cvs/home/pjotr/lwrk/luw/fasta/fastasearch,v $
> #
>
> # require 'profile'
>
> COPYRIGHT = "GPL (c) 2003-2004"
>
> usage = <<USAGE
>
>     Search fasta file(s) tags using a regular expression (regex)
>
>     Usage: fastasearch [-q query] filename(s)
>
>     Example:
>
>       ruby fastasearch -q '/([Hh]uman|[Hh]omo sapiens)/' nr.fa
>
>     For more information see
>
>         http://thebird.nl/bioinformatics/
> 	
>     Pjotr Prins
>     Wageningen University and Research Centre
>     http://www.wur.nl/
>     http://www.dpw.wageningen-ur.nl/nema/
>
> USAGE
>
> # --------------------------------------------------------------------
>
> srcpath=File.dirname($0)
> libpath=File.dirname(srcpath)+'/lib'
> $: << srcpath         # ---- Add start path to search libraries
> $: << libpath
>
> require 'getoptlong'
> require 'bio'
>
> # ---- Parse command line
> opts = GetoptLong.new(
>  [ "--help", "-h", GetoptLong::NO_ARGUMENT ],
>  [ "--query", "-q", GetoptLong::REQUIRED_ARGUMENT ]
> )
>
> do_help       = false
> query=nil
>
> opts.each do | opt, arg |
>    do_help   |= (opt == '--help')
>    query = arg if (opt == '--query')
> end
>
> # ---- Print usage
> if (do_help || ARGV.size==0)
>   print usage
>   exit 1
> end
>
> if !query
>   print "Give query: "
>   query = $stdin.gets.chomp
> end
>
> ARGV.each do | fn |
>   $stderr.print "Loading #{fn}..."
>   f = Bio::FlatFile.auto(fn)
>   $stderr.print " detected: #{f.dbclass}\n"
>   f.each_entry do | e |
>     if e.definition =~ /#{query}/
>       print '>',e.definition,e.data
>     end
>   end
> end
>
> _______________________________________________
> BioRuby mailing list
> BioRuby at open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioruby



More information about the BioRuby mailing list