Farm files for databases
simon andrews (BI)
simon.andrews at bbsrc.ac.uk
Thu Dec 20 12:24:18 UTC 2001
After getting some useful info from Peter Rice about how to create a
database farm in EMBOSS I thought I'd share the script I'm now using to do
this.
To use this simply copy and paste the text of the script at the bottom of
this message to a file on your system, then make sure that this file is
readable and executable by everyone (chmod 755 filename). The comments in
the script tell you what changes you need to make to the script itself, and
the format of the entry you need to create in emboss.default.
Because of the bug I previously reported in entret, this script will not
work from an entret query to the farm. It will work with seqret (and will
output any format you like), and can also be used as part of a USA for any
of the standard EMBOSS programs.
The script requires a unix-like OS, but could trivially be adapted to run
under Win32 if anyone is running EMBOSS under windows.
TTFN
Simon.
------ Script Starts Here -- Beware of long lines wrapping
----------------------
#!/usr/bin/perl -w
use strict;
# EMBOSS farm file script
#
# Written by Simon Andrews
# simon.andrews at bbsrc.ac.uk
# Dec 2001
#
# This script allows you to set up a farm
# of EMBOSS databases which can be queried
# by a single instance of seqret. The
# program must be accompanied by an entry
# in emboss.default which looks like this:
#
# DB name_of_database [
# type: N (or P if we're dealing with proteins)
# method: app
# format: fasta
# app: "/path/to/this/script"
# comment: "Whatever text you'd like to see in showdb" ]
#
# First we need to set a few preferences
#
# What is the full path to seqret?
# If you are sure that seqret will always
# be somewhere in your path, then you can
# just leave this as 'seqret'.
my $seqret_path = 'seqret';
# Now we need to know the names of the
# databases you'd like included in the
# search. These must be dabases which
# have already been indexed, and installed
# correctly into emboss.default. Simply
# enter the database names between the
# brackets, separated by spaces.
my @databases = qw(dbase1 dbase2 dbase3);
##### End of bits which need to be edited #########
my ($reference) = @ARGV;
if ($reference =~ /:(.+)$/){
$reference = $1;
}
else {
die "\n*** FARM ERROR *** Couldn't get accession after : from
$reference\n\n";
}
foreach my $database (@databases){
my $sequence = `$seqret_path $database:$reference fasta::stdout
2>/dev/null`;
if ($sequence){
print $sequence;
exit;
}
}
warn "\n*** FARM ERROR *** Couldn't find $reference in any of
'@databases'\n\n";
More information about the EMBOSS
mailing list