[Bioperl-l] Installing Bioperl on Windows

Barry Moore barry.moore at genetics.utah.edu
Thu Dec 9 11:18:52 EST 2004


Very helpful comments Nathan - Thank you.

I was referring to Bio::Tools::Run::Alignment::Clustalw in bioperl-run.  
While clustalw has a Windows port and runs just fine on Windows, this 
Bioperl module doesn't.  In it's doucmentation it says, "However, since 
the module is currently implemented using (unix) system calls, extensive 
modification may be necessary before Clustalw.pm would work under 
non-Unix operating systems (eg Windows, MacOS).  I'm not sure that the 
modifications would need to be that extensive (i.e. changing a system 
call to backticks) and maybe I should try that out.  My comments seemed 
to suggest that there were other modules that had the same issues, and 
that may not be true.

I have been able to install the Bioperl 1.4 core successfully with just 
the ActiveState, Bioperl, and Winnipeg repositories.

You are right, I should have given more detail for installing 
bioperl-run.  It contains alot of stuff beginners (and others)  might 
want.  Access to the Pise packages alone would be very useful to Windows 
users wanting to implement Unix only software.

Thanks for your help on the ppm info.  I'm not to savy with ppm and 
wasn't sure that it would automatically install the latest version.  I 
think a ppd for bioperl-run is an excellent idea.

Barry

Nathan Haigh wrote:

> Being a windows user (primarily), I have the following comments about 
> the windows install instructions:
>
> I wasn't sure which wrappers you were referring to that will not work 
> in on Windows OS, when you said:
>
> "Others, such as clustalw, have Windows ports, however the BioPerl 
> developer who wrote the interface used Unix specific system calls to 
> interact with these programs and so these wrappers will not work in 
> the Windows environment"
>
> Are you sure that the http://www.Bribes.org/perl/ppm repository isn't 
> require in addition to theoryx http://theoryx5.uwinnipeg.ca/ppms for 
> some modules (just wondering why I might have it installed unless I 
> needed it for a bioperl feature)?:
>
> I have amended the section regarding ppd files for bioperl-run etc. 
> suggesting the user try's searching for them before jumping into 
> source! I might see about getting a ppd file for the Bioperl-run 
> package made up as this is often something that beginners/intermediate 
> bioperlers would like to use i.e. have batch runs and parse the output 
> etc.
>
> I've attached my modified version of the file with changes.
>
> Also, with regards to naming packages in .ppd files:
>
> Short version:
>
> ------------------
>
> Change the two references to Bioperl-1.4 in the PPM install steps to read:
>
>         Install Bioperl
>
> Also, I think Bioperl 1.4 references should be made more general for 
> future releases i.e. just Bioperl
>
> Reasoning:
>
> ---------------
>
> ppd files have both a NAME and a VERSION field, and when installing 
> via PPM you would type
>
>         PPM> install <package name>
>
> NAME should not contain any reference to the version number and should 
> simply be set to Bioperl (not Bioperl-1.4), leaving the version 
> numbering to the VERSION field. This means that when a Bioperl v1.5 is 
> released and you do a search for bioperl:
>
>         PPM> search bioperl
>
> A list of modules is returned, e.g.:
>
> Searching in Active Repositories
>
>    1. Bioperl         [1.5] Bioinformatics Toolkit
>
>    2. Bioperl-1.2     [1.2] Bioperl 1.2 PPM3 Archive
>
>    3. Bioperl-1.2.1 [1.2.1] Bioperl 1.2.1 PPM3 Archive
>
>    4. Bioperl-1.2.3 [1.2.3] Bioperl 1.2.3 PPM3 Archive
>
>    5. Bioperl-1.4     [1.4] Bioperl 1.4 PPM3 Archive
>
> Thus, when the user issues the command:
>
>         PPM> install bioperl
>
> PPM's internals will automatically install the latest version of 
> Bioperl. If the user needs to install an older version, they should 
> issue a command such as:
>
>         PPM> install 4
>
> This would install Bioperl-1.2.3 package from the above list.
>
> This would also allow a user of BioPerl v1.4 to upgrade to 1.5 by 
> issuing the following command:
>
>         PPM> upgrade Bioperl
>
> And PPM's internals would upgrade BioPerl to the latest version 
> (however, I don't know how/if this would work for people who have 
> install Bioperl-1.4 (package 5 shown above) as PPM would probably 
> think this a totally different module because of the different NAME.
>
> Nathan
>
>
>> -----Original Message-----
>
>> From: bioperl-l-bounces at portal.open-bio.org 
> [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Barry Moore
>
>> Sent: 08 December 2004 21:30
>
>> To: Jason Stajich; Brian Osborne; bioperl
>
>> Subject: [Bioperl-l] Installing Bioperl on Windows
>
>>
>
>> Of course as soon as I sent my last e-mail I found an error in the file
>
>> I attached. It didn't include the example script that I reffered to.
>
>>
>
>> Barry
>
>>
>
>> ==========================================================
>
>>
>
>> Installing Bioperl on Windows
>
>> =============================
>
>>
>
>> 1) Quick Instructions for the Impatient
>
>> 2) Bioperl on Windows
>
>> 3) Perl on Windows
>
>> 4) BioPerl on Windows
>
>> 5) Beyond the Core
>
>> 6) BioPerl and Cygwin
>
>> 7) Cygwin Tips
>
>> 8) Example Script
>
>>
>
>> This installation guide was written by Barry Moore and other Bioperl
>
>> authors based on the
>
>> original work of Paul Boutros. Please report problems and/or fixes to
>
>> the bioperl mailing
>
>> list, bioperl-l at bioperl.org
>
>>
>
>> 1) Quick instructions for the impatient, lucky, or experienced user.
>
>> =====================================================================
>
>>
>
>> Download the ActivePerl MSI from
>
>> http://www.activestate.com/Products/ActivePerl/
>
>> Run the ActivePerl Installer (accepting all defaults is fine).
>
>> Open a command prompt (Menus Start->Run and type cmd) and run the ppm
>
>> shell (C:\>ppm).
>
>> Add two new ppm repositories with the following commands:
>
>> ppm> rep add Bioperl http://bioperl.org/DIST
>
>> ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms
>
>> Install Bioperl-1.4.
>
>> Go to http://www.bioperl.org and start reading documentation or try the
>
>> example script at
>
>> the end of this file.
>
>>
>
>>
>
>> 2) Bioperl on Windows
>
>> ======================
>
>>
>
>> Bioperl is a large collection of Perl modules (extensions to the Perl
>
>> language) that aid
>
>> in the task of writing Perl code to deal with sequence data in a myriad
>
>> of ways. Bioperl
>
>> provides objects for various types of sequence data and their associated
>
>> features and
>
>> annotations. It provides interfaces for analysis of these sequences with
>
>> a wide variety
>
>> of external programs (BLAST, fasta, clustalw and EMBOSS to name just a
>
>> few). It provides
>
>> interfaces to various types of databases both remote (GenBank, EMBL etc)
>
>> and local
>
>> (MySQL, flat files, GFF etc.) for storage and retrieval of sequences.
>
>> And finally with
>
>> it's associated documentation and mailing list Bioperl represents a
>
>> community of
>
>> bioinformatics professionals working in Perl who are committed to
>
>> supporting both
>
>> development of Bioperl and the new users who are drawn to the project.
>
>>
>
>> While most bioinformatics and computational biology applications are
>
>> developed in
>
>> Unix/Linux environments, more and more programs are being ported to
>
>> other operating
>
>> systems like Windows, and many users (often biologists with little
>
>> background in
>
>> programming) are looking for ways to automate bioinformatics analyses in
>
>> the Windows
>
>> environment. Perl and Bioperl can be installed natively on Windows
>
>> NT/2000/XP. Most of
>
>> the functionality of Bioperl is available with this type of install.
>
>> Much of the heavy
>
>> lifting in bioinformatics is done by programs originally developed in
>
>> lower level
>
>> languages like C and Pascal (e.g. BLAST, clustalw, Staden etc). Bioperl
>
>> simply acts as a
>
>> wrapper for running and parsing output from these external programs.
>
>> Some of those
>
>> programs (BLAST for example) are ported to Windows. These can be
>
>> installed and work
>
>> quite happily with BioPerl in the native Windows environment. Others,
>
>> such as clustalw,
>
>> have Windows ports, however the BioPerl developer who wrote the
>
>> interface used Unix
>
>> specific system calls to interact with these programs and so these
>
>> wrappers will not work
>
>> in the Windows environment. And finally some external programs such as
>
>> Staden and the
>
>> EMBOSS suite of programs can not be installed on Windows at all, and
>
>> therefore any part
>
>> of Bioperl that interacts with these packages either won't work or can't
>
>> be installed at
>
>> all.
>
>>
>
>> If you have a fairly simple project in mind, want to start using Bioperl
>
>> quickly, only
>
>> have access to a computer running Windows, and/or don't mind bumping up
>
>> against some
>
>> limitations then Bioperl on Windows may be a good place for you to
>
>> start. For example,
>
>> downloading a bunch of sequences from GenBank and sorting out the ones
>
>> that have a
>
>> particular annotation or feature works great. Running a bunch of your
>
>> sequences against
>
>> remote or local BLAST, parsing the output and storing it in a MySQL
>
>> database would be
>
>> fine also. Be aware that most if not all of the Bioperl developers are
>
>> working in some
>
>> type of a Unix environment (Linux, OSX, Cygwin). If you have problems
>
>> with Bioperl that
>
>> are specific to the Windows environment, you may be blazing new ground
>
>> and your pleas for
>
>> help on the Bioperl mailing list may get few responses - simply because
>
>> no one knows the
>
>> answer to your Windows specific problem. If this is or becomes a problem
>
>> for you then
>
>> you are better off working in some type of Unix like environment. One
>
>> solution to this
>
>> problem that will keep you working on a Windows machine it to install
>
>> Cygwin, a Unix
>
>> emulation environment for Windows. A number of Bioperl users are using
>
>> this approach
>
>> successfully and it is discussed more below.
>
>>
>
>> 3) Perl on Windows
>
>> ===================
>
>>
>
>> There are a couple of ways of installing Perl on a Windows machine. The
>
>> most common and
>
>> easiest is to get the most recent build from ActiveState. ActiveState is
>
>> a software
>
>> company (http://www.activestate.com) that provides free builds of Perl
>
>> for Windows
>
>> users. The current (December 2004) build is ActivePerl 5.8.4.810
>
>> (ActivePerl 5.6.1.638
>
>> is also available and should work just fine). To install ActivePerl on
>
>> Windows:
>
>> Download the ActivePerl MSI from
>
>> http://www.activestate.com/Products/ActivePerl/
>
>> Run the ActivePerl Installer (accepting all defaults is fine).
>
>>
>
>> You can also build Perl yourself (which requires a C compiler) or
>
>> download one of the
>
>> other binary distributions. The Perl source for building it yourself is
>
>> available from
>
>> CPAN (http://www.cpan.org), as are a few other binary distributions that
>
>> are alternatives
>
>> to ActiveState. This approach is not recommended unless you have
>
>> specific reasons for
>
>> doing so and know what you're doing. It that's the case you probably
>
>> don't need to be
>
>> reading this guide.
>
>>
>
>> Cygwin is a Unix emulation environment for Windows and comes with its
>
>> own copy of Perl.
>
>> Information on Cygwin and Bioperl is found below.
>
>>
>
>> 4) BioPerl on Windows
>
>> ======================
>
>>
>
>> Perl is a programming language that has been extended a lot by the
>
>> addition of external
>
>> modules. These modules work with the core language to extend the
>
>> functionality of Perl.
>
>> Bioperl is one such extension to Perl. These modular extensions to Perl
>
>> sometimes depend
>
>> on the functionality of other Perl modules and this creates a
>
>> dependency. You can't
>
>> install module X unless you have already installed module Y. Some Perl
>
>> modules are so
>
>> fundamentally useful that the Perl developers have included them in the
>
>> core distribution
>
>> of Perl - if you've installed Perl then these modules are already
>
>> installed. Other
>
>> modules are freely available from CPAN, but you'll have to install them
>
>> yourself if you
>
>> want to use them. BioPerl has such dependencies.
>
>>
>
>> Bioperl is actually a large collection of Perl modules (over 1000
>
>> currently) and these
>
>> modules are split into six groups. These six groups are:
>
>>
>
>> Bioperl Group Functions
>
>> -----------------------------------------------------------------
>
>> bioperl (the core) Most of the main functionality of Bioperl.
>
>> bioperl-run Wrappers to a lot of external programs.
>
>> bioperl-ext Interaction with some alignment functions
>
>> and the Staden package.
>
>> bioperl-db Using bioperl with BioSQL and local
>
>> relational databases.
>
>> bioperl-microarray Microarray specific functions.
>
>> biperl-gui Some preliminary work on a graphical user
>
>> interface to some Bioperl functions.
>
>>
>
>> The Bioperl core is what most new users will want to start with. Bioperl
>
>> 1.4 (the core)
>
>> and the Perl modules that it depends on can be easily installed with
>
>> ppm. PPM
>
>> (Programming Package Manager) is an ActivePerl utility for installing
>
>> Perl modules on
>
>> systems using ActivePerl. PPM will look online (you have to be connected
>
>> to the internet
>
>> of course) for files (these files end with .ppd) that tell it how to
>
>> install the modules
>
>> you want and what other modules your new modules depends on. It will
>
>> then download and
>
>> install your modules and all dependent modules for you. These .ppd files
>
>> are stored
>
>> online in ppm repositories. ActiveState maintains the largest ppm
>
>> repository and when
>
>> you installed ActivePerl ppm was installed with directions for using the
>
>> ActiveState
>
>> repositories. Unfortunately the ActiveState repositories are far from
>
>> complete and other
>
>> ActivePerl users maintain their own ppm repositories to fill in the
>
>> gaps. Installing
>
>> will require you to direct ppm to look in two new repositories. You do
>
>> this by opening a
>
>> Windows command prompt, typing ppm to start the ppm shell and then
>
>> typing the following
>
>> two commands:
>
>> ppm> rep add Bioperl http://bioperl.org/DIST
>
>> ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms
>
>>
>
>> Once ppm knows where to look for Bioperl and it's dependencies you
>
>> simply tell ppm to
>
>> install it. This is done with the command:
>
>> ppm> install Bioperl-1.4
>
>>
>
>> 5) Beyond the Core
>
>> ===================
>
>>
>
>> You may find that you want some of the features of other Bioperl groups
>
>> like bioperl-run
>
>> or bioperl-db. There are currently no ppm packages for installing these
>
>> parts of
>
>> Bioperl. You will have to install these manually from source. For this
>
>> you will need a
>
>> Windows version of the program make called nmake
>
>> (http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe).
>
>> You will
>
>> also want to have a willingness to experiment. You'll have to read the
>
>> installation
>
>> documents for each component that you want to install, and use nmake
>
>> where the
>
>> instructions call for make. You will have to determine from the
>
>> installation documents
>
>> what dependencies are required and you will have to get them, read there
>
>> documentation
>
>> and install them first. The details of this are beyond the scope of this
>
>> guide. Read
>
>> the documentation. Search Google. Try your best, and if you get stuck
>
>> consult with
>
>> other on the bioperl mailing list.
>
>>
>
>> 6) BioPerl and Cygwin
>
>> =====================
>
>>
>
>> Cygwin is a Unix emulator and shell environment available free at
>
>> www.cygwin.com. BioPerl
>
>> runs well within Cygwin. Some users claim that installation of Bioperl
>
>> is easier within
>
>> Cygwin than within Windows, but these may be users with Unix backgrounds.
>
>>
>
>> One advantage of using Bioperl in Cygwin is that all the external
>
>> modules are available
>
>> through CPAN, most if not all external programs can be installed and run
>
>> so many of the
>
>> limitation of Bioperl on Windows are circumvented.
>
>>
>
>> To get Bioperl running first install the basic Cygwin package as well as
>
>> the Cygwin Perl,
>
>> make, and gcc packages. Clicking the "View" button in the upper right of
>
>> the installer
>
>> enables you to see details on the various packages. Then follow the
>
>> BioPerl installation
>
>> instructions for Unix in BioPerl's INSTALL file.
>
>>
>
>> Note that expat comes with Cygwin (it's used by the module XML::Parser).
>
>>
>
>> One known issue is that DBD::mysql can be tricky to install in
>
>> Cygwin and this module is required for the bioperl-db, Biosql, and
>
>> bioperl-pipeline
>
>> external packages. Fortunately there's some good instructions online:
>
>> http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin.
>
>>
>
>> Also, set the environmental variable TMPDIR, programs like BLAST and
>
>> clustalw need a
>
>> place to create temporary files. e.g.:
>
>>
>
>> setenv TMPDIR e:/cygwin/tmp # csh, tcsh
>
>> export TMPDIR=e:/cygwin/tmp # sh, bash
>
>>
>
>> Note that this is not a syntax that Cygwin understands, which would be
>
>> something like
>
>> "/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects
>
>> on Windows.
>
>>
>
>> If this variable is not set correctly you'll see errors like this when
>
>> you run
>
>> Bio::Tools::Run::StandAloneBlast:
>
>>
>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>
>> MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory
>
>> STACK: Error::throw
>
>> ..........
>
>>
>
>> 7) Cygwin Tips
>
>> ===============
>
>>
>
>> The easiest way to install MySQL is to use the Windows binaries
>
>> available at
>
>> www.mysql.com. Note that Windows does not have sockets, so you need to
>
>> force the MySQL
>
>> connections to use TCP/IP instead. Do this by using the "-h" option from
>
>> the command-
>
>> line:
>
>>
>
>>  >mysql -h 127.0.0.1 -u blip -pblop biosql
>
>>
>
>> Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it
>
>> uses a host. For
>
>> example, if your databases are installed locally:
>
>>
>
>> alias mysql 'mysql -h 127.0.0.1'
>
>>
>
>> If you're trying to use some application or resource "outside" of Cygwin
>
>> and you're
>
>> having a problem remember that Cygwin's path syntax may not be the
>
>> correct one. Cygwin
>
>> understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when
>
>> referring to the E:
>
>> drive) but the external resource may want 'E:/cygwin/home/jacky'. So
>
>> your *rc files may
>
>> end up with paths written in these different syntaxes, depending.
>
>>
>
>> If you can, install Cygwin on a drive or partition that's
>
>> NTFS-formatted, not FAT32-
>
>> formatted. When you install Cygwin on a FAT32 partition you will not be
>
>> able to set
>
>> permissions and ownership correctly. In most situations this probably
>
>> won't make any
>
>> difference but there may be occasions where this is a problem.
>
>>
>
>> If you want to use BLAST we recommend that the Windows binary be
>
>> obtained from NCBI
>
>> (ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will
>
>> be named something
>
>> like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions
>
>> in README.bls.
>
>>
>
>> Although we've recommended using the BLAST and MySQL binaries you should
>
>> be able to
>
>> compile just about everything else from source code using Cygwin's gcc.
>
>> You'll notice
>
>> when you're installing Cygwin that many different libraries are also
>
>> available (gd, jpeg,
>
>> etc.).
>
>>
>
>> 8) Example Script
>
>> =================
>
>>
>
>> #!/usr/bin/perl
>
>>
>
>> #A short script to demonstrate how to download sequences from GenBank
>
>> and access
>
>> #the sequence and some associated annotations using Bioperl.
>
>>
>
>> use strict;
>
>> use warnings;
>
>> use Bio::SeqIO;
>
>> use Bio::DB::GenBank; #use Bio::DB::GenPept or Bio::DB::RefSeq if needed
>
>>
>
>> #Get some sequence IDs either like below, or read in from a file. 
> Note that
>
>> #this sample script works with the accession numbers below (at least at
>
>> the time
>
>> #it was written). If you add different accession numbers, and you get
>
>> errors,
>
>> #you may be calling for something that the sequence doesn't have. You'll
>
>> have
>
>> #to add your own error trapping code to handle that.
>
>> my @ids = ('K03160', 'AB039327', 'BC035972');
>
>>
>
>> #Create the GenBank database object to read from the database.
>
>> my $gb = new Bio::DB::GenBank();
>
>>
>
>> #Create a sequence stream to pass the sequences from the database to the
>
>> program.
>
>> my $seqio = $gb->get_Stream_by_id(\@ids);
>
>>
>
>> #Loop over all of the sequences that you requested.
>
>> while (my $seq = $seqio->next_seq) {
>
>>
>
>> #Here is how you get methods directly from the RichSeq object. Replace
>
>> #'display_name' with any other method in Table 2. that can be called on
>
>> #either the RichSeq object directly, or the PrimarySeq object which 
> it has
>
>> #inherited.
>
>> print "Display Name: ", $seq->display_name,"\n";
>
>> print "Sequence Date: ",$seq->get_dates,"\n";
>
>>
>
>> #Here is how to access the classification data from the species object.
>
>> my $species = $seq->species;
>
>> print "Species :", $species->common_name,"\n";
>
>> my @class = $species->classification;
>
>> print "Classification: @class\n";
>
>>
>
>> #Here is a general way to call things that are stored as a 
> Bio::SeqFeature::
>
>> #Generic object. Replace 'source' with any other of the "major" 
> headings in
>
>> #the feature table (e.g gene, CDS, etc.) and replace 'organism' with 
> any of
>
>> #the tag values found under that heading (mol_type, locus_tag, gene, 
> etc.)
>
>> my @source_feats = grep { $_->primary_tag eq 'source' }
>
>> $seq->get_SeqFeatures();
>
>> my $source_feat = shift @source_feats;
>
>> my @mol_type = $source_feat->get_tag_values('mol_type');
>
>> print "Molecule Type: @mol_type\n";
>
>>
>
>> #Here is a general way to call things that are stored as some type of a
>
>> #Bio::Annotation oject. This includes reference information, and 
> comments.
>
>> #Replace reference with 'comment' to get the comment, and replace
>
>> #$ref->authors with $ref->title (or location, medline, etc.) to get other
>
>> #reference categories
>
>> my $ann = $seq->annotation();
>
>> my @references = ($ann->get_Annotations('reference'));
>
>> my $ref = shift @references;
>
>> my ($title, $authors, $location, $pubmed, $reference);
>
>> if (defined $ref) {
>
>> $authors = $ref->authors;
>
>> print "Authors: $authors\n";
>
>> }
>
>> print "Sequence: \n", $seq->seq, "\n\n";
>
>> }
>
>>
>
>> --
>
>> Barry Moore
>
>> Dept. of Human Genetics
>
>> University of Utah
>
>> Salt Lake City, UT
>
>>
>
>> ---
>
>> avast! Antivirus: Inbound message clean.
>
>> Virus Database (VPS): 0450-0, 06/12/2004
>
>> Tested on: 09/12/2004 07:31:40
>
>> avast! is copyright (c) 2000-2003 ALWIL Software.
>
>> http://www.avast.com
>
>>
>
>>
>
>>
>
>> <<Installing_Bioperl_on_Windows.txt>>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>------------------------------------------------------------------------
>
>Installing Bioperl on Windows
>=============================
>
>1) Quick Instructions for the Impatient 
>2) Bioperl on Windows
>3) Perl on Windows
>4) BioPerl on Windows
>5) Beyond the Core
>6) BioPerl and Cygwin
>7) Cygwin Tips
>8) Example Script
>
>This installation guide was written by Barry Moore and other Bioperl authors based on the 
>original work of Paul Boutros. Please report problems and/or fixes to the bioperl mailing 
>list, bioperl-l at bioperl.org
>
>1) Quick instructions for the impatient, lucky, or experienced user.
>=====================================================================
>
>Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/
>Run the ActivePerl Installer (accepting all defaults is fine).
>Open a command prompt (Menus Start->Run and type cmd) and run the PPM shell (C:\>ppm).
>Add two new PPM repositories with the following commands:
>	PPM> rep add Bioperl http://bioperl.org/DIST
>	PPM> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms
>Install Bioperl with the following command:
>	PPM> install Bioperl
>Go to http://www.bioperl.org and start reading documentation or try the example script at 
>the end of this file.
>
>
>2) Bioperl on Windows
>======================
>
>Bioperl is a large collection of Perl modules (extensions to the Perl language) that aid 
>in the task of writing Perl code to deal with sequence data in a myriad of ways.  Bioperl 
>provides objects for various types of sequence data and their associated features and 
>annotations.  It provides interfaces for analysis of these sequences with a wide variety 
>of external programs (BLAST, fasta, clustalw and EMBOSS to name just a few).  It provides 
>interfaces to various types of databases both remote (GenBank, EMBL etc) and local 
>(MySQL, flat files, GFF etc.) for storage and retrieval of sequences.  And finally with 
>it's associated documentation and mailing list Bioperl represents a community of 
>bioinformatics professionals working in Perl who are committed to supporting both 
>development of Bioperl and the new users who are drawn to the project.
>
>While most bioinformatics and computational biology applications are developed in 
>Unix/Linux environments, more and more programs are being ported to other operating 
>systems like Windows, and many users (often biologists with little background in 
>programming) are looking for ways to automate bioinformatics analyses in the Windows 
>environment.  Perl and Bioperl can be installed natively on Windows NT/2000/XP.  Most of 
>the functionality of Bioperl is available with this type of install.  Much of the heavy 
>lifting in bioinformatics is done by programs originally developed in lower level 
>languages like C and Pascal (e.g. BLAST, clustalw, Staden etc).  Bioperl simply acts as a 
>wrapper for running and parsing output from these external programs.  Some of those 
>programs (BLAST for example) are ported to Windows.  These can be installed and work 
>quite happily with BioPerl in the native Windows environment.  Others, such as clustalw, 
>have Windows ports, however the BioPerl developer who wrote the interface used Unix 
>specific system calls to interact with these programs and so these wrappers will not work 
>in the Windows environment.  And finally some external programs such as Staden and the 
>EMBOSS suite of programs can not be installed on Windows at all, and therefore any part 
>of Bioperl that interacts with these packages either won't work or can't be installed at 
>all.
>
>If you have a fairly simple project in mind, want to start using Bioperl quickly, only 
>have access to a computer running Windows, and/or don't mind bumping up against some 
>limitations then Bioperl on Windows may be a good place for you to start.  For example, 
>downloading a bunch of sequences from GenBank and sorting out the ones that have a 
>particular annotation or feature works great.  Running a bunch of your sequences against 
>remote or local BLAST, parsing the output and storing it in a MySQL database would be 
>fine also.  Be aware that most if not all of the Bioperl developers are working in some 
>type of a Unix environment (Linux, OSX, Cygwin).  If you have problems with Bioperl that 
>are specific to the Windows environment, you may be blazing new ground and your pleas for 
>help on the Bioperl mailing list may get few responses - simply because no one knows the 
>answer to your Windows specific problem.  If this is or becomes a problem for you then 
>you are better off working in some type of Unix like environment.  One solution to this 
>problem that will keep you working on a Windows machine it to install Cygwin, a Unix 
>emulation environment for Windows.  A number of Bioperl users are using this approach 
>successfully and it is discussed more below.
>
>3) Perl on Windows
>===================
>
>There are a couple of ways of installing Perl on a Windows machine.  The most common and 
>easiest is to get the most recent build from ActiveState.  ActiveState is a software 
>company (http://www.activestate.com) that  provides free builds of Perl for Windows 
>users.  The current  (December 2004) build is ActivePerl 5.8.4.810 (ActivePerl 5.6.1.638   
>is also available and should work just fine).  To install ActivePerl on Windows:
>	Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/
>	Run the ActivePerl Installer (accepting all defaults is fine).  
>
>You can also build Perl yourself (which requires a C compiler) or download one of the 
>other binary distributions.  The Perl source for building it yourself is available from 
>CPAN (http://www.cpan.org), as are a few other binary distributions that are alternatives 
>to ActiveState.  This approach is not recommended unless you have specific reasons for 
>doing so and know what you're doing.  If that's the case you probably don't need to be 
>reading this guide.
>
>Cygwin is a Unix emulation environment for Windows and comes with its own copy of Perl.  
>Information on Cygwin and Bioperl is found below.
>
>4) BioPerl on Windows
>======================
>
>Perl is a programming language that has been extended a lot by the addition of external 
>modules.  These modules work with the core language to extend the functionality of Perl.  
>Bioperl is one such extension to Perl.  These modular extensions to Perl sometimes depend 
>on the functionality of other Perl modules and this creates a dependency.  You can't 
>install module X unless you have already installed module Y.  Some Perl modules are so 
>fundamentally useful that the Perl developers have included them in the core distribution 
>of Perl - if you've installed Perl then these modules are already installed.  Other 
>modules are freely available from CPAN, but you'll have to install them yourself if you 
>want to use them.  BioPerl has such dependencies.
>
>Bioperl is actually a large collection of Perl modules (over 1000 currently) and these 
>modules are split into six groups.  These six groups are:
>
>	Bioperl Group                         Functions
>	-----------------------------------------------------------------
>      bioperl (the core)        Most of the main functionality of Bioperl.
>      bioperl-run               Wrappers to a lot of external programs.
>      bioperl-ext               Interaction with some alignment functions
>                                and the Staden package.
>      bioperl-db                Using bioperl with BioSQL and local
>                                relational databases.
>      bioperl-microarray        Microarray specific functions.
>      biperl-gui                Some preliminary work on a graphical user
>                                interface to some Bioperl functions.
>
>The Bioperl core is what most new users will want to start with.  Bioperl (the core) 
>and the Perl modules that it depends on can be easily installed with PPM.  PPM 
>(Programmer's Package Manager formally known as the Perl Package Manager) is an ActivePerl
>utility for installing Perl modules on systems using ActivePerl.  PPM will look online
>(you have to be connected to the internet of course) for files (these files end with .ppd)
>that tell it how to install the modules you want and what other modules your new modules
>depends on.  It will then download and install your modules and all dependent modules for
>you.  These .ppd files are stored online in PPM repositories.  ActiveState maintains the
>largest PPM repository and when you installed ActivePerl PPM was installed with directions
>for using the ActiveState repositories.  Unfortunately the ActiveState repositories are
>far from complete and other ActivePerl users maintain their own PPM repositories to fill
>in the gaps.  Installing will require you to direct PPM to look in two new repositories.
>You do this by opening a Windows command prompt, typing ppm to start the PPM shell and
>then typing the following two commands:
>      PPM> rep add Bioperl http://bioperl.org/DIST
>      PPM> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms
>
>Once PPM knows where to look for Bioperl and it's dependencies you simply tell PPM to 
>install it.  This is done with the command:
>      PPM> install Bioperl
>
>5) Beyond the Core
>===================
>
>You may find that you want some of the features of other Bioperl groups like bioperl-run 
>or bioperl-db.  There are currently no PPM packages for installing these parts of 
>Bioperl (but check this by doing a Bioperl search at the PPM shell):
>	PPM> search bioperl
>
>If they are not present, you will have to install these manually from source.  For this
>you will need a Windows version of the program make called nmake 
>(http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe).  You will 
>also want to have a willingness to experiment.  You'll have to read the installation 
>documents for each component that you want to install, and use nmake where the 
>instructions call for make.  You will have to determine from the installation documents 
>what dependencies are required and you will have to get them, read there documentation 
>and install them first.  The details of this are beyond the scope of this guide.  Read 
>the documentation.  Search Google.  Try your best, and if you get stuck consult with 
>others on the bioperl mailing list.
>
>6) BioPerl and Cygwin
>=====================
>
>Cygwin is a Unix emulator and shell environment available free at www.cygwin.com. BioPerl 
>runs well within Cygwin. Some users claim that installation of Bioperl is easier within 
>Cygwin than within Windows, but these may be users with Unix backgrounds.
>
>One advantage of using Bioperl in Cygwin is that all the external modules are available 
>through CPAN, most if not all external programs can be installed and run so many of the 
>limitation of Bioperl on Windows are circumvented.
>
>To get Bioperl running first install the basic Cygwin package as well as the Cygwin Perl, 
>make, and gcc packages. Clicking the "View" button in the upper right of the installer 
>enables you to see details on the various packages. Then follow the BioPerl installation 
>instructions for Unix in BioPerl's INSTALL file.
>
>Note that expat comes with Cygwin (it's used by the module XML::Parser).
>
>One known issue is that DBD::mysql can be tricky to install in
>Cygwin and this module is required for the bioperl-db, Biosql, and bioperl-pipeline 
>external packages. Fortunately there's some good instructions online: 
>http://search.cpan.org/src/JWIED/DBD-mysql-2.1025/INSTALL.html#windows/cygwin.
>
>Also, set the environmental variable TMPDIR, programs like BLAST and clustalw need a 
>place to create temporary files. e.g.:
>
>setenv TMPDIR e:/cygwin/tmp     # csh, tcsh
>export TMPDIR=e:/cygwin/tmp     # sh, bash
>
>Note that this is not a syntax that Cygwin understands, which would be something like 
>"/cygdrive/e/cygwin/tmp". This is the syntax that a Perl module expects on Windows.
>
>If this variable is not set correctly you'll see errors like this when you run 
>Bio::Tools::Run::StandAloneBlast:
>
>------------- EXCEPTION: Bio::Root::Exception -------------
>MSG: Could not open /tmp/gXkwEbrL0a: No such file or directory
>STACK: Error::throw
>..........
>
>7) Cygwin Tips
>===============
>
>The easiest way to install MySQL is to use the Windows binaries available at 
>www.mysql.com. Note that Windows does not have sockets, so you need to force the MySQL 
>connections to use TCP/IP instead. Do this by using the "-h" option from the command-
>line:
>
>  
>
>>mysql -h 127.0.0.1 -u blip -pblop biosql
>>    
>>
>
>Or, alias the mysql command in your .tcshrc, .cshrc, or .bashrc so it uses a host. For 
>example, if your databases are installed locally:
>
>alias mysql 'mysql -h 127.0.0.1'
>
>If you're trying to use some application or resource "outside" of Cygwin and you're 
>having a problem remember that Cygwin's path syntax may not be the correct one. Cygwin 
>understands '/home/jacky' or '/cygdrive/e/cygwin/home/jacky' (when referring to the E: 
>drive) but the external resource may want 'E:/cygwin/home/jacky'. So your *rc files may 
>end up with paths written in these different syntaxes, depending.
>
>If you can, install Cygwin on a drive or partition that's NTFS-formatted, not FAT32-
>formatted. When you install Cygwin on a FAT32 partition you will not be able to set 
>permissions and ownership correctly. In most situations this probably won't make any 
>difference but there may be occasions where this is a problem.
>
>If you want to use BLAST we recommend that the Windows binary be obtained from NCBI 
>(ftp://ftp.ncbi.nih.gov/blast/executables/LATEST-BLAST - the file will be named something 
>like blast-2.2.6-ia32-win32.exe). Then follow the Windows instructions in README.bls.
>
>Although we've recommended using the BLAST and MySQL binaries you should be able to 
>compile just about everything else from source code using Cygwin's gcc. You'll notice 
>when you're installing Cygwin that many different libraries are also available (gd, jpeg, 
>etc.).
>
>8) Example Script
>=================
>
>#!/usr/bin/perl
>
>#A short script to demonstrate how to download sequences from GenBank and access
>#the sequence and some associated annotations using Bioperl.
>
>use strict;
>use warnings;
>use Bio::SeqIO;
>use Bio::DB::GenBank; #use Bio::DB::GenPept or Bio::DB::RefSeq if needed
>
>#Get some sequence IDs either like below, or read in from a file.  Note that
>#this sample script works with the accession numbers below (at least at the time
>#it was written).  If you add different accession numbers, and you get errors,
>#you may be calling for something that the sequence doesn't have.  You'll have
>#to add your own error trapping code to handle that.
>my @ids = ('K03160', 'AB039327', 'BC035972');
>
>#Create the GenBank database object to read from the database.
>my $gb = new Bio::DB::GenBank();
>
>#Create a sequence stream to pass the sequences from the database to the program.
>my $seqio = $gb->get_Stream_by_id(\@ids);
>
>#Loop over all of the sequences that you requested.
>while (my $seq = $seqio->next_seq) {
>
>  #Here is how you get methods directly from the RichSeq object.  Replace
>  #'display_name' with any other method in Table 2. that can be called on
>  #either the RichSeq object directly, or the PrimarySeq object which it has
>  #inherited.
>  print "Display Name:  ", $seq->display_name,"\n";
>  print "Sequence Date:  ",$seq->get_dates,"\n";
>
>  #Here is how to access the classification data from the species object.
>  my $species = $seq->species;
>  print "Species  :", $species->common_name,"\n";
>  my @class = $species->classification;
>  print "Classification:  @class\n";
>
>  #Here is a general way to call things that are stored as a Bio::SeqFeature::
>  #Generic object.  Replace 'source' with any other of the "major" headings in
>  #the feature table (e.g gene, CDS, etc.) and replace 'organism' with any of
>  #the tag values found under that heading (mol_type, locus_tag, gene, etc.)
>  my @source_feats = grep { $_->primary_tag eq 'source' } $seq->get_SeqFeatures();
>  my $source_feat = shift @source_feats;
>  my @mol_type = $source_feat->get_tag_values('mol_type');
>  print "Molecule Type:  @mol_type\n";
>  
>  #Here is a general way to call things that are stored as some type of a
>  #Bio::Annotation oject.  This includes reference information, and comments.
>  #Replace reference with 'comment' to get the comment, and replace
>  #$ref->authors with $ref->title (or location, medline, etc.) to get other
>  #reference categories
>  my $ann = $seq->annotation();
>  my @references = ($ann->get_Annotations('reference'));
>  my $ref = shift @references;
>  my ($title, $authors, $location, $pubmed, $reference);
>  if (defined $ref) {
>    $authors = $ref->authors;
>    print "Authors:  $authors\n";
>  }
>  print "Sequence:  \n", $seq->seq, "\n\n";
>}
>  
>

-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT



More information about the Bioperl-l mailing list