[Bioperl-l] Bioperl-l Digest, Vol 96, Issue 28
khush ........
bioinfo.khush at gmail.com
Mon May 2 06:02:28 UTC 2011
Dear Florent,
Thanks for your reply. Yes its clustalw, but I have clustalw2 and clustalx
installed on my fc13 machine. I am not sure where to set the path for the
same. I have some 400 nucleotide sequences for which I have to do the
analysis i.e is y I found this script useful to me.
help me...
Thank you
Kamak
On Sun, May 1, 2011 at 4:49 AM, Florent Angly <florent.angly at gmail.com>wrote:
> Kamal,
> It looks like you have a typo somewhere: what is 'clustaw'? You probably
> mean 'clustalw'.
> Florent
>
>
>
> On 29/04/11 16:34, khush ........ wrote:
>
>> Dear,
>>
>> I am trying to calculate the Ka/ks ratio of my aligned sequences by
>> clustalx
>> and for the same I am using
>>
>> So I am using the the scrip given at
>> https://github.com/bioperl/bioperl-live/blob/master/scripts/utilitind the
>> executable forind the executable fories/pairwise_kaks.PLS<https://github.com/bioperl/bioperl-live/blob/master/scripts/utilities/pairwise_kaks.PLS>
>>
>> when I am trying to run the It alert me to chage the line
>>
>> "warn("Could not find the executable f $aln_prog, make sure you have
>> installed it and have either set ".uc($aln_prog)."DIR or it is in your
>> PATH");"
>>
>> "Could not find the executable for clustaw, make sure you have installed
>> it
>> and have either set CLUSTAWDIR or it is in your PATH at kaks.pl line 52."
>>
>> I have clustalw2 and clustalx installed on my system. How to and where to
>> set the path for the same and how to calculate the Ka/Ks raio for my
>> sequences.
>>
>> Thank you
>> Kamal
>>
>>
>>
>>
>>
>>
>> On Fri, Apr 29, 2011 at 11:16 AM,<bioperl-l-request at lists.open-bio.org
>> >wrote:
>>
>> Send Bioperl-l mailing list submissions to
>>> bioperl-l at lists.open-bio.org
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> or, via email, send a message with subject or body 'help' to
>>> bioperl-l-request at lists.open-bio.org
>>>
>>> You can reach the person managing the list at
>>> bioperl-l-owner at lists.open-bio.org
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of Bioperl-l digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>> 1. Re: GSoC/BioPerl Reorganization Project (Sheena Scroggins)
>>> 2. Re: GSoC/BioPerl Reorganization Project (Chris Fields)
>>> 3. Re: GSoC/BioPerl Reorganization Project (Robert Buels)
>>> 4. Re: GSoC/BioPerl Reorganization Project (Siddhartha Basu)
>>> 5. Re: Standalone blast (khush ........)
>>> 6. Re: GSoC/BioPerl Reorganization Project (Robert Buels)
>>> 7. Re: Standalone blast (Florent Angly)
>>> 8. Re: Standalone blast (khush ........)
>>>
>>>
>>> ----------------------------------------------------------------------
>>>
>>> Message: 1
>>> Date: Thu, 28 Apr 2011 12:53:49 -0700
>>> From: Sheena Scroggins<sheena.scroggins at gmail.com>
>>> Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project
>>> To: Chris Fields<cjfields at illinois.edu>
>>> Cc: bioperl-l at lists.open-bio.org
>>> Message-ID:<BANLkTimee8HidYyh6wRY15LdsqdL5KrEuA at mail.gmail.com>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>> Chris,
>>>
>>> We haven't talked much about the versioning yet, but it will be on the
>>> list
>>> to figure out asap.
>>>
>>> So far, the plan is to split out Bio::Root first, followed by a couple
>>> modules that depend only on Bio::Root. The plan I proposed was Bio::Das,
>>> Bio::Event then Bio::Location. Depending on how much time is remaining
>>> for
>>> the GSoC project, the next to split out would be Bio::Factory and
>>> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I
>>> plan
>>> to still help with the reorganization after the internship is over, but I
>>> obviously have to have a stopping point for the GSoC project.
>>>
>>> Rob provide me with a really nice scrip to list dependencies of the
>>> modules,
>>> so I plan to make a roadmap towards to end of the summer that will help
>>> guide the rest of the reorganization. At that point, we'll have to deal
>>> with
>>> the circular dependencies carefully.
>>>
>>> This is a huge project, much bigger than I can do in one summer. But I
>>> plan
>>> to get it started in a way that makes it easy for others to contribute.
>>>
>>> Sheena
>>>
>>>
>>> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields<cjfields at illinois.edu
>>>
>>>> wrote:
>>>> Sheena,
>>>>
>>>> Congrats on being accepted! We've talked about doing this over the
>>>> years,
>>>> but it's not an easy task and it needs a dedicated project to get the
>>>>
>>> ball
>>>
>>>> rolling, so to speak. Hopefully this isn't tl;dr. I'll start off with
>>>> a
>>>> few of my questions/thoughts (Rob could probably chime in as well, but I
>>>> think his general thoughts on the project parallel mine):
>>>>
>>>> 1) The current BioPerl CPAN could just be a simple install script,
>>>> acting
>>>> like a 'Task' or 'Bundle' module, installing the actual Bio-specific
>>>> distributions. Doing it this way would allow you to iteratively split
>>>>
>>> off
>>>
>>>> additional code but retain the original Task/Bundle-based approach to
>>>> installation. For instance, the first pass could split out Root, then
>>>>
>>> have
>>>
>>>> a dependency-light and 'extras' distribution, 2nd round split further
>>>>
>>> based
>>>
>>>> on function, and so on:
>>>>
>>>> 1st round (v 1.9) : BioPerl (just an installer) -> installs root,
>>>> min-deps, extra-deps
>>>> 2nd round (v 1.901) : BioPerl (just an installer) -> root,
>>>> seq/feature,
>>>> other-min-deps, extra-deps
>>>> ...
>>>> Xth round (v 1.99) : BioPerl (just an installer) -> root, tools,
>>>> seq,
>>>> tree, align, coord, map, everything-else
>>>> ...
>>>>
>>>> Also, one could potentially install modules in various ways:
>>>>
>>> interactively,
>>>
>>>> in predetermined groups, using a user-defined list, etc (one could
>>>> effectively create custom BioPerl installs for GBrowse or other tools
>>>> for
>>>> instance). Of course I would only pick the easiest route to start, but
>>>> maybe that gives some ideas. Regardless, if the dependency tree is set
>>>>
>>> up
>>>
>>>> correctly any reliance on other Bio* modules would be defined in the
>>>>
>>> various
>>>
>>>> Build.PL/Makefile.PL and then installed via CPAN (as is any dependency).
>>>>
>>>> 2) The Bio::Root modules are probably the true core modules and are the
>>>> most stable with regards to changes, so those could be moved to
>>>> something
>>>> like BioPerl-Core. Beyond that, what are the proposed splits? (we've
>>>> discussed this on-list before, but it's appropriate to bring this up
>>>>
>>> again)
>>>
>>>> 3) How do we want to handle versioning? We can't (and probably
>>>>
>>> shouldn't)
>>>
>>>> release everything on a synchronized versioning scheme (via
>>>> Bio::Root::Version, for instance), that'll quickly fall apart.
>>>>
>>> Personally I
>>>
>>>> can foresee each split-off dist having it's own version, with the
>>>> BioPerl
>>>> network of modules being in effect it's own mini-CPAN.
>>>>
>>>> 5) Related to versioning, in my opinion we should maybe aim on
>>>> eventually
>>>> calling this BioPerl v2.0 and starting with a simpler X.Y versioning
>>>>
>>> scheme.
>>>
>>>> Lincoln has already done something like this with Bio::Graphics, which
>>>>
>>> was
>>>
>>>> originally part of BioPerl but split off prior to v 1.6.0.
>>>>
>>>> 6) In some cases I can see particularly thorny problems, such as
>>>> circular
>>>> dependencies. I can think of a few ways to address that (creating a
>>>>
>>> simple
>>>
>>>> lightweight Bio::Species class as a fallback if Bio::Tree code isn't
>>>> present, for instance), but any additional thoughts on this would be
>>>> helpful.
>>>>
>>>> 7) Do we want to set up something like 'git submodule' for the devs to
>>>>
>>> pull
>>>
>>>> down all BioPerl-relevant code?
>>>>
>>>> Other thoughts?
>>>>
>>>> chris
>>>>
>>>> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote:
>>>>
>>>> Hey everyone,
>>>>>
>>>>> I wanted to take a minute to introduce myself as one of the Google
>>>>>
>>>> Summer
>>>
>>>> of
>>>>
>>>>> Code interns. I was the lucky one chosen to work on the BioPerl
>>>>> Reorganization (*crowd cheers*). I am a grad student in bioinformatics,
>>>>>
>>>> and
>>>>
>>>>> somewhat new to this level of programming so bear with me as I learn
>>>>>
>>>> the
>>>
>>>> technical jargon. Luckily I have both Rob and Chris to mentor me this
>>>>> summer!
>>>>>
>>>>> Reading through the mailing list archives, I see there have been many
>>>>> discussion and differing opinions about tackling this project. Given
>>>>>
>>>> the
>>>
>>>> time frame for GSoC and my limited experience, there is no way I will
>>>>> complete this project on my own but I will at least be able to start
>>>>>
>>>> it,
>>>
>>>> which will hopefully motivate others to pitch in. So far, the plan for
>>>>>
>>>> the
>>>>
>>>>> GSoC project is to start by breaking out Bio::Root, followed by a
>>>>>
>>>> couple
>>>
>>>> other modules based on their dependencies and the time allowed. Each
>>>>>
>>>> will
>>>
>>>> be
>>>>
>>>>> published to CPAN independently. You can follow the project (once it
>>>>>
>>>> starts)
>>>>
>>>>> on github at https://github.com/sheenams.
>>>>>
>>>>> I look forward to collaborating with many of you on the reorganization
>>>>>
>>>> (hint
>>>>
>>>>> hint)!
>>>>>
>>>>> Sheena
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>>
>>> ------------------------------
>>>
>>> Message: 2
>>> Date: Thu, 28 Apr 2011 16:04:51 -0500
>>> From: Chris Fields<cjfields at illinois.edu>
>>> Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project
>>> To: Sheena Scroggins<sheena.scroggins at gmail.com>
>>> Cc: BioPerl List<bioperl-l at lists.open-bio.org>, Robert Buels
>>> <rmb32 at cornell.edu>
>>> Message-ID:<1FF62DC3-941A-4DCB-8464-89D220E4A9C5 at illinois.edu>
>>> Content-Type: text/plain; charset="us-ascii"
>>>
>>> Sounds fine; I think (as you indicate) we can deal with issues along the
>>> way. Rob, anything to add?
>>>
>>> chris
>>>
>>> On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote:
>>>
>>> Chris,
>>>>
>>>> We haven't talked much about the versioning yet, but it will be on the
>>>>
>>> list to figure out asap.
>>>
>>>> So far, the plan is to split out Bio::Root first, followed by a couple
>>>>
>>> modules that depend only on Bio::Root. The plan I proposed was Bio::Das,
>>> Bio::Event then Bio::Location. Depending on how much time is remaining
>>> for
>>> the GSoC project, the next to split out would be Bio::Factory and
>>> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I
>>> plan
>>> to still help with the reorganization after the internship is over, but I
>>> obviously have to have a stopping point for the GSoC project.
>>>
>>>> Rob provide me with a really nice scrip to list dependencies of the
>>>>
>>> modules, so I plan to make a roadmap towards to end of the summer that
>>> will
>>> help guide the rest of the reorganization. At that point, we'll have to
>>> deal
>>> with the circular dependencies carefully.
>>>
>>>> This is a huge project, much bigger than I can do in one summer. But I
>>>>
>>> plan to get it started in a way that makes it easy for others to
>>> contribute.
>>>
>>>> Sheena
>>>>
>>>>
>>>> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields<cjfields at illinois.edu>
>>>>
>>> wrote:
>>>
>>>> Sheena,
>>>>
>>>> Congrats on being accepted! We've talked about doing this over the
>>>> years,
>>>>
>>> but it's not an easy task and it needs a dedicated project to get the
>>> ball
>>> rolling, so to speak. Hopefully this isn't tl;dr. I'll start off with a
>>> few of my questions/thoughts (Rob could probably chime in as well, but I
>>> think his general thoughts on the project parallel mine):
>>>
>>>> 1) The current BioPerl CPAN could just be a simple install script,
>>>> acting
>>>>
>>> like a 'Task' or 'Bundle' module, installing the actual Bio-specific
>>> distributions. Doing it this way would allow you to iteratively split
>>> off
>>> additional code but retain the original Task/Bundle-based approach to
>>> installation. For instance, the first pass could split out Root, then
>>> have
>>> a dependency-light and 'extras' distribution, 2nd round split further
>>> based
>>> on function, and so on:
>>>
>>>> 1st round (v 1.9) : BioPerl (just an installer) -> installs root,
>>>>
>>> min-deps, extra-deps
>>>
>>>> 2nd round (v 1.901) : BioPerl (just an installer) -> root,
>>>> seq/feature,
>>>>
>>> other-min-deps, extra-deps
>>>
>>>> ...
>>>> Xth round (v 1.99) : BioPerl (just an installer) -> root, tools,
>>>> seq,
>>>>
>>> tree, align, coord, map, everything-else
>>>
>>>> ...
>>>>
>>>> Also, one could potentially install modules in various ways:
>>>>
>>> interactively, in predetermined groups, using a user-defined list, etc
>>> (one
>>> could effectively create custom BioPerl installs for GBrowse or other
>>> tools
>>> for instance). Of course I would only pick the easiest route to start,
>>> but
>>> maybe that gives some ideas. Regardless, if the dependency tree is set
>>> up
>>> correctly any reliance on other Bio* modules would be defined in the
>>> various
>>> Build.PL/Makefile.PL and then installed via CPAN (as is any dependency).
>>>
>>>> 2) The Bio::Root modules are probably the true core modules and are the
>>>>
>>> most stable with regards to changes, so those could be moved to something
>>> like BioPerl-Core. Beyond that, what are the proposed splits? (we've
>>> discussed this on-list before, but it's appropriate to bring this up
>>> again)
>>>
>>>> 3) How do we want to handle versioning? We can't (and probably
>>>>
>>> shouldn't) release everything on a synchronized versioning scheme (via
>>> Bio::Root::Version, for instance), that'll quickly fall apart.
>>> Personally I
>>> can foresee each split-off dist having it's own version, with the BioPerl
>>> network of modules being in effect it's own mini-CPAN.
>>>
>>>> 5) Related to versioning, in my opinion we should maybe aim on
>>>> eventually
>>>>
>>> calling this BioPerl v2.0 and starting with a simpler X.Y versioning
>>> scheme.
>>> Lincoln has already done something like this with Bio::Graphics, which
>>> was
>>> originally part of BioPerl but split off prior to v 1.6.0.
>>>
>>>> 6) In some cases I can see particularly thorny problems, such as
>>>> circular
>>>>
>>> dependencies. I can think of a few ways to address that (creating a
>>> simple
>>> lightweight Bio::Species class as a fallback if Bio::Tree code isn't
>>> present, for instance), but any additional thoughts on this would be
>>> helpful.
>>>
>>>> 7) Do we want to set up something like 'git submodule' for the devs to
>>>>
>>> pull down all BioPerl-relevant code?
>>>
>>>> Other thoughts?
>>>>
>>>> chris
>>>>
>>>> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote:
>>>>
>>>> Hey everyone,
>>>>>
>>>>> I wanted to take a minute to introduce myself as one of the Google
>>>>>
>>>> Summer of
>>>
>>>> Code interns. I was the lucky one chosen to work on the BioPerl
>>>>> Reorganization (*crowd cheers*). I am a grad student in bioinformatics,
>>>>>
>>>> and
>>>
>>>> somewhat new to this level of programming so bear with me as I learn
>>>>>
>>>> the
>>>
>>>> technical jargon. Luckily I have both Rob and Chris to mentor me this
>>>>> summer!
>>>>>
>>>>> Reading through the mailing list archives, I see there have been many
>>>>> discussion and differing opinions about tackling this project. Given
>>>>>
>>>> the
>>>
>>>> time frame for GSoC and my limited experience, there is no way I will
>>>>> complete this project on my own but I will at least be able to start
>>>>>
>>>> it,
>>>
>>>> which will hopefully motivate others to pitch in. So far, the plan for
>>>>>
>>>> the
>>>
>>>> GSoC project is to start by breaking out Bio::Root, followed by a
>>>>>
>>>> couple
>>>
>>>> other modules based on their dependencies and the time allowed. Each
>>>>>
>>>> will be
>>>
>>>> published to CPAN independently. You can follow the project (once it
>>>>>
>>>> starts)
>>>
>>>> on github at https://github.com/sheenams.
>>>>>
>>>>> I look forward to collaborating with many of you on the reorganization
>>>>>
>>>> (hint
>>>
>>>> hint)!
>>>>>
>>>>> Sheena
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>
>>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 3
>>> Date: Thu, 28 Apr 2011 16:19:51 -0700
>>> From: Robert Buels<rmb32 at cornell.edu>
>>> Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project
>>> To: Chris Fields<cjfields at illinois.edu>
>>> Cc: Sheena Scroggins<sheena.scroggins at gmail.com>, BioPerl List
>>> <bioperl-l at lists.open-bio.org>
>>> Message-ID:<4DB9F617.6070705 at cornell.edu>
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>
>>> I think you guys are on the right track, here are some slightly more
>>> detailed plans. I'll use Chris's subject numbering.
>>>
>>> 1,2,3,5.) I envision the splitting algorithm going like this:
>>>
>>> no strict; # this is pseudocode!
>>>
>>> my $split_count = 0;
>>> for $subsystem (qw( Bio::Root Bio::Das Bio::Event ... )) {
>>>
>>> - take $subsystem modules and tests out of bioperl-live
>>>
>>> (my $new_dist_name = $subsystem) =~ s/::/-/g;
>>> - extract $subsystem modules into new dist called
>>> $new_dist_name. Make sure all its tests pass, and write
>>> some more tests if necessary.
>>>
>>> - add dep on $subsystem to bioperl-live/Build.PL
>>>
>>> - push $new_dist_name and bioperl-live to CPAN.
>>> $new_dist_name has version '2.000', and bioperl-live has
>>> version "1.7.$split_count".
>>> }
>>>
>>> and then, at the end of this loop, bioperl-live will be
>>> nothing but a Build.PL and a couple of other things
>>> for backcompat, like Bio::Root::Version, Bio::Perl, etc.
>>>
>>> Important things to notice about this algorithm are that, at each
>>> step in the loop:
>>>
>>> a.) For users that install bioperl with CPAN,
>>> doing cpan 'Bio::Perl' or cpan 'Bio::Root::Version' will
>>> get you the same set of modules as before the split
>>> started, with the split-off modules at 2.000 versions, and
>>> the non-split-off ones at 1.7.x versions.
>>>
>>> b.) For users (not developers) that are git cloning
>>> bioperl-live, even though they are naughty (wink), they
>>> can do 'perl Build.PL; ./Build installdeps' to get the
>>> split-off modules, downloaded like any other CPAN
>>> dependency. There may be some lag before the split-off
>>> thing is downloadable from CPAN,
>>>
>>> c.) For BioPerl developers, unless they are working on a
>>> certain module, they should install the split-off modules
>>> from CPAN like everybody else, and git clone only the piece
>>> they are working on.
>>>
>>> d.) The version of bioperl-live keeps increasing by 0.001 with
>>> each split. The systems that are split off have a 2.x
>>> version number, each slightly different depending on when it
>>> was split off. After this point, their release schedules
>>> and version numbers are independent of eachother and of
>>> bioperl-live. For Bio::Perl and Bio::Root::Version, the
>>> things that stay in bioperl-live, installing the latest
>>> version will get you all the split-off modules.
>>>
>>>
>>> 6.) (thorny circular dependencies and stuff) Those will become quickly
>>> apparent as this process proceeds. They'll take some finesse and/or
>>> ruthlessness and/or hacking to get around. We'll burn those bridges as
>>> we come to them.
>>>
>>> 7.) (git submodules) Git submodules probably won't be necessary, since
>>> at each step in the process BioPerl devs can use ./Build installdeps or
>>> cpanm --installdeps . to install whatever the dependencies are for the
>>> piece they are working on, whether it's bioperl-live (in the case of a
>>> module that has not yet been split off), or one of the distributions
>>> that has already been split off (in which case their improvements will
>>> probably be releasable to CPAN immediately!).
>>>
>>> Lots of detail there. I tried to make it structured and easy to skim
>>> though. Thoughts?
>>>
>>> Rob
>>>
>>>
>>>
>>> On 04/28/2011 02:04 PM, Chris Fields wrote:
>>>
>>>> Sounds fine; I think (as you indicate) we can deal with issues along the
>>>>
>>> way. Rob, anything to add?
>>>
>>>> chris
>>>>
>>>> On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote:
>>>>
>>>> Chris,
>>>>>
>>>>> We haven't talked much about the versioning yet, but it will be on the
>>>>>
>>>> list to figure out asap.
>>>
>>>> So far, the plan is to split out Bio::Root first, followed by a couple
>>>>>
>>>> modules that depend only on Bio::Root. The plan I proposed was Bio::Das,
>>> Bio::Event then Bio::Location. Depending on how much time is remaining
>>> for
>>> the GSoC project, the next to split out would be Bio::Factory and
>>> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I
>>> plan
>>> to still help with the reorganization after the internship is over, but I
>>> obviously have to have a stopping point for the GSoC project.
>>>
>>>> Rob provide me with a really nice scrip to list dependencies of the
>>>>>
>>>> modules, so I plan to make a roadmap towards to end of the summer that
>>> will
>>> help guide the rest of the reorganization. At that point, we'll have to
>>> deal
>>> with the circular dependencies carefully.
>>>
>>>> This is a huge project, much bigger than I can do in one summer. But I
>>>>>
>>>> plan to get it started in a way that makes it easy for others to
>>> contribute.
>>>
>>>> Sheena
>>>>>
>>>>>
>>>>> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields<cjfields at illinois.edu>
>>>>>
>>>> wrote:
>>>
>>>> Sheena,
>>>>>
>>>>> Congrats on being accepted! We've talked about doing this over the
>>>>>
>>>> years, but it's not an easy task and it needs a dedicated project to get
>>> the
>>> ball rolling, so to speak. Hopefully this isn't tl;dr. I'll start off
>>> with
>>> a few of my questions/thoughts (Rob could probably chime in as well, but
>>> I
>>> think his general thoughts on the project parallel mine):
>>>
>>>> 1) The current BioPerl CPAN could just be a simple install script,
>>>>>
>>>> acting like a 'Task' or 'Bundle' module, installing the actual
>>> Bio-specific
>>> distributions. Doing it this way would allow you to iteratively split
>>> off
>>> additional code but retain the original Task/Bundle-based approach to
>>> installation. For instance, the first pass could split out Root, then
>>> have
>>> a dependency-light and 'extras' distribution, 2nd round split further
>>> based
>>> on function, and so on:
>>>
>>>> 1st round (v 1.9) : BioPerl (just an installer) -> installs root,
>>>>>
>>>> min-deps, extra-deps
>>>
>>>> 2nd round (v 1.901) : BioPerl (just an installer) -> root,
>>>>>
>>>> seq/feature, other-min-deps, extra-deps
>>>
>>>> ...
>>>>> Xth round (v 1.99) : BioPerl (just an installer) -> root, tools,
>>>>>
>>>> seq, tree, align, coord, map, everything-else
>>>
>>>> ...
>>>>>
>>>>> Also, one could potentially install modules in various ways:
>>>>>
>>>> interactively, in predetermined groups, using a user-defined list, etc
>>> (one
>>> could effectively create custom BioPerl installs for GBrowse or other
>>> tools
>>> for instance). Of course I would only pick the easiest route to start,
>>> but
>>> maybe that gives some ideas. Regardless, if the dependency tree is set
>>> up
>>> correctly any reliance on other Bio* modules would be defined in the
>>> various
>>> Build.PL/Makefile.PL and then installed via CPAN (as is any dependency).
>>>
>>>> 2) The Bio::Root modules are probably the true core modules and are the
>>>>>
>>>> most stable with regards to changes, so those could be moved to
>>> something
>>> like BioPerl-Core. Beyond that, what are the proposed splits? (we've
>>> discussed this on-list before, but it's appropriate to bring this up
>>> again)
>>>
>>>> 3) How do we want to handle versioning? We can't (and probably
>>>>>
>>>> shouldn't) release everything on a synchronized versioning scheme (via
>>> Bio::Root::Version, for instance), that'll quickly fall apart.
>>> Personally I
>>> can foresee each split-off dist having it's own version, with the BioPerl
>>> network of modules being in effect it's own mini-CPAN.
>>>
>>>> 5) Related to versioning, in my opinion we should maybe aim on
>>>>>
>>>> eventually calling this BioPerl v2.0 and starting with a simpler X.Y
>>> versioning scheme. Lincoln has already done something like this with
>>> Bio::Graphics, which was originally part of BioPerl but split off prior
>>> to v
>>> 1.6.0.
>>>
>>>> 6) In some cases I can see particularly thorny problems, such as
>>>>>
>>>> circular dependencies. I can think of a few ways to address that
>>> (creating
>>> a simple lightweight Bio::Species class as a fallback if Bio::Tree code
>>> isn't present, for instance), but any additional thoughts on this would
>>> be
>>> helpful.
>>>
>>>> 7) Do we want to set up something like 'git submodule' for the devs to
>>>>>
>>>> pull down all BioPerl-relevant code?
>>>
>>>> Other thoughts?
>>>>>
>>>>> chris
>>>>>
>>>>> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote:
>>>>>
>>>>> Hey everyone,
>>>>>>
>>>>>> I wanted to take a minute to introduce myself as one of the Google
>>>>>>
>>>>> Summer of
>>>
>>>> Code interns. I was the lucky one chosen to work on the BioPerl
>>>>>> Reorganization (*crowd cheers*). I am a grad student in
>>>>>> bioinformatics,
>>>>>>
>>>>> and
>>>
>>>> somewhat new to this level of programming so bear with me as I learn
>>>>>>
>>>>> the
>>>
>>>> technical jargon. Luckily I have both Rob and Chris to mentor me this
>>>>>> summer!
>>>>>>
>>>>>> Reading through the mailing list archives, I see there have been many
>>>>>> discussion and differing opinions about tackling this project. Given
>>>>>>
>>>>> the
>>>
>>>> time frame for GSoC and my limited experience, there is no way I will
>>>>>> complete this project on my own but I will at least be able to start
>>>>>>
>>>>> it,
>>>
>>>> which will hopefully motivate others to pitch in. So far, the plan for
>>>>>>
>>>>> the
>>>
>>>> GSoC project is to start by breaking out Bio::Root, followed by a
>>>>>>
>>>>> couple
>>>
>>>> other modules based on their dependencies and the time allowed. Each
>>>>>>
>>>>> will be
>>>
>>>> published to CPAN independently. You can follow the project (once it
>>>>>>
>>>>> starts)
>>>
>>>> on github at https://github.com/sheenams.
>>>>>>
>>>>>> I look forward to collaborating with many of you on the reorganization
>>>>>>
>>>>> (hint
>>>
>>>> hint)!
>>>>>>
>>>>>> Sheena
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> ------------------------------
>>>
>>> Message: 4
>>> Date: Thu, 28 Apr 2011 21:15:01 -0500
>>> From: Siddhartha Basu<sidd.basu at gmail.com>
>>> Subject: [Bioperl-l] Re: GSoC/BioPerl Reorganization Project
>>> To: bioperl-l at lists.open-bio.org
>>> Message-ID:<20110429021457.GA351 at Macintosh-235.local>
>>> Content-Type: text/plain; charset=us-ascii
>>>
>>> Hi Robert,
>>> At what point in flow the dependencies between the split modules will be
>>> added. Is there any particular order the split modules would be created.
>>> And how those split off modules will be released in CPAN, one by one as
>>> they being generated or all of them in a batch after which they will
>>> follow their release schedule.
>>>
>>> -siddhartha
>>>
>>>
>>>
>>> On Thu, 28 Apr 2011, Robert Buels wrote:
>>>
>>> I think you guys are on the right track, here are some slightly more
>>>> detailed plans. I'll use Chris's subject numbering.
>>>>
>>>> 1,2,3,5.) I envision the splitting algorithm going like this:
>>>>
>>>> no strict; # this is pseudocode!
>>>>
>>>> my $split_count = 0;
>>>> for $subsystem (qw( Bio::Root Bio::Das Bio::Event ... )) {
>>>>
>>>> - take $subsystem modules and tests out of bioperl-live
>>>>
>>>> (my $new_dist_name = $subsystem) =~ s/::/-/g;
>>>> - extract $subsystem modules into new dist called
>>>> $new_dist_name. Make sure all its tests pass, and write
>>>> some more tests if necessary.
>>>>
>>>> - add dep on $subsystem to bioperl-live/Build.PL
>>>>
>>>> - push $new_dist_name and bioperl-live to CPAN.
>>>> $new_dist_name has version '2.000', and bioperl-live has
>>>> version "1.7.$split_count".
>>>> }
>>>>
>>>> and then, at the end of this loop, bioperl-live will be
>>>> nothing but a Build.PL and a couple of other things
>>>> for backcompat, like Bio::Root::Version, Bio::Perl, etc.
>>>>
>>>> Important things to notice about this algorithm are that, at each
>>>> step in the loop:
>>>>
>>>> a.) For users that install bioperl with CPAN,
>>>> doing cpan 'Bio::Perl' or cpan 'Bio::Root::Version' will
>>>> get you the same set of modules as before the split
>>>> started, with the split-off modules at 2.000 versions, and
>>>> the non-split-off ones at 1.7.x versions.
>>>>
>>>> b.) For users (not developers) that are git cloning
>>>> bioperl-live, even though they are naughty (wink), they
>>>> can do 'perl Build.PL; ./Build installdeps' to get the
>>>> split-off modules, downloaded like any other CPAN
>>>> dependency. There may be some lag before the split-off
>>>> thing is downloadable from CPAN,
>>>>
>>>> c.) For BioPerl developers, unless they are working on a
>>>> certain module, they should install the split-off modules
>>>> from CPAN like everybody else, and git clone only the piece
>>>> they are working on.
>>>>
>>>> d.) The version of bioperl-live keeps increasing by 0.001 with
>>>> each split. The systems that are split off have a 2.x
>>>> version number, each slightly different depending on when it
>>>> was split off. After this point, their release schedules
>>>> and version numbers are independent of eachother and of
>>>> bioperl-live. For Bio::Perl and Bio::Root::Version, the
>>>> things that stay in bioperl-live, installing the latest
>>>> version will get you all the split-off modules.
>>>>
>>>>
>>>> 6.) (thorny circular dependencies and stuff) Those will become quickly
>>>> apparent as this process proceeds. They'll take some finesse and/or
>>>> ruthlessness and/or hacking to get around. We'll burn those bridges as
>>>>
>>> we
>>>
>>>> come to them.
>>>>
>>>> 7.) (git submodules) Git submodules probably won't be necessary, since
>>>> at
>>>> each step in the process BioPerl devs can use ./Build installdeps or
>>>>
>>> cpanm
>>>
>>>> --installdeps . to install whatever the dependencies are for the piece
>>>> they are working on, whether it's bioperl-live (in the case of a module
>>>> that has not yet been split off), or one of the distributions that has
>>>> already been split off (in which case their improvements will probably
>>>> be
>>>> releasable to CPAN immediately!).
>>>>
>>>> Lots of detail there. I tried to make it structured and easy to skim
>>>> though. Thoughts?
>>>>
>>>> Rob
>>>>
>>>>
>>>>
>>>> On 04/28/2011 02:04 PM, Chris Fields wrote:
>>>>
>>>>> Sounds fine; I think (as you indicate) we can deal with issues along
>>>>>
>>>> the
>>>
>>>> way. Rob, anything to add?
>>>>>
>>>>> chris
>>>>>
>>>>> On Apr 28, 2011, at 2:53 PM, Sheena Scroggins wrote:
>>>>>
>>>>> Chris,
>>>>>>
>>>>>> We haven't talked much about the versioning yet, but it will be on the
>>>>>> list to figure out asap.
>>>>>>
>>>>>> So far, the plan is to split out Bio::Root first, followed by a couple
>>>>>> modules that depend only on Bio::Root. The plan I proposed was
>>>>>>
>>>>> Bio::Das,
>>>
>>>> Bio::Event then Bio::Location. Depending on how much time is remaining
>>>>>> for the GSoC project, the next to split out would be Bio::Factory and
>>>>>> Bio::Coordinate, because they depend on Bio::Root and Bio::Location. I
>>>>>> plan to still help with the reorganization after the internship is
>>>>>>
>>>>> over,
>>>
>>>> but I obviously have to have a stopping point for the GSoC project.
>>>>>>
>>>>>> Rob provide me with a really nice scrip to list dependencies of the
>>>>>> modules, so I plan to make a roadmap towards to end of the summer that
>>>>>> will help guide the rest of the reorganization. At that point, we'll
>>>>>>
>>>>> have
>>>
>>>> to deal with the circular dependencies carefully.
>>>>>>
>>>>>> This is a huge project, much bigger than I can do in one summer. But I
>>>>>> plan to get it started in a way that makes it easy for others to
>>>>>> contribute.
>>>>>>
>>>>>> Sheena
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 27, 2011 at 12:35 PM, Chris Fields<cjfields at illinois.edu>
>>>>>> wrote:
>>>>>> Sheena,
>>>>>>
>>>>>> Congrats on being accepted! We've talked about doing this over the
>>>>>>
>>>>> years,
>>>
>>>> but it's not an easy task and it needs a dedicated project to get the
>>>>>> ball rolling, so to speak. Hopefully this isn't tl;dr. I'll start
>>>>>>
>>>>> off
>>>
>>>> with a few of my questions/thoughts (Rob could probably chime in as
>>>>>>
>>>>> well,
>>>
>>>> but I think his general thoughts on the project parallel mine):
>>>>>>
>>>>>> 1) The current BioPerl CPAN could just be a simple install script,
>>>>>>
>>>>> acting
>>>
>>>> like a 'Task' or 'Bundle' module, installing the actual Bio-specific
>>>>>> distributions. Doing it this way would allow you to iteratively split
>>>>>> off additional code but retain the original Task/Bundle-based approach
>>>>>>
>>>>> to
>>>
>>>> installation. For instance, the first pass could split out Root, then
>>>>>> have a dependency-light and 'extras' distribution, 2nd round split
>>>>>> further based on function, and so on:
>>>>>>
>>>>>> 1st round (v 1.9) : BioPerl (just an installer) -> installs
>>>>>>
>>>>> root,
>>>
>>>> min-deps, extra-deps
>>>>>> 2nd round (v 1.901) : BioPerl (just an installer) -> root,
>>>>>> seq/feature, other-min-deps, extra-deps
>>>>>> ...
>>>>>> Xth round (v 1.99) : BioPerl (just an installer) -> root, tools,
>>>>>> seq, tree, align, coord, map, everything-else
>>>>>> ...
>>>>>>
>>>>>> Also, one could potentially install modules in various ways:
>>>>>> interactively, in predetermined groups, using a user-defined list, etc
>>>>>> (one could effectively create custom BioPerl installs for GBrowse or
>>>>>> other tools for instance). Of course I would only pick the easiest
>>>>>>
>>>>> route
>>>
>>>> to start, but maybe that gives some ideas. Regardless, if the
>>>>>>
>>>>> dependency
>>>
>>>> tree is set up correctly any reliance on other Bio* modules would be
>>>>>> defined in the various Build.PL/Makefile.PL and then installed via
>>>>>>
>>>>> CPAN
>>>
>>>> (as is any dependency).
>>>>>>
>>>>>> 2) The Bio::Root modules are probably the true core modules and are
>>>>>>
>>>>> the
>>>
>>>> most stable with regards to changes, so those could be moved to
>>>>>>
>>>>> something
>>>
>>>> like BioPerl-Core. Beyond that, what are the proposed splits? (we've
>>>>>> discussed this on-list before, but it's appropriate to bring this up
>>>>>> again)
>>>>>>
>>>>>> 3) How do we want to handle versioning? We can't (and probably
>>>>>> shouldn't) release everything on a synchronized versioning scheme (via
>>>>>> Bio::Root::Version, for instance), that'll quickly fall apart.
>>>>>> Personally I can foresee each split-off dist having it's own version,
>>>>>> with the BioPerl network of modules being in effect it's own
>>>>>>
>>>>> mini-CPAN.
>>>
>>>> 5) Related to versioning, in my opinion we should maybe aim on
>>>>>>
>>>>> eventually
>>>
>>>> calling this BioPerl v2.0 and starting with a simpler X.Y versioning
>>>>>> scheme. Lincoln has already done something like this with
>>>>>>
>>>>> Bio::Graphics,
>>>
>>>> which was originally part of BioPerl but split off prior to v 1.6.0.
>>>>>>
>>>>>> 6) In some cases I can see particularly thorny problems, such as
>>>>>>
>>>>> circular
>>>
>>>> dependencies. I can think of a few ways to address that (creating a
>>>>>> simple lightweight Bio::Species class as a fallback if Bio::Tree code
>>>>>> isn't present, for instance), but any additional thoughts on this
>>>>>>
>>>>> would
>>>
>>>> be helpful.
>>>>>>
>>>>>> 7) Do we want to set up something like 'git submodule' for the devs to
>>>>>> pull down all BioPerl-relevant code?
>>>>>>
>>>>>> Other thoughts?
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Apr 27, 2011, at 12:17 AM, Sheena Scroggins wrote:
>>>>>>
>>>>>> Hey everyone,
>>>>>>>
>>>>>>> I wanted to take a minute to introduce myself as one of the Google
>>>>>>> Summer of
>>>>>>> Code interns. I was the lucky one chosen to work on the BioPerl
>>>>>>> Reorganization (*crowd cheers*). I am a grad student in
>>>>>>>
>>>>>> bioinformatics,
>>>
>>>> and
>>>>>>> somewhat new to this level of programming so bear with me as I learn
>>>>>>>
>>>>>> the
>>>
>>>> technical jargon. Luckily I have both Rob and Chris to mentor me this
>>>>>>> summer!
>>>>>>>
>>>>>>> Reading through the mailing list archives, I see there have been many
>>>>>>> discussion and differing opinions about tackling this project. Given
>>>>>>>
>>>>>> the
>>>
>>>> time frame for GSoC and my limited experience, there is no way I will
>>>>>>> complete this project on my own but I will at least be able to start
>>>>>>>
>>>>>> it,
>>>
>>>> which will hopefully motivate others to pitch in. So far, the plan
>>>>>>>
>>>>>> for
>>>
>>>> the
>>>>>>> GSoC project is to start by breaking out Bio::Root, followed by a
>>>>>>>
>>>>>> couple
>>>
>>>> other modules based on their dependencies and the time allowed. Each
>>>>>>> will be
>>>>>>> published to CPAN independently. You can follow the project (once it
>>>>>>> starts)
>>>>>>> on github at https://github.com/sheenams.
>>>>>>>
>>>>>>> I look forward to collaborating with many of you on the
>>>>>>>
>>>>>> reorganization
>>>
>>>> (hint
>>>>>>> hint)!
>>>>>>>
>>>>>>> Sheena
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> ------------------------------
>>>
>>> Message: 5
>>> Date: Fri, 29 Apr 2011 10:23:50 +0530
>>> From: "khush ........"<bioinfo.khush at gmail.com>
>>> Subject: Re: [Bioperl-l] Standalone blast
>>> To: Dave Messina<David.Messina at sbc.su.se>
>>> Cc: bioperl-l at lists.open-bio.org
>>> Message-ID:<BANLkTikjFc-HBBKLMRam1g+Kxoro+WAE_g at mail.gmail.com>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>> Dear Dave,
>>>
>>> Thank you for your support.
>>>
>>> If need to change the following lines like
>>>
>>> $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program =>
>>> 'blastx',
>>> -database => 'nr.fa'));
>>>
>>> $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"file.fa");
>>>
>>> I have a simple and basic query for you, as I am beginners in bioperl,
>>> that
>>> if I need to download the whole nr database from NCBI to run the code or
>>> It
>>> will directly fetch information from the NCBI website. I do not
>>> understand
>>> it, because downloading the whole nr d/b itself takes long time for me.
>>>
>>> How could I read whole file instead of simple string "TTTATAGATAGAGACAG"
>>> in
>>> -seq (a fasta file). Is there a simple way to do the exercise according
>>> to
>>> my conditions.
>>>
>>> Thank you
>>> Kamal
>>>
>>>
>>> On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina<David.Messina at sbc.su.se
>>>
>>>> wrote:
>>>> Hi Kamal,
>>>>
>>>> This is covered in the beginners' HOWTO:
>>>> http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST
>>>>
>>>>
>>>> Dave
>>>>
>>>>
>>>> On Thu, Apr 28, 2011 at 07:22, khush ........<bioinfo.khush at gmail.com
>>>> wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> I have some sequences ~250 and wanted to use BLASTX to blast against nr
>>>>> database of NCBI, as this is time consuming using web based search. Can
>>>>> some
>>>>> one please tell me how to start BIOPERL with scuh problems. I know that
>>>>> this
>>>>> is possible with bioperl, but do not know how.
>>>>>
>>>>> Any suggestion will be appreciable.
>>>>>
>>>>> Thanks in advance
>>>>> Kamal
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>> ------------------------------
>>>
>>> Message: 6
>>> Date: Thu, 28 Apr 2011 22:15:01 -0700
>>> From: Robert Buels<rmb32 at cornell.edu>
>>> Subject: Re: [Bioperl-l] GSoC/BioPerl Reorganization Project
>>> To: BioPerl List<bioperl-l at lists.open-bio.org>
>>> Message-ID:<4DBA4955.2030003 at cornell.edu>
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>
>>> On 04/28/2011 07:15 PM, Siddhartha Basu wrote:
>>>
>>>> At what point in flow the dependencies between the split modules will be
>>>> added. Is there any particular order the split modules would be created.
>>>>
>>> Dependencies are added and characterized at the time each distribution
>>> is created. That's why the splitting order starts at Bio::Root, so that
>>> you can proceed up the hierarchy of dependencies without having to
>>> modify the dependency lists of the distributions that have already been
>>> extracted.
>>>
>>> And how those split off modules will be released in CPAN, one by one as
>>>> they being generated or all of them in a batch after which they will
>>>> follow their release schedule.
>>>>
>>> One by one, as they are generated. I think it would be a good idea to
>>> re-release bioperl-live with each split as well. This will probably
>>> lead to bioperl-live being released nearly every week as the split is
>>> ongoing. As a consequence, the master branch of bioperl-live will need
>>> to be kept in very good shape. This is easy if you just follow good
>>> practice: develop in branches, run *all* the tests before committing, go
>>> on IRC and send pull requests for code review, etc.
>>>
>>> Rob
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 7
>>> Date: Fri, 29 Apr 2011 15:24:45 +1000
>>> From: Florent Angly<florent.angly at gmail.com>
>>> Subject: Re: [Bioperl-l] Standalone blast
>>> To: bioinfo.khush at gmail.com
>>> Cc: bioperl-l at lists.open-bio.org
>>> Message-ID:<4DBA4B9D.1010400 at gmail.com>
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>
>>> Hi Kamal,
>>>
>>> To run BLAST the way Dave described, you need to have BLAST installed on
>>> your computer, and you need to download BLAST databases to your computer
>>> (or make them yourself with the formatdb command). There are plenty of
>>> databases available on the NCBI FTP website: ftp://ftp.ncbi.nih.gov/.
>>> And yes, some of these databases are very large and will take a long
>>> time to download. By the way, the BLAST may also take a very long time
>>> to execute if you use large databases, so, you'd better run the analysis
>>> on a powerful computer or a server.
>>>
>>> Also read this documentation:
>>>
>>>
>>> http://search.cpan.org/~cjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm
>>> <
>>>
>>> http://search.cpan.org/%7Ecjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm
>>> It stipulates that you can BLAST an entire FASTA file (not just a
>>> sequence object):
>>>
>>> $inputfilename = 't/testquery.fa';
>>> $blast_report = $factory->blastall($inputfilename);
>>>
>>>
>>> Regards,
>>>
>>> Florent
>>>
>>>
>>>
>>>
>>> On 29/04/11 14:53, khush ........ wrote:
>>>
>>>> Dear Dave,
>>>>
>>>> Thank you for your support.
>>>>
>>>> If need to change the following lines like
>>>>
>>>> $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program =>
>>>>
>>> 'blastx',
>>>
>>>> -database => 'nr.fa'));
>>>>
>>>> $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"file.fa");
>>>>
>>>> I have a simple and basic query for you, as I am beginners in bioperl,
>>>>
>>> that
>>>
>>>> if I need to download the whole nr database from NCBI to run the code or
>>>>
>>> It
>>>
>>>> will directly fetch information from the NCBI website. I do not
>>>>
>>> understand
>>>
>>>> it, because downloading the whole nr d/b itself takes long time for me.
>>>>
>>>> How could I read whole file instead of simple string "TTTATAGATAGAGACAG"
>>>>
>>> in
>>>
>>>> -seq (a fasta file). Is there a simple way to do the exercise according
>>>>
>>> to
>>>
>>>> my conditions.
>>>>
>>>> Thank you
>>>> Kamal
>>>>
>>>>
>>>> On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina<David.Messina at sbc.su.se
>>>> wrote:
>>>>
>>>> Hi Kamal,
>>>>>
>>>>> This is covered in the beginners' HOWTO:
>>>>> http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST
>>>>>
>>>>>
>>>>> Dave
>>>>>
>>>>>
>>>>> On Thu, Apr 28, 2011 at 07:22, khush ........<bioinfo.khush at gmail.com
>>>>>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>>
>>>>>> I have some sequences ~250 and wanted to use BLASTX to blast against
>>>>>> nr
>>>>>> database of NCBI, as this is time consuming using web based search.
>>>>>> Can
>>>>>> some
>>>>>> one please tell me how to start BIOPERL with scuh problems. I know
>>>>>> that
>>>>>> this
>>>>>> is possible with bioperl, but do not know how.
>>>>>>
>>>>>> Any suggestion will be appreciable.
>>>>>>
>>>>>> Thanks in advance
>>>>>> Kamal
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 8
>>> Date: Fri, 29 Apr 2011 11:16:38 +0530
>>> From: "khush ........"<bioinfo.khush at gmail.com>
>>> Subject: Re: [Bioperl-l] Standalone blast
>>> To: Florent Angly<florent.angly at gmail.com>
>>> Cc: bioperl-l at lists.open-bio.org
>>> Message-ID:<BANLkTin_E2-Pq4Hk+W72x78bKpTRoEdy6g at mail.gmail.com>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>> Dear Florent,
>>>
>>> Thank you very much for your kind reply and let me clear the concept of
>>> running the blast. I am working with simple machine so I need to take
>>> permission from my administrator to work on some good server to have
>>> whole
>>> nr database from NCBI and run the blastx.
>>>
>>> Thank you
>>>
>>> Kamal
>>> Bioperl is great.
>>>
>>>
>>> On Fri, Apr 29, 2011 at 10:54 AM, Florent Angly<florent.angly at gmail.com
>>>
>>>> wrote:
>>>> Hi Kamal,
>>>>
>>>> To run BLAST the way Dave described, you need to have BLAST installed on
>>>> your computer, and you need to download BLAST databases to your computer
>>>>
>>> (or
>>>
>>>> make them yourself with the formatdb command). There are plenty of
>>>>
>>> databases
>>>
>>>> available on the NCBI FTP website: ftp://ftp.ncbi.nih.gov/. And yes,
>>>>
>>> some
>>>
>>>> of these databases are very large and will take a long time to download.
>>>>
>>> By
>>>
>>>> the way, the BLAST may also take a very long time to execute if you use
>>>> large databases, so, you'd better run the analysis on a powerful
>>>> computer
>>>>
>>> or
>>>
>>>> a server.
>>>>
>>>> Also read this documentation:
>>>>
>>>>
>>> http://search.cpan.org/~cjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm
>>> <
>>>
>>> http://search.cpan.org/%7Ecjfields/BioPerl-1.6.900/Bio/Tools/Run/StandAloneBlast.pm
>>>
>>>> It stipulates that you can BLAST an entire FASTA file (not just a
>>>>
>>> sequence
>>>
>>>> object):
>>>>
>>>> $inputfilename = 't/testquery.fa';
>>>> $blast_report = $factory->blastall($inputfilename);
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Florent
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 29/04/11 14:53, khush ........ wrote:
>>>>
>>>> Dear Dave,
>>>>>
>>>>> Thank you for your support.
>>>>>
>>>>> If need to change the following lines like
>>>>>
>>>>> $blast_obj = Bio::Tools::Run::StandAloneBlast->new(-program =>
>>>>>
>>>> 'blastx',
>>>
>>>> -database => 'nr.fa'));
>>>>>
>>>>> $seq_obj = Bio::Seq->new(-id =>"test query", -seq =>"file.fa");
>>>>>
>>>>> I have a simple and basic query for you, as I am beginners in bioperl,
>>>>> that
>>>>> if I need to download the whole nr database from NCBI to run the code
>>>>> or
>>>>> It
>>>>> will directly fetch information from the NCBI website. I do not
>>>>>
>>>> understand
>>>
>>>> it, because downloading the whole nr d/b itself takes long time for me.
>>>>>
>>>>> How could I read whole file instead of simple string
>>>>> "TTTATAGATAGAGACAG"
>>>>> in
>>>>> -seq (a fasta file). Is there a simple way to do the exercise according
>>>>>
>>>> to
>>>
>>>> my conditions.
>>>>>
>>>>> Thank you
>>>>> Kamal
>>>>>
>>>>>
>>>>> On Thu, Apr 28, 2011 at 12:59 PM, Dave Messina<David.Messina at sbc.su.se
>>>>>
>>>>>> wrote:
>>>>>>
>>>>> Hi Kamal,
>>>>>
>>>>>> This is covered in the beginners' HOWTO:
>>>>>> http://www.bioperl.org/wiki/HOWTO:Beginners#BLAST
>>>>>>
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 28, 2011 at 07:22, khush ........<bioinfo.khush at gmail.com
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>> I have some sequences ~250 and wanted to use BLASTX to blast against
>>>>>>>
>>>>>> nr
>>>
>>>> database of NCBI, as this is time consuming using web based search.
>>>>>>>
>>>>>> Can
>>>
>>>> some
>>>>>>> one please tell me how to start BIOPERL with scuh problems. I know
>>>>>>>
>>>>>> that
>>>
>>>> this
>>>>>>> is possible with bioperl, but do not know how.
>>>>>>>
>>>>>>> Any suggestion will be appreciable.
>>>>>>>
>>>>>>> Thanks in advance
>>>>>>> Kamal
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>> ------------------------------
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> End of Bioperl-l Digest, Vol 96, Issue 28
>>> *****************************************
>>>
>>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
More information about the Bioperl-l
mailing list