From js5@sanger.ac.uk Fri Feb 1 09:15:52 2002 From: js5@sanger.ac.uk (James Smith) Date: Fri, 1 Feb 2002 09:15:52 +0000 (GMT) Subject: [Bioperl-l] A second ensembl issue rears its head . . . In-Reply-To: <4D693F933DD8D311A18C00E018B005760B5681C4@trollope.niehs.nih.gov> Message-ID: On Thu, 31 Jan 2002, Tomso.Daniel wrote: > > DBD::mysql::st execute failed: Unknown column 't.gene' in 'field list' at > /usr/lib/perl5/site_perl/5.6.0/Bio/EnsEMBL/Virtual/StaticContig.pm line 2172. > DBD::mysql::st execute failed: Unknown column 't.gene' in 'field list' at > /usr/lib/perl5/site_perl/5.6.0/Bio/EnsEMBL/Virtual/StaticContig.pm line 2172. > > ## While trying to get_all_RepeatFeatures: > > DBD::mysql::st execute failed: Table 'homo_sapiens_core_130.analysis' doesn't > exist at /usr/lib/perl5/site_perl/5.6.0/Bio/EnsEMBL/DBSQL/Feature_Obj.pm line > 474. > DBD::mysql::st execute failed: Table 'homo_sapiens_core_130.analysis' doesn't > exist at /usr/lib/perl5/site_perl/5.6.0/Bio/EnsEMBL/DBSQL/Feature_Obj.pm line > 474. Which version of the EnsEMBL API are you using... in the 130 code-base these calls should be using the gene/feature adaptor model, not the older _obj model. James From b_i_osborne@hotmail.com Fri Feb 1 16:01:05 2002 From: b_i_osborne@hotmail.com (Brian Osborne) Date: Fri, 1 Feb 2002 11:01:05 -0500 Subject: [Bioperl-l] bioperl-db documentation error Message-ID: This is a multi-part message in MIME format. ------=_NextPart_000_0077_01C1AB0F.BEF6F7C0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable bioperl-l, Can I get cvs access to bioperl-db 0.1? There's a documentation error = I'd like to correct. Thanks again, Brian O. ------=_NextPart_000_0077_01C1AB0F.BEF6F7C0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
bioperl-l,
 
Can I get cvs access to bioperl-db 0.1? = There's a=20 documentation error I'd like to correct.
 
Thanks again,
 
Brian O.
 
------=_NextPart_000_0077_01C1AB0F.BEF6F7C0-- From dabbott@fhcrc.org Fri Feb 1 17:50:28 2002 From: dabbott@fhcrc.org (Denise Abbott) Date: Fri, 1 Feb 2002 09:50:28 -0800 (PST) Subject: [Bioperl-l] Installation problems: solaris Message-ID: When installing on this system: SunOS terra 5.8 Generic_108528-06 sun4u sparc SUNW,Ultra-4 This is perl, v5.6.1 built for sun4-solaris I receive this error while doing make test: t/DB...................ok 5/33 Batch access test failed. Error: -------------------- EXCEPTION -------------------- MSG: WebDBSeqI Error - check query sequences! STACK Bio::DB::WebDBSeqI::get_seq_stream blib/lib/Bio/DB/WebDBSeqI.pm:296 STACK Bio::DB::NCBIHelper::get_Stream_by_batch blib/lib/Bio/DB/NCBIHelper.pm:205 STACK (eval) t/DB.t:73 STACK toplevel t/DB.t:72 ------------------------------------------- t/DB................ok 38/33Warning: Couldn't connect to Genbank with Bio::DB::GenPept.pm! t/DB................ok 46/33Warning: Couldn't connect to Genbank with Bio::DB::GenBank.pm! Don't know which tests failed: got 46 ok, expected 33 The make test continues on from there and at the end says that it has errors and won't install unless forced. I looked a bit at the code there to see if it was some error that I could see, but I don't have time to look deeper into it, so I was wondering if anyone else could offer some hints of what to change or whether or not it really IS a problem with the sequences make test tries to use. Thanks. Denise Abbott From jason@cgt.mc.duke.edu Fri Feb 1 22:43:32 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Fri, 1 Feb 2002 17:43:32 -0500 (EST) Subject: [Bioperl-l] Installation problems: solaris In-Reply-To: Message-ID: Denise - You haven't reported which version of bioperl you having this trouble with. It has nothing to do with solaris - but rather with that version of bioperl and the NCBI Entrez server URL changing. All releases after 0.7.2 contain the appropriate URL in them - if you need Bio::DB functionality immedietely you should install the developer release 0.9.3 - otherwise await the 1.0 release at the end of the month. -jason On Fri, 1 Feb 2002, Denise Abbott wrote: > > When installing on this system: > SunOS terra 5.8 Generic_108528-06 sun4u sparc SUNW,Ultra-4 > This is perl, v5.6.1 built for sun4-solaris > > I receive this error while doing make test: > > t/DB...................ok 5/33 Batch access test failed. > Error: -------------------- EXCEPTION -------------------- > MSG: WebDBSeqI Error - check query sequences! > > STACK Bio::DB::WebDBSeqI::get_seq_stream blib/lib/Bio/DB/WebDBSeqI.pm:296 > STACK Bio::DB::NCBIHelper::get_Stream_by_batch > blib/lib/Bio/DB/NCBIHelper.pm:205 > STACK (eval) t/DB.t:73 > STACK toplevel t/DB.t:72 > ------------------------------------------- > > t/DB................ok 38/33Warning: Couldn't connect to Genbank with > Bio::DB::GenPept.pm! > > t/DB................ok 46/33Warning: Couldn't connect to Genbank with > Bio::DB::GenBank.pm! > > Don't know which tests failed: got 46 ok, expected 33 > > The make test continues on from there and at the end says that it has > errors and won't install unless forced. I looked a bit at the code there > to see if it was some error that I could see, but I don't have time to > look deeper into it, so I was wondering if anyone else could offer some > hints of what to change or whether or not it really IS a problem with the > sequences make test tries to use. > > Thanks. > Denise Abbott > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From jason@cgt.mc.duke.edu Fri Feb 1 22:46:15 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Fri, 1 Feb 2002 17:46:15 -0500 (EST) Subject: [Bioperl-l] bioperl-db documentation error In-Reply-To: Message-ID: You should already have that access with your current account - note we have branched the code and don't plan to release off the branch that 0.1 came from (branched to work with bioperl 0.7 annotation objects). So you should only need to make changes to the main trunk of the bioperl-db code. you can check it out in the same fashion as bioperl-live - substituting bioperl-db for the CVS module name. -jason On Fri, 1 Feb 2002, Brian Osborne wrote: > bioperl-l, > > Can I get cvs access to bioperl-db 0.1? There's a documentation error I'd like to correct. > > Thanks again, > > Brian O. > > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From dave.ardell@ebc.uu.se Fri Feb 1 23:58:05 2002 From: dave.ardell@ebc.uu.se (Ardell, David) Date: Sat, 02 Feb 2002 00:58:05 +0100 (CET) Subject: [Bioperl-l] Bio::TreeIO In-Reply-To: <200201300200.g0U20Eit015565@pw600a.bioperl.org> References: <200201300200.g0U20Eit015565@pw600a.bioperl.org> Message-ID: <1012607885.3c5b2b8d84d70@clamator.its.uu.se> Dear Bioperl colleagues I have been planning to introduce myself for about a month, right before I started traveling in the states from sweden. I have been using bioperl 0.7.0 for about a year and have written scripts and modified modules for my own use. First, I want to say thanks for your efforts. bioperl is great and just about essentially useful for what I do (I have a population genetics background and am currently doing microbial genomics and molecular evolution).. it has saved my skin in several accounts. Secondly, I have suggestions and scripts to contribute but have been bashful about sending something off to you because I am not caught up with the state of your development efforts. Lastly, I am writing now anyway because I saw something about a Bio::TreeIO mentioned on your list. I have developed Tree and TreeIO modules. I'd like to propose integrating them into bioperl, so long as i haven't reduplicated any of your efforts. The state of the modules is that they are currently suited for representing trees and tree formats, but are agnostic as to what kind of objects are at the tips of the trees. They are not explicitly bioperl and do not inherit the bioperl root object or interface. I am writing up a short publication about the modules as they are ready to be used by others (minus some documentation) and I want them to find a good home. I think what I would like to do is package them free-standing for the phylogenetics community to use. But, so long as you don't already have something like this in the pipeline, I would be open to contributing them to your project, and be interested in helping to plan how they could be integrated nicely inside bioperl with the alignment, sequence, and taxon representations. I would also be interested in writing modules to drive phylogenetics software and eat their output. Good luck with the new release! perl on dave Dave.Ardell@EBC.UU.SE From jason@cgt.mc.duke.edu Sat Feb 2 15:06:11 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Sat, 2 Feb 2002 10:06:11 -0500 (EST) Subject: [Bioperl-l] Bio::TreeIO In-Reply-To: <1012607885.3c5b2b8d84d70@clamator.its.uu.se> Message-ID: On Sat, 2 Feb 2002, Ardell, David wrote: > Dear Bioperl colleagues > > I have been planning to introduce myself for about a month, > right before I started traveling in the states from sweden. I have been > using bioperl 0.7.0 for about a year and have written scripts > and modified modules for my own use. > -hmm - would be interested in what modules you have modified - it only helps the toolkit if you contribute suggestions, bugfixes, and improvements back to the development effort. > First, I want to say thanks for your efforts. bioperl is great and > just about essentially useful for what I do (I have a population > genetics background and am currently doing microbial genomics and > molecular evolution).. it has saved my skin in several accounts. > We would definitely like to see population genetics type modules make their way in - Heikki and I have been trying to get Population objects in to handle interfacing with the Genetic Maps and work with my initial Family objects that work with Pedigree drawing and analysis objects. So far we have not put any population objects in place and would love to see some proposals here. Basically a definition of what types of operations you would need to be able to implement the analysis/statistics that you are interested in. > Secondly, I have suggestions and scripts to contribute but have been > bashful about sending something off to you because I am not caught up > with the state of your development efforts. > > Lastly, I am writing now anyway because I saw something about a > Bio::TreeIO mentioned on your list. I have developed Tree and TreeIO > modules. I'd like to propose integrating them into bioperl, so long as > i haven't reduplicated any of your efforts. > I've produced TreeIO and Tree objects in the 0.9.3 dev release which you should check out - these were put in place so that I could potentially convert between different tree formats and create random trees along the lines of Richard Hudson's work (see Bio::Tree::RandomFactory for reference). In TreeIO have only implemented reading of newick/newhampshire format but would like to implement read/write of phyloXML and some other formats. I also have implemented a module called Bio::Tree::Statistics that will calculate Fu and Li's D for a given tree that contains AlleleNodes. Plan on adding some other statistics when I have time. > The state of the modules is that they are currently suited for > representing trees and tree formats, but are agnostic as to what kind > of objects are at the tips of the trees. They are not explicitly > bioperl and do not inherit the bioperl root object or interface. > Look at the Bio::Tree::*Node* objects - I have an allele node used for storing alleles. Would be interested in what other operations you would find interesting. > I am writing up a short publication about the modules as they are > ready to be used by others (minus some documentation) and I want them > to find a good home. > > I think what I would like to do is package them free-standing for the > phylogenetics community to use. But, so long as you don't already have > something like this in the pipeline, I would be open to contributing > them to your project, and be interested in helping to plan how they > could be integrated nicely inside bioperl with the alignment, > sequence, and taxon representations. I would also be interested in > writing modules to drive phylogenetics software and eat their output. > These are something I am interested in as well - providing a nice API that would interact our alignment objects and calculate some molecular evolution statistics. Do you think the Bio::Species object is rich enough to handle all the taxon information we need? It would also be nice to interface with NCBI taxon information via taxonid # - and so that one could instantiate a species object based ncbi taxonid. As for analysis applications - we have an interface to EMBOSS which means we have an indirect interface to PHYLIP. We can Read/Write phylip and nexus formats so plugging into other external analysis should be relatively simple. Would really enjoy having your help on any and all of the above. Would be things we can start to build in after the 1.0 release. -jason > Good luck with the new release! > > perl on > dave > > > Dave.Ardell@EBC.UU.SE > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From smathias1@qwest.net Sat Feb 2 18:28:46 2002 From: smathias1@qwest.net (Steve Mathias) Date: Sat, 2 Feb 2002 11:28:46 -0700 Subject: [Bioperl-l] Genetic Analysis Modules Message-ID: <0202021128460A.04108@redbo> Hello Bioperlers, I've noted several postings on the list over the last few months on genetic maps, pedigrees, populations, genotypes, phenotypes, etc. - basically, things I consider to fall into the realm of genetic analysis. So I thought it might be of interest to some in the group to know that I have written a series of modules related to genetic analysis. The distribution of Genetics modules, aka GenPerl, is available in the CPAN directory SLMATH, or you can find it by doing a CPAN search on 'Genetics'. There's some documentation with the distribution outlining the functionality and how to use it. Briefly there are a series of Perl classes implementing a full genetic analysis object model. Then there are a set of modules implementing an API for managing persistence of the data in a relational database (currently only MySQL is supported), and for performing analysis on genotype/phenotype data. The former is basically a wrapper around DBI. The latter includes functionality for writing linkage format files and running linkage analysis software (E.g. Genehunter). There is some other analysis functionality, but it is mostly just stuff that I've played around with at one time or another. The Genetics modules are not structured in a very "bioperl" way. However, if there is interest in including this kind of functionality in bioperl, I'd be willing to volunteer to do this. If so, I'd be interested in hearing comments on where/how you all think it would be best to do this. Obviously, this would be a post 1.0 thing, but let me know what you think. -Steve -- Stephen L. Mathias smathias1@qwest.net From xgai@iastate.edu Sun Feb 3 21:09:52 2002 From: xgai@iastate.edu (Xiaowu Gai) Date: Sun, 03 Feb 2002 15:09:52 -0600 Subject: [Bioperl-l] question regarding entrez Message-ID: <5.0.1.4.2.20020203145925.00a908b8@xgai.mail.iastate.edu> Hi Everyone: I browsed through the documentation of BioPerl and could not seem to find anything about using Entrez with BioPerl, in other words, it appears impossible to do an Entrez search in your program written in BioPerl? Why does BioPerl not support Entrez? Is there a way I can work around it? Any known program for Net Entrez so I call it in my program? (I downloaded the Network Entrez from NCBI and played with it a little bit, but it has this GUI and no command line version). Thank you all so much. Xiaowu From schattner@alum.mit.edu Mon Feb 4 11:29:09 2002 From: schattner@alum.mit.edu (Peter Schattner) Date: Mon, 04 Feb 2002 03:29:09 -0800 Subject: [Bioperl-l] Jason and Ewan's TODO list References: Message-ID: <3C5E7084.C68D2305@alum.mit.edu> Ewan Birney wrote: > We are enjoying tuscon over here, and mainly to use the mailing list as a > filing system here is our TODO list. > > (a) top level docs > > Peter + Brian (any dates on this?) I plan to have bptutorial.pl (both the text and the tutorial script) updated to include 1.0 modificatins / additions by 2/15. Peter From schattner@alum.mit.edu Mon Feb 4 11:31:46 2002 From: schattner@alum.mit.edu (Peter Schattner) Date: Mon, 04 Feb 2002 03:31:46 -0800 Subject: [Bioperl-l] re: blast parsing References: <20020130143225.A21651@psychro> Message-ID: <3C5E7122.58CE710E@alum.mit.edu> Neil Saunders wrote: > Also, what are plans for SeqStats.pm in the next release? I like that > module, but it has limitations (e.g. codon_count() only outputs codons > with a non zero count). I have no plans to increase functionality of SeqStats.pm (I'm simply too busy). But if you'd like to do that, please do. Peter Schattner From jason@cgt.mc.duke.edu Mon Feb 4 14:31:57 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Mon, 4 Feb 2002 09:31:57 -0500 (EST) Subject: [Bioperl-l] (no subject) (fwd) Message-ID: That module is not part of the bioperl distribution - you may want to contact the author of the module for more information. http://search.cpan.org/search?mode=module&query=WWW%3A%3ASearch%3A%3APubMed -jason -- Jason Stajich Duke University jason@cgt.mc.duke.edu ---------- Forwarded message ---------- Date: 4 Feb 2002 06:22:09 -0000 From: gayatri ganesh To: Jason Stajich Subject: [Bioperl-l] (no subject) Sir, What is the purpose of WWW::Search::PubMed module .What does it retreive?? Are there any requirements to use this package apart from the associated packages. Thanking you, G.Gayatri C.R.Sathya Krishna From lstein@cshl.org Mon Feb 4 15:43:12 2002 From: lstein@cshl.org (Lincoln Stein) Date: Mon, 4 Feb 2002 10:43:12 -0500 Subject: [Bioperl-l] Bio::Graphics In-Reply-To: References: Message-ID: <02020410431202.30045@fontina> Anyone mind if I add Bio::Graphics (about 6000 lines of code) to the main Bioperl branch? It's a library that renders Bio::SeqI-compliant objects onto a canvas to create static PNG or JPEG images. It was part of the generic genome browser, but since it's library code, it probably belongs in BioPerl rather than at the application level. Lincoln -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From heikki@ebi.ac.uk Mon Feb 4 16:49:39 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon, 04 Feb 2002 16:49:39 +0000 Subject: [Bioperl-l] Bio::Graphics References: <02020410431202.30045@fontina> Message-ID: <3C5EBBA3.104B58CC@ebi.ac.uk> Lincoln Stein wrote: > > Anyone mind if I add Bio::Graphics (about 6000 lines of code) to the main > Bioperl branch? It's a library that renders Bio::SeqI-compliant objects onto > a canvas to create static PNG or JPEG images. Following the policy of keeping dependiencies to minimum, shouldn't it go into bioperl-gui? It this still a good idea? I am beginning to have doubts. Last summer we thought basic perl only modules are in bioperl-live. Modules with extra dependencies should go into reparate repositories. So far we have e.g. bioperl-db for RDM dependencies (mySQL, Postgres) bioperl-gui for TK GUI modules, cared for by Mark Wilkinson bioperl-ext for C extensions I'd hate to see this devision to hinder the addition of the new code when it was meant to make installation easier. -Heikki > It was part of the generic genome browser, but since it's library code, it > probably belongs in BioPerl rather than at the application level. > > Lincoln > > -- > ======================================================================== > Lincoln D. Stein Cold Spring Harbor Laboratory > lstein@cshl.org Cold Spring Harbor, NY > ======================================================================== > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From michal@orfeus.bioinfo.pl Mon Feb 4 20:31:15 2002 From: michal@orfeus.bioinfo.pl (Michal Kurowski) Date: Mon, 4 Feb 2002 21:31:15 +0100 Subject: [Bioperl-l] SeqIO on two files Message-ID: <20020204213115.A8793@orfeus> Hi, After some struggle with perl IO modules I'm still not quite sure how to make Bio::SeqIO work with two (fasta) files. What I try to do is to run 'bl2seq' on pairs of sequences from two files (matching on deflines) but without success. Any help highly apreciated, -- Michal Kurowski From Jonathan_Epstein@nih.gov Mon Feb 4 22:00:43 2002 From: Jonathan_Epstein@nih.gov (Jonathan Epstein) Date: Mon, 04 Feb 2002 17:00:43 -0500 Subject: [Bioperl-l] question regarding entrez In-Reply-To: <5.0.1.4.2.20020203145925.00a908b8@xgai.mail.iastate.edu> Message-ID: <4.2.0.58.20020204164825.00a542f8@helix.nih.gov> I'm not aware of anything in BioPerl which uses Entrez per se. Within the NCBI toolkit, you'll find a program called entrcmd.c, which compiles into a network client called Nentrcmd. I wrote this program when I was at NCBI, and it was originally used as the engine for the first WWW Entrez server. Unfortunately, this network client/server interface has grown less reliable over the years, especially for large queries. My understanding is that there is a newer Entrez API, but it's not clear to me whether it's been deployed yet. E.g., the command Nentrcmd -d n -e 'human[ORGN]' -p su >/tmp/human should dump all the human GIs into the output file, but in practice this command fails to produce any output, probably because of server problems and/or the dataset size. For support with this program, you should contact toolbox@ncbi.nlm.nih.gov. If you strike out with them I can try to help you, but note that I no longer have any control over the network services. It strikes me that it would be cool if Catherine L. and her group were to create a GUI interface for this program using their automatic GUI-creator (although note that there is a builtin GUI of sorts already for certain platforms). Here's the short online help, followed by the full help: mgchd1 2% Nentrcmd - Entrez command-line $Revision: 6.3 $ arguments: -d Initial database [String] Optional default = m -e Boolean expression [String] Optional -u Comma-delimited list of UIDs [String] Optional -p Program of commands [String] -s Display status report [T/F] Optional default = F -w Produce WWW/HTML formatted output (recommended value is /htbin) [String] Optional -h Detailed help [T/F] Optional default = F -f For WWW output, use Forms [T/F] Optional default = F -c 'Check' WWW output Forms [T/F] Optional default = F -x Name of export file for named UID list [String] Optional -i Comma-delimited list of files to import for named UID list [String] Optional -t Produce a list of terms (term) [String] Optional -l Taxonomy lookup [String] Optional -n On-the-fly neighboring [File In] Optional -o Output file [File Out] default = stdout -g Use WWW-style encoding for special input characters [T/F] Optional default = T -r Get sequences from ID Repository [T/F] Optional default = F -y Complexity (1=bioseq only, 2=bioseq set, 3=nuc-prot set) [Integer] Optional default = 3 --------------------- Entrcmd is a non-interactive command-line interface which allows a user to perform a series of neighboring and output operations, based upon an initial set of UIDs or a boolean expression which describes a set of UIDs. Alternatively, it can be used to display an alphabetically sorted list of terms near an initial term. Type 'entrcmd' with no arguments for a brief summary of command-line options. EXPRESSION SYNTAX (-e option) The following grammar is based upon Backus-Naur form. Braces ({}) are used to specify optional fields, and ellipses (...) represents an arbitrary number of repititions. In most Backus-Naur forms, the vertical bar (|) and brackets ([]) are used as meta-symbols. However, in the following grammar, the vertical bar and brackets are terminal symbols, and three stacked vertical bars are used to represent alternation. expression ::= diff { - diff ... } diff ::= term { | term ... } term ::= factor { & factor ... } | factor ::= qualtoken | ( expression ) | qualtoken ::= token { [ fld { ,S } ] } token is a string of characters which either contains no special characters, or which is delimited by double-quotes ("). Double-quote marks and backslashes (\) which appear with a quoted token must be quoted by an additional backslash. fld is an appropriate string describing a field. The possible values are described in the following table. For all databases, an asterisk(*) is a possible value for fld, signifying the union of all possible fields for that database. '*' is also the default field, if no field qualifier is specified. | fld| Databases and descriptions +----+-------------------------------------------------------------------- |WORD| For MEDLINE, "Abstract or Title"; for Sequences, "Text Terms" |MESH| MEDLINE only, "MeSH term" |AUTH| For all databases, "Author Name" |JOUR| For all databases, "Journal Title" |GENE| For all databases, "Gene Name" |KYWD| For MEDLINE, "Substance", for Sequences "Keyword" |ECNO| For MEDLINE and protein, "E.C. number" |ORGN| For all databases, "Organism" |ACCN| For Sequence databases, "Accession" |PROT| For protein, "Protein Name" The presence of ",S" after a field specifier implies the same semantics as "special" in Entrez. Entrez "total" semantics are the default. PROGRAM OF COMMANDS (-p option) For the "-e" and "-u" options, the program of commands consists of a sequence of neighboring operations alternated with optional output commands. All output commands, except the first, must be preceded by a period (.), and all neighboring commands must be preceded by a comma (,). The output commands are: no None (default) sg Sequence GenBank/GenPept flat file format ma MEDLINE ASN.1 format sa Sequence ASN.1 format md MEDLINE docsums sd Sequence docsums ml MEDLARS format sf Sequence FASTA format mr MEDLINE report format sr Sequence report format mu MEDLINE UIDs su Sequence UIDs si Sequence IDs Each output command may be followed by an optional count indicating how many articles to display. The default is to display all the articles. If the "-x" command line option appears ("export to a saved UID list"), then the first "mu" or "su" command results in those UIDs being written to that "saved UID list" file, rather than being written to the standard output. Neighboring commands indicate the database to neighbor "to", and consists of the first letter of each of the possible databases: (medline, protein, nucleotide) followed by an optional count of how many of the current set of articles should be included in the neighboring operation. Example: Find the articles written by "Kay LE", but not by "Forman-Kay JD". Find their MEDLINE neighbors. Print document summaries for all of these neighbors. Of these neighbors, neighbor the first 5 entries to the protein database. Print up to 10 of these sequences in Sequence Report format. entrcmd -e '"Kay LE" [AUTH] - "Forman-Kay JD" [AUTH]' -p ,m.md,p5.sr10 If the "-t" option is used, then the program of commands is different from what is described above. Rather, it consists of a seven character string, optionally followed by the number of terms which should be displayed. The default number of terms is 40. The string is of the form '123FLDD', where 1, 2, and 3 are as follows, and FLDD is one of the field specifications described above (AUTH, etc.). 1 - one of 't', 's', or 'o', where 't' means that the total term counts should be displayed after the term, 's' means that the special and total term counts should be displayed after the term, and 'o' means that only the term itself should be displayed 2 - one of 'b', 'c', 'e', or an integer from 3 to 9, where: 'b' - display terms beginning with the specified term 'c' - "center" terms; i.e., display half the terms before the specified term, and half the terms after the specified term 'e' - display terms ending with the specified term k - an integer from 3 to 9, indicating that (2/k)ths of the terms should be alphabetically before the specified term. Note that '4' is the same as 'c'. The value '9' is recommended for scrolled displays. 3 - One of 'i' or 'n', indicating for the 'b' and 'e' options above whether the specified term is to be included in the output, where 'i' means inclusive, and 'n' means non-inclusive. This value is ignored for other values of the previous character, but must be present as a place-holder. [ WARNING: SOME OF THESE TERM SPECIFICATIONS OPTIONS (COMBINATIONS OF 1, 2, AND 3 ABOVE) ARE CURRENTLY UNIMPLEMENTED ] WORLD WIDE WEB STYLE OUTPUT (-w option) The entrcmd program can also generate output which is appropriate for display in an HTML document, to be "served" by a WWW server. In particular, some output text contains HTML hypertext links to other data, as well as HTML formatting information. The parameter to the -w option is the directory prefix for the linked hypertext items; "/htbin" is recommended. If the "-w" option is selected, then the "-f" option may also be selected. This indicates that the HTML output should be of a form which is appropriate for a HTML "FORM". This output can only be processed by advanced WWW clients, but potentially provides a nicer interface, where each document summary has an associated checkbox, resulting in a display which is similar to the Entrez CD-ROM application. The "-c" option, if used in conjunction with "-f", indicates that these checkboxes should be "pre-checked", i.e., selected. This potentially provides the equivalent of the Entrez "select all" operation for neighboring. Hope this helps, -Jonathan At 04:09 PM 2/3/2002 , Xiaowu Gai wrote: >I browsed through the documentation of BioPerl and could not seem to find anything about using Entrez with BioPerl, in other words, it appears impossible to do an Entrez search in your program written in BioPerl? Why does BioPerl not support Entrez? Is there a way I can work around it? Any known program for Net Entrez so I call it in my program? (I downloaded the Network Entrez from NCBI and played with it a little bit, but it has this GUI and no command line version). Jonathan Epstein Jonathan_Epstein@nih.gov Head, Unit on Biologic Computation (301)402-4563 Office of the Scientific Director Bldg 31, Room 2A47 Nat. Inst. of Child Health & Human Development 31 Center Drive National Institutes of Health Bethesda, MD 20892 From lstein@cshl.org Mon Feb 4 23:03:06 2002 From: lstein@cshl.org (Lincoln Stein) Date: Mon, 4 Feb 2002 18:03:06 -0500 Subject: [Bioperl-l] Bio::Graphics In-Reply-To: <3C5EBBA3.104B58CC@ebi.ac.uk> References: <02020410431202.30045@fontina> <3C5EBBA3.104B58CC@ebi.ac.uk> Message-ID: <0202041803062U.30150@fontina> Actually Bio::Graphics does introduce a dependency on the GD module, which is usually found on Linux distributions, but not universal. I hadn't thought of that. I could add Bio::Graphics to the existing bioperl-gui package, but I'm not so keen to do that since it is hidden on the FTP site. From the bioperl.org main page takes two clicks and a cut-and-paste to get to the bioperl-gui distribution. Yargs. Lincoln On Monday 04 February 2002 11:49, Heikki Lehvaslaiho wrote: > Lincoln Stein wrote: > > Anyone mind if I add Bio::Graphics (about 6000 lines of code) to the main > > Bioperl branch? It's a library that renders Bio::SeqI-compliant objects > > onto a canvas to create static PNG or JPEG images. > > Following the policy of keeping dependiencies to minimum, shouldn't it go > into bioperl-gui? > > It this still a good idea? I am beginning to have doubts. Last summer we > thought basic perl only modules are in bioperl-live. Modules with extra > dependencies should go into reparate repositories. So far we have e.g. > > bioperl-db for RDM dependencies (mySQL, Postgres) > bioperl-gui for TK GUI modules, cared for by Mark Wilkinson > bioperl-ext for C extensions > > > I'd hate to see this devision to hinder the addition of the new code when > it was meant to make installation easier. > > > -Heikki > > > It was part of the generic genome browser, but since it's library code, > > it probably belongs in BioPerl rather than at the application level. > > > > Lincoln > > > > -- > > ======================================================================== > > Lincoln D. Stein Cold Spring Harbor Laboratory > > lstein@cshl.org Cold Spring Harbor, NY > > ======================================================================== > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From lstein@cshl.org Mon Feb 4 23:15:34 2002 From: lstein@cshl.org (Lincoln Stein) Date: Mon, 4 Feb 2002 18:15:34 -0500 Subject: [Bioperl-l] Boulder In-Reply-To: References: Message-ID: <0202041815342W.30150@fontina> I've just released version 1.27 of Boulder, which robustifies NCBI blast parsing (and eliminates some warnings) and restores full functionality for NCBI Entrez fetching. You'll find it on CPAN, or at stein.cshl.org/software/boulder/ I don't feel like supporting Boulder for much longer, so I encourage people to migrate to the equivalent Bioperl tools! Lincoln -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From jason@cgt.mc.duke.edu Mon Feb 4 23:44:05 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Mon, 4 Feb 2002 18:44:05 -0500 (EST) Subject: [Bioperl-l] Bio::Graphics In-Reply-To: <0202041803062U.30150@fontina> Message-ID: On Mon, 4 Feb 2002, Lincoln Stein wrote: > Actually Bio::Graphics does introduce a dependency on the GD module, which is > usually found on Linux distributions, but not universal. I hadn't thought of > that. > > I could add Bio::Graphics to the existing bioperl-gui package, but I'm > not so keen to do that since it is hidden on the FTP site. From the > bioperl.org main page takes two clicks and a cut-and-paste to get to > the bioperl-gui distribution. Yargs. > I want to use a CVS trick to combine elements from different CVS modules into a single checkout - I would also like us to do a better job with the whole distribution set of packages. We should do a better job providing links to all the sets of packages that we have regardless of whether or not Bio::Graphics makes it into bioperl-live or bioperl-gui. Its on my list but happy for our webteam or other volunteers to help out here. In the meantime I'd prefer Bio::Graphics to live in bioperl-gui but only if we can make it really easy for people to get it. In bioperl-gui I also checked in some modules called MapView contributed by a third party that have not been bioperl-alized (named init parameters, regression tests, Bio::Root::Root inheritance and exception throwing) -j > Lincoln > > On Monday 04 February 2002 11:49, Heikki Lehvaslaiho wrote: > > Lincoln Stein wrote: > > > Anyone mind if I add Bio::Graphics (about 6000 lines of code) to the main > > > Bioperl branch? It's a library that renders Bio::SeqI-compliant objects > > > onto a canvas to create static PNG or JPEG images. > > > > Following the policy of keeping dependiencies to minimum, shouldn't it go > > into bioperl-gui? > > > > It this still a good idea? I am beginning to have doubts. Last summer we > > thought basic perl only modules are in bioperl-live. Modules with extra > > dependencies should go into reparate repositories. So far we have e.g. > > > > bioperl-db for RDM dependencies (mySQL, Postgres) > > bioperl-gui for TK GUI modules, cared for by Mark Wilkinson > > bioperl-ext for C extensions > > > > > > I'd hate to see this devision to hinder the addition of the new code when > > it was meant to make installation easier. > > > > > > -Heikki > > > > > It was part of the generic genome browser, but since it's library code, > > > it probably belongs in BioPerl rather than at the application level. > > > > > > Lincoln > > > > > > -- > > > ======================================================================== > > > Lincoln D. Stein Cold Spring Harbor Laboratory > > > lstein@cshl.org Cold Spring Harbor, NY > > > ======================================================================== > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From birney@ebi.ac.uk Tue Feb 5 09:32:22 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 5 Feb 2002 09:32:22 +0000 (GMT) Subject: [Bioperl-l] Bio::Graphics In-Reply-To: <0202041803062U.30150@fontina> Message-ID: On Mon, 4 Feb 2002, Lincoln Stein wrote: > Actually Bio::Graphics does introduce a dependency on the GD module, which is > usually found on Linux distributions, but not universal. I hadn't thought of > that. > > I could add Bio::Graphics to the existing bioperl-gui package, but I'm not so > keen to do that since it is hidden on the FTP site. From the bioperl.org > main page takes two clicks and a cut-and-paste to get to the bioperl-gui > distribution. Yargs. It is an debate point. GD dependancy is not as great as the TK dependency the rest of -gui has, and is more "server side" (argument for putting it into bioperl-live). But... splitting things up is good otherwise we just have a behmouth of a system ---- but it is relatively well structured directory-wise. Of course, we let Lincoln check in DB::GFF because we like him and wanted him to contribute so.... Oh Vey. I don't know. I vote marginally for putting it into bioperl-live From marino@tofu.tamu.edu Tue Feb 5 13:04:21 2002 From: marino@tofu.tamu.edu (Leonardo Marino-Ramirez) Date: Tue, 5 Feb 2002 07:04:21 -0600 (CST) Subject: [Bioperl-l] Bio::Graphics In-Reply-To: <0202041803062U.30150@fontina> Message-ID: I agree with Ewan, I think that the place for Bio::Graphics is bioperl-live. On Mon, 4 Feb 2002, Lincoln Stein wrote: > Actually Bio::Graphics does introduce a dependency on the GD module, which is > usually found on Linux distributions, but not universal. I hadn't thought of > that. > > I could add Bio::Graphics to the existing bioperl-gui package, but I'm not so > keen to do that since it is hidden on the FTP site. From the bioperl.org > main page takes two clicks and a cut-and-paste to get to the bioperl-gui > distribution. Yargs. > > Lincoln > > On Monday 04 February 2002 11:49, Heikki Lehvaslaiho wrote: > > Lincoln Stein wrote: > > > Anyone mind if I add Bio::Graphics (about 6000 lines of code) to the main > > > Bioperl branch? It's a library that renders Bio::SeqI-compliant objects > > > onto a canvas to create static PNG or JPEG images. > > > > Following the policy of keeping dependiencies to minimum, shouldn't it go > > into bioperl-gui? > > > > It this still a good idea? I am beginning to have doubts. Last summer we > > thought basic perl only modules are in bioperl-live. Modules with extra > > dependencies should go into reparate repositories. So far we have e.g. > > > > bioperl-db for RDM dependencies (mySQL, Postgres) > > bioperl-gui for TK GUI modules, cared for by Mark Wilkinson > > bioperl-ext for C extensions > > > > > > I'd hate to see this devision to hinder the addition of the new code when > > it was meant to make installation easier. > > > > > > -Heikki > > > > > It was part of the generic genome browser, but since it's library code, > > > it probably belongs in BioPerl rather than at the application level. > > > > > > Lincoln > > > > > > -- > > > ======================================================================== > > > Lincoln D. Stein Cold Spring Harbor Laboratory > > > lstein@cshl.org Cold Spring Harbor, NY > > > ======================================================================== > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > -- ___ _/ ________________________________________________________________ _/ _/ _/ _/_/_/ Leonardo Marino-Ramirez lmarino@tamu.edu _/ _/_/ _/_/ _/ Biochemistry Department, Texas A&M University _/_/_/_/ _/ _/_/_/ 2128 TAMU, College Station, TX 77843-2128, USA _/ _/ _/ Voice: (979) 862-4055 Fax: (979) 845-9274 ___ _/ _/ _/ _________________________________________________ From lstein@cshl.org Tue Feb 5 13:12:52 2002 From: lstein@cshl.org (Lincoln Stein) Date: Tue, 5 Feb 2002 08:12:52 -0500 Subject: [Bioperl-l] Bio::Graphics In-Reply-To: References: <0202041803062U.30150@fontina> Message-ID: <15455.55892.944605.348157@pesto.lsjs.org> I appreciate your letting me check in DB::GFF. I suppose it sticks out a bit from the rest of Bioperl, but I am changing the documentation to expose the Bioperl API and mute the Ace API aspects of the thing. Bio::Graphics shouldn't stick out since it uses Bio::SeqI slavishly. On the subject of the BioSQL database, Chris Mungall's Gadfly API happened to be close enough to DB::GFF so that the Berkeley group was able to get the generic genome browser running on top of it pretty quickly. Since BioSQL is now morphing into a more general purpose annotation database, I would like to explain the parts of the DB::GFF API that are extensions to Bio::SeqIO that the genome browser depends on. If you happen to borrow these API components for BioSQL, then the browser will run on top of BioSQL from day one. $db = Bio::DB::GFF->new(...); # bioperl compliant $seq = $db->get_Seq_by_id($id); # bioperl compliant $seq = $db->get_Seq_by_acc($id); # bioperl compliant $seq = $db->get_Seq_by_XXX($id); # bioperl compliant $stream = $db->get_Stream_by_id([$id,...]); # "bioperl compliant" $stream = $db->get_Stream_by_batch([$id,...]); # "bioperl compliant" $seq = $stream->next_seq; # bioperl compliant # NOTE: actually, neither of the get_Stream_by_XXX calls is part of # RandomAccessI; DB::SwissProt uses the batch form and DB::GenBank # uses the by_id() form # My extensions # Title: segment() # Construct a Bio::SeqI object based on the name of a landmark, # and optionally the start and/or end of the segment to retrieve. # Whatever needs to be done to span the assembly happens at this # point. Think of this as a lightweight make_virtual_contig() # call. $segment = $db->segment(-name=>$name,-start=>$start,-end=>$end); # Title: absolute() # Toggle on and off relative coordinate addressing. Segments # start out as relative to the landmark named in the segment() # call; passing a true flag to absolute will force coordinates # derived from the segment to be absolute to the highest container. # (if you don't want to implement relative coordinate addressing, # then just make everything absolute by default). $segment->absolute([$flag]) # Title: features() # This is also called all_SeqFeatures() for Bioperl compatibility # but it has different calling conventions. It returns all # features that overlap the segment, optionally filtering # them by their type. The type is a string, and can be # a DAML/OIL path once Mike Ashburner, Suzi and I publish # the DAS feature ontology. @seq = $segment->features(-type=>['type1','type2','type3'], @other_options_you_dont_care_about); # Title: get_feature_stream() # As above, but fetches a sequence stream. This is also called # get_seq_stream() for Bio::SeqIO compatibility, but it takes # different arguments, so I thought it best to rename it. $stream = $segment->get_feature_stream(-type=>['type1','type2','type3'], @other_options_you_dont_care_about); # Title: contained_features(), contained_in(), # get_contained_features_stream(), get_contained_in_stream() # These retrieve features based on other types of relative location # information # Title: $db->features(), $db->get_feature_stream()... # You can call the features() family directly on the # database object, to suck out all its features... # Title: text_search() # No database inspired by NCBI would be complete without a full-text search... $stream = $db->get_text_search_stream('text to search') # Title: attribute searches # Simple attribute search. Keys of the hash are the attribute # names, values are desired values to match. All matches are # exact strings, and multiple attributes are ANDed together. # (I'm not particularly enthusiastic about this; it's a hack) $stream = $db->get_feature_stream(-attributes=> \%attribute_hash) # Pass thru a SQL query to the database. Must be done very # carefully in order to reconstruct the objects properly... $stream = $db->get_feature_stream(-query=>'SQL QUERY') Lincoln Ewan Birney writes: > On Mon, 4 Feb 2002, Lincoln Stein wrote: > > > Actually Bio::Graphics does introduce a dependency on the GD module, which is > > usually found on Linux distributions, but not universal. I hadn't thought of > > that. > > > > I could add Bio::Graphics to the existing bioperl-gui package, but I'm not so > > keen to do that since it is hidden on the FTP site. From the bioperl.org > > main page takes two clicks and a cut-and-paste to get to the bioperl-gui > > distribution. Yargs. > > > It is an debate point. GD dependancy is not as great as the TK dependency > the rest of -gui has, and is more "server side" (argument for putting it > into bioperl-live). But... splitting things up is good otherwise we just > have a behmouth of a system ---- but it is relatively well structured > directory-wise. > > > Of course, we let Lincoln check in DB::GFF because we like him and wanted > him to contribute so.... > > > Oh Vey. I don't know. > > > I vote marginally for putting it into bioperl-live > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY Positions available at my lab: see http://stein.cshl.org/#hire ======================================================================== From birney@ebi.ac.uk Tue Feb 5 14:09:42 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 5 Feb 2002 14:09:42 +0000 (GMT) Subject: [Bioperl-l] Bio::Graphics In-Reply-To: <15455.55892.944605.348157@pesto.lsjs.org> Message-ID: What about writing a Bio::DAS::DataSourceI or something similar lincoln which encapsulates that, and then - yes - I think it would be great to make BioSQL inheriet from that.... ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From mwilkinson@gene.pbi.nrc.ca Tue Feb 5 15:10:56 2002 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Tue, 05 Feb 2002 09:10:56 -0600 Subject: [Bioperl-l] Bio::Graphics References: <02020410431202.30045@fontina> <3C5EBBA3.104B58CC@ebi.ac.uk> <0202041803062U.30150@fontina> Message-ID: <3C5FF600.F66D3222@gene.pbi.nrc.ca> Lincoln Stein wrote: > I could add Bio::Graphics to the existing bioperl-gui package, but I'm not so > keen to do that since it is hidden on the FTP site. From the bioperl.org > main page takes two clicks and a cut-and-paste to get to the bioperl-gui > distribution. Yargs. hear hear! :-) I'd love it if we could make the bioperl-gui package a bit more... "visible"... M -- -------------------------------- "Speed is subsittute fo accurancy." ________________________________ Dr. Mark Wilkinson Bioinformatics Group National Research Council of Canada Plant Biotechnology Institute 110 Gymnasium Place Saskatoon, SK Canada From jason@cgt.mc.duke.edu Tue Feb 5 17:37:52 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Tue, 5 Feb 2002 12:37:52 -0500 (EST) Subject: [Bioperl-l] internal code review of Bio::Search Message-ID: [frac_identity question] When people are interested in what fraction of a HSP is identical or conserved - one is typically looking for the fraction identical in the HSP or relative to the whole sequence? I assume relative to just the portion that is participating the HSP, but just wanted to make sure... Does anyone else have code that calculates this so I can validate my implementation - I have been doing it against the entire length of the hit/query not the smaller portion in the HSP which I believe is wrong and am currently fixing that. [length question] In the Bio::Search::HSP objects I think there are some confusing parts wrt to length - any help in nomeclature or docs would be appreciated. (part of Hilmar's original SimilarityPair/FeaturePair which I have kept around) $hsp->query->seqlength - length of the entire query piece $hsp->query->length - length of the query participating in the HSP "" ditto s/query/hit/ (my added method to get at the HSP length) $hsp->hsp_length - length of the HSP (which includes gaps added from query and hit) (Steve's HSPI length methods - see Bio::Search::HSP::HSPI for docs) $hsp->length('total') - length of the HSP $hsp->length('query') - length of the query in the HSP $hsp->length('hit') - length of the hit in the HSP -j -- Jason Stajich Duke University jason@cgt.mc.duke.edu From jason@cgt.mc.duke.edu Tue Feb 5 23:27:53 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Tue, 5 Feb 2002 18:27:53 -0500 (EST) Subject: [Bioperl-l] [Bioperl-guts-l] Notification: incoming/1077 (fwd) Message-ID: Lana - Any reason you can't use fasta format as input to clustal? We have basically stopped supporting PIR format because none of our developers use it and after sending out request of interest on the list severl months ago no one responded that were using it. If there are indeed users out there we would be happy to add the appropriate code if we can get good use cases and example files that do not work with the current code. In any event I would certainly upgrade to bioperl 0.7.2 in the near future and the future 1.0 release as I suspect the pir handling is not so great in that 0.7.0 release. -jason -- Jason Stajich Duke University jason@cgt.mc.duke.edu ---------- Forwarded message ---------- Date: Tue, 5 Feb 2002 18:02:18 -0500 From: bioperl-bugs@bioperl.org To: bioperl-guts-l@bioperl.org Subject: [Bioperl-guts-l] Notification: incoming/1077 JitterBug notification new message incoming/1077 Message summary for PR#1077 From: Lana Schaffer Subject: file formats Date: Tue, 05 Feb 2002 15:08:37 -0800 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From lschaffe@agouron.com Tue Feb 5 18:02:17 2002 Received: from tbone.agouron.com (tbone.agouron.com [198.182.177.3]) by pw600a.bioperl.org (8.12.2/8.12.2) with SMTP id g15N2FPX023149 for ; Tue, 5 Feb 2002 18:02:16 -0500 Received: from relay.agouron.com by tbone.agouron.com via smtpd (for pw600a.bioperl.org [199.93.107.70]) with SMTP; 5 Feb 2002 23:08:39 UT Received: from agouron.com (rudy.agouron.com [10.0.76.106]) by hermes.agouron.com (8.10.2+Sun/8.9.3) with ESMTP id g15N8bQ10069 for ; Tue, 5 Feb 2002 15:08:37 -0800 (PST) Sender: lschaffe@agouron.com Message-ID: <3C6065F5.93DD6611@agouron.com> Date: Tue, 05 Feb 2002 15:08:37 -0800 From: Lana Schaffer Organization: Agouron X-Mailer: Mozilla 4.77C-SGI [en] (X11; I; IRIX64 6.5 IP30) X-Accept-Language: en MIME-Version: 1.0 To: bioperl-bugs@bioperl.org Subject: file formats Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I am starting to use Bioperl and presently have Bioperl 0.7 and unix OS IRIX64 6.5. I have used a small script to change the protein sequence format from fasta to PIR. The output perl file from Bioperl is: ------------------------------------------------------------------- >P1;CDK2 >P1;CDK2 MENFQKVEKI GEGTYGVVYK ARNKLTGEVV ALKKIRLDTE TEGVPSTAIR EISLLKELNH PNIVKLLDVI HTENKLYLVF EFLHQDLKKF MDASALTGIP LPLIKSYLFQ LLQGLAFCHS HRVLHRDLKP QNLLINTEGA IKLADFGLAR AFGVPVRTYT HEVVTLWYRA PEILLGCKYY STAVDIWSLG CIFAEMVTRR ALFPGDSEID QLFRIFRTLG TPDEVVWPGV TSMPDYKPSF PKWARQDFSK VVPPLDEDGR SLLSQMLHYD PNKRISAKAA LAHPFFQDVT KPVPHLRL ------------------------------------------------------------------- and the output/input perl file for clustalw is: ------------------------------------------------------------------- >P1;CDK2 MENFQKVEKIGEGTYGVVYKARNKLTGEVVALKKIRLDTETEGVPSTAIREISLLKELNH PNIVKLLDVIHTENKLYLVFEFLHQDLKKFMDASALTGIPLPLIKSYLFQLLQGLAFCHS HRVLHRDLKPQNLLINTEGAIKLADFGLARAFGVPVRTYTHEVVTLWYRAPEILLGCKYY STAVDIWSLGCIFAEMVTRRALFPGDSEIDQLFRIFRTLGTPDEVVWPGVTSMPDYKPSF PKWARQDFSKVVPPLDEDGRSLLSQMLHYDPNKRISAKAALAHPFFQDVTKPVPHLRL * ------------------------------------------------------------------- The clustalw program will produce an error when an import is attempted with the Bioperl format. The "*" is missing and the ">P1;CDK2" is repeated twice. I don't know if this format is corrected in more updated releases of Bioperl. However, this format needs to be compatible with other programs. Please update me on this problem. Thanks, -- Lana Schaffer Computational Chemistry schaffer@pfizer.com Pfizer Global R&D-LJ phone (858) 622-3002 10777 Science Center Drive fax (858) 678-8244 San Diego, CA 92122-1111 _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-guts-l From vijayk@gene.ccmbindia.org Wed Feb 6 19:25:48 2002 From: vijayk@gene.ccmbindia.org (Vijay Kaza) Date: Wed, 6 Feb 2002 11:25:48 -0800 (PST) Subject: [Bioperl-l] Request Message-ID: Dear members, I am in the process of learning perl and I realise that it is going to take a while for me to develop some decent skills in it. Meanwhile, I immediately need a couple of perl scripts that may be in use (actually i am sure somebody would have done it already) to run multiple sequence files through the following Gene identification programs:- 1. Genscan 2. Glimmer 3. Morgan 4. GeneSplicer I'd be extremely thankful to anybody out there who can help me in this matter as I am running out of time. Thanking you, Vijay Centre for Cellular and Molecular Biology, Hyderabad, India. vijayk@gene.ccmbindia.org From yjiang@ucsd.edu Wed Feb 6 18:11:26 2002 From: yjiang@ucsd.edu (Yong Jiang) Date: Wed, 06 Feb 2002 10:11:26 -0800 Subject: [Bioperl-l] help for installation bioperl Reahat linux Message-ID: <3C6171CD.FCF1C982@ucsd.edu> Hello, I am trying to install the bioperl 0.7.2 in my linux system, it reminded me the there is no clustalw installed on my platform, actually the clustalw1.82 is in my computer and it works well. Also I got a message that no string is installed, the BIO::DB won't work. Can someone tell me how this happen and how to solve it? Thanks in advance. yong From jason@cgt.mc.duke.edu Wed Feb 6 19:09:17 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Wed, 6 Feb 2002 14:09:17 -0500 (EST) Subject: [Bioperl-l] help for installation bioperl Reahat linux In-Reply-To: <3C6171CD.FCF1C982@ucsd.edu> Message-ID: On Wed, 6 Feb 2002, Yong Jiang wrote: > Hello, I am trying to install the bioperl 0.7.2 in my linux system, it > reminded me the there is no clustalw installed on my platform, actually > the clustalw1.82 is in my computer and it works well. Also I got a See the documentation in Bio::Tools::Run::Alignment::Clustalw. You will need to set the env variable CLUSTALDIR to point to the dir where your clustalw is installed. I have fixed the logic for this in the later 0.9.3 release to be a little bit smarter if it can find the exe in your path but setting CLUSTALDIR env variable will get you what you need in 0.7.2. > message that no string is installed, the BIO::DB won't work. Can someone > tell me how this happen and how to solve it? You need to install the package IO::String from CPAN. > Thanks in advance. > yong > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From taniaoh@yahoo.com Thu Feb 7 01:14:36 2002 From: taniaoh@yahoo.com (Tania Oh) Date: Thu, 7 Feb 2002 09:14:36 +0800 Subject: [Bioperl-l] oracle and bioperl/ensembl In-Reply-To: Message-ID: Hi all, the new group I'm in now is interested in using ensembl only if there is an oracle -> bioperl/ensembl DBAdaptor. I remember reading on the list sometime back that there was some development going on in providing an oracle port? is that development still going on or is there some other list I can get on for more information? any suggestions / comments about using bioperl/ensembl with an oracle DB is appreciated, esp. since it'll boost my argument about the advantages of using bioperl/ensembl for annotation!!! thanks in advance, Tania Oh Genomic Institute of Singapore http://www.genomeinstitute.org/ From maltman@onestepbeyond.com Wed Feb 6 23:57:24 2002 From: maltman@onestepbeyond.com (Mark) Date: Wed, 6 Feb 2002 18:57:24 -0500 Subject: [Bioperl-l] Flash Website Builder Message-ID: <200202062357.g16NvOk16695@host101.osbnet.com> 123Turnkey.com
BUILD A CUSTOM WEBSITE... ONLINE... IN JUST FIVE MINUTES!

Click Here to start building your FREE* custom website today!



Having a presence on the internet is gradually becoming a necessity!

Whether you are a business person, looking to keep in touch with old college and high school friends, or just want to post some pictures of the kids for the family, do what millions of others are doing and build your own personal website today.
CHECK OUT THESE FEATURES:

· Flash Introductions, Headers and Buttons

· Complete library of sounds and music

· Hundreds of custom designed templates

· Web Board

· Photo Editor

· Easy to use Content Editor



Personal websites are becoming commonplace and many of us are now using them as our personal or professional face to the world.

One Step Beyond, one of the most trusted names in custom website development, can now equip you with the same amazing tools we have used to build and sell thousands of websites to our customers. The newest release of 123Turnkey will help you to design and build your own website in just minutes and you need absolutely no programming skills. Just follow our easy, step-by-step instructions and you can have your own professional looking website up and running today.

*Normally, a custom designed website with features like this would cost thousands of dollars, but you can build your today for only $9.95 per month. Try it Risk Free. If you don't like it for any reason just cancel it within the first 30 days and there is no charge.

Click Here to Start Now!

From benb@fruitfly.BDGP.berkeley.edu Thu Feb 7 02:13:12 2002 From: benb@fruitfly.BDGP.berkeley.edu (Benjamin Berman) Date: Wed, 06 Feb 2002 18:13:12 -0800 Subject: [Bioperl-l] Substitution matrix format for Bio::Tools::pSW (Wise2) Message-ID: <5.1.0.14.0.20020206180732.03935008@skittles.lbl.gov> Sorry if this has come up before, I couldn't find any reference to it. Why is the subsitution matrix format for Bio::Tools::pSW (actually from Wise2) different from that used by blast? From what I can tell, the normal blast format includes the alphabet characters as the first element of every row and every column, while the Wise2 format includes them only as the first element of every column. I guess it's not a practical problem because the conversion seems pretty trivial. But I'm curious to know if I've got this straight or if I'm missing something. It would probably be nice to include some explanation of this in the documentation for pSW - to keep newbies like me from getting tripped up. Thanks, ben. ------ Benjamin Berman Rubin Lab, 539 Life Sciences Addition Department of Molecular and Cell Biology University of California, Berkeley benb@fruitfly.org From lstein@cshl.org Thu Feb 7 02:39:15 2002 From: lstein@cshl.org (Lincoln Stein) Date: Wed, 6 Feb 2002 21:39:15 -0500 Subject: [Bioperl-l] oracle and bioperl/ensembl In-Reply-To: References: Message-ID: <0202062139150A.02540@fontina> Yes, my group has done a port of EnsEMBL to oracle, and we're now integrating it back into the main EnsEMBL development track. If you need it now, we can help you out, or you can wait for it to appear in the main track. Lincoln On Wednesday 06 February 2002 20:14, Tania Oh wrote: > Hi all, > > the new group I'm in now is interested in using ensembl only if there is > an oracle -> bioperl/ensembl DBAdaptor. > > I remember reading on the list sometime back that there was some > development going on in providing an oracle port? > is that development still going on or is there some other list I can get on > for more information? > > any suggestions / comments about using bioperl/ensembl with an oracle DB is > appreciated, esp. since it'll boost my argument about the advantages of > using bioperl/ensembl for annotation!!! > > thanks in advance, > > Tania Oh > > Genomic Institute of Singapore > http://www.genomeinstitute.org/ > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From lstein@cshl.org Thu Feb 7 02:41:04 2002 From: lstein@cshl.org (Lincoln Stein) Date: Wed, 6 Feb 2002 21:41:04 -0500 Subject: [Bioperl-l] Bio::Graphics In-Reply-To: References: Message-ID: <0202062141040B.02540@fontina> Thanks for the invitation. It'll be there soon! Lincoln On Tuesday 05 February 2002 09:09, Ewan Birney wrote: > What about writing a Bio::DAS::DataSourceI or something similar lincoln > which encapsulates that, and then - yes - I think it would be great to > make BioSQL inheriet from that.... > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ======================================================================== Lincoln D. Stein Cold Spring Harbor Laboratory lstein@cshl.org Cold Spring Harbor, NY ======================================================================== From elia@fugu-sg.org Thu Feb 7 03:15:46 2002 From: elia@fugu-sg.org (Elia Stupka) Date: Thu, 7 Feb 2002 11:15:46 +0800 (SGT) Subject: [Bioperl-l] Re: oracle and bioperl/ensembl In-Reply-To: Message-ID: > I remember reading on the list sometime back that there was some development > going on in providing an oracle port? > is that development still going on or is there some other list I can get on > for more information? As far as Ensembl is concerned see: http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/OraclePortSpecifics.html I haven't written Oracle Adaptors in bioperl-db, but it should be very straight-forward. If you read the Wiki above, "porting to Oracle" simply means converting the schema to an acceptable Oracle schema, and changing some of the sql statements in the adaptors with a script. Elia -- ******************************** * http://www.fugu-sg.org/~elia * * tel: +65 874 1467 * * mobile: +65 90307613 * * fax: +65 777 0402 * ******************************** From birney@ebi.ac.uk Thu Feb 7 07:49:45 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Thu, 7 Feb 2002 07:49:45 +0000 (GMT) Subject: [Bioperl-l] Substitution matrix format for Bio::Tools::pSW (Wise2) In-Reply-To: <5.1.0.14.0.20020206180732.03935008@skittles.lbl.gov> Message-ID: My feeling is that we should possibly think about dropping the bioperl-ext completely as I don't think it is that useful. how many people use pSW in anger? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From tjf84089@glaxowellcome.co.uk Thu Feb 7 10:44:09 2002 From: tjf84089@glaxowellcome.co.uk (Fulton, Tim J) Date: Thu, 7 Feb 2002 10:44:09 -0000 Subject: [Bioperl-l] RE: oracle and bioperl/ensembl Message-ID: <7165E3A0BA56D411BE6700D0B77FC8AE02196441@ukz808.ggr.co.uk> Guys -you may have realised by my total silence recently that I've been moved onto other things away from Oracle/Ensembl in my day to day existence. Given that I'm the perpetrator of said Wiki page, I'll try to keep an eye out for any queries (?is that a pun?) arising and attempt to proffer info if I can but can offer little or no tangible code. I'm sure others around the place are continuing Oracle developments and obviously any updates to the Wiki page would be welcomed by everybody. It was put in place many schema iterations ago and is likely to be incomplete now. Apologies for my early retirement (I wish!), but it's a simple case of "C'est la vie", I'm afraid. Hope all is well with you all... > -----Original Message----- > From: Elia Stupka [SMTP:elia@fugu-sg.org] > Sent: 07 February 2002 03:16 > To: Tania Oh > Cc: Ensembl-Dev; Bioperl > Subject: Re: oracle and bioperl/ensembl > > > I remember reading on the list sometime back that there was some > development > > going on in providing an oracle port? > > is that development still going on or is there some other list I can get > on > > for more information? > > As far as Ensembl is concerned see: > > http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/OraclePortSpecifics.html > > I haven't written Oracle Adaptors in bioperl-db, but it should be very > straight-forward. If you read the Wiki above, "porting to Oracle" simply > means converting the schema to an acceptable Oracle schema, and changing > some of the sql statements in the adaptors with a script. > > Elia > > -- > ******************************** > * http://www.fugu-sg.org/~elia * > * tel: +65 874 1467 * > * mobile: +65 90307613 * > * fax: +65 777 0402 * > ******************************** > From heikki@ebi.ac.uk Thu Feb 7 11:55:19 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Thu, 07 Feb 2002 11:55:19 +0000 Subject: [Bioperl-l] [Fwd: [Biohackathon] Re: bibliography stuff in perl] Message-ID: <3C626B27.1337998B@ebi.ac.uk> Bioperlers, The bibliographic classes, java code and a SOAP server were described by Martin Senger at the Tucson Biohachathon. I started writing Bio::Biblio classes according to Martin's schema while at Hackthon. Next, Martin will write parsers to read Medline XML data in to objects. Learn more about Martin's work on bibliographies at http://industry.ebi.ac.uk/openBQS/ -Heikki -------- Original Message -------- Subject: [Biohackathon] Re: bibliography stuff in perl Date: Wed, 6 Feb 2002 15:15:35 +0000 (GMT) From: Martin Senger Reply-To: biohackathon@egenetics.com To: Jason Stajich CC: biohackathon@egenetics.com Jason, I have discussed the plan with Heikke and thenks to him I seem now to understand the way how it should be done to be compliant with the bioperl styles/policies. > The Medline XML > parser would be excellent - I imagined that a Bio::Biblio::IO factory > would create a Bio::Biblio::IO::medlinexml parser. This would build and > populate the appropriate Bibilo objects just as we do in SeqIO and > SearchIO. > Yes. An object Bio::Biblio::IO will instantiate and use an instance of Bio::Biblio::IO::medlinexml if '-format=>'medlinexml' is given. It reads XML MEDLINE file with one or more citations and convert them into the Heikke's objects of type Bio::Biblio::RefI (actually its sub-classes). I do not expect to implement also a 'write' method to converting back Heikke's objects to the XML MEDLINE format. The class Bio::Biblio::IO will be a subclass of Bio::SeqIO. Is this correct vision? Additionally to that I plan to write a Bio::Factory::Biblio module which instantiates and uses - again depending on the 'format' parameter (or 'protocol' in this case?) - module Bio::Factory::Biblio::soap. It implements query methods (similar as they are defined now for Java in BibRefQuery and BibRefSupport interfaces) in order to get citations using SOAP protocol. Note that this is _not_ repository-dependent - it can deliver MEDLINE citations as well as citations from other bibliographic repository (the difference is in the contents of the XML file - but this module knows only that it is a string). Does it still fit with your vision? And, finally, I need also to provide the bibligraphy web service based on SOAP, here at EBI, but it is a different question, not related only to bioperl. Regarding the XML parser, I have not decided which one to use. I will play with them a bit. Martin -- Martin Senger EMBL Outstation - Hinxton Senger@EBI.ac.uk European Bioinformatics Institute Phone: (+44) 1223 494636 Wellcome Trust Genome Campus (Switchboard: 494444) Hinxton Fax : (+44) 1223 494468 Cambridge CB10 1SD United Kingdom http://industry.ebi.ac.uk/~senger _______________________________________________ Biohackathon mailing list Biohackathon@egenetics.com http://fling.egenetics.com/mailman/listinfo/biohackathon From Andrew.Hynes@ogs.co.uk Thu Feb 7 14:34:41 2002 From: Andrew.Hynes@ogs.co.uk (Andrew Hynes) Date: Thu, 7 Feb 2002 14:34:41 -0000 Subject: [Bioperl-l] (no subject) Message-ID: <934C1A6D9596D511B59E0002B34BC52AC667A0@selene.ogs.co.uk> Andrew M Hynes PhD Bioinformatician Software Engineering Oxford GlycoSciences The Forum 86 Milton Park Abingdon Oxon OX14 4RY 01235 208065 ********************************************************************** The information transmitted by this email is private and confidential and is intended for the use of the intended recipients specified therein. If you are neither an intended recipient nor an employee or agent responsible for delivery to an intended recipient, you should be aware that any dissemination, distribution or copying of this communication is strictly prohibited. If you received this communication in error, please notify us immediately. ********************************************************************** From mongin@ebi.ac.uk Thu Feb 7 17:03:15 2002 From: mongin@ebi.ac.uk (Emmanuel Mongin) Date: Thu, 7 Feb 2002 17:03:15 +0000 (GMT) Subject: [Bioperl-l] ID line parsing in swiss.pm Message-ID: Hi, The STANDARD/PRELIMINARY tag in SWISS-PROT (ID line) entries does not seem to be parsed by swiss.pm. This is quite useful to know if the entry is a SWISS-PROT or and sptrembl entry. This could be stored in the annotation object, something called like entry_tag. $line =~ /^ID\s+([^\s_]+)(_([^\s_]+))?\s+([^\s;]+);\s+([^\s;]+);/ || $self->throw("swissprot stream with no ID. Not swissprot in my book"); if( $3 ) { $name = "$1$2"; $seq->division($3); } else { $name = $1; $seq->division('UNK'); } ################## #Get here the entry tag $seq->annotation->add_Annotation('entry_tag',$4); ################## $seq->primary_id($1); $seq->alphabet('protein'); # this is important to have the id for display in e.g. FTHelper, otherwise # you won't know which entry caused an error $seq->display_id($name); Any comments? Emmanuel ----------------------------------------------------- Emmanuel Mongin mongin@ebi.ac.uk Tel: +44 (0)1223 49 46 87 Mobile: +44 (0)7813 32 12 82 ----------------------------------------------------- From gabriele_rearick@hp.com Thu Feb 7 17:13:47 2002 From: gabriele_rearick@hp.com (REARICK,GABRIELE (HP-FtCollins,ex1)) Date: Thu, 7 Feb 2002 09:13:47 -0800 Subject: [Bioperl-l] bioperl available on HP machines (HPUX11 in particular) ? Message-ID: Hello, I am working for HP and my goal is to make bioperl available on HPUX11 for PARISC and IA64. I am a novice to bioperl and associated codes. I would appreciate any input: 1) Is anybody out there who had done some work already in this area? 2) Are there mailing list archives describing installation and other problems. Thanks a lot for any input/links/hints in advance. Gabriele Rearick From benb@fruitfly.BDGP.berkeley.edu Thu Feb 7 18:10:49 2002 From: benb@fruitfly.BDGP.berkeley.edu (Benjamin Berman) Date: Thu, 07 Feb 2002 10:10:49 -0800 Subject: [Bioperl-l] Substitution matrix format for Bio::Tools::pSW (Wise2) In-Reply-To: References: <5.1.0.14.0.20020206180732.03935008@skittles.lbl.gov> Message-ID: <5.1.0.14.0.20020207100929.032aefa0@skittles.lbl.gov> Ewan, I thought pSW might be a nice, fast SW implementation. But I guess not? thanks, ben. At 07:49 AM 2/7/2002 +0000, Ewan Birney wrote: >My feeling is that we should possibly think about dropping the bioperl-ext >completely as I don't think it is that useful. > > >how many people use pSW in anger? ------ Benjamin Berman Rubin Lab, 539 Life Sciences Addition Department of Molecular and Cell Biology University of California, Berkeley benb@fruitfly.org From cheng_gong@yahoo.com Thu Feb 7 21:21:55 2002 From: cheng_gong@yahoo.com (Gong Cheng) Date: Thu, 7 Feb 2002 13:21:55 -0800 (PST) Subject: [Bioperl-l] Re: Bioperl-l digest, Vol 1 #611 - 14 msgs In-Reply-To: <200202071703.g17H3DPX010255@pw600a.bioperl.org> Message-ID: <20020207212155.30855.qmail@web13402.mail.yahoo.com> unsubscribe --- bioperl-l-request@bioperl.org wrote: > Send Bioperl-l mailing list submissions to > bioperl-l@bioperl.org > > To subscribe or unsubscribe via the World Wide Web, > visit > http://bioperl.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body > 'help' to > bioperl-l-request@bioperl.org > > You can reach the person managing the list at > bioperl-l-admin@bioperl.org > > When replying, please edit your Subject line so it > is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > 1. help for installation bioperl Reahat linux > (Yong Jiang) > 2. Re: help for installation bioperl Reahat > linux (Jason Stajich) > 3. oracle and bioperl/ensembl (Tania Oh) > 4. Flash Website Builder (Mark) > 5. Substitution matrix format for Bio::Tools::pSW > (Wise2) (Benjamin Berman) > 6. Re: oracle and bioperl/ensembl (Lincoln Stein) > 7. Re: Bio::Graphics (Lincoln Stein) > 8. Re: oracle and bioperl/ensembl (Elia Stupka) > 9. Re: Substitution matrix format for > Bio::Tools::pSW > (Wise2) (Ewan Birney) > 10. RE: oracle and bioperl/ensembl (Fulton, Tim J) > 11. [Fwd: [Biohackathon] Re: bibliography stuff in > perl] (Heikki Lehvaslaiho) > 12. (no subject) (Andrew Hynes) > 13. ID line parsing in swiss.pm (Emmanuel Mongin) > > --__--__-- > > Message: 1 > Date: Wed, 06 Feb 2002 10:11:26 -0800 > From: Yong Jiang > Organization: UCSD > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] help for installation bioperl > Reahat linux > > Hello, I am trying to install the bioperl 0.7.2 in > my linux system, it > reminded me the there is no clustalw installed on my > platform, actually > the clustalw1.82 is in my computer and it works > well. Also I got a > message that no string is installed, the BIO::DB > won't work. Can someone > tell me how this happen and how to solve it? > Thanks in advance. > yong > > > --__--__-- > > Message: 2 > Date: Wed, 6 Feb 2002 14:09:17 -0500 (EST) > From: Jason Stajich > To: Yong Jiang > cc: > Subject: Re: [Bioperl-l] help for installation > bioperl Reahat linux > > On Wed, 6 Feb 2002, Yong Jiang wrote: > > > Hello, I am trying to install the bioperl 0.7.2 in > my linux system, it > > reminded me the there is no clustalw installed on > my platform, actually > > the clustalw1.82 is in my computer and it works > well. Also I got a > > > See the documentation in > Bio::Tools::Run::Alignment::Clustalw. > You will need to set the env variable CLUSTALDIR to > point to the dir > where your clustalw is installed. I have fixed the > logic for this in the > later 0.9.3 release to be a little bit smarter if it > can find the exe in > your path but setting CLUSTALDIR env variable will > get you what you need > in 0.7.2. > > > message that no string is installed, the BIO::DB > won't work. Can someone > > tell me how this happen and how to solve it? > > You need to install the package IO::String from > CPAN. > > > Thanks in advance. > > yong > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason@cgt.mc.duke.edu > > > --__--__-- > > Message: 3 > From: "Tania Oh" > To: "Ensembl-Dev" > Cc: "Bioperl" > Date: Thu, 7 Feb 2002 09:14:36 +0800 > Subject: [Bioperl-l] oracle and bioperl/ensembl > > > Hi all, > > the new group I'm in now is interested in using > ensembl only if there is an > oracle -> bioperl/ensembl DBAdaptor. > > I remember reading on the list sometime back that > there was some development > going on in providing an oracle port? > is that development still going on or is there some > other list I can get on > for more information? > > any suggestions / comments about using > bioperl/ensembl with an oracle DB is > appreciated, esp. since it'll boost my argument > about the advantages of > using bioperl/ensembl for annotation!!! > > thanks in advance, > > Tania Oh > > Genomic Institute of Singapore > http://www.genomeinstitute.org/ > > > > --__--__-- > > Message: 4 > Date: Wed, 6 Feb 2002 18:57:24 -0500 > From: Mark > To: bioperl-l@bioperl.org > Subject: [Bioperl-l] Flash Website Builder > > Transitional//EN"> > > > > 123Turnkey.com > > > topmargin=0> > cellpadding="0" border="0" width="752"> > >
> > > > > > > > > > > > === message truncated === __________________________________________________ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com From dag@sonsorol.org Thu Feb 7 22:11:39 2002 From: dag@sonsorol.org (chris dagdigian) Date: Thu, 07 Feb 2002 17:11:39 -0500 Subject: [Bioperl-l] spam, viral email & info needed for the next Open Bio Foundation newsletter Message-ID: <3C62FB9B.5010609@sonsorol.org> Hi folks, Sorry for the mass cross-post. I've got a few quick things to mention: (1) The quality of our mailing lists has been degraded both by more instances of spam getting through our filters and the ocasional viral payload sent by Outlook users. The spam problem has been addressed- We have finally faxed in all of the paperwork necessary (11 pages!) for the O|B|F to become a subscriber to the combined RBL/RSS/DUL blackhole databases maintained by the folks at mail-abuse.net. People interested in what RBL+ is should visit this URL: http://www.mail-abuse.org/rbl+/ We are going to refuse inbound email from (a) known spammers and spam friendly networks, (b) known open relays and (c) IP addresses blocks used by ISP dialup customers. Once MAPS processes our info and allows our mail server to query the service I think that 99% of the spam will go away. This was the case back when the RBL service was free and we used it all the time. The virus problem is going to take longer to fix. Because we are going to transition from Linux-on-Alpha to Sun Solaris systems we are going to delay the process of purchasing or downloading antivirus scanners that hook into sendmail until we are up and running on the new boxes. Ok. Enough talk about bad stuff...on to the good stuff... ** It's time for another Open Bioinformatics Foundation Newsletter ** The first one we wrote back in October was very well recieved and it is past time to put out a new issue. I'm soliciting information from the various project heads-- write up anything you want about your project and get it to me within a week or so for inclusion. As an example of what we are trying to do, see the 1st newsletter online at http://open-bio.org/pipermail/open-bioinformatics-foundation/2001-October/000001.html Regards, Chris -- Chris Dagdigian, Life Science IT & Research Computing Freelancer Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193 Yahoo IM: craffi From allenday@ucla.edu Thu Feb 7 23:51:39 2002 From: allenday@ucla.edu (Allen Day) Date: Thu, 7 Feb 2002 15:51:39 -0800 (PST) Subject: [Bioperl-l] ID line parsing in swiss.pm In-Reply-To: Message-ID: There is some ID tag parsing in Bio::Tools::SwissProtParser that you can use. It should merged into swiss.pm, but I haven't done it yet. -Allen > Hi, > > The STANDARD/PRELIMINARY tag in SWISS-PROT (ID line) entries does not seem > to be parsed by swiss.pm. This is quite useful to know if the entry is a > SWISS-PROT or and sptrembl entry. > This could be stored in the annotation object, something called like > entry_tag. > > > > $line =~ /^ID\s+([^\s_]+)(_([^\s_]+))?\s+([^\s;]+);\s+([^\s;]+);/ > || $self->throw("swissprot stream with no ID. Not swissprot in my > book"); > if( $3 ) { > $name = "$1$2"; > $seq->division($3); > } else { > $name = $1; > $seq->division('UNK'); > } > > ################## > #Get here the entry tag > $seq->annotation->add_Annotation('entry_tag',$4); > ################## > > $seq->primary_id($1); > $seq->alphabet('protein'); > # this is important to have the id for display in e.g. FTHelper, > otherwise > # you won't know which entry caused an error > $seq->display_id($name); > > > Any comments? > > Emmanuel > > > ----------------------------------------------------- > Emmanuel Mongin mongin@ebi.ac.uk > Tel: +44 (0)1223 49 46 87 > Mobile: +44 (0)7813 32 12 82 > ----------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From elia@fugu-sg.org Fri Feb 8 09:33:09 2002 From: elia@fugu-sg.org (Elia Stupka) Date: Fri, 8 Feb 2002 17:33:09 +0800 (SGT) Subject: [Bioperl-l] Re: collaborations In-Reply-To: Message-ID: > Just talking about group of protein sequences, which looks like > protein families :), I plan (for after NCBI28 Ensembl release) to work > on multiple alignments of protein families using clustalw combined with > T-coffee, or T-coffee alone. :) That is exactly where we are heading, and where Jason Stajich is working a lot too. We would draw trees from groups of orthologues and paralogues that we spot during our synteny finding process. We need to get to work together on this, more on Monday, otherwise I'll miss my flight Elia -- ******************************** * http://www.fugu-sg.org/~elia * * tel: +65 874 1467 * * mobile: +65 90307613 * * fax: +65 777 0402 * ******************************** From eae@sanger.ac.uk Fri Feb 8 10:08:21 2002 From: eae@sanger.ac.uk (Eduardo Eyras) Date: Fri, 8 Feb 2002 10:08:21 +0000 (GMT) Subject: [Bioperl-l] Re: collaborations In-Reply-To: Message-ID: On Fri, 8 Feb 2002, Elia Stupka wrote: > > I am very much interested in the protein side of this as we are now > > planning to integrate all species into our data mining system. I am just > > not sure if that's can still be called a collaboration or is it just a > > parasitic interaction on my part :-) > > We should aim for symbyosis :) maybe we'll better off with comensalism ;-) Eduardo From haoliu@rci.rutgers.edu Fri Feb 8 15:43:37 2002 From: haoliu@rci.rutgers.edu (Hao Liu) Date: Fri, 8 Feb 2002 10:43:37 -0500 (EST) Subject: [Bioperl-l] problem using BPlite Message-ID: I tried to parse a blast result: foreach $file (@files) { my $report = new Bio::Tools::Blast ( -file => $_, -parse=> 1 ); however, when I run the program, the error message says Can't locate object method "new" via package "Bio::Tools::Blast" at /cluster/bioinfo/ scripts/filterblast line 26. I am not sure if there is a installation problem, or it can't find the perl module... help, please! Thanks Sincerely -Hao Liu From arek@ebi.ac.uk Fri Feb 8 09:52:32 2002 From: arek@ebi.ac.uk (Arek Kasprzyk) Date: Fri, 8 Feb 2002 09:52:32 +0000 (GMT) Subject: [Bioperl-l] Re: collaborations In-Reply-To: Message-ID: On Fri, 8 Feb 2002, Elia Stupka wrote: > > Just talking about group of protein sequences, which looks like > > protein families :), I plan (for after NCBI28 Ensembl release) to work > > on multiple alignments of protein families using clustalw combined with > > T-coffee, or T-coffee alone. :) > > That is exactly where we are heading, and where Jason Stajich is working a > lot too. We would draw trees from groups of orthologues and paralogues > that we spot during our synteny finding process. > > We need to get to work together on this, I am very much interested in the protein side of this as we are now planning to integrate all species into our data mining system. I am just not sure if that's can still be called a collaboration or is it just a parasitic interaction on my part :-) arek > > more on Monday, otherwise I'll miss my flight > > Elia > > -- > ******************************** > * http://www.fugu-sg.org/~elia * > * tel: +65 874 1467 * > * mobile: +65 90307613 * > * fax: +65 777 0402 * > ******************************** > > > ------------------------------------------------------------------------------- Dr Arek Kasprzyk EMBL-European Bioinformatics Institute. Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. Tel: +44-(0)1223-494606 Fax: +44-(0)1223-494468 ------------------------------------------------------------------------------- From michal@orfeus.bioinfo.pl Fri Feb 8 12:44:46 2002 From: michal@orfeus.bioinfo.pl (Michal Kurowski) Date: Fri, 8 Feb 2002 13:44:46 +0100 Subject: [Bioperl-l] Re: Substitution matrix format for Bio::Tools::pSW (Wise2) In-Reply-To: ; from birney@ebi.ac.uk on Thu, Feb 07, 2002 at 07:49:45AM +0000 References: <5.1.0.14.0.20020206180732.03935008@skittles.lbl.gov> Message-ID: <20020208134446.A15008@orfeus> Ewan Birney [birney@ebi.ac.uk] wrote: > > how many people use pSW in anger? > Sorry Ewan, but there are some people who do find it useful. I've heard Perl6 will have no XS, so someone could rewrite it for Parrot ;-). More seriously, pSW is able to return nice SimpleAlign objects and bl2seq for example can be really tricky when it comes to it. Cheers, -- Michal Kurowski From matthew_pocock@yahoo.co.uk Fri Feb 8 17:12:47 2002 From: matthew_pocock@yahoo.co.uk (Matthew Pocock) Date: Fri, 08 Feb 2002 17:12:47 +0000 Subject: [Bioperl-l] Restricting annotations Message-ID: <3C64070F.8020100@yahoo.co.uk> Hi all. Sorry for the cross-posting. I was thinking of writing a system to constrain or validate the set of keys in an annotation bundle. I see that BioPerl has one already. Does anybody have views about this. * Is it useful? * Is it necisary? * What does it need to validate? * Do annotations need an ISA slot (like Object.getClass()), or should you always validate them by hand (like annType.instanceOf(ann))? * Do slots need any behavior, or are they purely places to put stuff? My initial plan was to knock something up that validated that an annotation had a given set of keys, and that their values where of an apropreate type. I have no wish to implement an entire frames language, just a bit of validation over our fluffy annotations. With any luck, we can produce an 'srs formated file' -> 'constrained annotation bundle' parser that will work for lots of use-cases. Matthew From jason@cgt.mc.duke.edu Fri Feb 8 17:23:24 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Fri, 8 Feb 2002 12:23:24 -0500 (EST) Subject: [Bioperl-l] Re: Substitution matrix format for Bio::Tools::pSW (Wise2) In-Reply-To: <20020208134446.A15008@orfeus> Message-ID: I have written an emboss alignment parser so you don't have to use bl2seq to get pairwise alignments We have emboss application capabilities in the dev release - this is going to be the way of the future I think. You can do the following: #!/usr/bin/perl -w use strict; use Bio::Factory::EMBOSS; use Bio::AlignIO; my $factory = new Bio::Factory::EMBOSS(); my $water = $factory->program('water'); $water->run({-sequencea => 'a.fa', -seqall => 'b.fa', -datafile => 'EBLOSUM50', -outfile => 'a_b.emboss', }); if( -e "a_b.emboss" ) { my $alignin = new Bio::AlignIO(-format => 'emboss', -file => 'a_b.emboss'); my $aln = $alignin->next_aln(); my $alignout = new Bio::AlignIO(-format => 'clustalw', -fh => \*STDOUT); $alignout->write_aln($aln); } else { print STDERR "unable to run the emboss program\n"; } We will eventually want to add some wrapper methods so you won't have to explicitly dump your sequence files out or instantiate an AlignIO reader. We're also working with Martin Senger at EBI and Catherine Letondal at Pasteur to interface with the PISE Web and openBSA CORBA interfaces for executing programs so that you won't necessarily have to install EMBOSS on your machine if you don't want to and can execute these analyses remotely (or on your own farm). I feel that this is a better way for us to go than bioperl-ext and it allows us to integrate with other open source package. Using the emboss pkg also allows people to get SW DNA alignments which have been past requests. -jason On Fri, 8 Feb 2002, Michal Kurowski wrote: > Ewan Birney [birney@ebi.ac.uk] wrote: > > > > > how many people use pSW in anger? > > > > Sorry Ewan, but there are some people who do find it useful. > I've heard Perl6 will have no XS, so someone could rewrite it for Parrot > ;-). > > More seriously, pSW is able to return nice SimpleAlign objects and > bl2seq for example can be really tricky when it comes to it. > > > Cheers, > > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From birney@ebi.ac.uk Fri Feb 8 12:15:16 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Fri, 8 Feb 2002 12:15:16 +0000 (GMT) Subject: [Bioperl-l] Re: Substitution matrix format for Bio::Tools::pSW (Wise2) In-Reply-To: <20020208134446.A15008@orfeus> Message-ID: Maybe I should write a clean extension layer to the new exonerate/C4 system from guy slater, which is the natural successor to the Wise2 project. Hmmm. Not this week for sure. From schattner@alum.mit.edu Fri Feb 8 16:25:20 2002 From: schattner@alum.mit.edu (Peter Schattner) Date: Fri, 08 Feb 2002 08:25:20 -0800 Subject: [Bioperl-l] Update of Bioperl tutorial for release 1.0 References: Message-ID: <3C63FBF0.A14FED7@alum.mit.edu> I've committed updates to the main branch of the bioperl-live CVS for the bioperl tutorial-and-script (bptutorial.pl) to include the changes and new features for Bioperl 1.0. Sections that have been added or significantly changed include those relating to: Map, Tree and Structure objects, parsing with Search/SearchIO, bioperl functions (ie the Bio::Perl object), running EMBOSS applications and RichSeq and SeqWithQuality objects. I may well have not completely or correctly understood the use of these objects, so I would recommend that the authors of these modules (and any others interested) check that the tutorial is correct and complete, and, if not email corrections and/or additions. I will then incorporate them (alternately you can modify the CVS yourself, but I would be grateful if you let me know what changes you've made - thanks) Also if there are other bioperl objects / features to be included in 1.0 that are still not covered in the tutorial, please let me know soon. Thanks Peter From xgai@iastate.edu Fri Feb 8 21:16:27 2002 From: xgai@iastate.edu (Xiaowu Gai) Date: Fri, 08 Feb 2002 15:16:27 -0600 Subject: [Bioperl-l] (no subject) Message-ID: <5.0.1.4.2.20020208150147.022489c0@xgai.mail.iastate.edu> Hi All: I am using BioPerl in my project and I found something that was really puzzling, I am not very sure it is a bug though: When I tried to use the get_Seq_by_acc method, it failed with the error message "Can not call method "_generic_seqfeature" on an undefined value at /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/genbank.pm line 272". However, it only gave me such error when I tried to get the sequence with the accession number of NT_006204 which is a big big sequence, and it worked just fine if I use another accession number such as AI770588 which is a short EST sequence. Can someone help me here? Thank you so much. Here is the codes that I used to test it: use Bio::DB::GenBank; use Bio::Seq; my $gb = new Bio::DB::GenBank(); my $seq = $gb->get_Seq_by_acc("NT_006204"); # does not work or my $seq = $gb->get_Seq_by_acc("AI770588"); # does work print $seq->desc(); Have a nice day. Xiaowu Xiaowu Gai, PhD Associate Scientist L. H. Baker Center for Bioinformatics and Biostatistics Iowa State University Ames, IA 50011 Phone: (515) 294-7624 Email: xgai@iastate.edu From kristen.briggs@genxy.com Sat Feb 9 00:33:10 2002 From: kristen.briggs@genxy.com (kristen briggs) Date: Fri, 08 Feb 2002 16:33:10 -0800 Subject: [Bioperl-l] problems with perling Makefile.PL Message-ID: <3C646E46.F53CEAD7@genxy.com> This is a multi-part message in MIME format. --------------F85018C86DEC295AAAD8202B Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hello, I'm a beginning programmer and I'm having trouble perling the Makefile.pl. I downloaded the binary bioperl-0.7.2.tar.gz for solaris to my pc, extracted the files, and moved the files over to my unix account. I then typed perl Makefile.PL and got the following message: "Illegal character \015 (carriage return) at Makefile.PL line 5. (Maybe you didn't strip carriage returns after a network transfer?)" Please advise me as to how to fix the Makefile.PL. I have no idea how to strip the carriage returns after a network transfer or that I even needed to strip carriage returns. I will be eternally grateful, Kris Briggs --------------F85018C86DEC295AAAD8202B Content-Type: text/x-vcard; charset=us-ascii; name="kristen.briggs.vcf" Content-Transfer-Encoding: 7bit Content-Description: Card for kristen briggs Content-Disposition: attachment; filename="kristen.briggs.vcf" begin:vcard n:Briggs, Ph.D.;Kristen tel;fax:858-597-2604 tel;work:858-597-2672 x-mozilla-html:FALSE adr:;;;;;; version:2.1 email;internet:kristen.briggs@genxy.com fn:Kristen Briggs, Ph.D. end:vcard --------------F85018C86DEC295AAAD8202B-- From heikki@ebi.ac.uk Sat Feb 9 08:10:03 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: 09 Feb 2002 08:10:03 +0000 Subject: [Bioperl-l] (no subject) In-Reply-To: <5.0.1.4.2.20020208150147.022489c0@xgai.mail.iastate.edu> References: <5.0.1.4.2.20020208150147.022489c0@xgai.mail.iastate.edu> Message-ID: <1013242205.16492.4.camel@bala> Xiawu, You've been caught by what I think is the most common newby problem in sequence retrieval. Entries with NT_ accession numbers are not GenBank entries at all. They just follow GenBank entry format. In fact, they are RefSeq entries and can not be found from the same place. You need to use Bio::DB::RefSeq module to retrieve them. Also, you could take a look at Bio::Perl::get_sequence() where this logic is built in. I think these modules are only in CVS at this point, definitely not on 0.7 releases. -Heikki On Fri, 2002-02-08 at 21:16, Xiaowu Gai wrote: > Hi All: > > I am using BioPerl in my project and I found something that was really > puzzling, I am not very sure it is a bug though: > > When I tried to use the get_Seq_by_acc method, it failed with the error > message "Can not call method "_generic_seqfeature" on an undefined value at > /usr/local/lib/perl5/site_perl/5.6.1/Bio/SeqIO/genbank.pm line 272". > However, it only gave me such error when I tried to get the sequence with > the accession number of NT_006204 which is a big big sequence, and it > worked just fine if I use another accession number such as AI770588 which > is a short EST sequence. Can someone help me here? Thank you so much. > > Here is the codes that I used to test it: > > use Bio::DB::GenBank; > use Bio::Seq; > > my $gb = new Bio::DB::GenBank(); > my $seq = $gb->get_Seq_by_acc("NT_006204"); # does not work > or > my $seq = $gb->get_Seq_by_acc("AI770588"); # does work > print $seq->desc(); > > Have a nice day. > > Xiaowu > > > Xiaowu Gai, PhD > Associate Scientist > L. H. Baker Center for Bioinformatics and Biostatistics > Iowa State University > Ames, IA 50011 > Phone: (515) 294-7624 > Email: xgai@iastate.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Sat Feb 9 08:29:12 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: 09 Feb 2002 08:29:12 +0000 Subject: [Bioperl-l] problems with perling Makefile.PL In-Reply-To: <3C646E46.F53CEAD7@genxy.com> References: <3C646E46.F53CEAD7@genxy.com> Message-ID: <1013243354.16492.6.camel@bala> Hmmm... My copy of 0.7 CVS Makefile.Pl does not contain any carriage returns. They should not matter, either.. Anyway, you should be able to get rid of them by typing: perl -pi -e 's/\015//' Makefile.PL or more UNIXy way of doing this is tr -d '\r' < Makefile.PL > newMakefile.PL and then rename the files. I hope this helps. -Heikki On Sat, 2002-02-09 at 00:33, kristen briggs wrote: > Hello, > > I'm a beginning programmer and I'm having trouble perling the > Makefile.pl. I downloaded the binary bioperl-0.7.2.tar.gz for solaris > to my pc, extracted the files, and moved the files over to my unix > account. > > I then typed perl Makefile.PL and got the following message: > > "Illegal character \015 (carriage return) at Makefile.PL line 5. (Maybe > you didn't strip carriage returns after a network transfer?)" > > Please advise me as to how to fix the Makefile.PL. I have no idea how > to strip the carriage returns after a network transfer or that I even > needed to strip carriage returns. > > I will be eternally grateful, > > Kris Briggs -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From nzaitlen@hotmail.com Sat Feb 9 12:47:23 2002 From: nzaitlen@hotmail.com (Noah Zaitlen) Date: Sat, 09 Feb 2002 04:47:23 -0800 Subject: [Bioperl-l] requesting NT files from Genbank Message-ID: I am trying to use the GenBank database acess modules to get NT_ files from NCBI. It works fine for other file types (i.e. the ones used in examples/getGenBank.pl). I tried the fix suggested in GenBank.pm $gb-Erequest_format('fasta') However, I get this error when I try to run it: Unrecognized file test: -E at getGenBank.pl line 15. Do you know how to fix this? Thanks, Noah Z. _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. From nzaitlen@hotmail.com Sat Feb 9 12:50:03 2002 From: nzaitlen@hotmail.com (Noah Zaitlen) Date: Sat, 09 Feb 2002 04:50:03 -0800 Subject: [Bioperl-l] Searching by GI Message-ID: Is there a way to search genbank by GI instead of id or acc. Or, is there a way to include the version number in the search? Thanks, Noah Z> _________________________________________________________________ Chat with friends online, try MSN Messenger: http://messenger.msn.com From heikki@ebi.ac.uk Sat Feb 9 13:24:33 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: 09 Feb 2002 13:24:33 +0000 Subject: [Bioperl-l] requesting NT files from Genbank In-Reply-To: References: Message-ID: <1013261075.23671.10.camel@bala> Noah, Read my answer to Xiaowu earlier today! -Heikki On Sat, 2002-02-09 at 12:47, Noah Zaitlen wrote: > I am trying to use the GenBank database acess modules to get NT_ files from > NCBI. It works fine for other file types (i.e. the ones used in > examples/getGenBank.pl). I tried the fix suggested in GenBank.pm > > $gb-Erequest_format('fasta') > > However, I get this error when I try to run it: > > Unrecognized file test: -E at getGenBank.pl line 15. > > Do you know how to fix this? > > Thanks, > Noah Z. > > > > > > _________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Sat Feb 9 13:26:44 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: 09 Feb 2002 13:26:44 +0000 Subject: [Bioperl-l] Searching by GI In-Reply-To: References: Message-ID: <1013261206.23671.12.camel@bala> Noah, The synopsis for Bio::DB::Genbank has these examples: $seq = $gb->get_Seq_by_id('MUSIGHBA1'); # Unique ID # or ... $seq = $gb->get_Seq_by_acc('J00522'); # Accession Number $seq = $gb->get_Seq_by_version('J00522.1'); # Accession.version $seq = $gb->get_Seq_by_gi('405830'); # GI Number -Heikki On Sat, 2002-02-09 at 12:50, Noah Zaitlen wrote: > Is there a way to search genbank by GI instead of id or acc. Or, is there a > way to include the version number in the search? > > Thanks, > Noah Z> > > > > _________________________________________________________________ > Chat with friends online, try MSN Messenger: http://messenger.msn.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jason@cgt.mc.duke.edu Sat Feb 9 20:11:19 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Sat, 9 Feb 2002 15:11:19 -0500 (EST) Subject: [Bioperl-l] [Bioperl-guts-l] Notification: incoming/1080 (fwd) Message-ID: Double check that XML::Parser::PerlSAX is installed. We probably aren't being explicit enough in the Makefile warnings as to what needs to be installed. You may find it much easier if you use the CPAN bundle to install the related modules: % perl -MCPAN -e shell CPAN> install Bundle::BioPerl Double check that XML::Writer is definitely installed - I get seg faults (on linux) when the tests are run in CPAN for me, but if I force it through all appears to be fine. (Note to Chris D - we need to make sure the bundle is definitely up to date for 1.00) -jason -- Jason Stajich Duke University jason@cgt.mc.duke.edu ---------- Forwarded message ---------- Date: Sat, 9 Feb 2002 12:51:07 -0500 From: bioperl-bugs@bioperl.org To: bioperl-guts-l@bioperl.org Subject: [Bioperl-guts-l] Notification: incoming/1080 JitterBug notification new message incoming/1080 Message summary for PR#1080 From: hu@pic.ansci.iastate.edu Subject: bioperl installation Date: Sat, 9 Feb 2002 12:51:06 -0500 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From hu@pic.ansci.iastate.edu Sat Feb 9 12:51:06 2002 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.12.2/8.12.2) with ESMTP id g19Hp6kO004936 for ; Sat, 9 Feb 2002 12:51:06 -0500 Date: Sat, 9 Feb 2002 12:51:06 -0500 Message-Id: <200202091751.g19Hp6kO004936@pw600a.bioperl.org> From: hu@pic.ansci.iastate.edu To: bioperl-bugs@bioperl.org Subject: bioperl installation Full_Name: Zhiliang Hu Module: Version: 0.72 PerlVer: 5.6 OS: RedHat lunix 7.0 Submission from: pic.ansci.iastate.edu (129.186.111.207) I have already installed XML::Node, XML::Writer, and IO::String using standard "perl -MCPAN -e 'modual_names'", and the "make test" on bioperl still complains: "The XML-format conversion requires the CPAN modules XML::Node, XML::Writer, and IO::String to be installed on your system, which they probably aren't. Skipping these tests." I wonder why? Zhiliang _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-guts-l From jason@cgt.mc.duke.edu Sat Feb 9 22:24:23 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Sat, 9 Feb 2002 17:24:23 -0500 (EST) Subject: [Bioperl-l] checking out all active bioperl CVS modules Message-ID: I've added a new alias on the CVS server that will allow you to checkout all the active bioperl CVS modules (bioperl-live,bioperl-gui,bioperl-db,bioperl-corba-server,bioperl-corba-client) Simply do % cvs -d YOURNORMALCVSROOT co bioperl_all where YOURNORMALCVSROOT is either -d:pserver:cvs@cvs.bioperl.org:/home/repository/bioperl OR -d:ext:USERNAME@bioperl.org:/home/repository/bioperl depending on whether or not you are checking out a writeable CVS tree. Thanks to James Stalker at Sanger/Ensembl for the CVS pointers. -jason -- Jason Stajich Duke University jason@cgt.mc.duke.edu From jason@cgt.mc.duke.edu Sat Feb 9 23:13:17 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Sat, 9 Feb 2002 18:13:17 -0500 (EST) Subject: [Bioperl-l] miscellanea Message-ID: A number of off-line things have been going on, just wanted to be sure and keep the community informed. Some day soon we will have an easy to update news system on the bioperl site so you can have your Knewsticker's flashing up bioperl newsupdates ;) - for now I am just sending this in a list email. * XML parsers. We're looking to converge on using 1 perl XML parser in the future. We want to go with SAX2 compliant parsers, but I think that we are still having some debates about what is speediest. My guess is that we'll leave things as they are for 1.00 but expect to retool the internals of any modules that use an XML parser and converge on a single solution where possible. * Bibliographic objects are getting added in Bio::Biblio. These are based on Martin Senger's design and we will be adding a Medline parser (and possibly Entrez NCBI XML if someone wants to help write it). We will have access to CORBA and HTTP fetches for remote data as well as hooks into an SQL db layer through the biosql project. * Markers and Maps will be coming up to speed as we test the objects out on real projects. * The Hackathon produced some great ideas - one of which is to formalize some of the sequence database access. We've decided to call the existing system of HTTP requests with an accession or gi and returning sequence data in a standard format (fasta,genbank,embl, bsml,agave) "BioFetch". This probably means that the DB.t tests should get moved to BioFetch.t and we leave only non-Biofetch DB tests in there (hmm and those are...?) * I'm considering proposing an event based parsing model for SeqIO (post 1.0 of course) in the same way the event based parsing was written for SearchIO. This would also be the time and place we could insert some smarter feature location parsing with a grammar (using Parse::RecDescent or equivalent ) rather than the pieced together regexps. * That said - we need an AGAVE SeqIO parser at some point - hopefully once the new framework is in place this will be extremely easy to write. Maybe the DT guys want to write one? * SteveC and I have been musing how we want to deal with the scripts and examples directories. Maybe it makes sense to have a single directory called scripts and have examples be located in there. The notion is that examples should demonstrate bioperl functionality but may not be general purpose (cmdline args, etc) while a script is something that people should be able to use out of the box for real work. There will also be some scripts which don't use bioperl - I have started a dir called scripts/contributed which is where these types of scripts can live. I would like to consider breaking this off into a new CVS module so we can grant write accounts to non-bioperl devs without worrying about erroneous commits to the main tree. With CVS alias magic I can actually make these appear the scripts/contributed directory anyways. * The current list of things to do for 1.00 are: - Finish code reviews for those who have agreed to do them. This should include answering the questions: Does the documentation make sense? Is the SYNOPSIS runnable? Does a reasonable test exist for all the pertinent objects? Are there any outstanding issues that need to be re-examined? - Verify Peter's bptutorial changes wrt to new modules, update the README, biodesign.pod, (Brian O has been way on top of this - Thanks!). - Check the bug list to see if there are any other gotchas that we should fix before the release (there is at least one SeqIO feature location parsing bugs that are showing up) (Other things I'm forgetting?) Anyone is welcome and encouraged to help with the above. Especially newcomers - if you can read some of the documentation and tell us what is unclear we can be sure and fix these before the release. If you do take something and get it completed, send a note to list so we can put a tick on the board and move on. -jason -- Jason Stajich Duke University jason@cgt.mc.duke.edu From jason@cgt.mc.duke.edu Sun Feb 10 15:37:12 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Sun, 10 Feb 2002 10:37:12 -0500 (EST) Subject: [Bioperl-l] miscellanea In-Reply-To: <200202100926.g1A9Qlw284520@electre.pasteur.fr> Message-ID: Brain turned off - forgot to mention that we're working on a spec for remote job execution. I'm comfortable with the proposed method names from the PISE interface that Catherine has put together. I guess I have to propigate these into the appropriate Bio::Factory::ApplicationI object. Any other takers who want to work on the spec? -jason -- Jason Stajich Duke University jason@cgt.mc.duke.edu From lynn_m_stevens@hotmail.com Sun Feb 10 18:41:04 2002 From: lynn_m_stevens@hotmail.com (Lynn Stevens) Date: Sun, 10 Feb 2002 10:41:04 -0800 Subject: [Bioperl-l] ORF FInder Message-ID: Is there a module in BioPerl which allows you to take a sequence and get back a list of all the ORFs (or even just the largest ORF) in all six frames (or even just one frame) indexed by sequence position. In other words you would submit a seq object and you would get back a set of numbers which tell you where the ORFs are located in the sequence. I have looked through all the documentation and still can not find this feature even though it seem like an extremely common task. Thanks for any help, Lynn _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. From jason@cgt.mc.duke.edu Sun Feb 10 20:04:16 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Sun, 10 Feb 2002 15:04:16 -0500 (EST) Subject: [Bioperl-l] ORF FInder In-Reply-To: Message-ID: Not in bioperl directly but you can use emboss's getorf program. On Sun, 10 Feb 2002, Lynn Stevens wrote: > Is there a module in BioPerl which allows you to take a sequence and get > back a list of all the ORFs (or even just the largest ORF) in all six frames > (or even just one frame) indexed by sequence position. > > In other words you would submit a seq object and you would get back a set of > numbers which tell you where the ORFs are located in the sequence. > > I have looked through all the documentation and still can not find this > feature even though it seem like an extremely common task. > > Thanks for any help, > > Lynn > > > _________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From heikki@ebi.ac.uk Mon Feb 11 09:26:18 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon, 11 Feb 2002 09:26:18 +0000 Subject: [Bioperl-l] ORF FInder References: Message-ID: <3C678E3A.43E9A10C@ebi.ac.uk> Lynn, Closest you get is translate_6frames() which can be called like: @seqs = Bio::SeqUtils->translate_6frames($nucseq); You are more than welcome to eleborate on that (using methods is_start_codon() and is_ter_codon() in Bio::Tools::CodonTable) to determine ORFs to return. I think: @cdss = Bio::SeqUtils->orfs($nucseq, $min_orf_len); would be useful. I'd love to include that into Bio::SeqUtils. -Heikki Lynn Stevens wrote: > > Is there a module in BioPerl which allows you to take a sequence and get > back a list of all the ORFs (or even just the largest ORF) in all six frames > (or even just one frame) indexed by sequence position. > > In other words you would submit a seq object and you would get back a set of > numbers which tell you where the ORFs are located in the sequence. > > I have looked through all the documentation and still can not find this > feature even though it seem like an extremely common task. > > Thanks for any help, > > Lynn > > _________________________________________________________________ > Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From heikki@ebi.ac.uk Mon Feb 11 10:00:05 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon, 11 Feb 2002 10:00:05 +0000 Subject: [Bioperl-l] Markers and Maps Message-ID: <3C679625.C51083B7@ebi.ac.uk> I've committed a set of reworked Bio::Map classes. The underlying logic behind existing classes was rewritten. All old tests pass and more. I also put in new classes for managing and comparing cytogenetic locations of type '2p13.1-q12' or 'Xq' or simply 'Y'. This description is in Bio::Map::Marker.pm: ---------------------------------------------------------------------- A Marker is a central object in Bio::Map name space. A Map is a holder class for objects. A Marker has a Position in a Map. A Marker can be compared to an other Markers using boolean methods. Positions can have non-numeric values or other methods to store the locations, so they have a method numeric() which does the conversion. A Marker has a convinience method position() which is able to create Positions of required class from scalars by calling method get_position_object(). For more complex situations, a Marker can have multiple positions in multiple Maps. It is therefore possible to extract Positions (all or belonging to certain Map) and compare Markers to them. It is up to the programmer to make sure position values and Maps they belong to can be sensibly compared. ---------------------------------------------------------------------- A dia UML model of the current setup is in models/bio_map.dia. Enjoy, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From michal@orfeus.bioinfo.pl Mon Feb 11 16:04:27 2002 From: michal@orfeus.bioinfo.pl (Michal Kurowski) Date: Mon, 11 Feb 2002 17:04:27 +0100 Subject: [Bioperl-l] SeqIO unwilling to work Message-ID: <20020211170426.A10731@orfeus> Hi, After spending some time with SeqIO interface I've got to say there are a few things a bit mysterious to me. Let's say I've got something like this: $file1 = param("file1"); $file2 = param("file2"); $area1 = param("text1"); $area2 = param("text2"); $stream1 = ""; $stream2 = ""; if ($file1) { while (<$file1>) { $stream1 .= $_; } } else { $stream1 = $area1; } if ($file2) { while (<$file2>) { $stream2 .= $_; } } else { $stream2 = $area2; } $stream1 =~ s/\r//g; $stream2 =~ s/\r//g; tie *IN1, 'IO::Scalar', \$stream1; tie *IN2, 'IO::Scalar', \$stream2; my $first = Bio::SeqIO->new(-fh => \*IN1); my $second = Bio::SeqIO->new(-fh => \*IN2); ...dealing with $first and $second ... It doesn't work. The point is "-fh" method (with a tied Symbol::gensym) does not work for me giving lots of mysterious errors (see below). It does not matter if I use object interface or "tied" objects. Operation `ne': no method found, left argument in overloaded package IO::Scalar, right argument has no overloaded magic at /usr/lib/perl5/site_perl/5.6.0/Bio/Root/IO.pm line 242. And when I try to investigate thing using "Data::Dumper": Use of uninitialized value in -d at /usr/lib/perl5/5.6.0/CGI.pm line 3355. fasta.pl: Use of uninitialized value in join at (eval 13) line 44. fasta.pl: Use of uninitialized value in join at (eval 13) line 44. fasta.pl: Can't locate object method "FETCH" via package "IO::Scalar" at /usr/lib/perl5/5.6.0/i386-linux/Data/Dumper.pm line 150, line 12. I thought "FETCH" is not necessary when using tied handles... (Bio)Perl wizards needed, -- Michal Kurowski From desbi2@yahoo.fr Mon Feb 11 16:44:18 2002 From: desbi2@yahoo.fr (=?iso-8859-1?q?CAROLINE=20BARRETTO?=) Date: Mon, 11 Feb 2002 17:44:18 +0100 (CET) Subject: [Bioperl-l] updating the swissprot database Message-ID: <20020211164418.57241.qmail@web12202.mail.yahoo.com> Hello, Does anybody know where I can find a script which makes an automated updating for the swiss-prot database ? I work with GCG and I would like to have these database updated weekly or every 2 weeks. Thank you for your help, Caroline. ___________________________________________________________ Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français ! Yahoo! Mail : http://fr.mail.yahoo.fr From gert.thijs@esat.kuleuven.ac.be Mon Feb 11 16:52:26 2002 From: gert.thijs@esat.kuleuven.ac.be (Gert Thijs) Date: Mon, 11 Feb 2002 17:52:26 +0100 Subject: [Bioperl-l] SeqIO unwilling to work References: <20020211170426.A10731@orfeus> Message-ID: <3C67F6CA.20106@esat.kuleuven.ac.be> Michal, Here is some code that I have been using lately and that does the job for me. --- my $query = new CGI; my $file = $query->param('file'); my $stream; if ( defined($file) ){ $stream = new Bio::SeqIO( -fh => $file, -format => 'fasta' ); }else{ print "Error: No sequences found.\n"; exit -1; } while ( my $ss = $stream->next_seq ){ # do some stuff } --- There was no need to first read the stream from $file into a string and then tie this string to a IO::Scalar. You can pass filehandle $file immediatly to Bio::SeqIO. Gert Michal Kurowski wrote: > Hi, > After spending some time with SeqIO interface I've got to say there > are a few things a bit mysterious to me. > > Let's say I've got something like this: > > > > $file1 = param("file1"); > $file2 = param("file2"); > $area1 = param("text1"); > $area2 = param("text2"); > > $stream1 = ""; > $stream2 = ""; > > if ($file1) { > while (<$file1>) { > $stream1 .= $_; > } > } else { > $stream1 = $area1; > } > > if ($file2) { > while (<$file2>) { > $stream2 .= $_; > } > } else { > $stream2 = $area2; > } > > $stream1 =~ s/\r//g; > $stream2 =~ s/\r//g; > > > tie *IN1, 'IO::Scalar', \$stream1; > tie *IN2, 'IO::Scalar', \$stream2; > > my $first = Bio::SeqIO->new(-fh => \*IN1); > my $second = Bio::SeqIO->new(-fh => \*IN2); > > ...dealing with $first and $second ... > > > It doesn't work. > The point is "-fh" method (with a tied Symbol::gensym) does not work > for me giving lots of mysterious errors (see below). It does not > matter if I use object interface or "tied" objects. > > Operation `ne': no method found, > left argument in overloaded package IO::Scalar, > right argument has no overloaded magic at > /usr/lib/perl5/site_perl/5.6.0/Bio/Root/IO.pm line 242. > > And when I try to investigate thing using "Data::Dumper": > > Use of uninitialized value in -d at /usr/lib/perl5/5.6.0/CGI.pm line > 3355. > fasta.pl: Use of uninitialized value in > join at (eval 13) line 44. > fasta.pl: Use of uninitialized value in > join at (eval 13) line 44. > fasta.pl: Can't locate object method > "FETCH" via package "IO::Scalar" at > /usr/lib/perl5/5.6.0/i386-linux/Data/Dumper.pm line 150, > line 12. > > I thought "FETCH" is not necessary when using tied handles... > > (Bio)Perl wizards needed, > > -- + Gert Thijs + + email: gert.thijs@esat.kuleuven.ac.be + homepage: http://www.esat.kuleuven.ac.be/~thijs + + K.U.Leuven + ESAT-SISTA + Kasteelpark Arenberg 10 + B-3001 Leuven-Heverlee + Belgium + Tel : +32 16 32 85 88 (new number) + Fax : +32 16 32 19 70 From dave.ardell@ebc.uu.se Mon Feb 11 17:53:27 2002 From: dave.ardell@ebc.uu.se (David Ardell) Date: Mon, 11 Feb 2002 18:53:27 +0100 Subject: [Bioperl-l] Bio::TreeIO References: Message-ID: <3C680517.165A669A@ebc.uu.se> Hello Jason, Hello Bioperl, Thanks for the response to my email a week ago regarding Tree and TreeIO. Since I've come home to my fast connection I've had time to skim bioperl 0.9.2 and the Tree and TreeIO modules. You asked about what modules I have written, and what suggestions I could make about bioperl. Over the last year, I have made notes about this. I am a bit afraid that these are already out of date, although a quick check of the 0.9.2 Changes file suggests maybe not. So here goes: First I'll illustrate what I have used bioperl to do for my work. Scripts: gapfree -- Remove gap containing sites from alignments (UnivAln). Important for some analyses. subfasta -- Extract sequences from multifasta files corresponding to reg-ex match on IDs or sequences. fasuniq -- Uniquify fasta files xl -- front end to translate, but supports gapped input (via my GapSeq class, see below), alignment of aa seq to coding seq, etc monocomp -- monomer composition with various levels of strictness on type, flexible about alphabets poscomp -- monomer composition in frames codaln -- an updated replacement to protal2dna -- alignment of DNA sequences by their protein translations. using the bioperl CLUSTALW driver I realize this must mostly be standard fare. A more specialized research application that I wrote in bioperl takes 1) an annotated sequence and 2) an alignment that includes that sequence, to then manipulate the alignment according to feature annotations of the sequence, to produce a reordered (subsequenced, complemented, etc.) alignment. ----------------------- Two packages that I made and the reasons why: GapSeqI -- an extension to PrimarySeqI to allow translation of sequences containing gaps (preserving frame). MySeqStats -- removed type-checking ------------------------ I guess the one major design-choice in bioperl that I find myself working around most is the built-in sequence strictness in parsers, constructors, and object methods. Some examples off the top of my head, where this has presented problems: MSF files can have tildes ('~') as gaps, and if I bring an MSF to a bioperl readable format, the tildes are retained. The '-' character is okay, but tildes choke somewhere between Seq and SeqIO. Another example: I use the sprinzl tRNA database that annotates nonstandard nucleotides in-sequence with 70 non-alphanumeric characters. Of course I can't be expected to translate this kind of sequence meaningfully, but why can't I convert it from fasta to selex within bioperl? I can't get this data past the bioperl-y gates. Suggestion: my wish would be for bioperl to have the same type-philosophy as perl itself -- type-permissive, and user beware. Functions always try to return something as reasonable as possible given the data. Strictness could be optionally enforced at the method- or programmer-level through predefined regexp tests a la if (seq->seq =~ [:DNA-IUPAC:]) {}, etc, so that functions could know what they are dealing with by examining the sequence and act appropriately. As a result of wanting to be able to translate gap-containing sequences (logically well-defined) I wrote GapSeq which is a PrimarySeqI. This led me to this question (actually, this note is old and I forget where I was making GapSeq to come to this question): should there be a copy constructor in order to be able to initialize derived objects with data from a file? How are you supposed to use the IO object with a derived sequence object? The following comments are just jotted notes: SeqIO: how do you get no-clobber behavior? ClustalW module: maybe a public list of supported parameter names? Bio::Tools::CodonTable: 1. Genetic code design should have a hash from names to code numbers, which would protect against reordering by NCBI -- ie programmer access should be 'ciliate' rather than '6' 2. Again, an exported hash of the tables supported by the module would be useful. 3. The name: what about Bio::Tools::TranslationTable? As a codon table means something pretty different to me. That's about it. I did have a look at Tree and TreeIO classes. They look like a excellent interface. I like that the abstraction and functions encompass gene genealogies right from the start. I haven't had a chance to play with them yet though. My modules, the ones I wrote about before, are built on top of Graph, Graph::Reader, and Graph::Writer. Their functionality seem to complement the functionality already existing in Bio::Tree and Bio::TreeIO. My modules are really focused on some of the more tedious everyday work of manipulating and publishing with phylogenetic trees. I would certainly like your suggestions of where to publish them (from a namespace perspective). Maybe the two efforts could be integrated. Thanks again for all of your tremendous effort for open source bioinformatics. I'll be passing along some scripts shortly. all the best dave -- Dr. David Ardell NSF Fellow in Bioinformatics Dept. of Molecular Evolution Uppsala University Norbyvägen 18C SE-75236 Uppsala, SWEDEN From dblock@gnf.org Mon Feb 11 17:06:19 2002 From: dblock@gnf.org (David Block) Date: Mon, 11 Feb 2002 09:06:19 -0800 Subject: [Bioperl-l] miscellanea In-Reply-To: Message-ID: On Saturday, February 9, 2002, at 03:13 PM, Jason Stajich wrote: > > * XML parsers. We're looking to converge on using 1 perl XML parser > in > the future. We want to go with SAX2 compliant parsers, but I think > that we are still having some debates about what is speediest. My > guess is that we'll leave things as they are for 1.00 but expect to > retool the internals of any modules that use an XML parser and > converge on a single solution where possible. > I'm not exactly the perl-xml expert, but I think the perl-xml community (at least the perl-xml list) is trying to build some standard SAX interfaces so that all the perl SAX parsers will use the same API. Then it wouldn't matter which SAX parser you have installed - any one that follows the spec will do. So converging can likely be to an interface, not to an implementation. I expect there will be some spirited competition in xml parsers for a while yet. Quote from a recent post by Matt Sargeant: Currently you have three options: XML::SAX::PurePerl, XML::SAX::Expat and XML::LibXML::SAX::Parser (oh, and XML::Parser::PerlSAX + XML::Filter::SAX1toSAX2). Your best bet is probably XML::SAX::Expat, because it's lightweight and fast, and we're working on a pure XS version (at the moment it's built on top of XML::Parser), which is nearing readiness. (end quote) There is a description of the interface at http://sax.perl.org specifically: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/perl- xml/libxml-perl/doc/sax-2.0.html?rev=HEAD&content-type=text/html (sorry if my mailer wraps that line) Hope this helps the decision making! -Dave > -jason > -- > Jason Stajich > Duke University > jason@cgt.mc.duke.edu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > > -- David Block (858)812-1513 Bioinformatics http://www.gnf.org dblock@gnf.org Just ridin' the Coaster... From cjm@fruitfly.bdgp.berkeley.edu Mon Feb 11 17:44:17 2002 From: cjm@fruitfly.bdgp.berkeley.edu (Chris Mungall) Date: Mon, 11 Feb 2002 09:44:17 -0800 (PST) Subject: [Bioperl-l] bioperl-db - changes In-Reply-To: <3C679625.C51083B7@ebi.ac.uk> Message-ID: I have committed some code to bioperl-db * Fuzzy Locations are now handled, using the location_qualifier_value table added to the biosql-schema during the hackathon. * Optimisations - all the features and locations for a sequence entry are now fetched in a few SQL calls rather than a number of calls proportional to the number of features. * Tidying - a lot of mysqlisms removed or pushed up to the BaseAdaptor layer, to allow for easier postgres support. Added a few generic ease of use methods to BaseAdaptor to more clearly expose the logic in the individual adaptor layer. * DBTestHarness now no longer uses the copy of the schema in the bioperl-db directory. Instead it checks ../biosql-schema/sql/biosqldb-mysql.sql - hmmm, this won't necessarily fit with the cvs re-organisation. How should we do this? An env var seems a bit nasty. --- Chris From heikki@ebi.ac.uk Mon Feb 11 18:14:11 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon, 11 Feb 2002 18:14:11 +0000 Subject: [Bioperl-l] updating the swissprot database References: <20020211164418.57241.qmail@web12202.mail.yahoo.com> Message-ID: <3C6809F3.A52C0E3D@ebi.ac.uk> CAROLINE BARRETTO wrote: > > Hello, > > Does anybody know where I can find a script which > makes an automated updating for the swiss-prot > database ? > I work with GCG and I would like to have these > database updated weekly or every 2 weeks. > > Thank you for your help, > > Caroline. Caroline, The way to do it used to be SynChron ftp://ftp.ebi.ac.uk/pub/software/unix/SynCron but that was a long time ago. I am out of touch. Send a mail to support@ebi.ac.uk and ask them. Or ask your national EMBnet node, they are doing this all the time. INFOBIOGEN is running the French EMBnet node and have SWISS-PROT at ftp://ftp.infobiogen.fr/pub/db/swissprot/ Yours, -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jason@cgt.mc.duke.edu Tue Feb 12 02:59:00 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Mon, 11 Feb 2002 21:59:00 -0500 (EST) Subject: [Bioperl-l] [Bioperl-guts-l] Notification: incoming/1082 (fwd) Message-ID: Dan - You're going to need to get the CDS from the gene object not the exon object - genscan doesn't provide the individual exon sequences, just the full CDS. Even though the exon object has a predicted_cds it doesn't necessarily mean it is filled in (but it is for MZEF prediction parsing). I guess one should be able to infer the sequence based on the exon table, but we don't parse it in that way. Perhaps it would be a good shortcut that someone would like to add? Probably incorperating parts from the below would work (double checking I didn't forget something...). In the meantime I think the following code should work (barring any off-by-one errors I might have accidently forgotten to check here). use Bio::Tools::Genscan; $input = shift or die $!; $genscan = Bio::Tools::Genscan->new(-file => $input); while ($gene = $genscan->next_prediction()){ @exon_arr = $gene->exons(); $predicted_cdna = $gene->predicted_cds(); $seq = $predicted_cdna->seq(); my $first = 1; my $l = 0; foreach $exon (@exon_arr){ my $start = $first; my $end = $first + $exon->length() - 1; my $s = $predicted_cdna->subseq($start,$end ); $first += $exon->length(); $l += length($s); print "exon seq is $s for $start..$end\n"; } } $genscan->close(); --- Jason Stajich Duke University jason@cgt.mc.duke.edu ---------- Forwarded message ---------- Date: Mon, 11 Feb 2002 20:19:12 -0500 From: bioperl-bugs@bioperl.org To: bioperl-guts-l@bioperl.org Subject: [Bioperl-guts-l] Notification: incoming/1082 JitterBug notification new message incoming/1082 Message summary for PR#1082 From: dli@tularik.com Subject: predicted_cds method doesn't return a promaryseqI obj Date: Mon, 11 Feb 2002 20:19:11 -0500 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From dli@tularik.com Mon Feb 11 20:19:11 2002 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.12.2/8.12.2) with ESMTP id g1C1JBkO000715 for ; Mon, 11 Feb 2002 20:19:11 -0500 Date: Mon, 11 Feb 2002 20:19:11 -0500 Message-Id: <200202120119.g1C1JBkO000715@pw600a.bioperl.org> From: dli@tularik.com To: bioperl-bugs@bioperl.org Subject: predicted_cds method doesn't return a promaryseqI obj Full_Name: Dan Li Module: Bio::Tools::Prediction::Exon Version: 0.7.2 PerlVer: 5.6.1 OS: linux Submission from: host190.tularik.com (216.88.144.190) When I run the following script using a standard genscan output file as input, method predicted_cds() didn't return a Bio::PromarySeqI object. Error message: Can't call method "seq" on an undefined value.... #! /usr/bin/perl use lib '/usr/local/lib/bioperl-0.7.2'; use Bio::Tools::Genscan; $input = shift or die $!; $genscan = Bio::Tools::Genscan->new(-file => $input); while ($gene = $genscan->next_prediction()){ @exon_arr = $gene->exons(); foreach $exon (@exon_arr){ $predicted_cdna = $exon->predicted_cds(); $seq = $predicted_cdna->seq(); print "$seq\n"; } } $genscan->close(); _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-guts-l From hilmarl@yahoo.com Tue Feb 12 07:24:00 2002 From: hilmarl@yahoo.com (Hilmar Lapp) Date: Mon, 11 Feb 2002 23:24:00 -0800 Subject: [Bioperl-l] Re: predicted_cds method doesn't return a promaryseqI obj References: Message-ID: <3C68C310.AA181C71@yahoo.com> Jason Stajich wrote: > > Dan - > > You're going to need to get the CDS from the gene object not the exon > object - genscan doesn't provide the individual exon sequences, just the > full CDS. Even though the exon object has a predicted_cds it doesn't > necessarily mean it is filled in (but it is for MZEF prediction parsing). Correct. The reason MZEF is different is because it only predicts (at least used to predict; not sure this is still true) exons, not genes as a composition of exons. > Subject: predicted_cds method doesn't return a promaryseqI obj In your example, it just returned undef. -hilmar -- ----------------------------------------------------------------- Hilmar Lapp email: hilmarl@yahoo.com San Diego, Ca. 92130 phone: +1 858 812 1757 ----------------------------------------------------------------- _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com From heikki@ebi.ac.uk Tue Feb 12 09:34:48 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue, 12 Feb 2002 09:34:48 +0000 Subject: [Bioperl-l] Markers and Maps References: Message-ID: <3C68E1B8.56B72692@ebi.ac.uk> Zhiliang Hu wrote: > > Dear Heikki, > > I am very interested in your Bio::Map. However I cannot find it on > CPAN nor www.bioperl.org sites. How may I obtain a copy? Zhiliang, And others I confused by not being explicit, When we on the bioperl list use the term "commit" we mean that we have added files into the bioperl CVS (Concurrent Versioning System) repository. Occasionally, a snapshot of the repository is taken and placed into CPAN. We are in the process of making a release 1.0 of bioperl, but right now the only way to access these modules are using CVS. You can use it anonymously (download only), too. See http://cvs.bioperl.org/ > By the way, is your moduel capable of making comparative maps? The schema, as it is now, consists of classes capable of holding map and marker infomation, only. The idea is that there markers can belong to several maps. There is no logic in there compute anything out of them. Nor is there any drawing code. All that can be added in due time if there is enough interest in willing hands to type in the code. Yours, -Heikki > Best regards, > > Zhiliang > > On Mon, 11 Feb 2002, Heikki Lehvaslaiho wrote: > > > > > I've committed a set of reworked Bio::Map classes. The underlying logic > > behind existing classes was rewritten. All old tests pass and more. > > I also put in new classes for managing and comparing cytogenetic locations > > of type '2p13.1-q12' or 'Xq' or simply 'Y'. > > > > > > This description is in Bio::Map::Marker.pm: > > ---------------------------------------------------------------------- > > A Marker is a central object in Bio::Map name space. A Map is a holder > > class for objects. A Marker has a Position in a Map. A Marker can be > > compared to an other Markers using boolean methods. Positions can have > > non-numeric values or other methods to store the locations, so they > > have a method numeric() which does the conversion. > > > > A Marker has a convinience method position() which is able to create > > Positions of required class from scalars by calling method > > get_position_object(). > > > > For more complex situations, a Marker can have multiple positions in > > multiple Maps. It is therefore possible to extract Positions (all or > > belonging to certain Map) and compare Markers to them. It is up to the > > programmer to make sure position values and Maps they belong to can be > > sensibly compared. > > ---------------------------------------------------------------------- > > > > A dia UML model of the current setup is in models/bio_map.dia. > > > > Enjoy, > > -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From birney@ebi.ac.uk Tue Feb 12 13:57:17 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 12 Feb 2002 13:57:17 +0000 (GMT) Subject: [Bioperl-l] 1.0alpha this weekend? Message-ID: With Peter's and Brian's documentation fixes in I would like to propose a 1.0alpha release this coming weekend. (a) Could code reviewers (myself included) review code (b) Jason/Mark --- are the issues with SearchIO resolved? (c) I would like to propose removing Bio::Tools::BLAST and replacing it with a module which simply throws an exception on new describing how to use the SearchIO system (d) Lincoln - you said you wanted to run all of genbank through the SeqIO system? any other thoughts out there? ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From nzaitlen@hotmail.com Tue Feb 12 14:52:33 2002 From: nzaitlen@hotmail.com (Noah Zaitlen) Date: Tue, 12 Feb 2002 06:52:33 -0800 Subject: [Bioperl-l] getting non-seq data from NT_ files Message-ID: I followed this note for GenBank.pm Note that when querying for GenBank accessions starting with 'NT_' you will need to call $gb-Erequest_format('fasta') beforehand, because in GenBank format (the default) the sequence part will be left out (the reason is that NT contigs are rather annotation with references to clones). However, this only gives the data from a fasta file. I would like the other data from the included in the genbank file. I.E. I would like the genbank file listed at http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?cmd=&txt=&save=0&cfm=on&query_key=2&db=nucleotide&view=gb I don't mind if the sequence is missing b/c that data is available using the Note above. I tried $seq = $gb->get_Seq_by_gi('15294447'); and I get the following error message -------------------- WARNING --------------------- MSG: CONTIG found. GenBank get_Stream_by_batch about to run. --------------------------------------------------- Use of uninitialized value in string ne at /usr/lib/perl5/site_perl/5.6.0/Bio/DB/NCBIHelper.pm line 288. Use of uninitialized value in string ne at /usr/lib/perl5/site_perl/5.6.0/Bio/DB/NCBIHelper.pm line 288. Can't call method "subseq" on an undefined value at /usr/lib/perl5/site_perl/5.6.0/Bio/DB/NCBIHelper.pm line 301. Does anyone know a way around this? Thanks, Noah Z. _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx From jason@cgt.mc.duke.edu Tue Feb 12 15:48:54 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Tue, 12 Feb 2002 10:48:54 -0500 (EST) Subject: [Bioperl-l] Re: getting non-seq data from NT_ files In-Reply-To: Message-ID: Since NT are refseqs not normal genbank entries try using the new Bio::DB::RefSeq handle. I *think* it might provide what you need, but I don't use it very often. Documentation should be fixed in the DB::GenBank to reflect the new option. Otherwise I'd suggest looking at the Bio::DB::NCBIHelper/DB::GenBank code and seeing if you can write some better dispatch methods to handle the case where you just want the annotation. Something like a '-noseq' option. -jason On Tue, 12 Feb 2002, Noah Zaitlen wrote: > I followed this note for GenBank.pm > > Note that when querying for GenBank accessions starting with 'NT_' you > will need to call $gb-Erequest_format('fasta') beforehand, because in > GenBank format (the default) the sequence part will be left out (the > reason is that NT contigs are rather annotation with references to > clones). > > However, this only gives the data from a fasta file. I would like the other > data from the included in the genbank file. I.E. I would like the genbank > file listed at > > http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?cmd=&txt=&save=0&cfm=on&query_key=2&db=nucleotide&view=gb > > I don't mind if the sequence is missing b/c that data is available using the > Note above. > > I tried > $seq = $gb->get_Seq_by_gi('15294447'); > > and I get the following error message > -------------------- WARNING --------------------- > MSG: CONTIG found. GenBank get_Stream_by_batch about to run. > --------------------------------------------------- > Use of uninitialized value in string ne at > /usr/lib/perl5/site_perl/5.6.0/Bio/DB/NCBIHelper.pm line 288. > Use of uninitialized value in string ne at > /usr/lib/perl5/site_perl/5.6.0/Bio/DB/NCBIHelper.pm line 288. > Can't call method "subseq" on an undefined value at > /usr/lib/perl5/site_perl/5.6.0/Bio/DB/NCBIHelper.pm line 301. > > Does anyone know a way around this? > > Thanks, > Noah Z. > > > _________________________________________________________________ > MSN Photos is the easiest way to share and print your photos: > http://photos.msn.com/support/worldwide.aspx > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From heikki@ebi.ac.uk Tue Feb 12 17:32:16 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Tue, 12 Feb 2002 17:32:16 +0000 Subject: [Bioperl-l] getting non-seq data from NT_ files References: Message-ID: <3C6951A0.81005B59@ebi.ac.uk> Noah et al, I was confused. RefSeq documentation (at places) still claims that NT_ files are part of the RefSeq database. There is pointer at the NCBI FTP server to genomes section where these files are. ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/ There is one NT_ file per human chromosome. That explains why these are not part of any common database distribution. Each file is megabytes long, so there is no simple way of displaying them. If you need them, sequence in various formats or without sequences, they are all there. As an exercise, one could try to download one of them try the genbank parser on them. Make sure you have machine with lots of memory! -Heikki From bdesany@bcm.tmc.edu Tue Feb 12 19:10:22 2002 From: bdesany@bcm.tmc.edu (Brian Desany) Date: Tue, 12 Feb 2002 13:10:22 -0600 Subject: [Bioperl-l] RE: bioperl-db - changes (Chris Mungall) In-Reply-To: <200202121705.g1CH5SkO011778@pw600a.bioperl.org> Message-ID: <000001c1b3f8$ec1142a0$e0d1f980@bad2k> I don't see these changes when I do "cvs -n -q update" or "cvs status" or go to WebCVS - is there normally some kind of a delay or is there some other cvs command I need to use to find out which files have been changed? Do I need to specify a particular branch maybe? Thanks, -Brian. > > I have committed some code to bioperl-db > > * Fuzzy Locations are now handled, using the location_qualifier_value > table added to the biosql-schema during the hackathon. > > * Optimisations - all the features and locations for a > sequence entry are > now fetched in a few SQL calls rather than a number of calls > proportional > to the number of features. > > * Tidying - a lot of mysqlisms removed or pushed up to the BaseAdaptor > layer, to allow for easier postgres support. Added a few > generic ease of > use methods to BaseAdaptor to more clearly expose the logic in the > individual adaptor layer. > > * DBTestHarness now no longer uses the copy of the schema in the > bioperl-db directory. Instead it checks > ../biosql-schema/sql/biosqldb-mysql.sql - hmmm, this won't > necessarily fit > with the cvs re-organisation. How should we do this? An env > var seems a > bit nasty. > > --- > Chris > From cjm@fruitfly.bdgp.berkeley.edu Tue Feb 12 19:16:32 2002 From: cjm@fruitfly.bdgp.berkeley.edu (Chris Mungall) Date: Tue, 12 Feb 2002 11:16:32 -0800 (PST) Subject: [Bioperl-l] RE: bioperl-db - changes (Chris Mungall) In-Reply-To: <000001c1b3f8$ec1142a0$e0d1f980@bad2k> Message-ID: Hmm, I just committed on the main branch - this is in the bioperl-db project remember. On Tue, 12 Feb 2002, Brian Desany wrote: > I don't see these changes when I do "cvs -n -q update" or "cvs status" or go > to WebCVS - is there normally some kind of a delay or is there some other > cvs command I need to use to find out which files have been changed? Do I > need to specify a particular branch maybe? > > Thanks, > -Brian. > > > > > I have committed some code to bioperl-db > > > > * Fuzzy Locations are now handled, using the location_qualifier_value > > table added to the biosql-schema during the hackathon. > > > > * Optimisations - all the features and locations for a > > sequence entry are > > now fetched in a few SQL calls rather than a number of calls > > proportional > > to the number of features. > > > > * Tidying - a lot of mysqlisms removed or pushed up to the BaseAdaptor > > layer, to allow for easier postgres support. Added a few > > generic ease of > > use methods to BaseAdaptor to more clearly expose the logic in the > > individual adaptor layer. > > > > * DBTestHarness now no longer uses the copy of the schema in the > > bioperl-db directory. Instead it checks > > ../biosql-schema/sql/biosqldb-mysql.sql - hmmm, this won't > > necessarily fit > > with the cvs re-organisation. How should we do this? An env > > var seems a > > bit nasty. > > > > --- > > Chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From bdesany@bcm.tmc.edu Tue Feb 12 19:48:40 2002 From: bdesany@bcm.tmc.edu (Brian Desany) Date: Tue, 12 Feb 2002 13:48:40 -0600 Subject: [Bioperl-l] RE: bioperl-db - changes (Chris Mungall) In-Reply-To: Message-ID: <000101c1b3fe$45ffc660$e0d1f980@bad2k> I'm doing a "cvs status" and getting this: >cvs status cvs server: Examining . =================================================================== File: BUGS Status: Up-to-date Working revision: 1.2 Repository revision: 1.2 /home/repository/bioperl/bioperl-db/BUGS,v Sticky Tag: HEAD (revision: 1.2) Sticky Date: (none) Sticky Options: (none) =================================================================== etc..... So it _seems_ like I've looking in the right spot for the right files (correct me if I'm wrong). Also, since I'm not a cvs expert, I'll tell you that these two commands bring me the same (old) files (after logging in anonymously): cvs -d :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl co -r HEAD bioperl-db cvs -d :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl co bioperl-db On the other hand, "cvs -d :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl co -r MAIN bioperl-db" fails because MAIN isn't a tag (so it tells me). Am I just flat out issuing the wrong checkout command? I don't do this too often... -Brian. > -----Original Message----- > From: Chris Mungall [mailto:cjm@fruitfly.bdgp.berkeley.edu] > Sent: Tuesday, February 12, 2002 1:17 PM > To: Brian Desany > Cc: bioperl-l@bioperl.org > Subject: Re: [Bioperl-l] RE: bioperl-db - changes (Chris Mungall) > > > > Hmm, I just committed on the main branch - this is in the bioperl-db > project remember. > > On Tue, 12 Feb 2002, Brian Desany wrote: > > > I don't see these changes when I do "cvs -n -q update" or > "cvs status" or go > > to WebCVS - is there normally some kind of a delay or is > there some other > > cvs command I need to use to find out which files have been > changed? Do I > > need to specify a particular branch maybe? > > > > Thanks, > > -Brian. > > > > > > > > I have committed some code to bioperl-db > > > > > > * Fuzzy Locations are now handled, using the > location_qualifier_value > > > table added to the biosql-schema during the hackathon. > > > > > > * Optimisations - all the features and locations for a > > > sequence entry are > > > now fetched in a few SQL calls rather than a number of calls > > > proportional > > > to the number of features. > > > > > > * Tidying - a lot of mysqlisms removed or pushed up to > the BaseAdaptor > > > layer, to allow for easier postgres support. Added a few > > > generic ease of > > > use methods to BaseAdaptor to more clearly expose the logic in the > > > individual adaptor layer. > > > > > > * DBTestHarness now no longer uses the copy of the schema in the > > > bioperl-db directory. Instead it checks > > > ../biosql-schema/sql/biosqldb-mysql.sql - hmmm, this won't > > > necessarily fit > > > with the cvs re-organisation. How should we do this? An env > > > var seems a > > > bit nasty. > > > > > > --- > > > Chris > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > From Radim7@atlas.cz Tue Feb 12 17:03:02 2002 From: Radim7@atlas.cz (Radim7@atlas.cz) Date: , 11 2002 15:26:47 Subject: [Bioperl-l] Summer Job Message-ID: <200202122201.g1CM1wkO014044@pw600a.bioperl.org> This is a multipart MIME message. --= Multipart Boundary Feb11021526 Content-Type: text/plain; charset="DEFAULT_CHARSET" Content-Transfer-Encoding: 7bit From Radim7@atlas.cz Mon Feb 11 15:27:32 2002 From: Radim7@atlas.cz (Radim7@atlas.cz) Date: Mon, 11 Feb 2002 15:27:32 Subject: [Bioperl-l] Summer Job Message-ID: <200202122216.g1CMGVkO014383@pw600a.bioperl.org> This is a multipart MIME message. --= Multipart Boundary Feb11021527 Content-Type: text/plain; charset="DEFAULT_CHARSET" Content-Transfer-Encoding: 7bit Dear Sir, I'm a University student from Europe and I'm trying to find myself a summer job (4 months) in the United States and consequently a job I would start with after I finish my studies. I would like to kindly ask whether there might be an opportunity working for you. I do not need any sponsorship. I will be fully eligible to work in the United States. You are welcome to review my resume that is attached as a Word document. Thank you very much for your time. Best Regards Radim Kupka --= Multipart Boundary Feb11021527 Content-Type: application/octet-stream; name="RadimKResume.Doc" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="RadimKResume.Doc" e1xydGYxXGFuc2lcYW5zaWNwZzEyNTBcdWMxIFxkZWZmMFxkZWZsYW5nMTAy OVxkZWZsYW5nZmUxMDI5e1xmb250dGJse1xmMFxmcm9tYW5cZmNoYXJzZXQw XGZwcnEye1wqXHBhbm9zZSAwMjAyMDYwMzA1MDQwNTAyMDMwNH1UaW1lcyBO ZXcgUm9tYW47fXtcZjNcZnJvbWFuXGZjaGFyc2V0MlxmcHJxMntcKlxwYW5v c2UgMDUwNTAxMDIwMTA3MDYwMjA1MDd9U3ltYm9sO30NCntcZjMwXGZyb21h blxmY2hhcnNldDIzOFxmcHJxMiBUaW1lcyBOZXcgUm9tYW4gQ0U7fXtcZjMx XGZyb21hblxmY2hhcnNldDIwNFxmcHJxMiBUaW1lcyBOZXcgUm9tYW4gQ3ly O317XGYzM1xmcm9tYW5cZmNoYXJzZXQxNjFcZnBycTIgVGltZXMgTmV3IFJv bWFuIEdyZWVrO317XGYzNFxmcm9tYW5cZmNoYXJzZXQxNjJcZnBycTIgVGlt ZXMgTmV3IFJvbWFuIFR1cjt9DQp7XGYzNVxmcm9tYW5cZmNoYXJzZXQxNzdc ZnBycTIgVGltZXMgTmV3IFJvbWFuIChIZWJyZXcpO317XGYzNlxmcm9tYW5c ZmNoYXJzZXQxNzhcZnBycTIgVGltZXMgTmV3IFJvbWFuIChBcmFiaWMpO317 XGYzN1xmcm9tYW5cZmNoYXJzZXQxODZcZnBycTIgVGltZXMgTmV3IFJvbWFu IEJhbHRpYzt9fXtcY29sb3J0Ymw7XHJlZDBcZ3JlZW4wXGJsdWUwO1xyZWQw XGdyZWVuMFxibHVlMjU1O1xyZWQwXGdyZWVuMjU1XGJsdWUyNTU7DQpccmVk MFxncmVlbjI1NVxibHVlMDtccmVkMjU1XGdyZWVuMFxibHVlMjU1O1xyZWQy NTVcZ3JlZW4wXGJsdWUwO1xyZWQyNTVcZ3JlZW4yNTVcYmx1ZTA7XHJlZDI1 NVxncmVlbjI1NVxibHVlMjU1O1xyZWQwXGdyZWVuMFxibHVlMTI4O1xyZWQw XGdyZWVuMTI4XGJsdWUxMjg7XHJlZDBcZ3JlZW4xMjhcYmx1ZTA7XHJlZDEy OFxncmVlbjBcYmx1ZTEyODtccmVkMTI4XGdyZWVuMFxibHVlMDtccmVkMTI4 XGdyZWVuMTI4XGJsdWUwOw0KXHJlZDEyOFxncmVlbjEyOFxibHVlMTI4O1xy ZWQxOTJcZ3JlZW4xOTJcYmx1ZTE5Mjt9e1xzdHlsZXNoZWV0e1xxbCBcbGkw XHJpMFx3aWRjdGxwYXJcYXNwYWxwaGFcYXNwbnVtXGZhYXV0b1xhZGp1c3Ry aWdodFxyaW4wXGxpbjBcaXRhcDAgXGZzMjRcbGFuZzEwMjlcbGFuZ2ZlMTAy OVxjZ3JpZFxsYW5nbnAxMDI5XGxhbmdmZW5wMTAyOSBcc25leHQwIE5vcm1h bDt9ew0KXHMxXHFsIFxsaTBccmkwXGtlZXBuXG5vd2lkY3RscGFyXGZhYXV0 b1xvdXRsaW5lbGV2ZWwwXHJpbjBcbGluMFxpdGFwMCBcYlxmczIwXGxhbmcx MDMzXGxhbmdmZTEwMjlcY2dyaWRcbGFuZ25wMTAzM1xsYW5nZmVucDEwMjkg XHNiYXNlZG9uMCBcc25leHQwIGhlYWRpbmcgMTt9e1wqXGNzMTAgXGFkZGl0 aXZlIERlZmF1bHQgUGFyYWdyYXBoIEZvbnQ7fXtcKlxjczE1IFxhZGRpdGl2 ZSBcdWxcY2YyIFxzYmFzZWRvbjEwIEh5cGVybGluazt9e1wqDQpcY3MxNiBc YWRkaXRpdmUgXHVsXGNmMTIgXHNiYXNlZG9uMTAgRm9sbG93ZWRIeXBlcmxp bms7fX17XCpcbGlzdHRhYmxle1xsaXN0XGxpc3R0ZW1wbGF0ZWlkNzE4OTU4 MDc4XGxpc3RzaW1wbGV7XGxpc3RsZXZlbFxsZXZlbG5mYzBcbGV2ZWxuZmNu MFxsZXZlbGpjMFxsZXZlbGpjbjBcbGV2ZWxmb2xsb3cwXGxldmVsc3RhcnRh dDBcbGV2ZWxzcGFjZTBcbGV2ZWxpbmRlbnQwe1xsZXZlbHRleHRcJzAxKjt9 e1xsZXZlbG51bWJlcnM7fVxjaGJyZHINClxicmRybm9uZVxicmRyY2YxIFxj aHNoZG5nMFxjaGNmcGF0MVxjaGNicGF0MSB9e1xsaXN0bmFtZSA7fVxsaXN0 aWQtMn19e1wqXGxpc3RvdmVycmlkZXRhYmxle1xsaXN0b3ZlcnJpZGVcbGlz dGlkLTJcbGlzdG92ZXJyaWRlY291bnQxe1xsZm9sZXZlbFxsaXN0b3ZlcnJp ZGVmb3JtYXR7XGxpc3RsZXZlbFxsZXZlbG5mYzIzXGxldmVsbmZjbjIzXGxl dmVsamMwXGxldmVsamNuMFxsZXZlbGZvbGxvdzBcbGV2ZWxzdGFydGF0MFxs ZXZlbG9sZA0KXGxldmVsc3BhY2UwXGxldmVsaW5kZW50MzYwe1xsZXZlbHRl eHRcJzAxXHUtMzkxMyA/O317XGxldmVsbnVtYmVyczt9XGYzXGNoYnJkclxi cmRybm9uZVxicmRyY2YxIFxjaHNoZG5nMFxjaGNmcGF0MVxjaGNicGF0MVxm YmlhczAgfX1cbHMxfX17XGluZm97XGF1dGhvciBSYWRpdW19e1xvcGVyYXRv ciBSYWRpdW19e1xjcmVhdGltXHlyMjAwMlxtbzFcZHkxM1xocjEyfXtccmV2 dGltXHlyMjAwMlxtbzJcZHkxMVxocjEwXG1pbjI2fQ0Ke1x2ZXJzaW9uMTZ9 e1xlZG1pbnMyMH17XG5vZnBhZ2VzNH17XG5vZndvcmRzMTA4Mn17XG5vZmNo YXJzNjE3Mn17XG5vZmNoYXJzd3MwfXtcdmVybjgyNDl9fVxkZWZ0YWI3MDhc d2lkb3djdHJsXGZ0bmJqXGFlbmRkb2NcaHlwaGhvdHo0MjVcbm94bGF0dG95 ZW5cZXhwc2hydG5cbm91bHRybHNwY1xkbnRibG5zYmRiXG5vc3BhY2Vmb3J1 bFxoeXBoY2FwczBcaG9yemRvY1xkZ2hzcGFjZTEyMFxkZ3ZzcGFjZTEyMFxk Z2hvcmlnaW4xNzAxDQpcZGd2b3JpZ2luMTk4NFxkZ2hzaG93MFxkZ3ZzaG93 M1xqY29tcHJlc3Ncdmlld2tpbmQ0XHZpZXdzY2FsZTEwMFxub2xuaHRhZGp0 YmwgXGZldDBcc2VjdGQgXGxpbmV4MFxoZWFkZXJ5NzA4XGZvb3Rlcnk3MDhc Y29sc3g3MDhcc2VjdGRlZmF1bHRjbCB7XCpccG5zZWNsdmwxXHBudWNybVxw bnN0YXJ0MVxwbmluZGVudDcyMFxwbmhhbmd7XHBudHh0YSAufX17XCpccG5z ZWNsdmwyXHBudWNsdHJccG5zdGFydDFccG5pbmRlbnQ3MjBccG5oYW5nDQp7 XHBudHh0YSAufX17XCpccG5zZWNsdmwzXHBuZGVjXHBuc3RhcnQxXHBuaW5k ZW50NzIwXHBuaGFuZ3tccG50eHRhIC59fXtcKlxwbnNlY2x2bDRccG5sY2x0 clxwbnN0YXJ0MVxwbmluZGVudDcyMFxwbmhhbmd7XHBudHh0YSApfX17XCpc cG5zZWNsdmw1XHBuZGVjXHBuc3RhcnQxXHBuaW5kZW50NzIwXHBuaGFuZ3tc cG50eHRiICh9e1xwbnR4dGEgKX19e1wqXHBuc2VjbHZsNlxwbmxjbHRyXHBu c3RhcnQxXHBuaW5kZW50NzIwXHBuaGFuZw0Ke1xwbnR4dGIgKH17XHBudHh0 YSApfX17XCpccG5zZWNsdmw3XHBubGNybVxwbnN0YXJ0MVxwbmluZGVudDcy MFxwbmhhbmd7XHBudHh0YiAofXtccG50eHRhICl9fXtcKlxwbnNlY2x2bDhc cG5sY2x0clxwbnN0YXJ0MVxwbmluZGVudDcyMFxwbmhhbmd7XHBudHh0YiAo fXtccG50eHRhICl9fXtcKlxwbnNlY2x2bDlccG5sY3JtXHBuc3RhcnQxXHBu aW5kZW50NzIwXHBuaGFuZ3tccG50eHRiICh9e1xwbnR4dGEgKX19XHBhcmRc cGxhaW4gDQpccWMgXGxpMFxyaTBcbm93aWRjdGxwYXJcZmFhdXRvXHJpbjBc bGluMFxpdGFwMCBcZnMyNFxsYW5nMTAyOVxsYW5nZmUxMDI5XGNncmlkXGxh bmducDEwMjlcbGFuZ2ZlbnAxMDI5IHtcYlxsYW5nMTAzMVxsYW5nZmUxMDI5 XGxhbmducDEwMzEgUmFkaW0gS3Vwa2ENClxwYXIgfXtcZjMwXGZzMjBcbGFu ZzEwMzFcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMxIFN0XCdlY1wnOWVcJ2Vkcmt5 IDgxDQpccGFyIH17XGYzMFxmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFu Z25wMTAzMyBTdFwnZWNcJzllZXJ5LCA1MDMyMQ0KXHBhciB9e1xmczIwXGxh bmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyBDemVjaCBSZXB1YmxpYw0K XHBhciBDZWxsIFBob25lOiAwMDQyNzI4LTk2My01MDUNClxwYXIgfXtcZnMy MFxsYW5nMTAzMVxsYW5nZmUxMDI5XGxhbmducDEwMzEgRmF4OiAwMDQyNDkt NTU0Mi0yMTkNClxwYXIgZS1tYWlsOiB9e1xmaWVsZHtcKlxmbGRpbnN0IHtc ZnMyMFxsYW5nMTAzMVxsYW5nZmUxMDI5XGxhbmducDEwMzEgIEhZUEVSTElO SyAibWFpbHRvOlJhZGltN0BhdGxhcy5jeiIgfXtcZnMyMFxsYW5nMTAzMVxs YW5nZmUxMDI5XGxhbmducDEwMzEge1wqXGRhdGFmaWVsZCANCjAwZDBjOWVh NzlmOWJhY2UxMThjODIwMGFhMDA0YmE5MGIwMjAwMDAwMDAzMDAwMDAwZTBj OWVhNzlmOWJhY2UxMThjODIwMGFhMDA0YmE5MGIyZTAwMDAwMDZkMDA2MTAw NjkwMDZjMDA3NDAwNmYwMDNhMDA1MjAwNjEwMDY0MDA2OTAwNmQwMDM3MDA0 MDAwNjEwMDc0MDA2YzAwNjEwMDczMDAyZTAwNjMwMDdhMDAwMDAwMDB9fX17 XGZsZHJzbHQge1xjczE1XGZzMjBcdWxcY2YyXGxhbmcxMDMxXGxhbmdmZTEw MjlcbGFuZ25wMTAzMSANClJhZGltN0BhdGxhcy5jen19fXtcZnMyMFxsYW5n MTAzMVxsYW5nZmUxMDI5XGxhbmducDEwMzEgDQpccGFyIA0KXHBhciANClxw YXIgfVxwYXJkIFxxbCBcbGkwXHJpMFxub3dpZGN0bHBhclxmYWF1dG9ccmlu MFxsaW4wXGl0YXAwIHtcZnMyMFxsYW5nMTAzMVxsYW5nZmUxMDI5XGxhbmdu cDEwMzEgDQpccGFyIH1ccGFyZCBccWwgXGxpMFxyaTBca2VlcG5cbm93aWRj dGxwYXJcZmFhdXRvXHJpbjBcbGluMFxpdGFwMCB7XGJcZnMyMlxsYW5nMTAz M1xsYW5nZmUxMDI5XGxhbmducDEwMzMgRVhFQ1VUSVZFIFNVTU1BUlkNClxw YXIgfVxwYXJkIFxxbCBcbGkwXHJpMFxub3dpZGN0bHBhclxmYWF1dG9ccmlu MFxsaW4wXGl0YXAwIHtcZnMyMFxsYW5nMTAzM1xsYW5nZmUxMDI5XGxhbmdu cDEwMzMgDQpccGFyIE15IGJhY2tncm91bmQgIGNvdmVycyAgZXNzZW50aWFs IElUICBrbm93LWhvdyAgc2luY2UgSSBncmV3IHVwIGluIGEgZmFtaWx5IElU IGJ1c2luZXNzLiAgVGhyb3VnaCB0aGUgbWFueSAgeWVhcnMgIEkgaGF2ZSBk ZXZlbG9wZWQgIG15IElUIHRlY2huaWNhbCBhbmQgdmlkZW8gaW5kdXN0cnkg c2tpbGxzIGFzIHdlbGwgYXMgdGhlIGFiaWxpdHkgdG8gaGFuZGxlIHRoZSBi dXNpbmVzcyBwYXJ0LiBJIGRpZCBzdWNjZWVkIGluIGFwcGx5aW5nICBteQ0K IA0Kc2tpbGxzIGV2ZW4gb3V0c2lkZSBvZiAgdGhlIGZhbWlseSBidXNpbmVz cyAgYXMgd2VsbCB3aGVyZSBJIGhhZCB0aGUgY2hhbmNlIHRvIGV4cGFuZCBt eSBrbm93bGVkZ2UsIHdvcmsgaW4gYSB0ZWFtLCAgZXhwbG9yZSBuZXcgdGVj aG5vbG9naWVzIGFuZCBnYXRoZXIgbG90cyBvZiBleHBlcmllbmNlIGluIHRo ZSBpbnRlcm5ldCwgSVQsIGFuZCBidXNpbmVzcyB3b3JsZC4gIEkgaGF2ZSBh biBleHBlcmllbmNlIHdpdGggZ2l2aW5nIHByZXNlbnRhDQp0DQppb25zLCBp bnRlcnZpZXdpbmcgcGVvcGxlIGFuZCB3aXRoIGludGVycHJldGluZyB0aGUg Y29tcGFueSB0byB0aGUgcHJlc3MuICBJIGRvIGhhdmUgdGhlIGFiaWxpdHkg dG8gbG9vayBhdCBhIHByb2JsZW0gZnJvbSBkaWZmZXJlbnQgYW5nbGVzIGFu ZCB0byBjb21lIHVwIHdpdGggYWx0ZXJuYXRpdmUgc29sdXRpb25zLiBJIGNh biBwcm9tcHRseSBkZXRlcm1pbmUgdGhlIHByb2JsZW0gcHJpb3JpdHksICBo YW5kbGUgbnVtZXJvdXMgdGFza3Mgc2kNCm11bHRhbmVvdXNseSBhbmQgdGFr ZSByZXNwb25zaWJpbGl0aWVzLiBJIGhhdmUgYW4gZXhwZXJpZW5jZSB3aXRo IHJlcHJlc2VudGF0aXZlIGJlaGF2aW5nICBhbmQgd29ya2luZyB3aXRoIGNv bmZpZGVudGlhbCBkYXRhLiBJIGRvIGVuam95IHdvcmtpbmcgaW4gYSB0ZWFt LCBjb21iaW5pbmcgYSB3b3JrIGluIHRoZSB0ZXJyYWluIHdpdGggd29ya2lu ZyBpbiB0aGUgb2ZmaWNlLg0KXHBhciBJIGFtIGF2YWlsYWJsZSB0byB0cmF2 ZWwgZXh0ZW5zaXZlbHkgb3IgcmVsb2NhdGUgYXMgcmVxdWlyZWQuDQpccGFy IH17XGJcZnMyMlxsYW5nMTAzM1xsYW5nZmUxMDI5XGxhbmducDEwMzMgDQpc cGFyIE9CSkVDVElWRTogfXtcZnMyMlxsYW5nMTAzM1xsYW5nZmUxMDI5XGxh bmducDEwMzMgV29yayAgaW4gSVQgZW52aXJvbm1lbnQsIG5ldHdvcmtzLCBz b2Z0d2FyZSBkZXZlbG9wbWVudH17XGJcZnMyMlxsYW5nMTAzM1xsYW5nZmUx MDI5XGxhbmducDEwMzMgDQpccGFyIA0KXHBhciBLRVkgU0tJTExTICAgfXtc ZnMyMlxsYW5nMTAzM1xsYW5nZmUxMDI5XGxhbmducDEwMzMgaGFyZCB3b3Jr aW5nLCBleGNlbGxlbnQgY29tbXVuaWNhdGlvbiBza2lsbHMsIGZhc3QgYWRh cHRhYmlsaXR5LCB9e1xiXGlcZnMyMlxsYW5nMTAzM1xsYW5nZmUxMDI5XGxh bmducDEwMzMgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgDQpccGFy ICAgICAgICAgICAgICAgICAgICAgICAgICAgIH17XGlcZnMyMlxsYW5nMTAz M1xsYW5nZmUxMDI5XGxhbmducDEwMzMgc29sdmUgbWFueSBJVCAgcHJvYmxl bXMgb24gdGhlIHBob25lIGFueXRpbWUsIHdpdGhvdXQgdGhlIGNvbXB1dGVy IHByZXNlbnR9e1xpXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAx MDMzIA0KXHBhciB9e1xiXGZzMjJcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5n bnAxMDMzIEVEVUNBVElPTg0KXHBhciB9e1xiXGxhbmcxMDMzXGxhbmdmZTEw MjlcbGFuZ25wMTAzMyAgICB9e1xmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlc bGFuZ25wMTAzMyAgDQpccGFyIDE5OTZcdGFiIFx0YWIgRW5nbGlzaCBsYW5n dWFnZSBTdGF0ZSBTY2hvb2wgY2VydGlmaWNhdGUNClxwYXIgMTk5N1x0YWIg XHRhYiBDb25jb3JkaWEgVW5pdmVyc2l0eSwgTW9udHJlYWwsIENhbmFkYSAt IEludGVuc2l2ZSBFbmdsaXNoIGxhbmd1YWdlIGNvdXJzZXMgDQpccGFyIH1c cGFyZCBccWwgXGZpLTE0MTBcbGkxNDEwXHJpMFxub3dpZGN0bHBhclxmYWF1 dG9ccmluMFxsaW4xNDEwXGl0YXAwIHtcZnMyMFxsYW5nMTAzM1xsYW5nZmUx MDI5XGxhbmducDEwMzMgMTk5OFx0YWIgU3IuIEhpZ2ggU2Nob29sIExlYXZp bmcgQ2VydGlmaWNhdGUgLSBGaXJzdCBQcml2YXRlIEhpZ2ggU2Nob29sIHNw ZWNpYWxpemVkIGluIExhbmd1YWdlcyBpbiBIcmFkZWMgS3J9e1xmczIwIFwn ZTF9ew0KXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIGxv dlwnZTkNClxwYXIgfVxwYXJkIFxxbCBcZmktNzA1XGxpNzA1XHJpMFxub3dp ZGN0bHBhclxmYWF1dG9ccmluMFxsaW43MDVcaXRhcDAge1xmczIwXGxhbmcx MDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyAyMDAwXHRhYiBcdGFiIFx0YWIg Qk9YTElHSFQgY2VydGlmaWNhdGUgKFp1cmljaCwgU3dpdHplcmxhbmQpDQpc cGFyIH1ccGFyZCBccWwgXGxpMFxyaTBcbm93aWRjdGxwYXJcZmFhdXRvXHJp bjBcbGluMFxpdGFwMCB7XGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5n bnAxMDMzIEluIHByb2dyZXNzXHRhYiBVbml2ZXJzaXR5IG9mIEhyYWRlYyBL cmFsb3ZlLCBjb21wdXRlciBzY2llbmNlcw0KXHBhciANClxwYXIgfXtcYlxm czIyXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyANClxwYXIgTEFO R1VBR0VTDQpccGFyIA0KXHBhciB9e1xmczIwXGxhbmcxMDMzXGxhbmdmZTEw MjlcbGFuZ25wMTAzMyBFeGNlbGxlbnQgRW5nbGlzaCwgQ3plY2ggKHNwb2tl biBhbmQgd3JpdHRlbik7DQpccGFyIA0KXHBhciB9e1xiXGZzMjJcbGFuZzEw MzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIA0KXHBhciBURUNITk9MT0dZIFNL SUxMUw0KXHBhciB9e1xiXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5n bnAxMDMzIA0KXHBhciB9e1xmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFu Z25wMTAzMyBJIGhhdmUgZXhwZXJpZW5jZSB3aXRoIGEgd2lkZSB2YXJpZXR5 IG9mIGhhcmR3YXJlIGFuZCBzb2Z0d2FyZSBhcyB3ZWxsIGFzIGEgZ29vZCBv dmVydmlldyBvZiB0aGUgbW9zdCByZWNlbnQgaW5mb3JtYXRpb24gdGVjaG5v bG9naWVzLg0KXHBhciB9e1xiXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxs YW5nbnAxMDMzIA0KXHBhciANClxwYXIgfXtcYlxpXGZzMjBcbGFuZzEwMzNc bGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIE9wZXJhdGluZyBTeXN0ZW1zDQpccGFy IH17XGlcZnMyMFxsYW5nMTAzM1xsYW5nZmUxMDI5XGxhbmducDEwMzMgDQpc cGFyIHtccG50ZXh0XHBhcmRccGxhaW5cZjNcZnMyMFxsYW5nMTAzM1xsYW5n ZmUxMDI5XGxhbmducDEwMzMgXGxvY2hcYWYzXGRiY2hcYWYwXGhpY2hcZjMg XCdiN1x0YWJ9fVxwYXJkIFxxbCBcZmktMzYwXGxpMzYwXHJpMFxub3dpZGN0 bHBhclx0eDM2MHtcKlxwbiBccG5sdmxibHRcaWx2bDBcbHMxXHBucm5vdDBc cG5mM1xwbmluZGVudDM2MCB7XHBudHh0YiBcJ2I3fX1cZmFhdXRvXGxzMVxy aW4wXGxpbjM2MFxpdGFwMCB7DQpcZnMyMFxsYW5nMTAzM1xsYW5nZmUxMDI5 XGxhbmducDEwMzMgTVMgRG9zLCAgV2luOTUsIDk4LCAyMDAwIGFuZCBMaW51 eCAoVW5peCkNClxwYXIgfVxwYXJkIFxxbCBcbGkwXHJpMFxub3dpZGN0bHBh cntcKlxwbiBccG5sdmxjb250XGlsdmwwXGxzMFxwbnJub3QwXHBuZGVjIH1c ZmFhdXRvXHJpbjBcbGluMFxpdGFwMCB7XGZzMjBcbGFuZzEwMzNcbGFuZ2Zl MTAyOVxsYW5nbnAxMDMzICAgIA0KXHBhciANClxwYXIgfXtcYlxpXGZzMjBc bGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIE5ldHdvcmtzIA0KXHBh ciANClxwYXIge1xwbnRleHRccGFyZFxwbGFpblxmM1xmczIwXGxhbmcxMDMz XGxhbmdmZTEwMjlcbGFuZ25wMTAzMyBcbG9jaFxhZjNcZGJjaFxhZjBcaGlj aFxmMyBcJ2I3XHRhYn19XHBhcmQgXHFsIFxmaS0zNjBcbGkzNjBccmkwXG5v d2lkY3RscGFyXHR4MzYwe1wqXHBuIFxwbmx2bGJsdFxpbHZsMFxsczFccG5y bm90MFxwbmYzXHBuaW5kZW50MzYwIHtccG50eHRiIFwnYjd9fVxmYWF1dG9c bHMxXHJpbjBcbGluMzYwXGl0YXAwIHsNClxmczIwXGxhbmcxMDMzXGxhbmdm ZTEwMjlcbGFuZ25wMTAzMyBOZXR3b3JrIFNlY3VyaXR5LCBpbnN0YWxsaW5n IHByb3hpZXMNClxwYXIge1xwbnRleHRccGFyZFxwbGFpblxmM1xmczIwXGxh bmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyBcbG9jaFxhZjNcZGJjaFxh ZjBcaGljaFxmMyBcJ2I3XHRhYn19XHBhcmQgXHFsIFxmaS0zNjBcbGkzNjBc cmkwXG5vd2lkY3RscGFyXHR4MzYwe1wqXHBuIFxwbmx2bGJsdFxpbHZsMFxs czFccG5ybm90MFxwbmYzXHBuaW5kZW50MzYwIHtccG50eHRiIFwnYjd9fVxm YWF1dG9cbHMxXHJpbjBcbGluMzYwXGl0YXAwIHsNClxmczIwXGxhbmcxMDMz XGxhbmdmZTEwMjlcbGFuZ25wMTAzMyBTZXR0aW5nIHVwIHdvcmtzdGF0aW9u cywgbmV0d29yayBIVywgVENQL0lQLCBETlMsV0lOUztESENQLCBlc3RhYmxp c2hpbmcgYSBjb25uZWN0aW9uIHZpYSBkaWFsLXVwLCBEU0wsIG9yIExBTg0K XHBhciB7XHBudGV4dFxwYXJkXHBsYWluXGYzXGZzMjBcbGFuZzEwMzNcbGFu Z2ZlMTAyOVxsYW5nbnAxMDMzIFxsb2NoXGFmM1xkYmNoXGFmMFxoaWNoXGYz IFwnYjdcdGFifX1ccGFyZCBccWwgXGZpLTM2MFxsaTM2MFxyaTBcbm93aWRj dGxwYXJcdHgzNjB7XCpccG4gXHBubHZsYmx0XGlsdmwwXGxzMVxwbnJub3Qw XHBuZjNccG5pbmRlbnQzNjAge1xwbnR4dGIgXCdiN319XGZhYXV0b1xsczFc cmluMFxsaW4zNjBcaXRhcDAgew0KXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAy OVxsYW5nbnAxMDMzIFNldHRpbmcgdXAgbmV0d29ya3MsIHByb3RvY29scywg c2V0dGluZyBGVFAgYW5kIFdFQiBzZXJ2ZXJzIChhcGFjaGUtTGludXgpDQpc cGFyIHtccG50ZXh0XHBhcmRccGxhaW5cZjNcZnMyMFxsYW5nMTAzM1xsYW5n ZmUxMDI5XGxhbmducDEwMzMgXGxvY2hcYWYzXGRiY2hcYWYwXGhpY2hcZjMg XCdiN1x0YWJ9fVxwYXJkIFxxbCBcZmktMzYwXGxpMzYwXHJpMFxub3dpZGN0 bHBhclx0eDM2MHtcKlxwbiBccG5sdmxibHRcaWx2bDBcbHMxXHBucm5vdDBc cG5mM1xwbmluZGVudDM2MCB7XHBudHh0YiBcJ2I3fX1cZmFhdXRvXGxzMVxy aW4wXGxpbjM2MFxpdGFwMCB7DQpcZnMyMFxsYW5nMTAzM1xsYW5nZmUxMDI5 XGxhbmducDEwMzMgUmVtb3RlIHNlcnZlciBzZXR0aW5nIHZpYSBzc2ggKExp bnV4KQ0KXHBhciB7XHBudGV4dFxwYXJkXHBsYWluXGYzXGZzMjBcbGFuZzEw MzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIFxsb2NoXGFmM1xkYmNoXGFmMFxo aWNoXGYzIFwnYjdcdGFifX1ccGFyZCBccWwgXGZpLTM2MFxsaTM2MFxyaTBc bm93aWRjdGxwYXJcdHgzNjB7XCpccG4gXHBubHZsYmx0XGlsdmwwXGxzMVxw bnJub3QwXHBuZjNccG5pbmRlbnQzNjAge1xwbnR4dGIgXCdiN319XGZhYXV0 b1xsczFccmluMFxsaW4zNjBcaXRhcDAgew0KXGZzMjBcbGFuZzEwMzNcbGFu Z2ZlMTAyOVxsYW5nbnAxMDMzIERlc2lnbmluZyBvcHRpbWFsIGhhcmR3YXJl IGNvbmZpZ3VyYXRpb24sIG5ldHdvcmsgYWRhcHRlcnMgYW5kIGNhYmxlcw0K XHBhciB9XHBhcmQgXHFsIFxsaTBccmkwXG5vd2lkY3RscGFye1wqXHBuIFxw bmx2bGNvbnRcaWx2bDBcbHMwXHBucm5vdDBccG5kZWMgfVxmYWF1dG9ccmlu MFxsaW4wXGl0YXAwIHtcZnMyMFxsYW5nMTAzM1xsYW5nZmUxMDI5XGxhbmdu cDEwMzMgDQpccGFyIH17XGJcaVxmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlc bGFuZ25wMTAzMyBIYXJkd2FyZSAmIERhdGEgU3RvcmFnZQ0KXHBhciANClxw YXIge1xwbnRleHRccGFyZFxwbGFpblxmM1xmczIwXGxhbmcxMDMzXGxhbmdm ZTEwMjlcbGFuZ25wMTAzMyBcbG9jaFxhZjNcZGJjaFxhZjBcaGljaFxmMyBc J2I3XHRhYn19XHBhcmQgXHFsIFxmaS0zNjBcbGkzNjBccmkwXG5vd2lkY3Rs cGFyXHR4MzYwe1wqXHBuIFxwbmx2bGJsdFxpbHZsMFxsczFccG5ybm90MFxw bmYzXHBuaW5kZW50MzYwIHtccG50eHRiIFwnYjd9fVxmYWF1dG9cbHMxXHJp bjBcbGluMzYwXGl0YXAwIHsNClxmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlc bGFuZ25wMTAzMyBBc3NlbWJsaW5nIGFuZCBjb25maWd1cmluZyBoYXJkd2Fy ZSwgc2V0dGluZyBCSU9TLCBIVyB0cm91Ymxlc2hvb3RpbmcNClxwYXIge1xw bnRleHRccGFyZFxwbGFpblxmM1xmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlc bGFuZ25wMTAzMyBcbG9jaFxhZjNcZGJjaFxhZjBcaGljaFxmMyBcJ2I3XHRh Yn19XHBhcmQgXHFsIFxmaS0zNjBcbGkzNjBccmkwXG5vd2lkY3RscGFyXHR4 MzYwe1wqXHBuIFxwbmx2bGJsdFxpbHZsMFxsczFccG5ybm90MFxwbmYzXHBu aW5kZW50MzYwIHtccG50eHRiIFwnYjd9fVxmYWF1dG9cbHMxXHJpbjBcbGlu MzYwXGl0YXAwIHsNClxmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25w MTAzMyBQYXJ0aXRpb25pbmcgRGlza3MgYW5kIHNldHRpbmcgdXAgZmlsZSBz eXN0ZW1zIEZhdCwgRmF0MzIsIE5URlMsIE5URlM1ICh3MmspLCBFeHQyOw0K XHBhciB7XHBudGV4dFxwYXJkXHBsYWluXGYzXGZzMjBcbGFuZzEwMzNcbGFu Z2ZlMTAyOVxsYW5nbnAxMDMzIFxsb2NoXGFmM1xkYmNoXGFmMFxoaWNoXGYz IFwnYjdcdGFifX1ccGFyZCBccWwgXGZpLTM2MFxsaTM2MFxyaTBcbm93aWRj dGxwYXJcdHgzNjB7XCpccG4gXHBubHZsYmx0XGlsdmwwXGxzMVxwbnJub3Qw XHBuZjNccG5pbmRlbnQzNjAge1xwbnR4dGIgXCdiN319XGZhYXV0b1xsczFc cmluMFxsaW4zNjBcaXRhcDAgew0KXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAy OVxsYW5nbnAxMDMzIEJhY2t1cHMsIFNlY3VyZSBEYXRhIFN0b3JhZ2UNClxw YXIgfVxwYXJkIFxxbCBcbGkwXHJpMFxub3dpZGN0bHBhcntcKlxwbiBccG5s dmxjb250XGlsdmwwXGxzMFxwbnJub3QwXHBuZGVjIH1cZmFhdXRvXHJpbjBc bGluMFxpdGFwMCB7XGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAx MDMzIA0KXHBhciB9XHBhcmQgXHFsIFxsaTBccmkwXGtlZXBuXG5vd2lkY3Rs cGFye1wqXHBuIFxwbmx2bGNvbnRcaWx2bDBcbHMwXHBucm5vdDBccG5kZWMg fVxmYWF1dG9ccmluMFxsaW4wXGl0YXAwIHtcYlxpXGZzMjBcbGFuZzEwMzNc bGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIFByb2plY3Rpb24gRGV2aWNlcw0KXHBh ciB9XHBhcmQgXHFsIFxsaTBccmkwXG5vd2lkY3RscGFye1wqXHBuIFxwbmx2 bGNvbnRcaWx2bDBcbHMwXHBucm5vdDBccG5kZWMgfVxmYWF1dG9ccmluMFxs aW4wXGl0YXAwIHtcYlxpXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5n bnAxMDMzIA0KXHBhciB7XHBudGV4dFxwYXJkXHBsYWluXGYzXGZzMjBcbGFu ZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIFxsb2NoXGFmM1xkYmNoXGFm MFxoaWNoXGYzIFwnYjdcdGFifX1ccGFyZCBccWwgXGZpLTM2MFxsaTM2MFxy aTBcbm93aWRjdGxwYXJcdHgzNjB7XCpccG4gXHBubHZsYmx0XGlsdmwwXGxz MVxwbnJub3QwXHBuZjNccG5pbmRlbnQzNjAge1xwbnR4dGIgXCdiN319XGZh YXV0b1xsczFccmluMFxsaW4zNjBcaXRhcDAgew0KXGZzMjBcbGFuZzEwMzNc bGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIEV4cGVyaWVuY2Ugd2l0aCBpbnN0YWxs aW5nLCBjb25uZWN0aW5nLCBhbmQgb3BlcmF0aW5nIHByb2plY3RvcnMgb2Yg dGhlIGZvbGxvd2luZyBtYWtlczoNClxwYXIgfVxwYXJkIFxxbCBcbGkzNjBc cmkwXG5vd2lkY3RscGFye1wqXHBuIFxwbmx2bGNvbnRcaWx2bDBcbHMwXHBu cm5vdDBccG5kZWMgfVxmYWF1dG9ccmluMFxsaW4zNjBcaXRhcDAge1xmczIw XGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyAoSU5GT0NVUywgU0FO WU8sIEhJVEFDSEkpDQpccGFyIHtccG50ZXh0XHBhcmRccGxhaW5cZjNcZnMy MFxsYW5nMTAzM1xsYW5nZmUxMDI5XGxhbmducDEwMzMgXGxvY2hcYWYzXGRi Y2hcYWYwXGhpY2hcZjMgXCdiN1x0YWJ9fVxwYXJkIFxxbCBcZmktMzYwXGxp MzYwXHJpMFxub3dpZGN0bHBhclx0eDM2MHtcKlxwbiBccG5sdmxibHRcaWx2 bDBcbHMxXHBucm5vdDBccG5mM1xwbmluZGVudDM2MCB7XHBudHh0YiBcJ2I3 fX1cZmFhdXRvXGxzMVxyaW4wXGxpbjM2MFxpdGFwMCB7DQpcZnMyMFxsYW5n MTAzM1xsYW5nZmUxMDI5XGxhbmducDEwMzMgUGFydGljaXBhdGlvbiwgb3Bl cmF0aW9uIG9mIHByb2plY3Rpb24gZGV2aWNlcywgdGVjaG5pY2FsIHN1cHBv cnQgYW5kIG9yZ2FuaXphdGlvbiBtYW5hZ2VtZW50IGF0IElOVkVYLCB0aGUg aW50ZXJuYXRpb25hbCBpbmZvcm1hdGlvbiB0ZWNobm9sb2d5IHRyYWRlIHNo b3cgaW4gSHJhZGVjIEtyfXtcZnMyMCBcJ2UxbG92XCdlOS59ew0KXGZzMjBc bGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzICANClxwYXIge1xwbnRl eHRccGFyZFxwbGFpblxmM1xmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFu Z25wMTAzMyBcbG9jaFxhZjNcZGJjaFxhZjBcaGljaFxmMyBcJ2I3XHRhYn19 XHBhcmQgXHFsIFxmaS0zNjBcbGkzNjBccmkwXG5vd2lkY3RscGFyXHR4MzYw e1wqXHBuIFxwbmx2bGJsdFxpbHZsMFxsczFccG5ybm90MFxwbmYzXHBuaW5k ZW50MzYwIHtccG50eHRiIFwnYjd9fVxmYWF1dG9cbHMxXHJpbjBcbGluMzYw XGl0YXAwIHsNClxmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAz MyBJbnN0YWxsYXRpb25zIG9mIHByb2plY3Rpb24gZGV2aWNlcyBhbmQgd2ly ZWxlc3MgdmlkZW8gdHJhbnNmZXIgdW5pdHMuIE9uZSBvZiB0aGVzZSBpbnN0 YWxhdGlvbnMgaW5jbHVkZWQgYSBwcm9qZWN0IGluIHRoZSBmYWN1bHR5IGhv c3BpdGFsIGluIEhyYWRlYyBLclwnZTFsb3ZcJ2U5IHdoZXINCmUgdGhlIGV5 ZSBzdXJnZXJ5IGhhZCBiZWVuIGNhcHR1cmVkIG9uIHRoZSBjYW1lcmEsIHNl bnQgd2lyZWxlc3MgdG8gYSByZW1vdGUgY2xhc3Nyb29tIGFuZCBwcm9qZWN0 ZWQgYXMgYSBsYXJnZSBtb3ZpZSBwaWN0dXJlIHRvIHRoZSBzdHVkZW50cy4g VGhpcyBhbHNvIGluY2x1ZGVkIGEgd2lyZWxlc3Mgc291bmQgdHJhbnNmZXIg YXMgdGhlIHN1cmdlb24gd2FzIGNvbW1lbnRpbmcgdGhlIG9wZXJhdGlvbi4N ClxwYXIgfVxwYXJkIFxxbCBcbGkwXHJpMFxub3dpZGN0bHBhcntcKlxwbiBc cG5sdmxjb250XGlsdmwwXGxzMFxwbnJub3QwXHBuZGVjIH1cZmFhdXRvXHJp bjBcbGluMFxpdGFwMCB7XGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5n bnAxMDMzIA0KXHBhciANClxwYXIgfXtcYlxpXGZzMjBcbGFuZzEwMzNcbGFu Z2ZlMTAyOVxsYW5nbnAxMDMzIFZpZGVvIEVkaXRpbmcNClxwYXIgDQpccGFy IHtccG50ZXh0XHBhcmRccGxhaW5cZjNcZnMyMFxsYW5nMTAzM1xsYW5nZmUx MDI5XGxhbmducDEwMzMgXGxvY2hcYWYzXGRiY2hcYWYwXGhpY2hcZjMgXCdi N1x0YWJ9fVxwYXJkIFxxbCBcZmktMzYwXGxpMzYwXHJpMFxub3dpZGN0bHBh clx0eDM2MHtcKlxwbiBccG5sdmxibHRcaWx2bDBcbHMxXHBucm5vdDBccG5m M1xwbmluZGVudDM2MCB7XHBudHh0YiBcJ2I3fX1cZmFhdXRvXGxzMVxyaW4w XGxpbjM2MFxpdGFwMCB7DQpcZnMyMFxsYW5nMTAzM1xsYW5nZmUxMDI5XGxh bmducDEwMzMgR3JhYmJpbmcgdmlkZW8sIG9mZmxpbmUgdmlkZW8gZWRpdGlu ZywgdHJhbnNpdGlvbnMsIHRpdGxlcywgZWZmZWN0cy4NClxwYXIge1xwbnRl eHRccGFyZFxwbGFpblxmM1xmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFu Z25wMTAzMyBcbG9jaFxhZjNcZGJjaFxhZjBcaGljaFxmMyBcJ2I3XHRhYn19 XHBhcmQgXHFsIFxmaS0zNjBcbGkzNjBccmkwXG5vd2lkY3RscGFyXHR4MzYw e1wqXHBuIFxwbmx2bGJsdFxpbHZsMFxsczFccG5ybm90MFxwbmYzXHBuaW5k ZW50MzYwIHtccG50eHRiIFwnYjd9fVxmYWF1dG9cbHMxXHJpbjBcbGluMzYw XGl0YXAwIHsNClxmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAz MyBWaWRlb0NEIGFuZCBEVkQgb3B0aW1pemF0aW9uLCBwcm9jZXNzaW5nIHZp ZGVvICYgc291bmQgZmlsZXM7DQpccGFyIH1ccGFyZCBccWwgXGZpMzAwXGxp MFxyaTBcbm93aWRjdGxwYXJcZmFhdXRvXHJpbjBcbGluMFxpdGFwMCB7XGZz MjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIA0KXHBhciB9XHBh cmQgXHFsIFxsaTBccmkwXG5vd2lkY3RscGFyXGZhYXV0b1xyaW4wXGxpbjBc aXRhcDAge1xmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyAN ClxwYXIgIH17XGJcaVxmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25w MTAzMyBQcm9ncmFtbWluZw0KXHBhciB9e1xmczIwXGxhbmcxMDMzXGxhbmdm ZTEwMjlcbGFuZ25wMTAzMyAgDQpccGFyIH1ccGFyZCBccWwgXGxpMFxyaTBc bm93aWRjdGxwYXJcdHg0NTM2XGZhYXV0b1xyaW4wXGxpbjBcaXRhcDAge1xm czIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyAgQmFzaWMsIEJv cmxhbmQgUGFzY2FsLCB9e1xiXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxs YW5nbnAxMDMzIEJvcmxhbmQgRGVscGhpIDYgLCBCb3JsYW5kICBDKyt9e1xm czIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyAgKHdpbjMyKSwg IH17DQpcYlxmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyBI VE1MLCBBU1AufXtcZnMyMFxsYW5nMTAzM1xsYW5nZmUxMDI5XGxhbmducDEw MzMgDQpccGFyIH1ccGFyZCBccWwgXGxpMFxyaTBcbm93aWRjdGxwYXJcZmFh dXRvXHJpbjBcbGluMFxpdGFwMCB7XGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAy OVxsYW5nbnAxMDMzICANClxwYXIgDQpccGFyICB9e1xiXGlcZnMyMFxsYW5n MTAzM1xsYW5nZmUxMDI5XGxhbmducDEwMzMgV2ViIERlc2lnbn17XGZzMjBc bGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIA0KXHBhciAgDQpccGFy ICB9e1xpXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIEV4 YW1wbGVzIG9mIG15ICB3b3JrICh3ZWIgZGVzaWduaW5nKSBjYW4gYmUgc2Vl bn17XGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzICBhdCAg ICAgfXtcYlxmczIyXHVsXGNmMlxsYW5nMTAzM1xsYW5nZmUxMDI5XGxhbmdu cDEwMzMgd3d3LmNvbWV4LmN6fXtcYlxmczIyXGxhbmcxMDMzXGxhbmdmZTEw MjlcbGFuZ25wMTAzMyANClxwYXIgfXtcZnMyMFxsYW5nMTAzM1xsYW5nZmUx MDI5XGxhbmducDEwMzMgDQpccGFyIA0KXHBhciB9e1xiXGlcZnMyMFxsYW5n MTAzM1xsYW5nZmUxMDI5XGxhbmducDEwMzMgQXBsaWNhdGlvbiBFeHBlcmll bmNlDQpccGFyIA0KXHBhciB9XHBhcmQgXHFsIFxmaS0xNDEwXGxpMTQxNlxy aTBcbm93aWRjdGxwYXJcZmFhdXRvXHJpbjBcbGluMTQxNlxpdGFwMCB7XGJc ZnMyMFxsYW5nMTAzM1xsYW5nZmUxMDI5XGxhbmducDEwMzMgT2ZmaWNlXHRh YiB9e1xmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyANCk1T IFdPUkQsIE1TIEVYQ0VMLCBNUyBQb3dlclBvaW50LCBNUyBBQ0NFU1MsIFJl Y29nbml0YSwgT3V0bG9vayBFeHBsb3JlciwgTmV0c2NhcGUgLCBJQ1EsIE1T SUU1IGFuZCBtYW55IG90aGVycy4NClxwYXIgDQpccGFyIH1ccGFyZCBccWwg XGxpMFxyaTBcbm93aWRjdGxwYXJcZmFhdXRvXHJpbjBcbGluMFxpdGFwMCB7 XGJcZnMyMFxsYW5nMTAzM1xsYW5nZmUxMDI5XGxhbmducDEwMzMgTmV0d29y a3NcdGFiIH17XGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMz IFdpbnByb3h5LCBBcGFjaGUsIE5ldCBVdGlscywgUEMgQW55d2hlcmUsIFRp bWJ1a3R1LCBvdGhlciBGVFAgYW5kIFdFQiBzZXJ2ZXJzLCBJSVMuDQpccGFy IA0KXHBhciB9e1xiXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAx MDMzIEdyYXBoaWMgMkRcdGFiIH17XGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAy OVxsYW5nbnAxMDMzIEFkb2JlIFBob3Rvc2hvcCwgVWxlYWQgUGhvdG9JbWFn ZSwgT2x5bXB1cyBDYW1lZGlhIFZpZGVvIEVkaXRpbmcuDQpccGFyIA0KXHBh ciB9e1xiXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIEdy YXBoaWMgM0RcdGFiIH17XGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5n bnAxMDMzIDNEIEFuaW1hdGlvbnM6IEJyeWNlM0QsIFJoaW5vLCBTb2Z0aW1h Z2UgMy43LCAzLjgsIFRydWVTcGFjZSA0LjIuDQpccGFyIA0KXHBhciB9XHBh cmQgXHFsIFxmaS0xNDEwXGxpMTQxMFxyaTBcbm93aWRjdGxwYXJcZmFhdXRv XHJpbjBcbGluMTQxMFxpdGFwMCB7XGJcZnMyMFxsYW5nMTAzM1xsYW5nZmUx MDI5XGxhbmducDEwMzMgQXVkaW9cdGFiIFx0YWIgfXtcZnMyMFxsYW5nMTAz M1xsYW5nZmUxMDI5XGxhbmducDEwMzMgQ29vbCBFZGl0IFBybywgV2luYW1w LCANClxwYXIgfVxwYXJkIFxxbCBcbGkwXHJpMFxub3dpZGN0bHBhclxmYWF1 dG9ccmluMFxsaW4wXGl0YXAwIHtcZnMyMFxsYW5nMTAzM1xsYW5nZmUxMDI5 XGxhbmducDEwMzMgDQpccGFyIH17XGJcZnMyMFxsYW5nMTAzM1xsYW5nZmUx MDI5XGxhbmducDEwMzMgVmlkZW9cdGFiIFx0YWIgfXtcZnMyMFxsYW5nMTAz M1xsYW5nZmUxMDI5XGxhbmducDEwMzMgVWxlYWQgIE1lZGlhIFN0dWRpbyBQ cm8sIEFkb2JlIFByZW1pZXJlLCBBdmlkIENpbmVtYS4NClxwYXIgDQpccGFy IH17XGJcZnMyMFxsYW5nMTAzM1xsYW5nZmUxMDI5XGxhbmducDEwMzMgUHJv Z3JhbW1pbmdcdGFiIH17XGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5n bnAxMDMzIFRQNywgQm9ybGFuZCBDKysgYnVpbGRlciA1LCBCb3JsYW5kIERl bHBoaSA2LCBIb21lc2l0ZQ0KXHBhciANClxwYXIgfXtcYlxmczIwXGxhbmcx MDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyBGaWxlIG1nbXQuXHRhYiB9e1xm czIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyBXaW5kb3dzIENv bW1hbmRlciwgVm9sY292IGNvbW1hbmRlciwgTm9ydG9uIENvbW1hbmRlciwg TWlkbmlnaHQgY29tbWFuZGVyDQpccGFyIH1ccGFyZCBccWwgXGZpNzA4XGxp NzA4XHJpMFxub3dpZGN0bHBhclxmYWF1dG9ccmluMFxsaW43MDhcaXRhcDAg e1xmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyAoTGludXgp LCBTYWxhbWFuZGVyLCBtNjAyOyBDbGVhblN3ZWVwIFBhcnRpdGlvbk1hZ2lj LCBGZGlzaywgRmRpc2sgKExpbnV4KQ0KXHBhciAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgIA0KXHBhciB9XHBhcmQgXHFs IFxsaTBccmkwXG5vd2lkY3RscGFyXGZhYXV0b1xyaW4wXGxpbjBcaXRhcDAg e1xiXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIENELUJ1 cm5pbmdcdGFiIH17XGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAx MDMzIEVhc3kgQ0QgY3JlYXRvciwgVG9HTzsgIE5lcm8gICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg DQpccGFyICAgICAgICAgICAgICAgICAgICAgICANClxwYXIgIA0KXHBhciB9 e1xiXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIFV0aWxp dGllc1x0YWIgXHRhYiB9e1xmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFu Z25wMTAzMyBOb3J0b24gVXRpbGl0aWVzIChhbGwgdmVyc2lvbnMpIDsgTm9y dG9uIEFudGl2aXJ1cyBhbmQgbWFueSBvdGhlcnMNClxwYXIgDQpccGFyIA0K XHBhciB9XHBhcmQgXHFsIFxsaTBccmkwXGtlZXBuXG5vd2lkY3RscGFyXGZh YXV0b1xyaW4wXGxpbjBcaXRhcDAge1xiXGZzMjJcbGFuZzEwMzNcbGFuZ2Zl MTAyOVxsYW5nbnAxMDMzIFdPUksgRVhQRVJJRU5DRQ0KXHBhciB9XHBhcmQg XHFsIFxsaTBccmkwXG5vd2lkY3RscGFyXGZhYXV0b1xyaW4wXGxpbjBcaXRh cDAge1xmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyANClxw YXIgfXtcYlxmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAzMyAx OTk1IC0gcHJlc2VudCBcdGFiIFBhcnQgdGltZSBqb2IgYXQgQ29tZXggKGZh bWlseSBidXNpbmVzcykgaW4gbWFya2V0aW5nIGFuZCBhcyBJVCB0ZWNobmlj aWFuDQpccGFyIA0KXHBhciB9e1xmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlc bGFuZ25wMTAzMyBDb250YWN0aW5nIGN1c3RvbWVycywgaW50cm9kdWNpbmcg bmV3IHByb2R1Y3RzLCBhZHZlcnRpc2luZywgZG9pbmcgZ3JhcGhpY2FsIGQN CmVzaWduIGZvciBhZHZlcnRpc2luZyBjYW1wYWlnbnMgKGJyb2NodXJlcywg cG9zdGVycywgYmFubmVycywgbW91c2UgcGFkcywgVC1zaGlydHMsIHBlbmNp bHMgYW5kIGV0Yy4pLCBkZWFsaW5nIHdpdGggY3VzdG9tZXJzLCBoZWxwaW5n IHdpdGggcHJpY2luZyBwb2xpY3kgYW5kIG1hcmtldGluZyBzdHJhdGVneS4g RXhwZXJpZW5jZSBpbiB0cmFkaW5nIHdpdGggb3ZlcnNlYXMsIGRlYWxpbmcg d2l0aCBvdXIgVS5TLiBwYXJ0bmVycy4NClxwYXIgDQpccGFyIEFzc2VtYmwN CmluZyBkZXNrdG9wIGNvbXB1dGVycywgc2VydmVycywgY29uZmlndXJpbmcg ZGV2aWNlcywgaW5zdGFsbGluZyBvcGVyYXRpbmcgc3lzdGVtKHMpIGFuZCBi b290IG9wdGlvbnMsIGNvbm5lY3RpbmcgcGVyaXBoZXJhbCBkZXZpY2VzLCBp bnN0YWxsaW5nIG9wdGlvbmFsIHNvZnR3YXJlLCBjb25maWd1cmluZyBuZXR3 b3JrIGFuZCBpbnRlcm5ldCBjb25uZWN0aW9ucywgc2V0dGluZyB1cCBJU1Ag YWNjb3VudHMsIHJlZ2lzdGVyaW5nIGRvbWFpbnMsIA0Kc2V0dGluZyBtYWls IG9wdGlvbnMsIG1haWwgZm9yd2FyZGluZywgcG9zc2libGUgdmlydXMgZWxp bWluYXRpb24gYW5kIHRyb3VibGVzaG9vdGluZywgDQpccGFyIA0KXHBhciAN ClxwYXIgfXtcYlxmczIwXGxhbmcxMDMzXGxhbmdmZTEwMjlcbGFuZ25wMTAz MyAxOTk5XHRhYiAgVGhlIEluc3RpdHV0ZSBvZiBDaXZpbCBEZWZlbnNlDQpc cGFyIH17XGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxsYW5nbnAxMDMzIA0K XHBhciBEZXZlbG9wZWQgYSBzcGVjaWFsIGNyYXNoLXBsYW4gc3lzdGVtIHBv d2VyZWQgYnkgTVMgZXhjZWwuIFRoaXMgc3lzdGVtIGlzIHRvIGJlIHVzZWQg YnkgYSBzcGVjaWFsIG1pbGl0YXJ5IHVuaXQgd29ya2luZyB3aXRoIElUIGlu IHRoZSB0ZXJyYWluLiAgVGhpcyBwcm9qZWN0IGlzIGNvbnNpZGVyZWQgVE9Q IFNFQ1JFVCBhbmQgZm9yIHRoaXMgcmVhc29uIEkgY2Fubm90IHJldmVhbCBh bnkgbW9yZSBpbmZvcm1hdGlvbi4NClxwYXIgQXQgdGhlIHNhbWUgdGltZSBJ IGhhdmUgbGVjdHVyZWQgdGhlIEluc3RpdHV0ZSBvZiAgQ2l2aWwgZGVmZW5z ZVxycXVvdGUgcyBlbXBsb3llZXMuIFRoZSBvYmplY3RpdmUgb2YgdGhlc2Ug bGVjdHVyZXMNClxwYXIgd2FzIHRvIGV4cGxhaW4gdGhlIGJhc2ljXHJxdW90 ZSBzIG9mIHRoZSBvcGVyYXRpbmcgc3lzdGVtIGFuZCB0byBsZWFybiBwYXJ0 aWNpcGFudHMgdG8gbWFrZSB0aGVpciBvd24gcHJvZmVzc2lvbmFsIGxvb2tp bmcgcHJlc2VudGF0aW9ucyBpbiBNUyBQb3dlclBvaW50IGFuZCB0byBsZWFy biB0aGVtIGhvdyB0byB1c2UgdGhlIHByZXNlbnRhdGlvbnMuIFBhcnQgb2Yg dGhlIGxlY3R1cmUgd2FzDQpccGFyIGFsc28gYSBwcm9jZXNzIG9mIGFjcXVp cmluZyBhbmQgcHJvY2Vzc2luZyBhcHByb3ByaWF0ZSBkYXRhIGZvciB0aGUg cHJlc2VudGF0aW9uLiBJIGRpZCBnaXZlIGluZGl2aWR1YWwgY29uc3VsdGF0 aW9ucyBhcyB3ZWxsLg0KXHBhciANClxwYXIgDQpccGFyIH17XGJcZnMyMFxs YW5nMTAzM1xsYW5nZmUxMDI5XGxhbmducDEwMzMgMTk5OVx0YWIgQXRsYXMg KHNlYXJjaCBlbmdpbmUsIFByYWd1ZSksIFNhbGVzIG1hbmFnZXINClxwYXIg fXtcZnMyMFxsYW5nMTAzM1xsYW5nZmUxMDI5XGxhbmducDEwMzMgDQpccGFy IENvbnRhY3RpbmcgY2xpZW50cywgcGFydGljaXBhdGluZyBpbiBtZWV0aW5n cywgaGFuZGxpbmcgcHJpY2luZyBwb2xpY3kgYW5kIGRldmVsb3BpbmcgbmV3 IHNlcnZpY2VzIGFuZCBuZXcgd2F5cyBvZiBhZA0KdmVydGlzaW5nIG9uIHRo ZSBzZWFyY2ggZW5naW5lLiAgT25lIG9mIHRoZSBuZXdlc3QgaGl0cyBvZiB0 aGUgaW50ZXJuZXQgYWR2ZXJ0aXNpbmcgd2FzIGEgc3lzdGVtIHRoYXQgc2hv d2VkIHRoZSBsb2dvIG9mIHRoZSBjb21wYW55IGFuZCBhIHNtYWxsIGJhbm5l ciBjb21wb3NlZCB0byB0aGUgd2ViIHBhZ2Ugd2hlbiBkaXNwbGF5aW5nIHRo ZSBzZWFyY2ggcmVzdWx0cy4gTWFuYWdpbmcgYWR2ZXJ0aXNpbmcgY2FtcGFp Z25zLCBoYW5kbGluZyBhDQpkdmVydGlzaW5nIHNvZnR3YXJlIGluIEFTUCwg Y29udHJvbGxpbmcgdGhlIFNlcnZlciBJbXByZXNzaW9uIFN0YXR1cyBhbmQg ZWNvbm9taWNhbGx5IGZpbGxpbmcgYWxsIGVtcHR5IGdhcHMgYnkgdGVtcG9y YXJ5IGNhbXBhaWducy4gVmVyeSBjbG9zZSBjb29wZXJhdGlvbiB3aXRoIG1h cmtldGluZyBtYW5hZ2VycyBhbmQgcHJvamVjdCBtYW5hZ2Vycy4NClxwYXIg DQpccGFyIA0KXHBhciB9e1xiXGZzMjBcbGFuZzEwMzNcbGFuZ2ZlMTAyOVxs YW5nbnAxMDMzIDE5ODAtMjAwMFx0YWIgVC1TT0ZUIC0gQ29udHJhY3RlZCB0 byBkZXZlbG9wIGFuIEV4cGVydCBTeXN0ZW19e1xmczIwXGxhbmcxMDMzXGxh bmdmZTEwMjlcbGFuZ25wMTAzMyANClxwYXIgDQpccGFyIFRoaXMgc3lzdGVt IGlzIGJhc2VkIG9uIG1hdGhlbWF0aWNhbCBhcHByb3hpbWF0aW9uLCBtYXRy aXggYW5kIGNvbnZlcnNpb25zIGFuZCBiaW5hcnkgb3BlcmF0aW9ucyB0byBw cmVkaWN0IGFuIGFwcHJveGltYXRlIGJlaGF2aW9yIG9mIGNlcnRhaW4gcGFy dGljbGVzIGluIHNwYWNlIGJhc2VkIG9uIHRoZWlyIHByZXZpb3VzIGNvaGVy ZW5jZSBhbmQgYSBwcm9iYWJpbGl0eSBvZiBvY2N1cnJlbmNlLiAgVGhpcyBz eXN0ZW0gY291bGQgcmUNCnZlYWwgbmV3IG1ldGhvZHMgaW4gbWVkaWNhbCBy ZXNlYXJjaCBhbmQgdGhlIGZpZWxkIG9mIGNoZW1pc3RyeSB0byBmaW5kIHRo ZSBpbnRlcmFjdGlvbiBhbmQgZGVwZW5kZW5jaWVzIGFtb25nIGNoZW1pY2Fs cywgdmlydXNlcywgZGlzZWFzZXMgYW5kIG1pY3JvYmVzLiAgSSBhbHNvIGJl bGlldmUgdGhhdCB0aGlzIHN5c3RlbSBjb3VsZCByZWFjaCBhIHZlcnkgaGln aCBwcmVjaXNpb24gcmF0aW8gd2hlbiB1c2VkIGFzIGEgbmV1cmFsIG5ldHdv cg0Kay4gIE1vcmUgZGV0YWlsZWQgaW5mb3JtYXRpb24gY2FuIGJlIHByb3Zp ZGVkIHVwb24gcmVxdWVzdC4gIA0KXHBhciANClxwYXIgDQpccGFyIH17XGJc ZnMyMFxsYW5nMTAzM1xsYW5nZmUxMDI5XGxhbmducDEwMzMgSE9CQklFUyBB TkQgSU5URVJFU1RTDQpccGFyIA0KXHBhciB9e1xmczIwXGxhbmcxMDMzXGxh bmdmZTEwMjlcbGFuZ25wMTAzMyBNdXNpYywgU3BvcnRzLCBDb21wdXRlcnMs IFRyYXZlbCwgTWVldGluZyBuZXcgcGVvcGxlLg0KXHBhciANClxwYXIgfVxw YXJkXHBsYWluIFxzMVxxbCBcbGkwXHJpMFxrZWVwblxub3dpZGN0bHBhclxm YWF1dG9cb3V0bGluZWxldmVsMFxyaW4wXGxpbjBcaXRhcDAgXGJcZnMyMFxs YW5nMTAzM1xsYW5nZmUxMDI5XGNncmlkXGxhbmducDEwMzNcbGFuZ2ZlbnAx MDI5IHtBTEwgUkVGRVJFTkNFUyBBUkUgQVZBSUxBQkxFIFVQT04gUkVRVUVT VA0KXHBhciB9fQ== --= Multipart Boundary Feb11021527-- From andrew@anatomy.otago.ac.nz Tue Feb 12 23:09:00 2002 From: andrew@anatomy.otago.ac.nz (Andrew Macgregor) Date: Wed, 13 Feb 2002 12:09:00 +1300 Subject: [Bioperl-l] Can bioperl parse homologene files? Message-ID: Hello, Can anyone tell me whether bioperl can be used to parse homologene files available from NCBI? Is this the type of thing bioperl can do? I've had a good look around the list archives, the tutorial etc but don't seem to be able to find anything. Am I missing something? The file is the hmlg.trip.ftp file and looks like this: > Hs|Mm|B|LL.23271 |23585 |AL110158 |LL.67886 |144143 |AK018678 |92.51 Hs|Rn|B|LL.23271 |23585 |BC011385 | |51149 |AI454462 |90.26 Rn|Mm|B| |51149 |AI454462 |LL.67886 |144143 |AV233538 |93.47 TITLE Hs.23585=KIAA1078 KIAA1078 protein TITLE Mm.144143=1600013L13Rik RIKEN cDNA 1600013L13 gene TITLE Rn.51149=- ESTs > Xl|Dm|B| |1091 |AB045628 | |LL.41094 | |68.27 Dr|Xl|B| |2089 |AI588500 | |1091 |BG363776 |82.52 Hs|Xl|B|LL.23369 |6151 |AF315591 | |1091 |AB045628 |77.98 Mm|Xl|B|LL.80913 |20543 |AY027917 | |1091 |AB045628 |77.72 Rn|Xl|B| |44196 |BF417362 | |1091 |AB045628 |83.62 Rn|Dm|B| |44196 |AI408670 | |LL.41094 | |79.39 Hs|Mm|c|LL.23369 |6151| |LL.80913 |20543 | | Hs|Mm|B|LL.23369 |6151 |AF315591 |LL.80913 |20543 |AY027917 |93.35 Dr|Dm|B| |2089 |AI588500 | |LL.41094 | |73.41 TITLE Dm.LL.41094=pum pumilio TITLE Dr.2089=- ESTs, Moderately similar to A46221 abdominal segment formation protein pumilio - fruit fly [D.melanogaster] TITLE Hs.6151=PUM2 pumilio (Drosophila) homolog 2 TITLE Mm.20543=Pum2 pumilio 2 (Drosophila) TITLE Rn.44196=- ESTs, Moderately similar to A46221 abdominal segment formation protein pumilio - fruit fly [D.melanogaster] TITLE Xl.1091=- Xenopus laevis mRNA for pumilio, partial cds ... Cheers, Andrew. From jason@cgt.mc.duke.edu Wed Feb 13 00:21:03 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Tue, 12 Feb 2002 19:21:03 -0500 (EST) Subject: [Bioperl-l] 1.0alpha this weekend? In-Reply-To: Message-ID: On Tue, 12 Feb 2002, Ewan Birney wrote: > > > With Peter's and Brian's documentation fixes in I would like to propose a > 1.0alpha release this coming weekend. > > > (a) Could code reviewers (myself included) review code > > > (b) Jason/Mark --- are the issues with SearchIO resolved? > yep - he just wanted to be able to reset the iterator - (after i fixed the silly blastn parsing bug). > > (c) I would like to propose removing Bio::Tools::BLAST and replacing it > with a module which simply throws an exception on new describing how to > use the SearchIO system > yeah - and can we agree the BPlite is in the twilight - we'll plan to provide bug fixes on BPlite but development effort will be focused on SearchIO unless someone REALLY wants to be its maintainer. > > (d) Lincoln - you said you wanted to run all of genbank through the SeqIO > system? > > > > > any other thoughts out there? > There are 26 bugs in the queue some of them are not going to get done in this releae I suspect, but many of them are straightforward. Would be nice to take a look and see what can get fixed. The highlights of what is in the queue, volunteers needed to test and fix these bugs. Your contribution could just be to provide a reproduceable script (and datafile where needed) for the bug. Cross-Platform / Execing programs * (966) Tools::Run::Alignment modules * Platform dependent issues (906) Mac and (1052) Windows+Clustalw. Not sure I want to fix 1052 as suggested. * (986) Tempfiles and cleanup - do we want to do a migratio to IO::File from File::Temp??? SeqFeatures * (992) Sub dividing a seqeunce (trunc) and remapping the seqfeature coordinates -- what happens to fuzzies.... we should probably support this by making new fuzzies - this was consensus at hackathon. * (1038) - the Bio::SeqFeature::Gene objects may have a bug? SeqIO * (876) decide if we want to do any PIR support - the module had been updated, not sure it is completely compliant. * (987) SCF bug which should be gone with Chad's new implementation * (1000) - EMBL bug that is fixed on main-branch - can we remove (are we ever going to do another 0.7 series release?) * (1043,1062,1068,1069) genbank parsing, (1071) swissprot writing [ I tested 1043 and it is definitely there] Are we really writing in the new GenBank format - can we really parse the new genbank format properly? I also may have lost Emmanuel's bug wrt to SwissProt unless Allen fixed it in his SwissProt. Misc * (1039) - Misc. Bio::Tools::SeqPattern bug - is it really a bug? * (1014) - anyone use the Restriction Enzyme pkg and want to check this out? Analysis Result parsing /SearchIO * (1034) - possible HMMer parsing bug (are we going to move HMMer parsing into SearchIO for 1.0?) * (1025) - BPlite parsing issues, BPlite out of memory issues (1039) - probably due to tempfile issues with File::Temp * (1063) - SearchIO blast parsing an empty report - may be fixed already - just need a tester? > > > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From cjm@fruitfly.bdgp.berkeley.edu Wed Feb 13 02:44:37 2002 From: cjm@fruitfly.bdgp.berkeley.edu (Chris Mungall) Date: Tue, 12 Feb 2002 18:44:37 -0800 (PST) Subject: [Bioperl-l] RE: bioperl-db - changes (Chris Mungall) In-Reply-To: <000101c1b3fe$45ffc660$e0d1f980@bad2k> Message-ID: Oh, I'm just using the default branch rather than HEAD Should I be using HEAD? On Tue, 12 Feb 2002, Brian Desany wrote: > I'm doing a "cvs status" and getting this: > > >cvs status > cvs server: Examining . > =================================================================== > File: BUGS Status: Up-to-date > > Working revision: 1.2 > Repository revision: 1.2 /home/repository/bioperl/bioperl-db/BUGS,v > Sticky Tag: HEAD (revision: 1.2) > Sticky Date: (none) > Sticky Options: (none) > > =================================================================== > etc..... > So it _seems_ like I've looking in the right spot for the right files > (correct me if I'm wrong). > > Also, since I'm not a cvs expert, I'll tell you that these two commands > bring me the same (old) files (after logging in anonymously): > > cvs -d :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl co -r HEAD > bioperl-db > cvs -d :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl co bioperl-db > > On the other hand, "cvs -d > :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl co -r MAIN bioperl-db" > fails because MAIN isn't a tag (so it tells me). Am I just flat out issuing > the wrong checkout command? I don't do this too often... > > -Brian. > > > -----Original Message----- > > From: Chris Mungall [mailto:cjm@fruitfly.bdgp.berkeley.edu] > > Sent: Tuesday, February 12, 2002 1:17 PM > > To: Brian Desany > > Cc: bioperl-l@bioperl.org > > Subject: Re: [Bioperl-l] RE: bioperl-db - changes (Chris Mungall) > > > > > > > > Hmm, I just committed on the main branch - this is in the bioperl-db > > project remember. > > > > On Tue, 12 Feb 2002, Brian Desany wrote: > > > > > I don't see these changes when I do "cvs -n -q update" or > > "cvs status" or go > > > to WebCVS - is there normally some kind of a delay or is > > there some other > > > cvs command I need to use to find out which files have been > > changed? Do I > > > need to specify a particular branch maybe? > > > > > > Thanks, > > > -Brian. > > > > > > > > > > > I have committed some code to bioperl-db > > > > > > > > * Fuzzy Locations are now handled, using the > > location_qualifier_value > > > > table added to the biosql-schema during the hackathon. > > > > > > > > * Optimisations - all the features and locations for a > > > > sequence entry are > > > > now fetched in a few SQL calls rather than a number of calls > > > > proportional > > > > to the number of features. > > > > > > > > * Tidying - a lot of mysqlisms removed or pushed up to > > the BaseAdaptor > > > > layer, to allow for easier postgres support. Added a few > > > > generic ease of > > > > use methods to BaseAdaptor to more clearly expose the logic in the > > > > individual adaptor layer. > > > > > > > > * DBTestHarness now no longer uses the copy of the schema in the > > > > bioperl-db directory. Instead it checks > > > > ../biosql-schema/sql/biosqldb-mysql.sql - hmmm, this won't > > > > necessarily fit > > > > with the cvs re-organisation. How should we do this? An env > > > > var seems a > > > > bit nasty. > > > > > > > > --- > > > > Chris > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From bdesany@houston.rr.com Wed Feb 13 03:41:08 2002 From: bdesany@houston.rr.com (Brian Desany) Date: Tue, 12 Feb 2002 21:41:08 -0600 Subject: [Bioperl-l] RE: bioperl-db - changes (Chris Mungall) In-Reply-To: Message-ID: <000101c1b440$4ab30f40$327ba8c0@desany2k> > Oh, I'm just using the default branch rather than HEAD > > Should I be using HEAD? I don't know. You're the active developer :) I thought the default was HEAD anyway, which is consistent with there being no difference in the files I get whether I use HEAD or the default. I'll try to dig some more on the relationship between branches (eg MAIN) and tags (eg HEAD) and mail if I figure it out. > On Tue, 12 Feb 2002, Brian Desany wrote: > > > I'm doing a "cvs status" and getting this: > > > > >cvs status > > cvs server: Examining . > > =================================================================== > > File: BUGS Status: Up-to-date > > > > Working revision: 1.2 > > Repository revision: 1.2 > /home/repository/bioperl/bioperl-db/BUGS,v > > Sticky Tag: HEAD (revision: 1.2) > > Sticky Date: (none) > > Sticky Options: (none) > > > > =================================================================== > > etc..... > > So it _seems_ like I've looking in the right spot for the > right files > > (correct me if I'm wrong). > > > > Also, since I'm not a cvs expert, I'll tell you that these > two commands > > bring me the same (old) files (after logging in anonymously): > > > > cvs -d > :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl co -r HEAD > > bioperl-db > > cvs -d > :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl co bioperl-db > > > > On the other hand, "cvs -d > > :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl co -r > MAIN bioperl-db" > > fails because MAIN isn't a tag (so it tells me). Am I just > flat out issuing > > the wrong checkout command? I don't do this too often... > > > > -Brian. > > > > > -----Original Message----- > > > From: Chris Mungall [mailto:cjm@fruitfly.bdgp.berkeley.edu] > > > Sent: Tuesday, February 12, 2002 1:17 PM > > > To: Brian Desany > > > Cc: bioperl-l@bioperl.org > > > Subject: Re: [Bioperl-l] RE: bioperl-db - changes (Chris Mungall) > > > > > > > > > > > > Hmm, I just committed on the main branch - this is in the > bioperl-db > > > project remember. > > > > > > On Tue, 12 Feb 2002, Brian Desany wrote: > > > > > > > I don't see these changes when I do "cvs -n -q update" or > > > "cvs status" or go > > > > to WebCVS - is there normally some kind of a delay or is > > > there some other > > > > cvs command I need to use to find out which files have been > > > changed? Do I > > > > need to specify a particular branch maybe? > > > > > > > > Thanks, > > > > -Brian. > > > > > > > > > > > > > > I have committed some code to bioperl-db > > > > > > > > > > * Fuzzy Locations are now handled, using the > > > location_qualifier_value > > > > > table added to the biosql-schema during the hackathon. > > > > > > > > > > * Optimisations - all the features and locations for a > > > > > sequence entry are > > > > > now fetched in a few SQL calls rather than a number of calls > > > > > proportional > > > > > to the number of features. > > > > > > > > > > * Tidying - a lot of mysqlisms removed or pushed up to > > > the BaseAdaptor > > > > > layer, to allow for easier postgres support. Added a few > > > > > generic ease of > > > > > use methods to BaseAdaptor to more clearly expose the > logic in the > > > > > individual adaptor layer. > > > > > > > > > > * DBTestHarness now no longer uses the copy of the > schema in the > > > > > bioperl-db directory. Instead it checks > > > > > ../biosql-schema/sql/biosqldb-mysql.sql - hmmm, this won't > > > > > necessarily fit > > > > > with the cvs re-organisation. How should we do this? An env > > > > > var seems a > > > > > bit nasty. > > > > > > > > > > --- > > > > > Chris > > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@bioperl.org > > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From heikki@ebi.ac.uk Wed Feb 13 11:23:35 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 13 Feb 2002 11:23:35 +0000 Subject: [Bioperl-l] POD fixed again Message-ID: <3C6A4CB7.FD83F5F@ebi.ac.uk> I've again gone through POD documentation for the whole bioperl-live and fixed dozens of minor bugs. They have crept in since last check a bit more than two weeks ago. If you want to help keeping the documentaion clean, please run podchecker -warnings -warnings *.pm in the directory you've been working in. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jason@cgt.mc.duke.edu Wed Feb 13 13:26:14 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Wed, 13 Feb 2002 08:26:14 -0500 (EST) Subject: [Bioperl-l] RE: bioperl-db - changes (Chris Mungall) In-Reply-To: <000101c1b440$4ab30f40$327ba8c0@desany2k> Message-ID: There are only 2 branches on the bioperl-db src - branch-bioperl-072: which is to allow bioperl-db dev to continue with the 0.7 branch (for singapore mostly) until bioperl 1.0 is available and bug free. HEAD: which I suspect you are calling main. Which is where all active dev is focused. One issue at times is that the anonymous CVS is synced from the active dev every two hours so there can be a (up to 2 hr) lag between when the code is checked in by a dev and it appear on the anonymous srv. That said, I have not seen Chris's commits on the bioperl-db code yet - either as a msg on the guts list or when looking at the log for the files. Chris are you absolutely sure that you did a commit in the db directory? -jason On Tue, 12 Feb 2002, Brian Desany wrote: > > > > Oh, I'm just using the default branch rather than HEAD > > > > Should I be using HEAD? > > I don't know. You're the active developer :) > > I thought the default was HEAD anyway, which is consistent with there being > no difference in the files I get whether I use HEAD or the default. > > I'll try to dig some more on the relationship between branches (eg MAIN) and > tags (eg HEAD) and mail if I figure it out. > > > > > On Tue, 12 Feb 2002, Brian Desany wrote: > > > > > I'm doing a "cvs status" and getting this: > > > > > > >cvs status > > > cvs server: Examining . > > > =================================================================== > > > File: BUGS Status: Up-to-date > > > > > > Working revision: 1.2 > > > Repository revision: 1.2 > > /home/repository/bioperl/bioperl-db/BUGS,v > > > Sticky Tag: HEAD (revision: 1.2) > > > Sticky Date: (none) > > > Sticky Options: (none) > > > > > > =================================================================== > > > etc..... > > > So it _seems_ like I've looking in the right spot for the > > right files > > > (correct me if I'm wrong). > > > > > > Also, since I'm not a cvs expert, I'll tell you that these > > two commands > > > bring me the same (old) files (after logging in anonymously): > > > > > > cvs -d > > :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl co -r HEAD > > > bioperl-db > > > cvs -d > > :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl co bioperl-db > > > > > > On the other hand, "cvs -d > > > :pserver:cvs@cvs.bioperl.org:/home/repository/bioperl co -r > > MAIN bioperl-db" > > > fails because MAIN isn't a tag (so it tells me). Am I just > > flat out issuing > > > the wrong checkout command? I don't do this too often... > > > > > > -Brian. > > > > > > > -----Original Message----- > > > > From: Chris Mungall [mailto:cjm@fruitfly.bdgp.berkeley.edu] > > > > Sent: Tuesday, February 12, 2002 1:17 PM > > > > To: Brian Desany > > > > Cc: bioperl-l@bioperl.org > > > > Subject: Re: [Bioperl-l] RE: bioperl-db - changes (Chris Mungall) > > > > > > > > > > > > > > > > Hmm, I just committed on the main branch - this is in the > > bioperl-db > > > > project remember. > > > > > > > > On Tue, 12 Feb 2002, Brian Desany wrote: > > > > > > > > > I don't see these changes when I do "cvs -n -q update" or > > > > "cvs status" or go > > > > > to WebCVS - is there normally some kind of a delay or is > > > > there some other > > > > > cvs command I need to use to find out which files have been > > > > changed? Do I > > > > > need to specify a particular branch maybe? > > > > > > > > > > Thanks, > > > > > -Brian. > > > > > > > > > > > > > > > > > I have committed some code to bioperl-db > > > > > > > > > > > > * Fuzzy Locations are now handled, using the > > > > location_qualifier_value > > > > > > table added to the biosql-schema during the hackathon. > > > > > > > > > > > > * Optimisations - all the features and locations for a > > > > > > sequence entry are > > > > > > now fetched in a few SQL calls rather than a number of calls > > > > > > proportional > > > > > > to the number of features. > > > > > > > > > > > > * Tidying - a lot of mysqlisms removed or pushed up to > > > > the BaseAdaptor > > > > > > layer, to allow for easier postgres support. Added a few > > > > > > generic ease of > > > > > > use methods to BaseAdaptor to more clearly expose the > > logic in the > > > > > > individual adaptor layer. > > > > > > > > > > > > * DBTestHarness now no longer uses the copy of the > > schema in the > > > > > > bioperl-db directory. Instead it checks > > > > > > ../biosql-schema/sql/biosqldb-mysql.sql - hmmm, this won't > > > > > > necessarily fit > > > > > > with the cvs re-organisation. How should we do this? An env > > > > > > var seems a > > > > > > bit nasty. > > > > > > > > > > > > --- > > > > > > Chris > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l@bioperl.org > > > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From seth.redmond@ic.ac.uk Wed Feb 13 14:43:55 2002 From: seth.redmond@ic.ac.uk (Seth Redmond) Date: Wed, 13 Feb 2002 14:43:55 +0000 Subject: [Bioperl-l] re: database() Message-ID: <1ACA964F-2090-11D6-851F-0003936508E8@ic.ac.uk> I am attempting to parse the database names for each hit in a blast report using subjct_object->database() in blast.pm. However, instead of the database name I get '-' returned for each hit. Is there anywhere I might have gone wrong, anything I should consider or any alternatives to using this method. thanks -s -- ______________________________________________ Seth Redmond DNA resource and Database Curator Wellcome Trust Laboratories for Molecular Parasitology Department of Biological Sciences Imperial College London SW7 2AY ______________________________________________ From dblock@gnf.org Wed Feb 13 16:46:38 2002 From: dblock@gnf.org (David Block) Date: Wed, 13 Feb 2002 08:46:38 -0800 Subject: [Bioperl-l] 1.0alpha this weekend? In-Reply-To: Message-ID: <3F97F3AE-20A1-11D6-A4E7-0003931D9C38@gnf.org> > * (1038) - the Bio::SeqFeature::Gene objects may have a bug? > I checked the submitted script and was unable to reproduce the bug. This code should not have changed recently, so I have no idea what the problem is. Works for me, and if this didn't work in Genquire, which Mark has been debugging like crazy, Mark would have jumped all over it. I say go to 1.0alpha from my end! -- David Block (858)812-1513 Bioinformatics http://www.gnf.org dblock@gnf.org Just ridin' the Coaster... From mwilkinson@gene.pbi.nrc.ca Wed Feb 13 17:34:35 2002 From: mwilkinson@gene.pbi.nrc.ca (Mark Wilkinson) Date: Wed, 13 Feb 2002 11:34:35 -0600 Subject: [Bioperl-l] 1.0alpha this weekend? References: <3F97F3AE-20A1-11D6-A4E7-0003931D9C38@gnf.org> Message-ID: <3C6AA3AB.4EF09795@gene.pbi.nrc.ca> I haven't noticed a bug in Bio::SeqFeature::Gene, though we must be careful in that Genquire rolls its own routines in many/most cases... it is true to the BioPerl API, but not the code... At the same time, I have tested SeqCanvas quite extensively, and that uses the bona fide BioPerl routines for gene/transcript/feature creation and manipulation and I haven't seen anything amiss. I am unable to pull up bug 1038 on the tracking system, so I can't see what the report was.... If Dave says "go", I have no reason to think otherwise :-) M David Block wrote: > > * (1038) - the Bio::SeqFeature::Gene objects may have a bug? > > > I checked the submitted script and was unable to reproduce the bug. > This code should not have changed recently, so I have no idea what the > problem is. > > Works for me, and if this didn't work in Genquire, which Mark has been > debugging like crazy, Mark would have jumped all over it. > > I say go to 1.0alpha from my end! > > -- > David Block (858)812-1513 > Bioinformatics http://www.gnf.org > dblock@gnf.org Just ridin' the Coaster... > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l From jason@cgt.mc.duke.edu Wed Feb 13 18:04:14 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Wed, 13 Feb 2002 13:04:14 -0500 (EST) Subject: [Bioperl-l] 1.0alpha this weekend? In-Reply-To: <3C6AA3AB.4EF09795@gene.pbi.nrc.ca> Message-ID: good to hear - I'll update the bug report and check it off ... jason wishing for the whiteboard from the hackathon to start adding ticks.... -j On Wed, 13 Feb 2002, Mark Wilkinson wrote: > I haven't noticed a bug in Bio::SeqFeature::Gene, though we must be > careful in that Genquire rolls its own routines in many/most cases... it > is true to the BioPerl API, but not the code... > > At the same time, I have tested SeqCanvas quite extensively, and that uses > the bona fide BioPerl routines for gene/transcript/feature creation and > manipulation and I haven't seen anything amiss. I am unable to pull up > bug 1038 on the tracking system, so I can't see what the report was.... > > If Dave says "go", I have no reason to think otherwise :-) > > M > > > > David Block wrote: > > > > * (1038) - the Bio::SeqFeature::Gene objects may have a bug? > > > > > I checked the submitted script and was unable to reproduce the bug. > > This code should not have changed recently, so I have no idea what the > > problem is. > > > > Works for me, and if this didn't work in Genquire, which Mark has been > > debugging like crazy, Mark would have jumped all over it. > > > > I say go to 1.0alpha from my end! > > > > -- > > David Block (858)812-1513 > > Bioinformatics http://www.gnf.org > > dblock@gnf.org Just ridin' the Coaster... > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From schan@xenongenetics.com Wed Feb 13 18:42:38 2002 From: schan@xenongenetics.com (Simon Chan) Date: Wed, 13 Feb 2002 10:42:38 -0800 Subject: [Bioperl-l] Phred (Flinstone?) Parser Message-ID: Hi All, Could someone point me to a phred file parser written in perl? What if I wanted to align the sequences from 2 phred files and then compare the quality scores at each position? I started writing a program that'll do this, but then it occured that someone out there has already done this. Many thanks, All. Simon ################## From cjm@fruitfly.bdgp.berkeley.edu Wed Feb 13 18:50:04 2002 From: cjm@fruitfly.bdgp.berkeley.edu (Chris Mungall) Date: Wed, 13 Feb 2002 10:50:04 -0800 (PST) Subject: [Bioperl-l] RE: bioperl-db - changes (Chris Mungall) In-Reply-To: <000001c1b3f8$ec1142a0$e0d1f980@bad2k> Message-ID: OOPS!!! It turns out I was using the correct cvs branch (default == HEAD), I had just neglected to check for any warning messages after saving my commit message! Sure enough, my changes never went through do to a conflict. Ok, they're really in there this time (may take a wee while to propagate to webcvs) On Tue, 12 Feb 2002, Brian Desany wrote: > I don't see these changes when I do "cvs -n -q update" or "cvs status" or go > to WebCVS - is there normally some kind of a delay or is there some other > cvs command I need to use to find out which files have been changed? Do I > need to specify a particular branch maybe? > > Thanks, > -Brian. > > > > > I have committed some code to bioperl-db > > > > * Fuzzy Locations are now handled, using the location_qualifier_value > > table added to the biosql-schema during the hackathon. > > > > * Optimisations - all the features and locations for a > > sequence entry are > > now fetched in a few SQL calls rather than a number of calls > > proportional > > to the number of features. > > > > * Tidying - a lot of mysqlisms removed or pushed up to the BaseAdaptor > > layer, to allow for easier postgres support. Added a few > > generic ease of > > use methods to BaseAdaptor to more clearly expose the logic in the > > individual adaptor layer. > > > > * DBTestHarness now no longer uses the copy of the schema in the > > bioperl-db directory. Instead it checks > > ../biosql-schema/sql/biosqldb-mysql.sql - hmmm, this won't > > necessarily fit > > with the cvs re-organisation. How should we do this? An env > > var seems a > > bit nasty. > > > > --- > > Chris > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From dag@sonsorol.org Thu Feb 14 16:53:32 2002 From: dag@sonsorol.org (chris dagdigian) Date: Thu, 14 Feb 2002 11:53:32 -0500 Subject: [Bioperl-l] [Fwd: [Volunteer] gc_content] Message-ID: <3C6BEB8C.8050008@sonsorol.org> This is a multi-part message in MIME format. --------------080400050504060400040007 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit --------------080400050504060400040007 Content-Type: message/rfc822; name="[Volunteer] gc_content" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="[Volunteer] gc_content" Return-Path: Received: from pw600a.bioperl.org (pw600a.bioperl.org [199.93.107.70]) by fedayi.sonsorol.org (8.11.0/8.11.0) with ESMTP id g1EGH4125895 for ; Thu, 14 Feb 2002 11:17:04 -0500 (EST) Received: from pw600a.bioperl.org (localhost [127.0.0.1]) by pw600a.bioperl.org (8.12.2/8.12.2) with ESMTP id g1EGC2kO002746 for ; Thu, 14 Feb 2002 11:12:02 -0500 Received: from harpo.wi.mit.edu (genome.wi.mit.edu [18.157.0.135]) by pw600a.bioperl.org (8.12.2/8.12.2) with ESMTP id g1EGBmkO002741 for ; Thu, 14 Feb 2002 11:11:48 -0500 Received: from genome.wi.mit.edu (pc14095.wi.mit.edu [18.157.14.95]) by harpo.wi.mit.edu (8.9.2/8.9.2) with ESMTP id LAA03910 for ; Thu, 14 Feb 2002 11:18:12 -0500 (EST) Message-ID: <3C6BE343.53BB12A4@genome.wi.mit.edu> From: Seth Purcell Organization: Whitehead Institute Center for Genome Research X-Mailer: Mozilla 4.78 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: volunteer@open-bio.org Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: [Volunteer] gc_content Sender: volunteer-admin@open-bio.org Errors-To: volunteer-admin@open-bio.org X-BeenThere: volunteer@open-bio.org X-Mailman-Version: 2.0.6 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Open-Bio volunteer coordinator List-Unsubscribe: , List-Archive: Date: Thu, 14 Feb 2002 11:18:11 -0500 Hi - I am very unfamiliar with BioPerl, but it seems like there isn't a built-in method to get a sequence's gc content. I am assuming you just don't want to clutter your code with something so trivial, but it is a commonly repeated task, so if it would be useful to you please feel free to incorporate the following small code snippet as a method. I think it would make sense in either Seq or PrimarySeq. I can see how you might not want to clutter PrimarySeq, but if you put it there you could avoid both breaking the abstraction and copying the sequence just to get the gc content. However, it seems like the Seq and PrimarySeq methods copy the sequence all over the place, so you may not care about duplicating the sequence all the time. Sorry to bother you if you already have this functionality, I just didn't see it in the online documentation. sub gc_content { # calculate the gc content of the chunk of sequence passed as a parameter my $seq = shift; return ($seq =~ tr/gGcC//)/length($seq); } I don't know if BioPerl users would rather have a percent than a fraction, or if it would be useful to generalize this to be able to calculate the content of letters besides g and c, etc., but these are easy changes. The version I use optionally takes a reference to avoid copying long sequences a lot, but I didn't think this was necessary for a member function. Seth Purcell Scientific Programmer Whitehead Institute/MIT Center for Genome Research _______________________________________________ Volunteer mailing list Volunteer@open-bio.org http://open-bio.org/mailman/listinfo/volunteer --------------080400050504060400040007-- From jason@cgt.mc.duke.edu Thu Feb 14 17:31:32 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Thu, 14 Feb 2002 12:31:32 -0500 (EST) Subject: [Bioperl-l] (no subject) Message-ID: Seth - Thanks for the volunteer - actually this method exists in the Bio::Tools::SeqStats module. It actually allows you to calculate GC content - but perhaps it isn't obvious to new users how to get it to do what you want. I'm starting a FAQ maybe that will make its way into it eventually. -jason Date: Thu, 14 Feb 2002 11:18:11 -0500 From: Seth Purcell To: volunteer@open-bio.org Subject: [Volunteer] gc_content Hi - I am very unfamiliar with BioPerl, but it seems like there isn't a built-in method to get a sequence's gc content. I am assuming you just don't want to clutter your code with something so trivial, but it is a commonly repeated task, so if it would be useful to you please feel free to incorporate the following small code snippet as a method. I think it would make sense in either Seq or PrimarySeq. I can see how you might not want to clutter PrimarySeq, but if you put it there you could avoid both breaking the abstraction and copying the sequence just to get the gc content. However, it seems like the Seq and PrimarySeq methods copy the sequence all over the place, so you may not care about duplicating the sequence all the time. Sorry to bother you if you already have this functionality, I just didn't see it in the online documentation. sub gc_content { # calculate the gc content of the chunk of sequence passed as a parameter my $seq = shift; return ($seq =~ tr/gGcC//)/length($seq); } I don't know if BioPerl users would rather have a percent than a fraction, or if it would be useful to generalize this to be able to calculate the content of letters besides g and c, etc., but these are easy changes. The version I use optionally takes a reference to avoid copying long sequences a lot, but I didn't think this was necessary for a member function. Seth Purcell Scientific Programmer -- Jason Stajich Duke University jason@cgt.mc.duke.edu From jason@cgt.mc.duke.edu Thu Feb 14 17:46:06 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Thu, 14 Feb 2002 12:46:06 -0500 (EST) Subject: [Bioperl-l] FAQ Message-ID: I've started a FAQ and checked it in. Very very basic beginning. This is my first FAQ so if you have a better structure for organising it, feel free to step in and reformat. Eventually I'd like to add a link on the bioperl main site to some version (either CVS or static) of this in conjunction with 1.0. I remember that there were some people that wanted to help put questions in the FAQ. I had been in support of doing all the FAQ Q&A on Wiki but I'm just as happy to add questions to the FAQ as we answer them on the list. This means that all you people who help answer newbie questions on the list - or if you are an artist formerly known as a bioperl newbie - you can really help out a whole lot be putting questions down in the FAQ and/or answering them. If you - please Add your Name to the FAQ. If everyone is happy with the CVS way of doing it we'll leave it there. Other suggestions of course welcomed. -jason -- Jason Stajich Duke University jason@cgt.mc.duke.edu From d.navarro@bmb.sdu.dk Thu Feb 14 18:48:59 2002 From: d.navarro@bmb.sdu.dk (Danny Navarro) Date: 14 Feb 2002 19:48:59 +0100 Subject: [Bioperl-l] Bug in remote Blasts scripts Message-ID: <1013712539.19808.41.camel@boli> Hi All, I am a undergraduate student in biochemistry and I am not very good in programming yet. Playing with the remote Blast scripts I found that the retrieve_blast.pl script didn't work. The output was like this: Can't get RID from the input data. I've just changed the regexp for catching the RID: Original: /RID" VALUE="(\S+)"/s Changed: /name="RID" type="hidden" value="(\S+)"/s ...And now it works fine. At least for genbank. Is this a bug? I am doing something wrong? Should I submit to the mailing list if I find more things like this? Danny From jason@cgt.mc.duke.edu Thu Feb 14 19:02:30 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Thu, 14 Feb 2002 14:02:30 -0500 (EST) Subject: [Bioperl-l] Bug in remote Blasts scripts In-Reply-To: <1013712539.19808.41.camel@boli> Message-ID: Thanks Danny - This is in fact a known bug stemming from some unmaintained scripts (which should either be fixed or purged from our future releases). In the future I'd suggest using the Bio::Tools::Run::RemoteBlast module which provides an OO interface to remote blast submission and retrieval. -jason On 14 Feb 2002, Danny Navarro wrote: > Hi All, > > I am a undergraduate student in biochemistry and I am not very good in > programming yet. Playing with the remote Blast scripts I found that the > retrieve_blast.pl script didn't work. The output was like this: > > Can't get RID from the input data. > > I've just changed the regexp for catching the RID: > > Original: > /RID" VALUE="(\S+)"/s > > Changed: > /name="RID" type="hidden" value="(\S+)"/s > > ...And now it works fine. At least for genbank. > > Is this a bug? I am doing something wrong? Should I submit to the > mailing list if I find more things like this? > > Danny > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From dag@sonsorol.org Thu Feb 14 20:43:00 2002 From: dag@sonsorol.org (chris dagdigian) Date: Thu, 14 Feb 2002 15:43:00 -0500 Subject: [Bioperl-l] help update dependencies prior to 1.0 release Message-ID: <3C6C2154.60006@sonsorol.org> Folks, I just updated http://bioperl.org/Core/external.shtml to hopefully reflect the changes that are needed in the bioperl 1.0 world. The 2 major changes seem to be new dependencies for XML::Twig, DBD::mysql and DBI::mysqlopt. Please let me know if this page is missing anything or if there are any dependencies that we have forgotten to list. Also- I will be releasing a new version of Bundle::BioPerl soon as well. The only change for our bundle seems to be the addition of XML::Twig. I'm not going to put the MySQL stuff in the bundle because preliminary testing reveals that CPAN.pm does not politely failure cases where MySQL does not actually exist on the system. -Chris -- Chris Dagdigian, Life Science IT & Research Computing Freelancer Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193 PGP KeyID: 83D4310E Yahoo IM: craffi From andrew@anatomy.otago.ac.nz Thu Feb 14 20:36:39 2002 From: andrew@anatomy.otago.ac.nz (Andrew Macgregor) Date: Fri, 15 Feb 2002 09:36:39 +1300 Subject: [Bioperl-l] Homologene again... Message-ID: Hello, I haven't had any feedback on whether bioperl can parse homologene files so I'm guessing maybe it can't. Is this the type of thing that you want bioperl to do or is it out of scope? Can anybody point me to perl scripts that do this? If not, I'll be writing something to do the job. Is this something that could/should get put in bioperl somewhere, or in scripts central or is there just not too much interest in doing this? Cheers, Andrew. From kidd_beth@hotmail.com Thu Feb 14 21:10:04 2002 From: kidd_beth@hotmail.com (Beth Kidd) Date: Thu, 14 Feb 2002 16:10:04 -0500 Subject: [Bioperl-l] Bug in remote Blasts scripts References: <1013712539.19808.41.camel@boli> Message-ID: I did the same to get it working. Looks as if NCBI changed the HTML a little. Beth Kidd ----- Original Message ----- From: "Danny Navarro" To: "Bioperl" Sent: Thursday, February 14, 2002 1:48 PM Subject: [Bioperl-l] Bug in remote Blasts scripts > Hi All, > > I am a undergraduate student in biochemistry and I am not very good in > programming yet. Playing with the remote Blast scripts I found that the > retrieve_blast.pl script didn't work. The output was like this: > > Can't get RID from the input data. > > I've just changed the regexp for catching the RID: > > Original: > /RID" VALUE="(\S+)"/s > > Changed: > /name="RID" type="hidden" value="(\S+)"/s > > ...And now it works fine. At least for genbank. > > Is this a bug? I am doing something wrong? Should I submit to the > mailing list if I find more things like this? > > Danny > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From b_i_osborne@hotmail.com Thu Feb 14 21:16:14 2002 From: b_i_osborne@hotmail.com (Brian Osborne) Date: Thu, 14 Feb 2002 16:16:14 -0500 Subject: [Bioperl-l] ORF FInder References: Message-ID: Lynn, Jason is also saying that you can use EMBOSS programs from within Bioperl. Here's an example using EMBOSS's getorf program : use Bio::SeqIO; use Bio::Factory::EMBOSS; $factory = new Bio::Factory::EMBOSS; $app = $factory->program("getorf"); %input = ( -sequence => "input.fasta", -minsize => 22, -outseq => "orfs.fasta" ); $app->run(\%input); $seqio = Bio::SeqIO->new(-file => "orfs.fasta"); $seqobj = $seqio->next; Or something.... True, it's not strictly Bioperl-ish but you have tremendous amount of functionality in the EMBOSS suite, and this makes it all available easily. My understanding is that the EMBOSS modules will return Bioperl objects someday, rather than just create files for you as in the example above. To get the positions of the ORFs you're going to have parse the header/description line yourself, it's provided by the Seq object's desc() method. I'll add this to the new FAQ, something about functionality not found in Bioperl might be found in EMBOSS, which is accessible through Bioperl. Brian O. ----- Original Message ----- From: "Jason Stajich" To: "Lynn Stevens" Cc: Sent: Sunday, February 10, 2002 3:04 PM Subject: Re: [Bioperl-l] ORF FInder > Not in bioperl directly but you can use emboss's getorf program. > > On Sun, 10 Feb 2002, Lynn Stevens wrote: > > > Is there a module in BioPerl which allows you to take a sequence and get > > back a list of all the ORFs (or even just the largest ORF) in all six frames > > (or even just one frame) indexed by sequence position. > > > > In other words you would submit a seq object and you would get back a set of > > numbers which tell you where the ORFs are located in the sequence. > > > > I have looked through all the documentation and still can not find this > > feature even though it seem like an extremely common task. > > > > Thanks for any help, > > > > Lynn > > > > > > _________________________________________________________________ > > Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason@cgt.mc.duke.edu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From gordonp@cbr.nrc.ca Thu Feb 14 22:44:57 2002 From: gordonp@cbr.nrc.ca (Paul Gordon) Date: Thu, 14 Feb 2002 18:44:57 -0400 (AST) Subject: [Bioperl-l] Bug in remote Blasts scripts In-Reply-To: Message-ID: Of course, to make it really robust, an HTML parser should be used, but I think that we can get it mostly there (except pathological cases). Didn't somebody mention something about lenient parsers before? :-) /(?:\s(?:name=['"]RID['"]|value=['"]([^>]+?)['"])[^>]*?){2}/is Answer in $+, no matter the order of the attributes in the tag (or intervening data), or the contents of "value". Of course, a good programmmer would use /x and comment :-) > > I've just changed the regexp for catching the RID: > > > > Original: > > /RID" VALUE="(\S+)"/s > > > > Changed: > > /name="RID" type="hidden" value="(\S+)"/s From birney@ebi.ac.uk Thu Feb 14 22:49:44 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Thu, 14 Feb 2002 22:49:44 +0000 (GMT) Subject: [Bioperl-l] Homologene again... In-Reply-To: Message-ID: Andrew - what is the basic data type in homologene - if it is an alignment then an AlignIO read/writer would be great. Go for it! If it is just a sequence cluster, then we don't have a base object for that, so I'm unsure what we should do (make one?). Definitely more hassle. Contributions are welcome. ;) ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From b_i_osborne@hotmail.com Thu Feb 14 23:42:29 2002 From: b_i_osborne@hotmail.com (Brian Osborne) Date: Thu, 14 Feb 2002 18:42:29 -0500 Subject: [Bioperl-l] "POD errors" not detected by podchecker Message-ID: This is a multi-part message in MIME format. ------=_NextPart_000_003D_01C1B587.5AD97E20 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Colleagues, I've noticed a couple of mistakes that podchecker doesn't find. The = first one looks something like this : This method returns a L object Which makes sense literally but will be translated into something like : This method returns a the Bio::SeqIO::Bizou manpage object by pod2html. So you have to write it like : This method returns a Bio::SeqIO::Bizou object, see = L for details. Or the equivalent. From perlpod : Translators will mostly add wording around a L<> link, so that L = becomes "the foo(1) manpage", for example (see pod2man for details). = Thus, you shouldn't write things like the L manpage, if you want = the translated document to read sensibly. The second error is adding tags to any tab- or space-indented text. So = strings like : Do B use Bio::SeqIO::Bizou will not be reformatted to bold as desired, they'll appear as is, = because they're indented. From perlpod : A verbatim paragraph, distinguished by being indented (that is, it = starts with space or tab). It should be reproduced exactly,=20 Assuming the text is not part of a command paragraph, meaning a = paragraph preceded by a "=3D" command. Thanks once again, Brian O. ------=_NextPart_000_003D_01C1B587.5AD97E20 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Colleagues,
 
I've noticed a couple of mistakes that = podchecker=20 doesn't find. The first one looks something like this :
 
  This method returns a=20 L<Bio::SeqIO::Bizou> object
 
Which makes sense literally but will be = translated=20 into something like :
 
  This method returns a the = Bio::SeqIO::Bizou=20 manpage object
 
by pod2html. So you have to write it = like=20 :
 
  This method returns a = Bio::SeqIO::Bizou=20 object, see L<Bio::SeqIO::Bizou> for = details.
 
Or the equivalent. From perlpod = :
 
Translators will mostly add wording around a L<> link, so = that=20 L<foo(1)> becomes "the foo(1) manpage", for = example=20 (see pod2man for details). Thus, you shouldn't write things like=20 the L<foo> manpage, if you want the translated = document to=20 read sensibly.
 
 
The second error is adding tags to any = tab- or=20 space-indented text. So strings like :
 
       Do = B<not> use Bio::SeqIO::Bizou
 
will not be reformatted to bold as = desired, they'll=20 appear as is, because they're indented. From perlpod :
 
A verbatim paragraph, distinguished by being indented (that is, it = starts=20 with space or tab). It should be reproduced exactly,
 
Assuming the text is not part of a = command=20 paragraph, meaning a paragraph preceded by a "=3D" command.
 
Thanks once again,
 
Brian O.
 
 
------=_NextPart_000_003D_01C1B587.5AD97E20-- From jason@cgt.mc.duke.edu Fri Feb 15 02:54:56 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Thu, 14 Feb 2002 21:54:56 -0500 (EST) Subject: [Bioperl-l] Homologene again... In-Reply-To: Message-ID: No there aren't objects or parsers for this data in bioperl because homologene it is just a cluster of LocusLink Ids and accessions. I tried to write a basic parser for my own needs last month and sort of gave up on the data - been happier with the InParanoid Orthologs for what I needed in the end. Happy to give you what I started writing (note - I was interested in drosophila orthologs to human so this is specific for that). Hope this helps at all - I realize it is not at all sophisticated - and there are a couple of cases that it fails to parse because for some reason the file doesn't follow the format all the way through - go figure... -jason #!/usr/bin/perl -w use strict; # This is from the Homologene readme # The field delimiter is "|". # -The first two fields indicate the organisms from which the sequences # originate. # -The third field indicates the type of similarity. # -The fourth (LocusLink ID), fifth (UniGene ID), and sixth (Accession # number) fields correspond to the first organism. One or both of UG ID # and LL ID may be present. Locus Link and UniGene are in one-to-one # correspondence in the latter case, so no ambiguity arises through the # choice of set identifier. # -The seventh(LL), eighth(UG), and ninth(Accession) fields correspond # to the second organism. # -The tenth field is the percent identity of the alignment, or a URL to # the source of a curated ortholog. # A similarity between organisms may be a best match of several # different types, with the type of match indicated by the sixth # character of the record. # t indicates best match from the second field to the first. (when # using the second sequence as query, the first sequence is the best # match, with percent identity of alignments over 100 nt the score) # f indicates best match to the the second field from the first. # (when using the first sequence as query, the second sequence is the # best match) # b indicates reciprocal best match (cluster pairs identified by f and t # coincide). # B indicates reinforced reciprocal best match (reciprocal best matches # between at least three organisms agree). # c indicates a curated homology (i.e., one that # comes from outside NCBI or froma syntenic association, # rather than one that is produced by an automatic process run at NCBI). # Nota bene: many curated homologies are between genes rather than # between accession numbers; consequently, we've chosen not to display # accessions for all curated homologies, since the gene identifier- # accession mapping is not always accurately resolvable. open(HGENE, "hmlg.trip.ftp") or die("cannot open hmlg.trip.ftp"); $/ ="\n>"; while(my $l = ) { my @data = split(/\n/,$l); my ($title,$gene); foreach my $line ( @data ) { last if( $gene && $title); next if( $line =~ /^>/ ); if( $line =~ /^TITLE/ && $line =~ /Hs\./ ) { (undef,$title) = split(/\s+/,$line); } else { next unless ( $line =~ /Dm/ ); my ($speciesa,$speciesb, $matchtype, $lla,$uga,$acc_a,undef, $llb,$ugb,$acc_b, $pid) = split(/\|/,$line); if( lc($speciesa) eq 'dm' ) { $lla =~ s/^\s+(\S+)/$1/; $lla =~ s/(\S+)\s+$/$1/; $gene = $lla; } elsif( lc($speciesb) eq 'dm' ) { $llb =~ s/^\s+(\S+)/$1/; $llb =~ s/(\S+)\s+$/$1/; $gene = $llb; } } } if( $title && $gene ) { print "Title: $title Gene:$gene\n"; } } On Fri, 15 Feb 2002, Andrew Macgregor wrote: > Hello, > > I haven't had any feedback on whether bioperl can parse homologene > files so I'm guessing maybe it can't. Is this the type of thing that > you want bioperl to do or is it out of scope? > > Can anybody point me to perl scripts that do this? If not, I'll be > writing something to do the job. Is this something that could/should > get put in bioperl somewhere, or in scripts central or is there just > not too much interest in doing this? > > Cheers, Andrew. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From andrew@anatomy.otago.ac.nz Fri Feb 15 03:33:42 2002 From: andrew@anatomy.otago.ac.nz (Andrew Macgregor) Date: Fri, 15 Feb 2002 16:33:42 +1300 Subject: [Bioperl-l] Homologene again... In-Reply-To: References: Message-ID: Jason and Ewan, I'll take a look at this script and post what I come up with eventually in case it is useful for anyone. Thanks, Andrew. >No there aren't objects or parsers for this data in bioperl because >homologene it is just a cluster of LocusLink Ids and accessions. I >tried to write a basic parser for my own needs last month and sort of gave >up on the data - been happier with the InParanoid Orthologs for what I >needed in the end. > >Happy to give you what I started writing (note - I was interested in >drosophila orthologs to human so this is specific for that). > >Hope this helps at all - I realize it is not at all sophisticated - and >there are a couple of cases that it fails to parse because for some reason >the file doesn't follow the format all the way through - go figure... > >-jason From elia@fugu-sg.org Fri Feb 15 10:10:08 2002 From: elia@fugu-sg.org (Elia Stupka) Date: Fri, 15 Feb 2002 18:10:08 +0800 (SGT) Subject: [Bioperl-l] Homologene again... In-Reply-To: Message-ID: > tried to write a basic parser for my own needs last month and sort of gave > up on the data - been happier with the InParanoid Orthologs for what I > needed in the end. Hey Jason, just read your mail, was wondering if you did anything around inparanoid in the meantime, any wrappers, objects, etc.? Elia From jason@cgt.mc.duke.edu Fri Feb 15 13:47:46 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Fri, 15 Feb 2002 08:47:46 -0500 (EST) Subject: [Bioperl-l] Homologene again... In-Reply-To: Message-ID: Elia - I wrote a parser for the 5 columns data in the sqltable files that Sonnhammer group provides- so this was a little easier than wading through homologene where some genes don't have accession numbers - and automating retrieval from the locuslink ID was also not apparent to me (I guess one could download all of LocusLink). I wanted to see cDNA and protein alignments for all of the DM vs HS orthologs - so I used Bio::DB::SwissProt and pulled in the proteins. They provide a db of the protein seq (no annotation) so this was only necessary because I wanted to find the EMBL link and get the corresponding cDNA. I did a clustalw alignment of the protein (could have used needle here I guess since I want global alignments and it is pairwise -- probably what clustalw does anyways) - then built a cDNA alignment BASED on the protein align. This is done by inserting gaps in cDNA sequences based on where they are in the protein alignment and building a SimpleAlign object with these sequences. Have to make sure that we handle UTRs okay (annotation helps here if we use it) or else you end up in the wrong place. Additionally did the cDNA align with just clustal to compare (still curious if my approach works in all cases - sometimes I was not starting in the right place and still trying to debug that). For some cases the SwissProt ID had changed from the set they used - but using the web interface allowed the old ids to still work. There were a couple of cases where the SwissProt record did not point to a valid cDNA and so couldn't provide that data either. Happy to provide the script and/or check it in if you think it would be useful to where you're going. I'm not actually running any of the InParanoid analysis as I suspect you'll want to do with the Fugu stuff. InParanoid also provides Bootstrap values for the orthologs based on the trees they built for each gene - so you get a group of genes where 1 is from one species (HS) and the rest are from the other species (DM) and you only want to see the alignment of the 2 orthologs (the HS gene and the DM with bootstrap value of 100%). Easy to pick these out and do the alignments. Hope that is useful/interesting. -jason On Fri, 15 Feb 2002, Elia Stupka wrote: > > tried to write a basic parser for my own needs last month and sort of gave > > up on the data - been happier with the InParanoid Orthologs for what I > > needed in the end. > > Hey Jason, > > just read your mail, was wondering if you did anything around inparanoid > in the meantime, any wrappers, objects, etc.? > > Elia > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From b_i_osborne@hotmail.com Fri Feb 15 15:09:53 2002 From: b_i_osborne@hotmail.com (Brian Osborne) Date: Fri, 15 Feb 2002 10:09:53 -0500 Subject: [Bioperl-l] ORF FInder Message-ID: Lynn, You must be referring to the Docs Web page, and you're right, Bio::Factory::EMBOSS is not mentioned there since those pages only go as far as version 0.7.2. My guess is that this page is not created automatically, so it's not up-to-date. For more current documentation see http://doc.bioperl.org/bioperl-live/ but it looks like these pages aren't absolutely current either, but they're close and Bio::Factory::EMBOSS is mentioned (I don't see the Bio/Biblio directory, for example). For the most recent code you might want to get a CVS account and download bioperl-live. I can't vouch for every single thing in bioperl-live but the code I've used works. Plus you'll be able to contribute! Brian O. ----- Original Message ----- From: "Lynn Stevens" To: Sent: Thursday, February 14, 2002 8:04 PM Subject: Re: [Bioperl-l] ORF FInder > > Hi Brian, > I don't see Bio::Factory::EMBOSS on the BioPerl documents page. Is this new? > > > >From: "Brian Osborne" > >To: "Lynn Stevens" > >CC: > >Subject: Re: [Bioperl-l] ORF FInder > >Date: Thu, 14 Feb 2002 16:16:14 -0500 > > > >Lynn, > > > >Jason is also saying that you can use EMBOSS programs from within Bioperl. > >Here's an example using EMBOSS's getorf program : > > > >use Bio::SeqIO; > >use Bio::Factory::EMBOSS; > > > >$factory = new Bio::Factory::EMBOSS; > >$app = $factory->program("getorf"); > >%input = ( -sequence => "input.fasta", -minsize => 22, -outseq => > >"orfs.fasta" ); > >$app->run(\%input); > >$seqio = Bio::SeqIO->new(-file => "orfs.fasta"); > >$seqobj = $seqio->next; > > > >Or something.... > > > >True, it's not strictly Bioperl-ish but you have tremendous amount of > >functionality in the EMBOSS suite, and this makes it all available easily. > >My understanding is that the EMBOSS modules will return Bioperl objects > >someday, rather than just create files for you as in the example above. To > >get the positions of the ORFs you're going to have parse the > >header/description line yourself, it's provided by the Seq object's desc() > >method. > > > >I'll add this to the new FAQ, something about functionality not found in > >Bioperl might be found in EMBOSS, which is accessible through Bioperl. > > > >Brian O. > > > > > >----- Original Message ----- > >From: "Jason Stajich" > >To: "Lynn Stevens" > >Cc: > >Sent: Sunday, February 10, 2002 3:04 PM > >Subject: Re: [Bioperl-l] ORF FInder > > > > > > > Not in bioperl directly but you can use emboss's getorf program. > > > > > > On Sun, 10 Feb 2002, Lynn Stevens wrote: > > > > > > > Is there a module in BioPerl which allows you to take a sequence and get > > > > back a list of all the ORFs (or even just the largest ORF) in all six > >frames > > > > (or even just one frame) indexed by sequence position. > > > > > > > > In other words you would submit a seq object and you would get back a > >set of > > > > numbers which tell you where the ORFs are located in the sequence. > > > > > > > > I have looked through all the documentation and still can not find this > > > > feature even though it seem like an extremely common task. > > > > > > > > Thanks for any help, > > > > > > > > Lynn > > > > > > > > > > > > _________________________________________________________________ > > > > Get your FREE download of MSN Explorer at > >http://explorer.msn.com/intl.asp. > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@bioperl.org > > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > > > > > -- > > > Jason Stajich > > > Duke University > > > jason@cgt.mc.duke.edu > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l@bioperl.org > >http://bioperl.org/mailman/listinfo/bioperl-l > > From spillai2@yahoo.com Fri Feb 15 18:12:38 2002 From: spillai2@yahoo.com (Sanjeev Pillai) Date: Fri, 15 Feb 2002 10:12:38 -0800 (PST) Subject: [Bioperl-l] Sigcleave module problem Message-ID: <20020215181238.5662.qmail@web12708.mail.yahoo.com> Hi all, I'm encountering a problem when I use the Sigcleave module in one of my perl scripts (This module helps predict signal peptide cleavage regions). When I run my script that uses Sigcleave and pass on an amino acid sequence file (raw amino acid sequence data with no headers), it gives me the following error messages: Use of uninitialized value in transliteration (tr///) at /usr/local/lib/perl5/site_perl/5.6.1/Bio/Tools/Sigcleave.pm line 333. Use of uninitialized value in transliteration (tr///) at /usr/local/lib/perl5/site_perl/5.6.1/Bio/Tools/Sigcleave.pm line 450. Use of uninitialized value in concatenation (.) or string at /usr/local/lib/perl5/site_perl/5.6.1/Bio/Tools/Sigcleave.pm line 452. When I checked the module and went to the specific lines mentioned in the error messages, I find that they all do the perl transliteration on $pep. I realize that $pep is never initialized eventhough earlier in the module as part of the _Analyze function in the constructor $pep is initialized as $self->seq. So for some reason, the sequence I feed the program is never being read. I do not want to tinker with anything inside the module. I would greatly appreciate if any of you could help me out here with suggestions/modifications. I'm thinking some of you may have encountered this problem. Thanks a lot Sanjeev __________________________________________________ Do You Yahoo!? Got something to say? Say it better with Yahoo! Video Mail http://mail.yahoo.com From dag@sonsorol.org Fri Feb 15 18:38:57 2002 From: dag@sonsorol.org (chris dagdigian) Date: Fri, 15 Feb 2002 13:38:57 -0500 Subject: [Bioperl-l] Sigcleave module problem References: <20020215181238.5662.qmail@web12708.mail.yahoo.com> Message-ID: <3C6D55C1.1040301@sonsorol.org> Sanjeev- That module is very old and does not conform to any of the new bioperl standards. It will likely be removed from bioperl in the future unless someone steps up and "modernizes" it. I'm willing to take a look at your problem but it would help me greatly if you could provide me with the sequence you were using that generated the error. Regards, Chris Sanjeev Pillai wrote: > Hi all, > I'm encountering a problem when I use the Sigcleave > module in one of my perl scripts (This module helps > predict signal peptide cleavage regions). When I run > my script that uses Sigcleave and pass on an amino > acid sequence file (raw amino acid sequence data with > no headers), it gives me the following error messages: > > Use of uninitialized value in transliteration (tr///) > at > /usr/local/lib/perl5/site_perl/5.6.1/Bio/Tools/Sigcleave.pm > line 333. > Use of uninitialized value in transliteration (tr///) > at > /usr/local/lib/perl5/site_perl/5.6.1/Bio/Tools/Sigcleave.pm > line 450. > Use of uninitialized value in concatenation (.) or > string at > /usr/local/lib/perl5/site_perl/5.6.1/Bio/Tools/Sigcleave.pm > line 452. > > When I checked the module and went to the specific > lines mentioned in the error messages, I find that > they all do the perl transliteration on $pep. I > realize that $pep is never initialized eventhough > earlier in the module as part of the _Analyze function > in the constructor $pep is initialized as $self->seq. > So for some reason, the sequence I feed the program is > never being read. I do not want to tinker with > anything inside the module. > > I would greatly appreciate if any of you could help me > out here with suggestions/modifications. I'm thinking > some of you may have encountered this problem. > > Thanks a lot > Sanjeev > > __________________________________________________ > Do You Yahoo!? > Got something to say? Say it better with Yahoo! Video Mail > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Chris Dagdigian, Life Science IT & Research Computing Freelancer Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193 Yahoo IM: craffi From jason@cgt.mc.duke.edu Fri Feb 15 18:54:41 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Fri, 15 Feb 2002 13:54:41 -0500 (EST) Subject: [Bioperl-l] ORF FInder In-Reply-To: Message-ID: On Fri, 15 Feb 2002, Brian Osborne wrote: > Lynn, > > You must be referring to the Docs Web page, and you're right, > Bio::Factory::EMBOSS is not mentioned there since those pages only go as far > as version 0.7.2. My guess is that this page is not created automatically, > so it's not up-to-date. For more current documentation see > http://doc.bioperl.org/bioperl-live/ but it looks like these pages aren't > absolutely current either, but they're close and Bio::Factory::EMBOSS is > mentioned (I don't see the Bio/Biblio directory, for example). > Hmm - my cronjob must not be working - will take a look. > For the most recent code you might want to get a CVS account and download > bioperl-live. I can't vouch for every single thing in bioperl-live but the > code I've used works. Plus you'll be able to contribute! > NB - You don't have to have an account to checkout the code - see http://cvs.open-bio.org for ways to browse the code online and to check out anonymously. > Brian O. > > ----- Original Message ----- > From: "Lynn Stevens" > To: > Sent: Thursday, February 14, 2002 8:04 PM > Subject: Re: [Bioperl-l] ORF FInder > > > > > > Hi Brian, > > I don't see Bio::Factory::EMBOSS on the BioPerl documents page. Is this > new? > > > > > > >From: "Brian Osborne" > > >To: "Lynn Stevens" > > >CC: > > >Subject: Re: [Bioperl-l] ORF FInder > > >Date: Thu, 14 Feb 2002 16:16:14 -0500 > > > > > >Lynn, > > > > > >Jason is also saying that you can use EMBOSS programs from within > Bioperl. > > >Here's an example using EMBOSS's getorf program : > > > > > >use Bio::SeqIO; > > >use Bio::Factory::EMBOSS; > > > > > >$factory = new Bio::Factory::EMBOSS; > > >$app = $factory->program("getorf"); > > >%input = ( -sequence => "input.fasta", -minsize => 22, -outseq => > > >"orfs.fasta" ); > > >$app->run(\%input); > > >$seqio = Bio::SeqIO->new(-file => "orfs.fasta"); > > >$seqobj = $seqio->next; > > > > > >Or something.... > > > > > >True, it's not strictly Bioperl-ish but you have tremendous amount of > > >functionality in the EMBOSS suite, and this makes it all available > easily. > > >My understanding is that the EMBOSS modules will return Bioperl objects > > >someday, rather than just create files for you as in the example above. > To > > >get the positions of the ORFs you're going to have parse the > > >header/description line yourself, it's provided by the Seq object's > desc() > > >method. > > > > > >I'll add this to the new FAQ, something about functionality not found in > > >Bioperl might be found in EMBOSS, which is accessible through Bioperl. > > > > > >Brian O. > > > > > > > > >----- Original Message ----- > > >From: "Jason Stajich" > > >To: "Lynn Stevens" > > >Cc: > > >Sent: Sunday, February 10, 2002 3:04 PM > > >Subject: Re: [Bioperl-l] ORF FInder > > > > > > > > > > Not in bioperl directly but you can use emboss's getorf program. > > > > > > > > On Sun, 10 Feb 2002, Lynn Stevens wrote: > > > > > > > > > Is there a module in BioPerl which allows you to take a sequence and > get > > > > > back a list of all the ORFs (or even just the largest ORF) in all > six > > >frames > > > > > (or even just one frame) indexed by sequence position. > > > > > > > > > > In other words you would submit a seq object and you would get back > a > > >set of > > > > > numbers which tell you where the ORFs are located in the sequence. > > > > > > > > > > I have looked through all the documentation and still can not find > this > > > > > feature even though it seem like an extremely common task. > > > > > > > > > > Thanks for any help, > > > > > > > > > > Lynn > > > > > > > > > > > > > > > _________________________________________________________________ > > > > > Get your FREE download of MSN Explorer at > > >http://explorer.msn.com/intl.asp. > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l@bioperl.org > > > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > -- > > > > Jason Stajich > > > > Duke University > > > > jason@cgt.mc.duke.edu > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l@bioperl.org > > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > >_______________________________________________ > > >Bioperl-l mailing list > > >Bioperl-l@bioperl.org > > >http://bioperl.org/mailman/listinfo/bioperl-l > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From sac@bioperl.org Fri Feb 15 19:24:51 2002 From: sac@bioperl.org (Steve Chervitz) Date: Fri, 15 Feb 2002 11:24:51 -0800 (PST) Subject: [Bioperl-l] 1.0alpha this weekend? In-Reply-To: Message-ID: <20020215192451.39394.qmail@web13701.mail.yahoo.com> Jason Stajich wrote: > On Tue, 12 Feb 2002, Ewan Birney wrote: > > > > > > > With Peter's and Brian's documentation fixes in I would like to propose a > > 1.0alpha release this coming weekend. > > > > > > (a) Could code reviewers (myself included) review code > > > > > > (b) Jason/Mark --- are the issues with SearchIO resolved? > > > yep - he just wanted to be able to reset the iterator - (after i fixed the > silly blastn parsing bug). > > > > > (c) I would like to propose removing Bio::Tools::BLAST and replacing it > > with a module which simply throws an exception on new describing how to > > use the SearchIO system > > > yeah - and can we agree the BPlite is in the twilight - we'll plan to > provide bug fixes on BPlite but development effort will be focused on > SearchIO unless someone REALLY wants to be its maintainer. I'd prefer if we keep Bio::Tools::Blast and make a deprecation warning appear when anyone calls its new() method. The message can also point people at SearchIO. The reason not to remove it entirely is that there is still some functionality that has not been migrated to SearchIO, primarily the to_html() method. The plan is to replace this with the SearchIO::Writer system. I can make an attempt to get the to_html() stuff migrated over this weekend, but as I'm currently attending the MGED4 meeting in Boston, I can't guarantee. (If I have sufficient battery power, I may be able to get it done on the plane ;). There may be other useful tidbits in Bio::Tools::Blast still lingering. I'll need some time to check against what's in SearchIO. > > > > (d) Lincoln - you said you wanted to run all of genbank through the SeqIO > > system? > > > > > > > > > > any other thoughts out there? > > > > There are 26 bugs in the queue some of them are not going to get done in > this releae I suspect, but many of them are straightforward. Would be nice > to take a look and see what can get fixed. > > > The highlights of what is in the queue, volunteers needed to test and fix > these bugs. Your contribution could just be to provide a reproduceable > script (and datafile where needed) for the bug. > > > Cross-Platform / Execing programs > * (966) Tools::Run::Alignment modules > * Platform dependent issues (906) Mac and (1052) Windows+Clustalw. Not > sure I want to fix 1052 as suggested. > * (986) Tempfiles and cleanup - do we want to do a migratio to IO::File > from File::Temp??? > > SeqFeatures > * (992) Sub dividing a seqeunce (trunc) and remapping the > seqfeature > coordinates -- what happens to fuzzies.... we should probably > support this by making new fuzzies - this was consensus at > hackathon. > * (1038) - the Bio::SeqFeature::Gene objects may have a bug? > > SeqIO > * (876) decide if we want to do any PIR support - the module had been > updated, not sure it is completely compliant. > * (987) SCF bug which should be gone with Chad's new implementation > * (1000) - EMBL bug that is fixed on main-branch - can we remove (are we > ever going to do another 0.7 series release?) > * (1043,1062,1068,1069) genbank parsing, (1071) swissprot writing > [ I tested 1043 and it is definitely there] > Are we really writing in the new GenBank format - can we really parse > the new genbank format properly? > I also may have lost Emmanuel's bug wrt to SwissProt unless Allen fixed > it in his SwissProt. > > Misc > > * (1039) - Misc. Bio::Tools::SeqPattern bug - is it really a bug? > * (1014) - anyone use the Restriction Enzyme pkg and want to check this > out? As these were my babies, I can have a look see. I recently made some fixes and modifications to RestrictionEnzyme, so I may have this bug covered. Steve -- Steve Chervitz sac@bioperl.org > Analysis Result parsing /SearchIO > * (1034) - possible HMMer parsing bug (are we going to move HMMer parsing > into SearchIO for 1.0?) > * (1025) - BPlite parsing issues, BPlite out of memory issues (1039) - > probably due to tempfile issues with File::Temp > * (1063) - SearchIO blast parsing an empty report - may be fixed already > - just need a tester? > > > > > > > > > > > > ----------------------------------------------------------------- > > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > > . > > ----------------------------------------------------------------- > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > -- > Jason Stajich > Duke University > jason@cgt.mc.duke.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l __________________________________________________ Do You Yahoo!? Got something to say? Say it better with Yahoo! Video Mail http://mail.yahoo.com From spillai2@yahoo.com Fri Feb 15 19:29:16 2002 From: spillai2@yahoo.com (Sanjeev Pillai) Date: Fri, 15 Feb 2002 11:29:16 -0800 (PST) Subject: [Bioperl-l] Sigcleave module problem In-Reply-To: <3C6D55C1.1040301@sonsorol.org> Message-ID: <20020215192916.2176.qmail@web12705.mail.yahoo.com> Hi Chris, Thanks for responding. The amino acid sequence that I'm using is a simple raw sequence file by name seq.aa which is as follows: MLELLPTAVEGVSQAQITGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITTLVPAIAFTMYLSMLLGYGLTMVPFGGEQNPIYWARYADWLFTTPLLLLDLALLVDADQGTILALVGADGIMIGTGLVGALTKVYSYRFVWWAISTAAMLYILYVLFFGFTSKAESMRPEVASTFKVLRNVTVVLWSAYPVVWLIGSEGAGIVPLNIETLLFMVLDVSAKVGFGLILLRSRAIFGEAEAPEPSAGDGAAATSD The perl script that I wrote is also very simple, (straight from the docs site) as I was wanting to test it first. #!/usr/local/bin/perl -w use Bio::Tools::Sigcleave; $sigcleave_object = Bio::Tools::Sigcleave->new('-file'=>'seq.aa', '-threshold'=>'3.5', '-desc'=>'test sigcleave protein seq', '-type'=>'AMINO'); %raw_results = $sigcleave_object->signals; $formatted_output = $sigcleave_object->pretty_print; Thanks again for your time and attention. Regards Sanjeev --- chris dagdigian wrote: > > Sanjeev- > > That module is very old and does not conform to any > of the new bioperl > standards. It will likely be removed from bioperl in > the future unless > someone steps up and "modernizes" it. I'm willing to > take a look at your > problem but it would help me greatly if you could > provide me with the > sequence you were using that generated the error. > > Regards, > Chris > > > Sanjeev Pillai wrote: > > Hi all, > > I'm encountering a problem when I use the > Sigcleave > > module in one of my perl scripts (This module > helps > > predict signal peptide cleavage regions). When I > run > > my script that uses Sigcleave and pass on an amino > > acid sequence file (raw amino acid sequence data > with > > no headers), it gives me the following error > messages: > > > > Use of uninitialized value in transliteration > (tr///) > > at > > > /usr/local/lib/perl5/site_perl/5.6.1/Bio/Tools/Sigcleave.pm > > line 333. > > Use of uninitialized value in transliteration > (tr///) > > at > > > /usr/local/lib/perl5/site_perl/5.6.1/Bio/Tools/Sigcleave.pm > > line 450. > > Use of uninitialized value in concatenation (.) or > > string at > > > /usr/local/lib/perl5/site_perl/5.6.1/Bio/Tools/Sigcleave.pm > > line 452. > > > > When I checked the module and went to the specific > > lines mentioned in the error messages, I find that > > they all do the perl transliteration on $pep. I > > realize that $pep is never initialized eventhough > > earlier in the module as part of the _Analyze > function > > in the constructor $pep is initialized as > $self->seq. > > So for some reason, the sequence I feed the > program is > > never being read. I do not want to tinker with > > anything inside the module. > > > > I would greatly appreciate if any of you could > help me > > out here with suggestions/modifications. I'm > thinking > > some of you may have encountered this problem. > > > > Thanks a lot > > Sanjeev > > > > __________________________________________________ > > Do You Yahoo!? > > Got something to say? Say it better with Yahoo! > Video Mail > > http://mail.yahoo.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > -- > Chris Dagdigian, > Life Science IT & Research Computing Freelancer > Office: 617-666-6454, Mobile: 617-877-5498, Fax: > 425-699-0193 > Yahoo IM: craffi > __________________________________________________ Do You Yahoo!? Got something to say? Say it better with Yahoo! Video Mail http://mail.yahoo.com From birney@ebi.ac.uk Fri Feb 15 19:44:19 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Fri, 15 Feb 2002 19:44:19 +0000 (GMT) Subject: [Bioperl-l] 1.0alpha this weekend? In-Reply-To: <20020215192451.39394.qmail@web13701.mail.yahoo.com> Message-ID: On Fri, 15 Feb 2002, Steve Chervitz wrote: > > > I'd prefer if we keep Bio::Tools::Blast and make a deprecation warning > appear when anyone calls its new() method. The message can also point > people at SearchIO. > > The reason not to remove it entirely is that there is still some > functionality that has not been migrated to SearchIO, primarily the > to_html() method. The plan is to replace this with the > SearchIO::Writer system. > I am happy to force/wait for SearchIO::Writer to remove Blast. The Blast tools are our biggest maintenance headache ... > I can make an attempt to get the to_html() stuff migrated over this weekend, but as > I'm currently attending the MGED4 meeting in Boston, I can't guarantee. (If I have > sufficient battery power, I may be able to get it done on the plane ;). > I'll poke around -- commit if you have anything done and I'll pick it up. Jason and I will certainy get it done in Cape Town (right Jason?) > There may be other useful tidbits in Bio::Tools::Blast still lingering. I'll need > some time to check against what's in SearchIO. > > > > > > > > (d) Lincoln - you said you wanted to run all of genbank through the SeqIO > > > system? > > > > > > > > > > > > > > > any other thoughts out there? > > > > > > > There are 26 bugs in the queue some of them are not going to get done in > > this releae I suspect, but many of them are straightforward. Would be nice > > to take a look and see what can get fixed. > > > > > > The highlights of what is in the queue, volunteers needed to test and fix > > these bugs. Your contribution could just be to provide a reproduceable > > script (and datafile where needed) for the bug. > > > > > > Cross-Platform / Execing programs > > * (966) Tools::Run::Alignment modules > > * Platform dependent issues (906) Mac and (1052) Windows+Clustalw. Not > > sure I want to fix 1052 as suggested. > > * (986) Tempfiles and cleanup - do we want to do a migratio to IO::File > > from File::Temp??? > > > > SeqFeatures > > * (992) Sub dividing a seqeunce (trunc) and remapping the > > seqfeature > > coordinates -- what happens to fuzzies.... we should probably > > support this by making new fuzzies - this was consensus at > > hackathon. > > * (1038) - the Bio::SeqFeature::Gene objects may have a bug? > > > > SeqIO > > * (876) decide if we want to do any PIR support - the module had been > > updated, not sure it is completely compliant. > > * (987) SCF bug which should be gone with Chad's new implementation > > * (1000) - EMBL bug that is fixed on main-branch - can we remove (are we > > ever going to do another 0.7 series release?) > > * (1043,1062,1068,1069) genbank parsing, (1071) swissprot writing > > [ I tested 1043 and it is definitely there] > > Are we really writing in the new GenBank format - can we really parse > > the new genbank format properly? > > I also may have lost Emmanuel's bug wrt to SwissProt unless Allen fixed > > it in his SwissProt. > > > > Misc > > > > * (1039) - Misc. Bio::Tools::SeqPattern bug - is it really a bug? > > * (1014) - anyone use the Restriction Enzyme pkg and want to check this > > out? > > As these were my babies, I can have a look see. I recently made some fixes and > modifications to RestrictionEnzyme, so I may have this bug covered. > > Steve > -- > Steve Chervitz > sac@bioperl.org > > > Analysis Result parsing /SearchIO > > * (1034) - possible HMMer parsing bug (are we going to move HMMer parsing > > into SearchIO for 1.0?) > > * (1025) - BPlite parsing issues, BPlite out of memory issues (1039) - > > probably due to tempfile issues with File::Temp > > * (1063) - SearchIO blast parsing an empty report - may be fixed already > > - just need a tester? > > > > > > > > > > > > > > > > > > > ----------------------------------------------------------------- > > > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > > > . > > > ----------------------------------------------------------------- > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l@bioperl.org > > > http://bioperl.org/mailman/listinfo/bioperl-l > > > > > > > -- > > Jason Stajich > > Duke University > > jason@cgt.mc.duke.edu > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l@bioperl.org > > http://bioperl.org/mailman/listinfo/bioperl-l > > > __________________________________________________ > Do You Yahoo!? > Got something to say? Say it better with Yahoo! Video Mail > http://mail.yahoo.com > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From elia@fugu-sg.org Fri Feb 15 19:45:22 2002 From: elia@fugu-sg.org (Elia Stupka) Date: Sat, 16 Feb 2002 03:45:22 +0800 (SGT) Subject: [Bioperl-l] Homologene again... In-Reply-To: Message-ID: Thanks Jason, sometimes I really wish we were working in the same place, I am sure we would have done tons together by now.. Elia -- ******************************** * http://www.fugu-sg.org/~elia * * tel: +65 874 1467 * * mobile: +65 90307613 * * fax: +65 777 0402 * ******************************** From jason@cgt.mc.duke.edu Fri Feb 15 19:55:06 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Fri, 15 Feb 2002 14:55:06 -0500 (EST) Subject: [Bioperl-l] 1.0alpha this weekend? In-Reply-To: Message-ID: > > I'll poke around -- commit if you have anything done and I'll pick it > up. Jason and I will certainy get it done in Cape Town (right Jason?) > Aye-aye! Will give us a chance to get the bug count down some more too. Let's still do the alpha release this weekend to give people a chance to try it out who want to give things a whirl. -- Jason Stajich Duke University jason@cgt.mc.duke.edu From schan@xenongenetics.com Fri Feb 15 20:43:34 2002 From: schan@xenongenetics.com (Simon Chan) Date: Fri, 15 Feb 2002 12:43:34 -0800 Subject: [Bioperl-l] modifying bioperl module Message-ID: Hi All, I'm planning on modifying one of the modules for my own use. Besides making sure I have a strong background in object oriented perl, what other skills / know-how should I have. ? Never "cracked" open a module and played with its insides before, so my apologies if this quesiton seems painfully obvioius! ;-) Thanks, All. Simon ############################ From cjm@fruitfly.bdgp.berkeley.edu Fri Feb 15 23:18:58 2002 From: cjm@fruitfly.bdgp.berkeley.edu (Chris Mungall) Date: Fri, 15 Feb 2002 15:18:58 -0800 (PST) Subject: [Bioperl-l] fetching SP by protein ID Message-ID: How hard would it be to get the current Bio::DB::SwissProt to fetch by protein ID (ie the ids for proteins that are shared by genbank/embl/ddbj)? for instance, AAK68636, also retrievable like this: http://srs6.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-e+[{SWALL_SP_REMTREMBL}-prd:AAK68636] If someone promises me it's not hard I'll take it on. Right now I know nothing about the expasy interface. From Wang.Kai@mayo.edu Sat Feb 16 23:30:05 2002 From: Wang.Kai@mayo.edu (Wang, Kai) Date: Sat, 16 Feb 2002 17:30:05 -0600 Subject: [Bioperl-l] bug in genbank.pm Message-ID: <37F6069F8626D4119F22009027E409AB02003F3D@excsrv37.mayo.edu> I pointed out this problem about two months ago, but nobody changed it. The new GenBank file format add a "molecular shape" in the LOCUS line so current genbank.pm cannot process it. in the file: # $Id: genbank.pm,v 1.46 2002/02/14 16:41:22 jason Exp $ if (($2 eq 'bp') || defined($5)) { if ($4 eq 'circular') { $seq->molecule($3); $seq->is_circular($4); $seq->division($5); ($date) = $line =~ /.*(\d\d-\w\w\w-\d\d\d\d)/; } else { $seq->molecule($3); $seq->division($4); $date = $5; } } else { $seq->molecule('PRT') if($2 eq 'aa'); $seq->division($3); $date = $4; } The above code was based on the wrong assumption that NCBI will not add 'linear' tag to a record. One example is accession number 'NM_003748'. The first line is: LOCUS NM_003748 3134 bp mRNA linear PRI 01-NOV-2000 The current genbank.pm cannot recognize 01-NOV-2000. I think the best way is to use: $line =~ /^LOCUS\s+(\S+)\s+\S+\s+(bp|aa)\s+(\S+)?\s+(\S+)?\s+(\w\w\w)?\s+(\d\d-\w\w\w -\d\d\d\d)?/ From copley@embl-heidelberg.de Mon Feb 18 15:29:40 2002 From: copley@embl-heidelberg.de (Richard Copley) Date: Mon, 18 Feb 2002 16:29:40 +0100 Subject: [Bioperl-l] Sigcleave module problem References: <20020215192916.2176.qmail@web12705.mail.yahoo.com> Message-ID: <3C711DE4.9060306@embl-heidelberg.de> I think if you instantiate it with the sequence as a string, not a file, it works. $sigcleave = Bio::Tools::Sigcleave->new( -seq => 'MLELLPTAVEGVS etc", -id => 'blah' ); %sig_hash = $sigcleave->signals; Richard. Sanjeev Pillai wrote: > Hi Chris, > Thanks for responding. > > The amino acid sequence that I'm using is a simple raw > sequence file by name seq.aa which is as follows: > > MLELLPTAVEGVSQAQITGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITTLVPAIAFTMYLSMLLGYGLTMVPFGGEQNPIYWARYADWLFTTPLLLLDLALLVDADQGTILALVGADGIMIGTGLVGALTKVYSYRFVWWAISTAAMLYILYVLFFGFTSKAESMRPEVASTFKVLRNVTVVLWSAYPVVWLIGSEGAGIVPLNIETLLFMVLDVSAKVGFGLILLRSRAIFGEAEAPEPSAGDGAAATSD > > The perl script that I wrote is also very simple, > (straight from the docs site) as I was wanting to test > it first. > > #!/usr/local/bin/perl -w > > use Bio::Tools::Sigcleave; > > $sigcleave_object = > Bio::Tools::Sigcleave->new('-file'=>'seq.aa', > > '-threshold'=>'3.5', > '-desc'=>'test > sigcleave protein seq', > '-type'=>'AMINO'); > > %raw_results = $sigcleave_object->signals; > $formatted_output = $sigcleave_object->pretty_print; > > > Thanks again for your time and attention. > Regards > Sanjeev > > --- chris dagdigian wrote: > >>Sanjeev- >> >>That module is very old and does not conform to any >>of the new bioperl >>standards. It will likely be removed from bioperl in >>the future unless >>someone steps up and "modernizes" it. I'm willing to >>take a look at your >>problem but it would help me greatly if you could >>provide me with the >>sequence you were using that generated the error. >> >>Regards, >>Chris >> >> >>Sanjeev Pillai wrote: >> >>>Hi all, >>>I'm encountering a problem when I use the >>> >>Sigcleave >> >>>module in one of my perl scripts (This module >>> >>helps >> >>>predict signal peptide cleavage regions). When I >>> >>run >> >>>my script that uses Sigcleave and pass on an amino >>>acid sequence file (raw amino acid sequence data >>> >>with >> >>>no headers), it gives me the following error >>> >>messages: >> >>>Use of uninitialized value in transliteration >>> >>(tr///) >> >>>at >>> >>> > /usr/local/lib/perl5/site_perl/5.6.1/Bio/Tools/Sigcleave.pm > >>>line 333. >>>Use of uninitialized value in transliteration >>> >>(tr///) >> >>>at >>> >>> > /usr/local/lib/perl5/site_perl/5.6.1/Bio/Tools/Sigcleave.pm > >>>line 450. >>>Use of uninitialized value in concatenation (.) or >>>string at >>> >>> > /usr/local/lib/perl5/site_perl/5.6.1/Bio/Tools/Sigcleave.pm > >>>line 452. >>> >>>When I checked the module and went to the specific >>>lines mentioned in the error messages, I find that >>>they all do the perl transliteration on $pep. I >>>realize that $pep is never initialized eventhough >>>earlier in the module as part of the _Analyze >>> >>function >> >>>in the constructor $pep is initialized as >>> >>$self->seq. >> >>>So for some reason, the sequence I feed the >>> >>program is >> >>>never being read. I do not want to tinker with >>>anything inside the module. >>> >>>I would greatly appreciate if any of you could >>> >>help me >> >>>out here with suggestions/modifications. I'm >>> >>thinking >> >>>some of you may have encountered this problem. >>> >>>Thanks a lot >>>Sanjeev >>> >>>__________________________________________________ >>>Do You Yahoo!? >>>Got something to say? Say it better with Yahoo! >>> >>Video Mail >> >>>http://mail.yahoo.com >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l@bioperl.org >>>http://bioperl.org/mailman/listinfo/bioperl-l >>> >>> >> >>-- >>Chris Dagdigian, >>Life Science IT & Research Computing Freelancer >>Office: 617-666-6454, Mobile: 617-877-5498, Fax: >>425-699-0193 >>Yahoo IM: craffi >> >> > > > __________________________________________________ > Do You Yahoo!? > Got something to say? Say it better with Yahoo! Video Mail > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > > -- Richard Copley EMBL Meyerhofstr.1 69012 Heidelberg Germany Tel: +49 6221 387 534 FAX: +49 6221 387 517 From birney@ebi.ac.uk Mon Feb 18 16:56:44 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Mon, 18 Feb 2002 16:56:44 +0000 (GMT) Subject: [Bioperl-l] Bioperl 1.0alpha Message-ID: As mentioned earlier I have tar balled up Bioperl 1.0 alpha. It is available at ftp://bio.perl.org/pub/DIST/ as bioperl-1.0.alpha.tar.gz The main reason for the version numbering change is just to emphasise to everyone that we are *serious* about releasing 1.0. At this point our main focus should be documentation --- there are still some cornors to the code that need cleaning out --- I suspect alot of this will be sorted out in the second hackathon (Cape Town) where Jason, Heikki and I can bash on it (or probably more correctly on the plane to Cape Town, assumming we don't get knee-capped on the plane. Many thanks to Brian Osborne and Peter Schattner for updating/reviewing the documentation, especially from the perspective of the new comer. I feel the first introduction documentation looks really good. Can other people check this out? So --- outstanding jobs in my view (a) write Bio::SearchIO::Writer::html so we can shoot Bio::Tools::Blast (sorry Steve). (b) Review by people who should be reviewing.. (!) Doh! Then - branch - on release, list of modules to prune - I think Bio::Tools::SwissProtParser should go, as it is really waiting for an event based SeqIO framework (now... that is going to be a real shakeup - *definitely* not 1.0 stuff). Should we be releasing just bioperl-live as the 1.0 release or bioperl-all? If so, how much do we need to cross check bioperl-all? Yadda, yadda. This could get complex! ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From jason@cgt.mc.duke.edu Mon Feb 18 18:29:59 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Mon, 18 Feb 2002 13:29:59 -0500 (EST) Subject: [Bioperl-l] Bioperl 1.0alpha In-Reply-To: Message-ID: On Mon, 18 Feb 2002, Ewan Birney wrote: > > > As mentioned earlier I have tar balled up Bioperl 1.0 alpha. It is > available at > > ftp://bio.perl.org/pub/DIST/ > > as > > bioperl-1.0.alpha.tar.gz > > > > The main reason for the version numbering change is just to emphasise to > everyone that we are *serious* about releasing 1.0. At this point our main > focus should be documentation --- there are still some cornors to the code > that need cleaning out --- I suspect alot of this will be sorted out in > the second hackathon (Cape Town) where Jason, Heikki and I can bash on it > (or probably more correctly on the plane to Cape Town, assumming we don't > get knee-capped on the plane. > > > > > Many thanks to Brian Osborne and Peter Schattner for updating/reviewing > the documentation, especially from the perspective of the new comer. I > feel the first introduction documentation looks really good. Can other > people check this out? > Ditto! I also started the FAQ to address some typical questions that come up. Please add more to it if you have any good suggestions. > > > So --- outstanding jobs in my view > > (a) write Bio::SearchIO::Writer::html so we can shoot Bio::Tools::Blast > (sorry Steve). > > (b) Review by people who should be reviewing.. (!) Doh! > uh, heh. Yah. Here is the basic gist of what I am concerned about: I'm still a little concerned about seqfeatures that should be expanded versus the Split Location model. This is basically when someone calls: $feat->add_sub_SeqFeature($subfeat,'EXPAND'); So with the invention of the Bio::Location::Split system for handling multiple locations, so I'm not sure we do the right thing for locations versus sub features. Gotta dig some more and come up with appropriate tests. This also relates to working with locations and when we call trim on a sequence and need to renumber the coordinates for the features on the subsequence. Need to develop the tests for this so we can be sure we're really acting properly (pretty sure we're not). > > > Then - branch > > - on release, list of modules to prune - I think > Bio::Tools::SwissProtParser should go, as it is really waiting for an > event based SeqIO framework (now... that is going to be a real shakeup - > *definitely* not 1.0 stuff). > Okay - let's also remove Bio::Tools::Fasta (there is no test for this and it is replaced by Bio::SearchIO) Not sure if we want to release with the Biblio objects included? Should we remove from a 1.0 branch? > > > Should we be releasing just bioperl-live as the 1.0 release or > bioperl-all? If so, how much do we need to cross check bioperl-all? Yadda, > yadda. This could get complex! > db has certainly been cross-checked against 1.0 (although it is probably going to be in flux till at least the hackathon if not later). The gui was tested with Mark's stuff (ignoring the MapViewer code I dumped in there from a contributor). ext is not part of the bioperl_all since we have been talking about phasing it out - but perhaps it should be? The CORBA code *should* be in working order by the end of the hackathon ... sooo I say let's try for the whole thing. > > > ----------------------------------------------------------------- > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 > . > ----------------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From heikki@ebi.ac.uk Mon Feb 18 19:11:38 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Mon, 18 Feb 2002 19:11:38 +0000 Subject: [Bioperl-l] Bioperl 1.0alpha References: Message-ID: <3C7151EA.B9AAF493@ebi.ac.uk> > > (b) Review by people who should be reviewing.. (!) Doh! > > > uh, heh. Yah. > Here is the basic gist of what I am concerned about: > I'm still a little concerned about seqfeatures that should > be expanded versus the Split Location model. This is basically when > someone calls: > $feat->add_sub_SeqFeature($subfeat,'EXPAND'); > > So with the invention of the Bio::Location::Split system for handling > multiple locations, so I'm not sure we do the right thing for locations > versus sub features. Gotta dig some more and come up with appropriate > tests. > > This also relates to working with locations and when we call trim on a > sequence and need to renumber the coordinates for the features on the > subsequence. Need to develop the tests for this so we can be sure we're > really acting properly (pretty sure we're not). I'd like to get refseq/NT/genbank logic into GEnBank.pm, but I can not spare any time before the hackathon. > > > > > > Then - branch > > > > - on release, list of modules to prune - I think > > Bio::Tools::SwissProtParser should go, as it is really waiting for an > > event based SeqIO framework (now... that is going to be a real shakeup - > > *definitely* not 1.0 stuff). > > > Okay - let's also remove > Bio::Tools::Fasta (there is no test for this and it is replaced by > Bio::SearchIO) > > Not sure if we want to release with the Biblio objects included? > Should we remove from a 1.0 branch? Martin has been working on it. Let's see at the hackathon what is its status then. If we can have XML parser in there, I can easily add Bio::DB::Medline and then it might be worth letting it stay. Aw, I promised to remove the hydrophobicity special meaning from B & Z characters in Bio::Tools::SeqPattern, I have not done it. I'll doit at the hackathon unless someone else beats me at it. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From jason@cgt.mc.duke.edu Mon Feb 18 22:55:09 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Mon, 18 Feb 2002 17:55:09 -0500 (EST) Subject: [Bioperl-l] local emboss Message-ID: I've added some code Bio::Tools::Run::EMBOSSApplication so that one doesn't have to explicitly write out a sequence to a file in order to run an emboss app. This works wherever you pass in a Bio::PrimarySeqI or array of Bio::PrimarySeqI to an EMBOSS application handle. See the following for an example So the following use Bio::Factory::EMBOSS; use Bio::SeqIO; use Bio::AlignIO; my $factory = new Bio::Factory::EMBOSS(); my $water = $factory->program('water'); my @seqs_to_check; # assume this is a list of seqs my $seq_to_test; # assume this is a single seq to eval my $waterout = 'out.water'; # all of this -- my $outseq = new Bio::SeqIO(-file => ">seqfile1"); $outseq->write_seq($seq_to_test); $outseq= new Bio::SeqIO(-file => ">seqdb"); foreach my $seq ( @seqs_to_check ) { $outseq->write_seq($seq); } $outseq->close(); $water->run({ '-sequencea' => 'seqfile1', '-seqall' => 'seqdb', '-gapopen' => '10.0', '-gapextend' => '0.5', '-outfile' => $wateroutfile}); #-- # can be replaced with this $water->run({ '-sequencea' => $seq_to_test, '-seqall' => \@seqs_to_check, '-gapopen' => '10.0', '-gapextend' => '0.5', '-outfile' => $wateroutfile}); #-- Additionally you can read in the alignment output my $alnin = new Bio::AlignIO(-format => 'emboss', -file => $wateroutfile); while( my $aln = $alnin->next_aln ) { # process the alignment } This is not in Alpha release but will make it in next push out. -jason -- Jason Stajich Duke University jason@cgt.mc.duke.edu From pel@bioanalyte.com Mon Feb 18 23:38:14 2002 From: pel@bioanalyte.com (Peter Leopold) Date: Mon, 18 Feb 2002 18:38:14 -0500 Subject: [Bioperl-l] testable XML::Writer whereabouts Message-ID: <3C719066.42680ACF@mediaone.net> I'm trying to build 1.0alpha on RedHat 7.1, perl 5.6.0, but I can't find a usable version of XML::Writer. In bioperl-1.0alpha build directory: # perl Makefile.PL Generated sub tests. go make show_tests to see available subtests External Module XML::Writer, Parsing + writing of XML documents, is not installed on this computer. The Bio::SeqIO::game,Bio::Variation::* in Bioperl needs it for Bio::Variation code, GAME parser A CPAN search of "XML::Writer" gives XML-Writer-0.4 by David Megginson Released April 2000 XML::Writer Perl extension for writing XML documents. While it builds without error it dumps core during 'make test'. Any thoughts? Peter Leopold From jason@cgt.mc.duke.edu Tue Feb 19 02:26:01 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Mon, 18 Feb 2002 21:26:01 -0500 (EST) Subject: [Bioperl-l] testable XML::Writer whereabouts In-Reply-To: <3C719066.42680ACF@mediaone.net> Message-ID: I had the same problem - just do a make install anyways. We should contact the XML::Writer author and see what is up. -jason On Mon, 18 Feb 2002, Peter Leopold wrote: > I'm trying to build 1.0alpha on RedHat 7.1, perl 5.6.0, > but I can't find a usable version of XML::Writer. > > In bioperl-1.0alpha build directory: > # perl Makefile.PL > Generated sub tests. go make show_tests to see available subtests > External Module XML::Writer, Parsing + writing of XML documents, > is not installed on this computer. > The Bio::SeqIO::game,Bio::Variation::* in Bioperl needs it for > Bio::Variation code, GAME parser > > A CPAN search of "XML::Writer" gives > XML-Writer-0.4 by David Megginson Released April 2000 > XML::Writer Perl extension for writing XML documents. > While it builds without error it dumps core during 'make test'. > Any thoughts? > > Peter Leopold > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From hilmarl@yahoo.com Tue Feb 19 06:45:39 2002 From: hilmarl@yahoo.com (Hilmar Lapp) Date: Mon, 18 Feb 2002 22:45:39 -0800 Subject: [Bioperl-l] Bioperl 1.0alpha References: Message-ID: <3C71F493.86851FED@yahoo.com> Jason Stajich wrote: > > > > > > > Should we be releasing just bioperl-live as the 1.0 release or > > bioperl-all? If so, how much do we need to cross check bioperl-all? Yadda, > > yadda. This could get complex! > > > db has certainly been cross-checked against 1.0 (although it is probably > going to be in flux till at least the hackathon if not later). > > The gui was tested with Mark's stuff (ignoring the MapViewer code I dumped > in there from a contributor). > > ext is not part of the bioperl_all since we have > been talking about phasing it out - but perhaps it should be? > > The CORBA code *should* be in working order by the end of the hackathon > ... sooo I say let's try for the whole thing. > I wouldn't do that. First, the other modules are beautiful and useful enough on their own, second, the same holds for bioperl, especially for the 1.0 celebration, and third, why give things more chances to be broken than you need to. -hilmar (finishing his Feb posting to Bioperl ;) -- ----------------------------------------------------------------- Hilmar Lapp email: hilmarl@yahoo.com San Diego, Ca. 92130 phone: +1 858 812 1757 ----------------------------------------------------------------- _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com From Guoneng.Zhong@med.nyu.edu Tue Feb 19 14:25:03 2002 From: Guoneng.Zhong@med.nyu.edu (Guoneng Zhong) Date: Tue, 19 Feb 2002 09:25:03 -0500 Subject: [Bioperl-l] alignment and assembly Message-ID: <76A1D86A-2544-11D6-9F6F-0050E41E5C1B@med.nyu.edu> Hi, I am trying to write an alignment function and an assembly function for dna sequences. I looked at the the Bio::Align modules and am not sure how they can help me. I am used to seeing two sequences align with bars indicating that nucleotide on the top sequence matches with that in the lower sequence. Is there a tutorial or example like that on the site? As for assembly, is there something like the TIGR Assembler that allows me to input two sequences and get hopefully one sequence out? (Obviously, I an make command line calls to TIGR Assembler, but perhaps a Bioperl interface?). Thanks, Guoneng From Guoneng.Zhong@med.nyu.edu Tue Feb 19 14:34:37 2002 From: Guoneng.Zhong@med.nyu.edu (Guoneng Zhong) Date: Tue, 19 Feb 2002 09:34:37 -0500 Subject: [Bioperl-l] how not to use Files when file handlers are required Message-ID: --Apple-Mail-1--685143998 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed Hi, I am starting to use Bioperl and I notice that often files are used as keepers of sequences and reports. Either a file name is supplied as input or some file handler is required for input or output. Case in point is the following example I copied from the tutorial: use Bio::Tools::pSW; $factory = new Bio::Tools::pSW( '-matrix' => 'blosum62.bla', '-gap' => 12, '-ext' => 2, ); $factory->align_and_show($seq1, $seq2, STDOUT); $aln = $factory->pairwise_alignment($seq1, $seq2); My problem is that I don't work with files very much; I get my sequence either from someone's website or from a database. So in the above example, how would I supply the matrix as a string if I don't have a file? Do I have to write the string to a file first and then tell it? Same for the STDOUT; what if I want to use a String instead? I come from a java background, and it has a String IO Stream reader/writer. I wonder in Perl there is something like that or if Bioperl accommodates for this. Thanks, Guoneng --Apple-Mail-1--685143998 Content-Transfer-Encoding: 7bit Content-Type: text/enriched; charset=US-ASCII Hi, I am starting to use Bioperl and I notice that often files are used as keepers of sequences and reports. Either a file name is supplied as input or some file handler is required for input or output. Case in point is the following example I copied from the tutorial: Courier Newuse Bio::Tools::pSW; $factory = new Bio::Tools::pSW( '-matrix' => 'blosum62.bla', '-gap' => 12, '-ext' => 2, ); $factory->align_and_show($seq1, $seq2, STDOUT); $aln = $factory->pairwise_alignment($seq1, $seq2); My problem is that I don't work with files very much; I get my sequence either from someone's website or from a database. So in the above example, how would I supply the matrix as a string if I don't have a file? Do I have to write the string to a file first and then tell it? Same for the STDOUT; what if I want to use a String instead? I come from a java background, and it has a String IO Stream reader/writer. I wonder in Perl there is something like that or if Bioperl accommodates for this. Thanks, Guoneng --Apple-Mail-1--685143998-- From AKarger@CuraGen.com Tue Feb 19 14:49:08 2002 From: AKarger@CuraGen.com (Karger, Amir) Date: Tue, 19 Feb 2002 09:49:08 -0500 Subject: [Bioperl-l] HomoloGene Message-ID: <715118539AC7D311A0DF009027DE6E1904CBF97E@mli8b5065ffbrad.curagen.com> Just FYI for people who are trying to use HomoloGene: as Jason suggested in his message on the 14th, the HomoloGene file is confused. In particular, I was looking at the hmlg.ftp file (not the triplet file Jason parsed). It turns out that -- at least in the version from January 11 -- there are a bunch of lines where the LocusLink ID is in the eighth column instead of the seventh. (The majority of lines have it in the seventh column where the README says it should be. I can't tell if this is better or worse.) The eighth column is supposed to be Unigene identifiers, which are just numbers, (since the species is in the second column) so "LL" should never be found in that column. >perl -wan -F'\|' -e 'next unless $#F; print "$. $F[7]\n" if $F[7]=~/LL/;' hmlg.ftp 23 LL.42489 27 LL.41094 40 LL.42919 57 LL.38254 69 LL.42058 83 LL.31837 158 LL.37503 (etc.) I wrote to the folks at NCBI but haven't gotten a response. Amir Karger CuraGen Corporation LEGAL NOTICE - Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this e-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately. From Guoneng.Zhong@med.nyu.edu Tue Feb 19 15:45:21 2002 From: Guoneng.Zhong@med.nyu.edu (Guoneng Zhong) Date: Tue, 19 Feb 2002 10:45:21 -0500 Subject: [Bioperl-l] alignment Message-ID: Hi, Has anyone done a simple pairwise alignment on DNA strands? The examples in the docs and tutorial are about protein alignment using the Smith-Waterman algorithm. I tried to assign a dna matrix to the "-matrix" parameter and it kept telling me that my sequences are not proteins. Here is the code snippet: $seq1 = Bio::Seq->new (-id=>"seq1",-seq=>"AATTATATAATATATCTCTCCTCTTGCTCTC"); $seq2 = Bio::Seq->new (-id=>"seq2",-seq=>"AATTATATAATATATGCCTCCCCCTTACTCTC"); #$factory = new Bio::Tools::pSW('-matrix'=>'NUC4X4HB.MNT'); $factory = new Bio::Tools::pSW('-matrix'=>'nu.bla'); $aln = $factory->pairwise_alignment($seq1,$seq2); foreach $seq ($aln->eachSeq()){ print "$seq\n"; } Any idea why? Or does this module work only for proteins? Thanks, G From dag@sonsorol.org Tue Feb 19 15:50:10 2002 From: dag@sonsorol.org (chris dagdigian) Date: Tue, 19 Feb 2002 10:50:10 -0500 Subject: [Bioperl-l] [housekeeping note] experimental changes to bioperl-l and biojava-l list configuration Message-ID: <3C727432.5050507@sonsorol.org> Hi folks, The Open Bioinformatics Foudation's recent subscription to the RBL+ list (http://mail-abuse.org/rbl+/) has done a great job at seriously cutting down the amount of spam that leaks onto our mailing lists. It does not however, protect us from virus-laden email messages as the members of biojava-l have found out on multiple ocasions. We are eventually going to deploy antivirus scanning on all of our inbound and outbound email but until that happens we need an interim solution. Typically the way that most large mailing lists handle this is to employ a "no attachments" policy. All attachments are either stripped at the MTA level or converted into plaintext by external helper applications. The side effect of this is that it also removes HTML-email which 90% of the time is spam anyway. The feedback from people I asked about doing this on our server was that it could be "too drastic". Instead of stripping anything MIME-encoded I've made some experimental changes to 2 of our largest lists: biojava-l and bioperl-l. What I've done is configured the lists to "hold" messages that contain suspect "content-type:" fields. What this mean is that messages won't be "stripped" but they will be blocked and held for moderator attention. Anything that is spam or suspicious will get blown away by an OBF mailteam volunteer. Messages that look OK will get passed on to the list. This is the best compromise I can come up with between "stripping everything" and converting the lists to 100% moderated forums. One side effect is that our "hold" patterns are going to block HTML messages wich is probably a good thing. Another side effect is that innocent messages may get held up or delayed as they wait for moderation. This is mostly unavoidable. For those that care, here are the patterns we are trying to use to hold suspect message: Content-Type: .*multipart Content-Type: .*mixed Content-Type: .*rich As a general rule to avoid having your emails held for approval people may wish to keep the following in mind: (1) Be polite to text-only email readers; don't send HTML messages to the list. (2) Don't send file attachments; post URLs or links within your message Feedback directly to me or to mailteam@open-bio.org is welcome. I'll let people know how this experiment goes - most likely in our next newsletter scheduled for mid-March. Regards, Chris -- Chris Dagdigian, Life Science IT & Research Computing Freelancer Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193 PGP KeyID: 83D4310E Yahoo IM: craffi From jason@cgt.mc.duke.edu Tue Feb 19 16:25:13 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Tue, 19 Feb 2002 11:25:13 -0500 (EST) Subject: [Bioperl-l] alignment In-Reply-To: Message-ID: You can't really use the pSW for this - we've covered this on the list recently. We'll try and move these nuggets of info out to the tutorial and FAQ as time permits. I'd suggest moving toward the emboss 'water' tool for smith-waterman alignments rather than Ewan's pSW. See my recent post: http://bioperl.org/pipermail/bioperl-l/2002-February/007272.html On Tue, 19 Feb 2002, Guoneng Zhong wrote: > Hi, > Has anyone done a simple pairwise alignment on DNA strands? The > examples in the docs and tutorial are about protein alignment using the > Smith-Waterman algorithm. I tried to assign a dna matrix to the > "-matrix" parameter and it kept telling me that my sequences are not > proteins. Here is the code snippet: > > $seq1 = Bio::Seq->new > (-id=>"seq1",-seq=>"AATTATATAATATATCTCTCCTCTTGCTCTC"); > $seq2 = Bio::Seq->new > (-id=>"seq2",-seq=>"AATTATATAATATATGCCTCCCCCTTACTCTC"); > #$factory = new Bio::Tools::pSW('-matrix'=>'NUC4X4HB.MNT'); > $factory = new Bio::Tools::pSW('-matrix'=>'nu.bla'); > $aln = $factory->pairwise_alignment($seq1,$seq2); > foreach $seq ($aln->eachSeq()){ > print "$seq\n"; > } > > Any idea why? Or does this module work only for proteins? > yep. > Thanks, > G > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From jason@cgt.mc.duke.edu Tue Feb 19 16:27:21 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Tue, 19 Feb 2002 11:27:21 -0500 (EST) Subject: [Bioperl-l] alignment and assembly In-Reply-To: <76A1D86A-2544-11D6-9F6F-0050E41E5C1B@med.nyu.edu> Message-ID: On Tue, 19 Feb 2002, Guoneng Zhong wrote: > Hi, > I am trying to write an alignment function and an assembly function for > dna sequences. I looked at the the Bio::Align modules and am not sure > how they can help me. I am used to seeing two sequences align with bars > indicating that nucleotide on the top sequence matches with that in the > lower sequence. Is there a tutorial or example like that on the site? > See the Bio::AlignIO for how to read in alignments from files. We don't interface with phrap or the tigr assembler at this point. Happy to see someone design the appropriate objects that extend the Bio::SimpleAlign object (via the Bio::Align::AlignI interface) to handle assemblies. We're happy to help with the object design if you lay out your plan. > As for assembly, is there something like the TIGR Assembler that allows > me to input two sequences and get hopefully one sequence out? > (Obviously, I an make command line calls to TIGR Assembler, but perhaps > a Bioperl interface?). > > Thanks, > Guoneng > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From jason@cgt.mc.duke.edu Tue Feb 19 16:32:29 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Tue, 19 Feb 2002 11:32:29 -0500 (EST) Subject: [Bioperl-l] how not to use Files when file handlers are required In-Reply-To: Message-ID: On Tue, 19 Feb 2002, Guoneng Zhong wrote: > Hi, > I am starting to use Bioperl and I notice that often files are used as > keepers of sequences and reports. Either a file name is supplied as As is the case in most of bioinformatics right now. > input or some file handler is required for input or output. Case in > point is the following example I copied from the tutorial: > > use Bio::Tools::pSW; > $factory = new Bio::Tools::pSW( '-matrix' => 'blosum62.bla', > '-gap' => 12, > '-ext' => 2, ); > $factory->align_and_show($seq1, $seq2, STDOUT); > $aln = $factory->pairwise_alignment($seq1, $seq2); > All the alignment algorithms are implemented in a more efficient language than perl so we don't actually do any alignments in perl other than with the pSW module which is through XS. So you need to have a logical transfer of data - files are the primary mechanism for this. You can do what you want with the EMBOSS interface and the 'water' program much better and I've just written the helper function which doesn't require you to explicitly dump the seq to a file (it is done behind the scenes). Please see the previous messages in the list archive. Note that the sequence data still has to be dumped to a file because that is where emboss programs read it from. > My problem is that I don't work with files very much; I get my sequence > either from someone's website or from a database. So in the above > example, how would I supply the matrix as a string if I don't have a > file? Do I have to write the string to a file first and then tell it? > Same for the STDOUT; what if I want to use a String instead? I come > from a java background, and it has a String IO Stream reader/writer. I > wonder in Perl there is something like that or if Bioperl accommodates > for this. > > Thanks, > Guoneng > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From birney@ebi.ac.uk Tue Feb 19 16:59:02 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Tue, 19 Feb 2002 16:59:02 +0000 (GMT) Subject: [Bioperl-l] alignment In-Reply-To: Message-ID: On Tue, 19 Feb 2002, Guoneng Zhong wrote: > Hi, > Has anyone done a simple pairwise alignment on DNA strands? The > examples in the docs and tutorial are about protein alignment using the > Smith-Waterman algorithm. I tried to assign a dna matrix to the > "-matrix" parameter and it kept telling me that my sequences are not > proteins. Here is the code snippet: > > $seq1 = Bio::Seq->new > (-id=>"seq1",-seq=>"AATTATATAATATATCTCTCCTCTTGCTCTC"); > $seq2 = Bio::Seq->new > (-id=>"seq2",-seq=>"AATTATATAATATATGCCTCCCCCTTACTCTC"); > #$factory = new Bio::Tools::pSW('-matrix'=>'NUC4X4HB.MNT'); > $factory = new Bio::Tools::pSW('-matrix'=>'nu.bla'); > $aln = $factory->pairwise_alignment($seq1,$seq2); > foreach $seq ($aln->eachSeq()){ > print "$seq\n"; > } > > Any idea why? Or does this module work only for proteins? it (stupidly) only works for proteins. This is fixable, but requires someone to get into the extension layer and build the right hooks. Try setting the type explicitly to "protein" to fox it ;) > > Thanks, > G > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From rfsouza@citri.iq.usp.br Tue Feb 19 19:13:16 2002 From: rfsouza@citri.iq.usp.br (Robson Francisco de Souza) Date: Tue, 19 Feb 2002 17:13:16 -0200 (BRST) Subject: [Bioperl-l] alignment and assembly In-Reply-To: Message-ID: Hi everyone, Concerning Jason's message on extending Bio::SimpleAlign to handle assemblies... I have not yet started coding bioperl modules, but I'm very interested in designing and writing an assembly object/interface. Problem is I don't know exactly were to start from. I started reading bioperl's tutorial and biodesign documentation yesterday, but I'm afraid I do not know enough perl's OOP, although I've written a few modules of my own (without inheritance, which I do not understand fully yet :/). Anyway, as I wrote a few months ago to the list and to Chad Matsalla, I implemented a module that loads phrap output (both ACE and phrap.out files) into an hierarchical hash structure inside a separate module and namespace (which I called Assembly). Every time a user wants to access phrap data, it creates an Assembly object, loads the file and access the data through the interface I defined (which, by the way, is awful). Now, where do you guys think I should start from? A concern that I have is how do I store assembly data (which often is quite huge) in to the modules memory? I'm not sure an hierarchy like this one, which I used in my module, is adequate: assembly -> assembly data (# of contigs, etc) -> clone data (inferred from read locations and name or loaded from phrap.out) -> contig -> contig data (sequence, quality, # of reads, etc) -> read or sequence -> read data (sequence, quality, etc) is the most appropriate data structure, because many times a user may ask which contig was assembled with read XXX or, conversely, wich reads or sequence fall between positions S and E in contig M. There is also a problem concerning were to store the huge amount of features an assembly may have (lists of poor quality regions, a description (scaffold) of how different contigs must be positioned in relation to one another to build a greater contig). Maybe, I should split such data among several classes, like Bio::Assembly::Assembly (to hold assembly data), Bio::Assembly::Contig, Bio::Assembly::Analysis (mainly methods used to get information out of a loaded assembly, like Consed's low quality regions list or high quality discrepancies. Another concern I have, since I've only experience with the phred/phrap/consed package, would be to keep an Assembly object (whether built on top of Bio::SimpleAlign or not) free from the particularities of the phrap program. Does anyone know of a general implementation of sequence assembly objects, independent of the assembly program? In biojava maybe? Anyway, I'm starting to design my implementation and would appreciate any help.comments/suggestions from you. Best regards, Robson > See the Bio::AlignIO for how to read in alignments from files. We don't > interface with phrap or the tigr assembler at this point. Happy to see > someone design the appropriate objects that extend the Bio::SimpleAlign > object (via the Bio::Align::AlignI interface) to handle assemblies. We're > happy to help with the object design if you lay out your plan. > > -- > Jason Stajich > Duke University > jason@cgt.mc.duke.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From andrew@anatomy.otago.ac.nz Tue Feb 19 20:26:33 2002 From: andrew@anatomy.otago.ac.nz (Andrew Macgregor) Date: Wed, 20 Feb 2002 09:26:33 +1300 Subject: [Bioperl-l] HomoloGene In-Reply-To: <715118539AC7D311A0DF009027DE6E1904CBF97E@mli8b5065ffbrad.curagen.com> References: <715118539AC7D311A0DF009027DE6E1904CBF97E@mli8b5065ffbrad.curagen.com> Message-ID: Hi Amir, This is because the eighth column is meant to have a unigene as you say but Dm doesn't have unigene numbers assigned to it so they use the LocusLink ID instead. So it is most definitely possible to find LL in the eighth column. I'll be posting my parsing effort soon. Cheers, Andrew. >Just FYI for people who are trying to use HomoloGene: as Jason suggested in >his message on the 14th, the HomoloGene file is confused. In particular, I >was looking at the hmlg.ftp file (not the triplet file Jason parsed). It >turns out that -- at least in the version from January 11 -- there are a >bunch of lines where the LocusLink ID is in the eighth column instead of the >seventh. (The majority of lines have it in the seventh column where the >README says it should be. I can't tell if this is better or worse.) The >eighth column is supposed to be Unigene identifiers, which are just numbers, >(since the species is in the second column) so "LL" should never be found in >that column. > >>perl -wan -F'\|' -e 'next unless $#F; print "$. $F[7]\n" if $F[7]=~/LL/;' >hmlg.ftp >23 LL.42489 >27 LL.41094 >40 LL.42919 >57 LL.38254 >69 LL.42058 >83 LL.31837 >158 LL.37503 > >(etc.) > >I wrote to the folks at NCBI but haven't gotten a response. > >Amir Karger >CuraGen Corporation From AKarger@CuraGen.com Tue Feb 19 21:40:09 2002 From: AKarger@CuraGen.com (Karger, Amir) Date: Tue, 19 Feb 2002 16:40:09 -0500 Subject: [Bioperl-l] HomoloGene Message-ID: <715118539AC7D311A0DF009027DE6E1904CBF985@mli8b5065ffbrad.curagen.com> > -----Original Message----- > From: Andrew Macgregor [mailto:andrew@anatomy.otago.ac.nz] > > This is because the eighth column is meant to have a unigene as you > say but Dm doesn't have unigene numbers assigned to it so they use > the LocusLink ID instead. So it is most definitely possible to find > LL in the eighth column. I'm not satisfied with this answer. (1) There are a whole lot of records for which they don't include a Unigene ID. For example, a whole bunch of curated links, like: Hs|Mm|c|LL.1387 |23598| |LL.12914| | | http://www.informatics.jax.org/searches/accession_report.cgi?id=MGI:1098280 They still put the LL in the seventh column in this example,and left the Unigene column blank. (2 and much more important) Whichever way it's done, why doesn't the README say so?! The README's "-The seventh(LL), eighth(UG), and ninth(Accession) fields correspond to the second organism." is totally misleading, even if they're putting the LL in column 8 on purpose. > > Cheers, Andrew. -Amir LEGAL NOTICE - Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this e-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately. From andrew@anatomy.otago.ac.nz Tue Feb 19 21:40:50 2002 From: andrew@anatomy.otago.ac.nz (Andrew Macgregor) Date: Wed, 20 Feb 2002 10:40:50 +1300 Subject: [Bioperl-l] Homologene parser... Message-ID: In case this is useful to anyone on the list, this is what I have come up with to parse the homologene file hmlg.trip.ftp. Having attended Damian Conway's one day tutorial at the O'Reilly Bioinformatics conference I was pretty keen to try out Parse::RecDescent, so it uses that - I wasn't disappointed. I'm pretty sure this works, it seems to parse the entire file without any problems. At the moment the script below simply prints out what it parses. To be useful you need to replace the action parts of the grammar with whatever you want to do with the data. I don't think the "...end_of_record" is needed now - I was originally passing the parser the entire file. The script only works on the triplet file but it is pretty easy to adapt the grammar to work on the hmlg.ftp file as well (i.e remove delimiter, title, change how text is feed to parser). I'm pretty sure this works fine, but I haven't had time to really check it so use with care. I'm keen to get any feedback at all, including perceived merits, demerits of using Parse::RecDescent for this sort of thing. I'm not sure I can see where this would fit into bioperl apart perhaps from scripts central, but if someone does (Jason, Ewan?) and wants to point me in the right direction I could work on something. Cheers, Andrew. #!/usr/bin/perl -W # Homologene (hmlg.trip.ftp) parser # Andrew Macgregor andrew.macgregor@stonebow.otago.ac.nz # parser only tested against hmlg.trip.ftp # provided "as is" without any warranty of any kind use strict; use Parse::RecDescent; my $grammar = q { record : delimiter ortholog(s) title(s) ...end_of_record | delimiter : /^>/ {print ">\n"; } ortholog : organism1 "|" organism2 "|" similarity_type "|" locuslink_id_org1(?) "|" unigene_id_org1(?) "|" accession_org1(?) "|" locuslink_id_org2(?) "|" unigene_id_org2(?) "|" accession_org2(?) "|" percentage(?) { print "$item{organism1}|"; print "$item{organism2}|"; print "$item{similarity_type}|"; print "@{$item{locuslink_id_org1}}|"; print "@{$item{unigene_id_org1}}|"; print "@{$item{accession_org1}}|"; print "@{$item{locuslink_id_org2}}|"; print "@{$item{unigene_id_org2}}|"; print "@{$item{accession_org2}}|"; print "@{$item{percentage}}\n"; } title : "TITLE" unigene "=" gene_symbol description(?) { print "TITLE $item{unigene}=$item{gene_symbol}\t@{$item{description}}\n"; } end_of_record : /\Z/ organism1 : organism organism2 : organism similarity_type : /t|f|b|B|c/ locuslink_id_org1 : locuslink_id unigene_id_org1 : unigene_id accession_org1 : accession locuslink_id_org2 : locuslink_id unigene_id_org2 : unigene_id accession_org2 : accession percentage : /.+/ unigene : organism "." unigene_id { $return = "$item{organism}.$item{unigene_id}" } | "Dm." locuslink_id | locuslink_id gene_symbol : /[\w-]+/ description : /.+/ organism : /At|Bt|Dm|Dr|Hs|Hv|Mm|Os|Rn|Ta|Xl|Zm/ locuslink_id : /LL.[0-9]+/ unigene_id : /[0-9]+/ | locuslink_id accession : /\w+/ }; my $parser = new Parse::RecDescent ($grammar); open (HOMOLOGENE, "hmlg.trip.ftp") or die "Can't open hmlg.trip.ftp: $!"; # read from the homologene file building up a record then passing it to the parser my ($record, $complete); while (my $text = ) { if ($text =~ /^>/) { $parser->record($record) if defined $complete; $complete = 1; $record = ""; $record .= $text; } else { $record .= $text } } $parser->record($record) if defined $complete; # takes care of the last record From andrew@anatomy.otago.ac.nz Tue Feb 19 22:39:06 2002 From: andrew@anatomy.otago.ac.nz (Andrew Macgregor) Date: Wed, 20 Feb 2002 11:39:06 +1300 Subject: [Bioperl-l] HomoloGene In-Reply-To: <715118539AC7D311A0DF009027DE6E1904CBF985@mli8b5065ffbrad.curagen.com> References: <715118539AC7D311A0DF009027DE6E1904CBF985@mli8b5065ffbrad.curagen.com> Message-ID: Hi Amir, I'm just going by the readme. It looks to me that the unigene field in either the first or the second organism can be blank. >-The fourth (LocusLink ID), fifth (UniGene ID), and sixth (Accession >number) fields correspond to the first organism. One or both of UG ID >and LL ID may be present. Locus Link and UniGene are in one-to-one >correspondence in the latter case, so no ambiguity arises through the >choice of set identifier. >-The seventh(LL), eighth(UG), and ninth(Accession) fields correspond >to the second organism. This is the case in the example you give. It has LL and unigene for org 1 but no accession number then LL but no unigene number or accession number for org 2. Cheers, Andrew. Amir wrote: >I'm not satisfied with this answer. > >(1) There are a whole lot of records for which they don't include a Unigene >ID. For example, a whole bunch of curated links, like: > >Hs|Mm|c|LL.1387 |23598| |LL.12914| | | >http://www.informatics.jax.org/searches/accession_report.cgi?id=MGI:1098280 > >They still put the LL in the seventh column in this example,and left the >Unigene column blank. > From desmond@imcb.nus.edu.sg Wed Feb 20 04:49:18 2002 From: desmond@imcb.nus.edu.sg (Desmond Lim) Date: Wed, 20 Feb 2002 12:49:18 +0800 Subject: [Bioperl-l] Motif finding Message-ID: <3C739B4E.4555.1A0C1B66@localhost> Is there a way of finding motifs in a strand of DNA? The thing is, I want to find exact matches and some fuzzy ones (i.e 80% exact). Is there a perl module to do it? Thanks. Desmond From wang@cshl.org Wed Feb 20 05:45:22 2002 From: wang@cshl.org (Jinhua Wang) Date: Wed, 20 Feb 2002 00:45:22 -0500 Subject: [Bioperl-l] problem in access Ensembl. Message-ID: <3C7337F2.15C15A36@cshl.org> when I ran the following test script to access EnsEMBL database, I got the error message: DBD::mysql::st execute failed: Unknown column 'meta_value' in 'field list' at /usr/lib/perl5/site_perl/5.6.0/Bio/EnsEMBL/DBSQL/MetaContainer.pm line 134. DBD::mysql::st execute failed: Unknown column 'meta_value' in 'field list' at /usr/lib/perl5/site_perl/5.6.0/Bio/EnsEMBL/DBSQL/MetaContainer.pm line 134. ------------------------------------------- use DBI; use Bio::EnsEMBL::DBSQL::DBAdaptor; my $host = 'kaka.sanger.ac.uk'; my $user = 'anonymous'; #my $dbname = 'current'; my $dbname = 'ensembl100'; my $db = new Bio::EnsEMBL::DBSQL::DBAdaptor(-host =>$host, -user =>$user, -dbname =>$dbname); my @clones = $db->get_all_Clone_id; foreach my $clone (@clones){ print $clone->id."\n";} what's the problem? Thanks, JH From heikki@ebi.ac.uk Wed Feb 20 08:15:26 2002 From: heikki@ebi.ac.uk (Heikki Lehvaslaiho) Date: Wed, 20 Feb 2002 08:15:26 +0000 Subject: [Bioperl-l] Homologene parser... References: Message-ID: <3C735B1E.742F078B@ebi.ac.uk> Andrew Macgregor wrote: > > In case this is useful to anyone on the list, this is what I have > come up with to parse the homologene file hmlg.trip.ftp. Having > attended Damian Conway's one day tutorial at the O'Reilly > Bioinformatics conference I was pretty keen to try out > Parse::RecDescent, so it uses that - I wasn't disappointed. > > > I'm pretty sure this works, it seems to parse the entire file without > any problems. At the moment the script below simply prints out what > it parses. To be useful you need to replace the action parts of the > grammar with whatever you want to do with the data. I don't think the > "...end_of_record" is needed now - I was originally passing the > parser the entire file. The script only works on the triplet file but > it is pretty easy to adapt the grammar to work on the hmlg.ftp file > as well (i.e remove delimiter, title, change how text is feed to > parser). > > I'm pretty sure this works fine, but I haven't had time to really > check it so use with care. I'm keen to get any feedback at all, > including perceived merits, demerits of using Parse::RecDescent for > this sort of thing. Now this is the way to parse text databases! ... says someone with a few years experience in parsing files using icarus language in SRS. Recoursive parsing is the cleanest, most robust way of parsing semmi-structured biologial databases. For some time now I've been wanting to start using Parse::RecDescent, but have not had time. I am not sure but there should be a homologene parser written in icarus somewhere at the EBI SRS server. Can not find it though... Generally, if you want to parse a biological database flat file with Parse::RecDescent, have look at icarus parsers. They are visible in every database's info page. > I'm not sure I can see where this would fit into bioperl apart > perhaps from scripts central, but if someone does (Jason, Ewan?) and > wants to point me in the right direction I could work on something. Have a look at Bio::SeqIO/* parsers. They have a read_seq() method. The code below would go in there (grammar into the BEGIN block?) and instead of printing values out, the code should create relevant objects (In case of homologene, you'd have to write them first.). I am not quite sure how the flow of code fits into that model, but revision is being considered right now, so this came at the right time. -Heikki > Cheers, Andrew. > > #!/usr/bin/perl -W > > # Homologene (hmlg.trip.ftp) parser > # Andrew Macgregor andrew.macgregor@stonebow.otago.ac.nz > # parser only tested against hmlg.trip.ftp > # provided "as is" without any warranty of any kind > > use strict; > use Parse::RecDescent; > > my $grammar = q { > > record : delimiter ortholog(s) title(s) ...end_of_record > | > > delimiter : /^>/ {print ">\n"; } > > ortholog : organism1 "|" organism2 "|" similarity_type "|" > locuslink_id_org1(?) "|" > unigene_id_org1(?) "|" accession_org1(?) "|" > locuslink_id_org2(?) "|" > unigene_id_org2(?) "|" accession_org2(?) "|" > percentage(?) > > { > print "$item{organism1}|"; > print "$item{organism2}|"; > print "$item{similarity_type}|"; > print "@{$item{locuslink_id_org1}}|"; > print "@{$item{unigene_id_org1}}|"; > print "@{$item{accession_org1}}|"; > print "@{$item{locuslink_id_org2}}|"; > print "@{$item{unigene_id_org2}}|"; > print "@{$item{accession_org2}}|"; > print "@{$item{percentage}}\n"; > } > > title : "TITLE" unigene "=" gene_symbol description(?) > { > print "TITLE > $item{unigene}=$item{gene_symbol}\t@{$item{description}}\n"; > } > > end_of_record : /\Z/ > > organism1 : organism > > organism2 : organism > > similarity_type : /t|f|b|B|c/ > > locuslink_id_org1 : locuslink_id > > unigene_id_org1 : unigene_id > > accession_org1 : accession > > locuslink_id_org2 : locuslink_id > > unigene_id_org2 : unigene_id > > accession_org2 : accession > > percentage : /.+/ > > unigene : organism "." unigene_id { $return = > "$item{organism}.$item{unigene_id}" } > | "Dm." locuslink_id | locuslink_id > > gene_symbol : /[\w-]+/ > > description : /.+/ > > organism : /At|Bt|Dm|Dr|Hs|Hv|Mm|Os|Rn|Ta|Xl|Zm/ > > locuslink_id : /LL.[0-9]+/ > > unigene_id : /[0-9]+/ > | locuslink_id > > accession : /\w+/ > > }; > > my $parser = new Parse::RecDescent ($grammar); > open (HOMOLOGENE, "hmlg.trip.ftp") or die "Can't open hmlg.trip.ftp: $!"; > > # read from the homologene file building up a record then passing it > to the parser > my ($record, $complete); > > while (my $text = ) { > > if ($text =~ /^>/) { > $parser->record($record) if defined $complete; > $complete = 1; > $record = ""; > $record .= $text; > } > else { > $record .= $text > } > } > $parser->record($record) if defined $complete; # takes care of > the last record > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ http://www.ebi.ac.uk/mutations/ _/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk _/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute _/ _/ _/ Wellcome Trust Genome Campus, Hinxton _/ _/ _/ Cambs. CB10 1SD, United Kingdom _/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468 ___ _/_/_/_/_/________________________________________________________ From birney@ebi.ac.uk Wed Feb 20 09:01:53 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 20 Feb 2002 09:01:53 +0000 (GMT) Subject: [Bioperl-l] problem in access Ensembl. In-Reply-To: <3C7337F2.15C15A36@cshl.org> Message-ID: On Wed, 20 Feb 2002, Jinhua Wang wrote: > when I ran the following test script to access EnsEMBL database, I got > the > error message: Jinhua - first off, probably should be posting this sort of comment to ensembl-dev, not bioperl-l, and secondly try setting dbname to current, not to ensembl100. Ensembl100 is an old release and has a (subtly different) schema. You need a different branch of the code base to work with it. From seth.redmond@ic.ac.uk Wed Feb 20 10:15:29 2002 From: seth.redmond@ic.ac.uk (Seth Redmond) Date: Wed, 20 Feb 2002 10:15:29 +0000 Subject: [Bioperl-l] re: blast database() Message-ID: I'm having some trouble getting the blast::hits... database() method to work. (i.e. to find the exact fasta sequence I'm matching against in my database. $database = @hits[$j]->database(); returns a dash instead of the database name. I've tried a number of different databases with this. Are there any relevant examples which I might have a look at? Anyone have any advice? thanks -s -- ______________________________________________ Seth Redmond DNA resource and Database Curator Wellcome Trust Laboratories for Molecular Parasitology Department of Biological Sciences Imperial College London SW7 2AY ______________________________________________ From AKarger@CuraGen.com Wed Feb 20 13:51:48 2002 From: AKarger@CuraGen.com (Karger, Amir) Date: Wed, 20 Feb 2002 08:51:48 -0500 Subject: [Bioperl-l] HomoloGene Message-ID: <715118539AC7D311A0DF009027DE6E1904CBF988@mli8b5065ffbrad.curagen.com> > -----Original Message----- > From: Andrew Macgregor [mailto:andrew@anatomy.otago.ac.nz] > > I'm just going by the readme. It looks to me that the unigene field > in either the first or the second organism can be blank. > > >-The fourth (LocusLink ID), fifth (UniGene ID), and sixth (Accession > >number) fields correspond to the first organism. One or > both of UG ID > >and LL ID may be present. Locus Link and UniGene are in one-to-one > >correspondence in the latter case, so no ambiguity arises through the > >choice of set identifier. > >-The seventh(LL), eighth(UG), and ninth(Accession) fields correspond > >to the second organism. > > This is the case in the example you give. It has LL and unigene for > org 1 but no accession number then LL but no unigene number or > accession number for org 2. Oh, absolutely. I'm 100% fine with leaving a UG out. The example I gave was supposed to demonstrate how most of the time when they left out an ID they did exactly the right thing. It exactly follows the README, which says LL is in 7, UG is in 8, and one or both may be empty. But let's put two rows from the file together, with a few extra spaces added to make it more obvious. I line up the '|' characters, and... Xl|Hs|t| |1091 |AB045628 |LL.9698 |153834 |D43951 |75.35 Xl|Dm|B| |1091 |AB045628 | |LL.41094 | |68.27 The first line follows the rules just fine. The second, though, puts the LL ID into the column that the UG should be in. Why don't they put the LL into the seventh column, and leave the eighth UG column blank? Or -- if there's some reason that absolutely requires putting LL in column 8 -- can't they at least tell us in the README that they'll do so? Anyway, I guess this isn't terribly Bioperlish, except in terms of a warning for writing parsers. -Amir CuraGen Corporation LEGAL NOTICE - Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this e-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately. From jason@cgt.mc.duke.edu Wed Feb 20 14:01:05 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Wed, 20 Feb 2002 09:01:05 -0500 (EST) Subject: [Bioperl-l] Motif finding In-Reply-To: <3C739B4E.4555.1A0C1B66@localhost> Message-ID: I would consider using EMBOSS's fuzzynuc. We do have a module that does a similar thing called Bio::Tools::SeqPattern. -jason On Wed, 20 Feb 2002, Desmond Lim wrote: > Is there a way of finding motifs in a strand of DNA? > The thing is, I want to find exact matches and some fuzzy ones (i.e 80% exact). Is > there a perl module to do it? > > Thanks. > > Desmond > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From jason@cgt.mc.duke.edu Wed Feb 20 14:06:43 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Wed, 20 Feb 2002 09:06:43 -0500 (EST) Subject: [Bioperl-l] re: blast database() In-Reply-To: Message-ID: Seth - As has been mentioned on the list and as is making its way into the new FAQ - bugs or problems with Bio::Tools::Blast are likely not going to get fixed. Consider switching to Bio::SearchIO available in the 1.0alpha or Bio::Tools::BPlite available in 0.7.X and on. -jason On Wed, 20 Feb 2002, Seth Redmond wrote: > I'm having some trouble getting the blast::hits... database() method to > work. (i.e. to find the exact fasta sequence I'm matching against in my > database. > > $database = @hits[$j]->database(); > > returns a dash instead of the database name. I've tried a number of > different databases with this. Are there any relevant examples which I > might have a look at? Anyone have any advice? > > thanks > > -s > > -- > ______________________________________________ > Seth Redmond > > DNA resource and Database Curator > Wellcome Trust Laboratories for Molecular Parasitology > Department of Biological Sciences > Imperial College > London > SW7 2AY > ______________________________________________ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From b_i_osborne@hotmail.com Wed Feb 20 15:20:23 2002 From: b_i_osborne@hotmail.com (Brian Osborne) Date: Wed, 20 Feb 2002 10:20:23 -0500 Subject: [Bioperl-l] Motif finding References: <3C739B4E.4555.1A0C1B66@localhost> Message-ID: Desmond, Jason mentioned the SeqPattern module and tools from EMBOSS. There's also a CPAN module called String::Approx that will do the "find 80% exact" (http://search.cpan.org/doc/JHI/String-Approx-3.18/Approx.pm). Brian O. ----- Original Message ----- From: "Desmond Lim" To: Sent: Tuesday, February 19, 2002 11:49 PM Subject: [Bioperl-l] Motif finding > Is there a way of finding motifs in a strand of DNA? > The thing is, I want to find exact matches and some fuzzy ones (i.e 80% exact). Is > there a perl module to do it? > > Thanks. > > Desmond > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From Boris.Lenhard@cgb.ki.se Wed Feb 20 18:41:19 2002 From: Boris.Lenhard@cgb.ki.se (Boris Lenhard) Date: 20 Feb 2002 19:41:19 +0100 Subject: [Bioperl-l] Re: Motif finding In-Reply-To: <200202201348.g1KDmckO028264@pw600a.bioperl.org> References: <200202201348.g1KDmckO028264@pw600a.bioperl.org> Message-ID: <1014230480.2643.96.camel@lorien.cgb.ki.se> > > Is there a way of finding motifs in a strand of DNA? > The thing is, I want to find exact matches and some fuzzy ones (i.e 80% exact). Is > there a perl module to do it? > > Thanks. > > Desmond > Try TFBS at http://forkhead.cgb.ki.se/TFBS/ . But please wait a few hours, I am uploading the 0.3 release tonight. It represents DNA patterns using matrices, but has modules for converting a set of DNA motifs to matrix representation. In your case, if you have e.g. motif "ACATTAGATTT", you would do my $patterngen = TFBS::PatternGen::SimplePFM->new(-sequences=>["ACATTAGATTT"]); my $frequency_matrix = $patterngen->pattern; my $weight_matrix = $frequency_matrix->to_PWM; # suppose you want to scan a sequence in a Bio::Seq object # called $seqobj, with 80% score threshold my $binding_site_set = $weight_matrix->search_seq(-seqobj=>$seqobj, -threshold=>"80%"); # to loop through the $binding_site_set, do my $iterator = $binding_site_set->iterator; while (my $binding_site = $iterator->next) { # do whatever you want with $binding_site; # $binding_site is a TFBS::Site object, # which is a subclass of Bio::SeqFeature::Generic # and has all its functionality } There are other ways to go, too. Cheers, Boris ##################################### Boris Lenhard, Ph.D. Center for Genomics and Bioinformatics Karolinska Institutet Berzelius väg 35, B322 171 77 Stockholm, SWEDEN Phone: +46 (0)8 728 6142 FAX: +46 (0)8 32 48 26 E-mail: Boris.Lenhard@cgb.ki.se ##################################### From b_i_osborne@hotmail.com Wed Feb 20 18:01:20 2002 From: b_i_osborne@hotmail.com (Brian Osborne) Date: Wed, 20 Feb 2002 13:01:20 -0500 Subject: [Bioperl-l] PPSEARCH, PRINTS, PRFSCAN troubles Message-ID: This is a multi-part message in MIME format. ------=_NextPart_000_0114_01C1BA0E.B143BEC0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable bioperl-l,=20 The Parse.pm modules in these three Tools directories all contain the = statement "use SeqFeatureSet", which doesn't exist in bioperl-live, as = you know. What these modules are attempting to do is parse the results = files from these programs (fingerPRINTScan in the case of PRINTS) and = make matches into SeqFeature::Generic objects, then use add_Feature to = add the SeqFeatures to the "set". Definitely a nice use of the = SeqFeature feature, in principle. I could "mess" around in this code but there appears to be no test data = files in t/, I could be mistaken of course. Could the authors step in = and help out? Fix the Parse.pm modules and/or provide some results = files? Thank you, Brian O. ------=_NextPart_000_0114_01C1BA0E.B143BEC0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
bioperl-l,
 
The Parse.pm modules in these three = Tools=20 directories all contain the statement "use SeqFeatureSet", which doesn't = exist=20 in bioperl-live, as you know. What these modules are attempting to do is = parse=20 the results files from these programs (fingerPRINTScan in the case of = PRINTS)=20 and make matches into SeqFeature::Generic objects, then use add_Feature = to add=20 the SeqFeatures to the "set". Definitely a nice use of the SeqFeature = feature,=20 in principle.
 
I could "mess" around in this code but = there=20 appears to be no test data files in t/, I could be mistaken of course. = Could the=20 authors step in and help out? Fix the Parse.pm modules and/or provide = some=20 results files?
 
Thank you,
 
Brian O.
 
------=_NextPart_000_0114_01C1BA0E.B143BEC0-- From andrew@anatomy.otago.ac.nz Wed Feb 20 21:12:56 2002 From: andrew@anatomy.otago.ac.nz (Andrew Macgregor) Date: Thu, 21 Feb 2002 10:12:56 +1300 Subject: [Bioperl-l] Homologene parser... In-Reply-To: <3C735B1E.742F078B@ebi.ac.uk> References: <3C735B1E.742F078B@ebi.ac.uk> Message-ID: Heikki Lehvaslaiho wrote: > >Have a look at Bio::SeqIO/* parsers. They have a read_seq() method. The >code below would go in there (grammar into the BEGIN block?) and instead of >printing values out, the code should create relevant objects (In case of >homologene, you'd have to write them first.). > >I am not quite sure how the flow of code fits into that model, but revision >is being considered right now, so this came at the right time. Hi Heikki, Thanks for the feedback and suggestions. I'll take a look at Bio::SeqIO/* etc and see if I can work out where I can go with this. Cheers, Andrew. -- ___________________________________________ Andrew Macgregor Bioinformatics Programmer & Database Administrator Molecular Embryology Group Department of Anatomy & Structural Biology University of Otago, Dunedin, New Zealand andrew.macgregor@stonebow.otago.ac.nz Telephone: +64 3 479 7873 http://anatomy.otago.ac.nz/meg ___________________________________________ From b_i_osborne@hotmail.com Wed Feb 20 21:26:04 2002 From: b_i_osborne@hotmail.com (Brian Osborne) Date: Wed, 20 Feb 2002 16:26:04 -0500 Subject: [Bioperl-l] Ppsearch, Prints, Prfscan troubles Message-ID: This is a multi-part message in MIME format. ------=_NextPart_000_0037_01C1BA2B.4AF33F20 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable bioperl-l,=20 The Parse.pm modules in these three Tools directories (PPSEARCH, PRINTS, = PRFSCAN) all contain the statement "use SeqFeatureSet", which doesn't = exist in 1.0. What these modules are doing is parsing the results files = from these programs (fingerPRINTScan in the case of PRINTS) and making = matches into SeqFeature::Generic objects, then using add_Feature to add = the SeqFeatures to the "set". Definitely a nice use of the SeqFeature = feature, in principle. My guess is that there's no longer any such thing as a set of = SeqFeatures separate from sequence objects, is that correct? Should = these modules simply return arrays of SeqFeature::Generic objects? I could "mess" around in this code but there appears to be no test data = files in t/. Could the authors step in and help out? Fix the Parse.pm = modules and/or provide some results files that I could use? Thank you, Brian O. ------=_NextPart_000_0037_01C1BA2B.4AF33F20 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
bioperl-l,
 
The Parse.pm modules in these three = Tools=20 directories (PPSEARCH, PRINTS, PRFSCAN) all=20 contain the statement "use SeqFeatureSet", which doesn't exist in 1.0. = What=20 these modules are doing is parsing the results files from these programs = (fingerPRINTScan in the case of PRINTS) and making matches into=20 SeqFeature::Generic objects, then using add_Feature to add the = SeqFeatures to=20 the "set". Definitely a nice use of the SeqFeature feature, in=20 principle.
 
My guess is that there's no longer any = such thing=20 as a set of SeqFeatures separate from sequence objects, is that correct? = Should=20 these modules simply return arrays of SeqFeature::Generic = objects?
 
I could "mess" around in this code but = there=20 appears to be no test data files in t/. Could the authors step in and = help out?=20 Fix the Parse.pm modules and/or provide some results files that I could=20 use?
 
Thank you,
 
Brian O.
 
------=_NextPart_000_0037_01C1BA2B.4AF33F20-- From andreas.matern@lbri.lionbioscience.com Wed Feb 20 22:18:23 2002 From: andreas.matern@lbri.lionbioscience.com (Andreas Matern) Date: Wed, 20 Feb 2002 17:18:23 -0500 Subject: [Bioperl-l] bug in genbank.pm References: <37F6069F8626D4119F22009027E409AB02003F3D@excsrv37.mayo.edu> Message-ID: <3C7420AF.42DB1836@lbri.lionbioscience.com> Has this been fixed? Just wondering.... "Wang, Kai" wrote: > > I pointed out this problem about two months ago, but nobody changed it. The > new GenBank file format add a "molecular shape" in the LOCUS line so current > genbank.pm cannot process it. > > in the file: > > # $Id: genbank.pm,v 1.46 2002/02/14 16:41:22 jason Exp $ > if (($2 eq 'bp') || defined($5)) { > if ($4 eq 'circular') { > $seq->molecule($3); > $seq->is_circular($4); > $seq->division($5); > ($date) = $line =~ /.*(\d\d-\w\w\w-\d\d\d\d)/; > } else { > $seq->molecule($3); > $seq->division($4); > $date = $5; > } > } else { > $seq->molecule('PRT') if($2 eq 'aa'); > $seq->division($3); > $date = $4; > } > > The above code was based on the wrong assumption that NCBI will not add > 'linear' tag to a record. > One example is accession number 'NM_003748'. The first line is: > > LOCUS NM_003748 3134 bp mRNA linear PRI > 01-NOV-2000 > > The current genbank.pm cannot recognize 01-NOV-2000. > > I think the best way is to use: $line =~ > /^LOCUS\s+(\S+)\s+\S+\s+(bp|aa)\s+(\S+)?\s+(\S+)?\s+(\w\w\w)?\s+(\d\d-\w\w\w > -\d\d\d\d)?/ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- ------------------ Andreas Matern Bioinformatician LION Bioscience Research, Inc. 141 Portland Street, 10th Floor Cambridge, MA 02139 andreas.matern@lbri.lionbioscience.com phone: (617) 245-5483 fax: (617) 245-5499 From birney@ebi.ac.uk Wed Feb 20 22:38:05 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 20 Feb 2002 22:38:05 +0000 (GMT) Subject: [Bioperl-l] PPSEARCH, PRINTS, PRFSCAN troubles In-Reply-To: Message-ID: They are modoules and will be pruned (or should be pruned) before release. ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From fabien.coutellier@info-sciences.univ-orleans.fr Thu Feb 21 10:42:01 2002 From: fabien.coutellier@info-sciences.univ-orleans.fr ( Fabien.COUTELLIER) Date: Thu, 21 Feb 2002 11:42:01 +0100 Subject: [Bioperl-l] bioperl on windows Message-ID: <3C74CEF9.540B6C3C@info-sciences.univ-orleans.fr> I m no bioperl member but I m designing an application using bioperl. I work on Sun Solaris operating system but it would be gratefull if it could run on Windows 98 or 2000. My question is : can I expect a full compatibility with Windows and if not, is there any issue to make it compatible. From jason@cgt.mc.duke.edu Thu Feb 21 13:18:46 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Thu, 21 Feb 2002 08:18:46 -0500 (EST) Subject: [Bioperl-l] bioperl on windows In-Reply-To: <3C74CEF9.540B6C3C@info-sciences.univ-orleans.fr> Message-ID: Entirely dependent on how you write your perl. But yes you should expect full compatibility if you write vanilla perl code or are careful about the components that use system resources. Places to watch out - Files (use File::Spec->catfile to construct file paths not '/') External programs ( can't be expected to work unless programs are identical on both platforms, ala grep). Indexes - Berkeley DB (Bio::Index:: modules) has some trouble if the latest DB_File.pm module is not installed on the Windows machine. On Thu, 21 Feb 2002, Fabien.COUTELLIER wrote: > I m no bioperl member but I m designing an application using bioperl. > I work on Sun Solaris operating system but it would be gratefull if it > could run on Windows 98 or 2000. > My question is : can I expect a full compatibility with Windows and if > not, is there any issue to make it compatible. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From herename@hotmail.com Thu Feb 21 14:34:41 2002 From: herename@hotmail.com (Name Here) Date: Thu, 21 Feb 2002 09:34:41 -0500 Subject: [Bioperl-l] Running FastA remotely? Message-ID: Hi! Any advice on running FastA remotely -- ie. is there a service like that for Blast? Thanks! -kyle _________________________________________________________________ Send and receive Hotmail on your mobile device: http://mobile.msn.com From birney@ebi.ac.uk Thu Feb 21 17:13:23 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Thu, 21 Feb 2002 17:13:23 +0000 (GMT) Subject: [Bioperl-l] Hinxton Genome Informatics Meeting [apologies for cross posting] Message-ID: Just to announce that a repeat of last year's very successful Hinxton Genome Informatics, jointly held by CSHL and the Wellcome Trust is being run again this year, Sept 4th-8th. Web site with registration details can be found at http://meetings.cshl.org/hinxtonhall.htm (close of registration is a long way off) Next year the venue is likely to switch to Cold Spring Harbor. We expect a lively, engaged group of scientists talking about Genome Informatics problems and solutions. Ewan ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 . ----------------------------------------------------------------- From chapmanb@arches.uga.edu Fri Feb 22 12:21:35 2002 From: chapmanb@arches.uga.edu (Brad Chapman) Date: Fri, 22 Feb 2002 07:21:35 -0500 Subject: [Bioperl-l] Following the Biohackathon Message-ID: <20020222072135.A25305@ci350185-a.athen1.ga.home.com> [Apologies for the cross-posting. I know, I get 10 copies of this message too.] Hello all; As many of you may know, quite a few people involved with the open-bio projects (http://www.open-bio.org/) are participating in a two-part hackathon, sponsored by the nice folks at O'Reilly and Electric Genetics. This hackathon gives the motley crew involved with various open source bioinformatics projects (ie. BioPerl, BioJava, Biopython, DAS, Ensembl, BioRuby, GO, MOBY, OmniGene...) a chance to get together and get some serious hacking done. The first part took place at the O'Reilly Bioinformatics Conference in Tucson during the end of January. Due to the fact that all of us involved are hard working and smarter than your average bear, we accomplished quite a bit of good stuff and were very proud of ourselves. Big pats on the back all around. This mail is to let you know that the second part of the hackathon will be taking place next week in Cape Town, South Africa. You can find tons of detail about the schedule and plans on the Electric Genetics website: http://www.egenetics.com/?Section=Biohackathon_details&Parent=open_source If you're keen to follow the exciting world of open-source bioinformatics hacking play-by-play, we'll be fully on-line at: http://www.technophage.com This page will feature technical details of what we're working on, amazing pictures of South Africa, and color commentary featuring all of those nasty tidbits you've always wanted to hear about your favorite bioinformatics programmer. Although-there-certainly-won't-be-anything-juicy-about-me-ly yr's, Brad -- PGP public key available from http://pgp.mit.edu/ From jason@cgt.mc.duke.edu Fri Feb 22 13:26:37 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Fri, 22 Feb 2002 08:26:37 -0500 (EST) Subject: [Bioperl-l] [Bioperl-guts-l] Notification: incoming/1099 (fwd) Message-ID: You are not using the Bio::SeqIO module correctly. Please read the documentation about Bio::SeqIO available in the POD (at http://docs.bioperl.org or on your machine - % perldoc Bio::SeqIO). Additionally the bioperl tutorial will help you as well - available linked from our website. Bio::SeqIO is a stream of data - not a single sequence. Your code is corrected below with >>>> in front of lines that have been changed or added. use Bio::SeqIO; $in = Bio::SeqIO->new ('-file' => "test.txt", '-format' => "Fasta"); require Bio::Tools::RestrictionEnzyme; $re1 = new Bio::Tools::RestrictionEnzyme(-NAME =>'EcoRI'); >>>>>>>>>>>>my $seq = $in->next_seq >>>>>>>>>>>>$locations = $re1->cut_locations($seq); $first = ${$locations}[1]; print " the first location is $first"; -jason -- Jason Stajich Duke University jason@cgt.mc.duke.edu ---------- Forwarded message ---------- Date: Fri, 22 Feb 2002 00:17:03 -0500 From: bioperl-bugs@bioperl.org To: bioperl-guts-l@bioperl.org Subject: [Bioperl-guts-l] Notification: incoming/1099 JitterBug notification new message incoming/1099 Message summary for PR#1099 From: Guojun Yang Subject: Is this a bug? Date: Thu, 21 Feb 2002 23:22:39 -0600 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From guojun@idmb.tamu.edu Fri Feb 22 00:17:03 2002 Received: from nobel.idmb.tamu.edu ([165.91.108.110]) by pw600a.bioperl.org (8.12.2/8.12.2) with ESMTP id g1M5H2kO013633 for ; Fri, 22 Feb 2002 00:17:02 -0500 Received: by nobel.idmb.tamu.edu with Internet Mail Service (5.5.2653.19) id <1LLZCHPC>; Thu, 21 Feb 2002 23:22:41 -0600 Message-ID: <9DB60E0436608046813742ED5E9989EF01D1E1@nobel.idmb.tamu.edu> From: Guojun Yang To: "'bioperl-bugs@bio.perl.org'" Subject: Is this a bug? Date: Thu, 21 Feb 2002 23:22:39 -0600 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C1BB60.F21A02D0" This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C1BB60.F21A02D0 Content-Type: text/plain; charset="iso-8859-1" Dear Bioperl, Thank you for maintain the bipperl, I tried to write a short script to get the locaiton of enzymatic cuts, it gave me information like this: Can't locate object method "seq" via package "Bio::SeqIO::fasta" (perhaps you fo rgot to load "Bio::SeqIO::fasta"?) at C:/Perl/site/lib/Bio/Tools/RestrictionEnzy me.pm line 670. my script is: use Bio::SeqIO; $in = Bio::SeqIO->new ('-file' => "test.txt", '-format' => "Fasta"); require Bio::Tools::RestrictionEnzyme; $re1 = new Bio::Tools::RestrictionEnzyme(-NAME =>'EcoRI'); $locations = $re1->cut_locations($in); $first = ${$locations}[1]; print " the first location is $first"; Is it becaused of a bug in .../RestrictionEnzyme.pm? I am using a Windows 2000 OS? Or could you give me a hint on how to do it? Thank you very much, Guojun ------_=_NextPart_001_01C1BB60.F21A02D0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Is this a bug?

Dear Bioperl,
Thank you for maintain the bipperl, I = tried to write a short script to get the locaiton of enzymatic cuts, it = gave me information like this:

Can't locate object method = "seq" via package "Bio::SeqIO::fasta" (perhaps you = fo
rgot to load = "Bio::SeqIO::fasta"?) at = C:/Perl/site/lib/Bio/Tools/RestrictionEnzy
me.pm line 670.

my script is:

use Bio::SeqIO;
$in =3D Bio::SeqIO->new ('-file' = =3D> "test.txt",
        =         =         '-format' =3D> "Fasta");
require = Bio::Tools::RestrictionEnzyme;
$re1 =3D new = Bio::Tools::RestrictionEnzyme(-NAME =3D>'EcoRI');
$locations =3D = $re1->cut_locations($in);
$first =3D ${$locations}[1];
print " the first location is = $first";

Is it becaused of a bug in = .../RestrictionEnzyme.pm? I am using a Windows 2000 OS? Or could you = give me a hint on how to do it?

Thank you very much,
Guojun

------_=_NextPart_001_01C1BB60.F21A02D0-- _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-guts-l From chad@sausage.usask.ca Fri Feb 22 16:43:15 2002 From: chad@sausage.usask.ca (Chad Matsalla) Date: Fri, 22 Feb 2002 10:43:15 -0600 (CST) Subject: [Bioperl-l] alignment and assembly Message-ID: Jason wrote: > See the Bio::AlignIO for how to read in alignments from files. We don't > interface with phrap or the tigr assembler at this point. Yes we do. See Bio::Tools::Alignment::Consed.pm to work with phrap alignments. It really doesn't do pair-by-pair comparisons but it will help when using phrap to cluster sequences and things. If anybody would like to help I would like to create more bioperl-compliant alignment objects out of consed.pm but at the time I created it this wasn't an issue. Chad Matsalla From chad@sausage.usask.ca Fri Feb 22 17:48:45 2002 From: chad@sausage.usask.ca (Chad Matsalla) Date: Fri, 22 Feb 2002 11:48:45 -0600 (CST) Subject: [Bioperl-l] Creating and retrieving seqfeatures by name Message-ID: Hi All, I would like to create SeqFeatures and retrieve them by name. Here is some example code: my $seq = new Bio::Seq( -seq => 'atatatatatatatatatatatatatatatatatatatataaatatatatatatatatatata', -primary_id => 'Chad1'); my $feature = new Bio::SeqFeature::Generic( -start => '10', -end => '20'); $feature->attach_seq( new Bio::PrimarySeq(-seq => $sequence, -display_id => "Chads_kewl_feature") ); $seq->add_SeqFeature($feature); So, now how can I get the sequence for "Chads_kewl_feature", by name, from $seq? This is what I want: my $chads_sequence = $seq->get_feature_sequence(-feature_name => 'Chads_kewl_feature"); _or even better_ my $feature = $seq->get_feature_by_name(-name=>'Chads_kewl_feature'); This is what I was doing: my @features = $seq->all_SeqFeatures(); foreach (@features) { if ($_->seqname() eq "Chads_kewl_feature") { print("Chads_kewl_feature's sequence is:\n"); print("\t ".$_->entire_seq()->seq()."\n"); } } Thanks for your help, Chad Matsalla -- "The POP3 server service depends on the SMTP server service, which failed to start because of the following error: The operation completed successfully." - Windows NT Server v3.51 From jason@cgt.mc.duke.edu Fri Feb 22 18:03:10 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Fri, 22 Feb 2002 13:03:10 -0500 (EST) Subject: [Bioperl-l] Creating and retrieving seqfeatures by name In-Reply-To: Message-ID: Are you iterested in the getting a feature by name part or are you interested in the getting the sequence for a feature part? # get sequences for a feature $seq->subseq($feature->location()) ; # to get back a string $seq->trunc($feature->location()) ; # to get back a Bio::SeqI # get features by name my @feats foreach my $f ( $seq->top_SeqFeatures() ) { if( $f->primary_id eq $name ) { push @feats,$f; } } I guess it would be nice to have a shortcut method so that one could do this: # remember that we don't require features to be unique! my @feats = $seq->get_features_by_id($name); foreach my $f ( @feats ) { print $seq->subseq($f->location); } I'm pretty sure that entire_seq is indeed the entire sequence the feature is attached to but not necessarily the subseq that the feature is on. Hmm, I need to check this in my code audit... Happening on the plane tomorrow I hope. -j On Fri, 22 Feb 2002, Chad Matsalla wrote: > > Hi All, > > I would like to create SeqFeatures and retrieve them by name. > > Here is some example code: > > my $seq = new Bio::Seq( -seq => > 'atatatatatatatatatatatatatatatatatatatataaatatatatatatatatatata', > -primary_id => 'Chad1'); > my $feature = new Bio::SeqFeature::Generic( -start => '10', > -end => '20'); > $feature->attach_seq( > new Bio::PrimarySeq(-seq => $sequence, > -display_id => "Chads_kewl_feature") > ); > $seq->add_SeqFeature($feature); > > > So, now how can I get the sequence for "Chads_kewl_feature", by name, > from $seq? > > This is what I want: > > my $chads_sequence = $seq->get_feature_sequence(-feature_name => > 'Chads_kewl_feature"); > > _or even better_ > > my $feature = $seq->get_feature_by_name(-name=>'Chads_kewl_feature'); > > This is what I was doing: > my @features = $seq->all_SeqFeatures(); > foreach (@features) { > if ($_->seqname() eq "Chads_kewl_feature") { > print("Chads_kewl_feature's sequence is:\n"); > print("\t ".$_->entire_seq()->seq()."\n"); > } > } > > Thanks for your help, > > Chad Matsalla > > > > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From ydzhang@iastate.edu Fri Feb 22 18:12:20 2002 From: ydzhang@iastate.edu (Yuandan Zhang) Date: Fri, 22 Feb 2002 12:12:20 -0600 Subject: [Bioperl-l] blast parsing for multiple blast output in one file In-Reply-To: <200202221704.g1MH4WkO020158@pw600a.bioperl.org> Message-ID: <4.2.0.58.20020222120503.00a32f00@ydzhang.mail.iastate.edu> Hi, I have a multiple blast output stored in one file. This file was generated by a NCBI blast batch run. I tried to parse it using Bio::Tools::Blast. However, this module parses one blast result from one file or multiple blast results from multiple files. I am reluctant to split the multiple blast results into a number of files, each file contains one blast output, because this will generate a few thousands of files. Any advice on parsing multiple blast output stored in one file? Thanks, Yuandan -- Yuandan Zhang, Ph.D. Animal Science, Iowa State University 2255 Kildee Hall, Ames IA 50011-3150 USA E-mail: ydzhang@iastate.edu Phone: (515) 294 6114 (office) Fax: (515) 294 2401 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ System Support for: ANGENMAP Maillist angenmap@db.genome.iastate.edu U.S. Pig Genome Project http://www.genome.iastate.edu/ Pig EST Project http://pigest.genome.iastate.edu .***. .***. .***. .***. .***. * | | | * | | | * * | | | * | | | * * | | | * | | | * * | | | * * | | | * * | | | * * | | | * * | | | * * | | | * | | | * * | | | * | | | * '***' '***' '***' '***' '***' ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From jason@cgt.mc.duke.edu Fri Feb 22 18:23:35 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Fri, 22 Feb 2002 13:23:35 -0500 (EST) Subject: [Bioperl-l] blast parsing for multiple blast output in one file In-Reply-To: <4.2.0.58.20020222120503.00a32f00@ydzhang.mail.iastate.edu> Message-ID: use Bio::Tools::BPlite or Bio::SearchIO. to use Bio::Tools::BPlite with multiple reports do: use Bio::Tools::BPlite; my $report = new Bio::Tools::BPlite(-file=>$filename); { $report->query; $report->database; while(my $sbjct = $report->nextSbjct) { $sbjct->name; while (my $hsp = $sbjct->nextHSP) { $hsp->score; $hsp->bits; $hsp->percent; $hsp->P; $hsp->match; $hsp->positive; $hsp->length; $hsp->querySeq; $hsp->sbjctSeq; $hsp->homologySeq; $hsp->query->start; $hsp->query->end; $hsp->hit->start; $hsp->hit->end; $hsp->hit->seqname; $hsp->hit->overlaps($exon); } } # the following line takes you to the next report in the stream/file # it will return 0 if that report is empty, # but that is valid for an empty blast report. # Returns -1 for EOF. last if ($report->_parseHeader == -1); redo; } to use Bio::SearchIO with multiple reports do: use Bio::SearchIO; my $stream = new Bio::SearchIO(-format => 'blast', -file => $filename); # iterate through all the reports in a single file while( my $result = $stream->next_result ) { # iterate through all the hits in a result while( my $hit = $result->next_hit ) { # see Bio::Search::Hit::HitI for available methods # iterate through all the hsps for a hit while( my $hsp = $hit->next_hsp ) { # see Bio::Search::HSP::HSPI for available methods } } } -jason On Fri, 22 Feb 2002, Yuandan Zhang wrote: > Hi, > > I have a multiple blast output stored in one file. This file was generated by a NCBI blast batch run. I tried to parse it using Bio::Tools::Blast. However, this module parses one blast result from one file or multiple blast results from multiple files. I am reluctant to split the multiple blast results into a number of files, each file contains one blast output, because this will generate a few thousands of files. Any advice on parsing multiple blast output stored in one file? > > Thanks, > > Yuandan > > -- > Yuandan Zhang, Ph.D. > Animal Science, Iowa State University > 2255 Kildee Hall, Ames IA 50011-3150 USA > E-mail: ydzhang@iastate.edu > Phone: (515) 294 6114 (office) > Fax: (515) 294 2401 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > System Support for: > ANGENMAP Maillist angenmap@db.genome.iastate.edu > U.S. Pig Genome Project http://www.genome.iastate.edu/ > Pig EST Project http://pigest.genome.iastate.edu > > .***. .***. .***. .***. .***. > * | | | * | | | * * | | | * | | | * * | | | > * | | | * * | | | * * | | | * * | | | * * | | | * > * | | | * * | | | * | | | * * | | | * | | | * > '***' '***' '***' '***' '***' > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From sac@bioperl.org Sat Feb 23 08:38:38 2002 From: sac@bioperl.org (Steve Chervitz) Date: Sat, 23 Feb 2002 00:38:38 -0800 (PST) Subject: [Bioperl-l] re: blast database() In-Reply-To: Message-ID: <20020223083838.70118.qmail@web13706.mail.yahoo.com> The Bio::Tools::Blast::Sbjct::database() method no longer provides any useful information. It's been unofficially deprecated for some time. The reason is that extracting database info from the sequence identifier in a BLAST hit was highly error-prone, so this is no longer done. The Bio::Tools::Blast::database() probably gets you what you need. Or, better, switch over to the new Bio::SearchIO facility as Jason mentioned. The Result object has a database_name() method. (Again, note that there is no hit-specific database method in the new SearchIO system). Steve -- Steve Chervitz sac@bioperl.org --- Seth Redmond wrote: > I'm having some trouble getting the blast::hits... database() method to > work. (i.e. to find the exact fasta sequence I'm matching against in my > database. > > $database = @hits[$j]->database(); > > returns a dash instead of the database name. I've tried a number of > different databases with this. Are there any relevant examples which I > might have a look at? Anyone have any advice? > > thanks > > -s > > -- > ______________________________________________ > Seth Redmond > > DNA resource and Database Curator > Wellcome Trust Laboratories for Molecular Parasitology > Department of Biological Sciences > Imperial College > London > SW7 2AY > ______________________________________________ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com From isayahayaa@yahoo.com Sun Feb 24 04:41:13 2002 From: isayahayaa@yahoo.com (isa yahayaa) Date: Sat, 23 Feb 2002 20:41:13 -0800 (PST) Subject: [Bioperl-l] TRANSFER OF FUND Message-ID: <20020224044113.37210.qmail@web21309.mail.yahoo.com> 92 WORKS ROAD, EKULU ENUGU, ENUGU STATE, 234-90-400858 234-80-33152366 {VERY URGENT BUSINESS TRANSACTION} GREETINGS IN ORDER TO TRANSFER OUT (USD 26 MILLION DOLLARS) FROM OUR BANK. I HAVE THE COURAGE TO ASK YOU TO LOOK FOR A RELIABLE AND HONEST PERSON WHO WILL BE CAPABLE FOR THIS IMPORTANT BUSINESS BELIEVING THAT YOU WILL NEVER LET ME DOWN EITHER NOW OR IN FUTURE. I AM ALHAJI ISA YAHAYA, THE EASTERN DISTRICT BANK MANAGER OF UNITED BANK FOR AFRICA PLC. (UBA). THERE IS AN ACCOUNT OPENED IN THIS BANK IN 1980 AND SINCE 1990 NOBODY HAS OPERATED ON THIS ACCOUNT AGAIN. AFTER GOING THROUGH SOME OLD FILES IN THE RECORDS I DISCOVERED THAT IF I DO NOT REMITT THIS MONEY OUT URGENTLY IT WILL BE FORFEITED FOR NOTHING. THE OWNER OF THIS ACCOUNT IS MR. SMITH B. ANDREAS, A FOREIGNER, AND THE MANAGER OF PETRO - TECHNICAL SUPPORT SERVICES, A CHEMICAL ENGINEER BY PROFESSION AND HE DIED SINCE 1990. NO OTHER PERSON KNOWS ABOUT THIS ACCOUNT OR ANY THING CONCERNING IT, THE ACCOUNT HAS NO OTHER BENEFICIARY AND MY INVESTIGATION PROVED TO ME AS WELL THAT THIS COMPANY DOES NOT KNOW ANYTHING ABOUT THIS ACCOUNT AND THE AMOUNT INVOLVED IS (USD 26 MILLION DOLLARS). I WANT TO TRANSFER THIS MONEY INTO A SAFE FOREIGNERS ACCOUNT ABROAD BUT I DON'T KNOW ANY FOREIGNER, I AM ONLY CONTACTING YOU AS A FOREIGNER BECAUSE THIS MONEY CAN NOT BE APPROVED TO A LOCAL BANK HERE, BUT CAN ONLY BE APPROVED TO ANY FOREIGN ACCOUNT BECAUSE THE MONEY IS IN US DOLLARS AND THE FORMER OWNER OF THE ACCOUNT IS MR. SMITH B. ANDREAS IS A FOREIGNER TOO. I KNOW THAT THIS MASSAGE WILL COME TO YOU AS A SURPRISE AS WE DON'T KNOW OUR SELVES BEFORE, BUT BE SURE THAT IT IS REAL AND A GENUINE BUSINESS. I ONLY GOT YOUR CONTACT ADDRESS FROM THE COMPUTER ,WITH BELIEVE IN GOD THAT YOU WILL NEVER LET ME DOWN IN THIS BUSINESS YOU ARE THE ONLY PERSON THAT I HAVE CONTACTED IN THIS BUSINESS, SO PLEASE REPLY URGENTLY SO THAT I WILL INFORM YOU THE NEXT STEP TO TAKE URGENTLY. I WANT US TO SEE FACE TO FACE OR SIGN A BINDING AGREEMENT TO BIND US TOGETHER SO THAT YOU CAN RECIEVE THIS MONEY INTO A FORIEGN ACCOUNT OR ANY ACCOUNT OF YOUR CHOICE WHERE THE FUND WILL BE REMMITTED.AND I WILL FLY TO YOUR COUNTRY FOR WITHDRAWAL AND SHARING AND OTHER INVESTMENTS. I AM CONTACTING YOU BECAUSE OF THE NEED TO INVOLVE A FOREIGNER WITH FOREIGN ACCOUNT AND FOREIGN BENEFICIARY. I NEED YOUR CO-OPERATION TO MAKE THIS WORK FINE. BECAUSE THE MANAGEMENT IS READY TO APPROVE THIS PAYMENT TO ANY FOREIGNER WHO HAS CORRECT INFORMATION OF THIS ACCOUNT, WHICH I WILL GIVE TO YOU LATER IMMEDIATELY, IF YOU ARE ABLE AND WITH CAPABILITY TO HANDLE SUCH AMOUNT IN STRICT CONFIDENCE AND TRUST ACCORDING TO MY INSTRUCTIONS AND ADVICE FOR OUR MUTUAL BENEFIT BECAUSE THIS OPPORTUNITY WILL NEVER COME AGAIN IN MY LIFE. A NEED TRUTHFUL PERSON IN THIS BUSINESS BECAUSE I DON'T WANT TO MAKE MISTAKE I NEED YOUR STRONG ASSURANCE AND TRUST. WITH MY POSITION NOW IN THE OFFICE I CAN TRANSFER THIS MONEY TO ANY FOREIGNERS RELIABLE ACCOUNT WHICH YOU CAN PROVIDE WITH ASSURANCE THAT THIS MONEY WILL BE INTACT PENDING MY PHYSICAL ARRIVAL IN YOUR COUNTRY FOR SHARING. I WILL DESTROY ALL DOCUMENTS OF TRANSACTION IMMEDIATELY WE RECIEVE THIS MONEY LEAVING NO TRACE TO ANY PLACE. YOU CAN ALSO COME TO DISCUSS WITH ME FACE TO FACE AFTER WHICH I WILL MAKE THIS REMITTANCE IN YOUR PRESENCE AND TWO OF US WILL FLY TO YOUR COUNTRY AT LEAST TWO DAYS AHEAD OF THE MONEY GOING INTO YOUR ACCOUNT. I WILL APPLY FOR ANNUAL LEAVE TO GET VISA IMMEDIATELY I HEAR FROM YOU THAT YOU ARE READY TO ACT AND RECEIVE THIS FUND IN YOUR ACCOUNT. I WILL USE MY POSITION AND INFLUENCE TO EFFECT LEGAL APPROVALS AND ONWARD TRANSFER OF THIS MONEY TO YOUR ACCOUNT WITH APPROPRIATE CLEARANCE FORMS OF THE MINISTRIES AND FOREIGN EXCHANGE DEPARTMENTS. AT THE CONCLUSION OF THIS BUSINESS, YOU WILL BE GIVEN 20% OF THE TOTAL AMOUNT, 75% WILL BE FOR ME, WHILE 5% WILL BE FOR EXPENSES BOTH PARTIES MIGHT HAVE INCURED DURING THE PROCESS OF TRANSFERING. I LOOK FORWARD TO YOUR EARLIEST REPLY BY EMAIL. YOURS TRULY, ALHAJI ISA YAHAYA __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com From andreas.matern@lbri.lionbioscience.com Mon Feb 25 19:50:12 2002 From: andreas.matern@lbri.lionbioscience.com (Andreas Matern) Date: Mon, 25 Feb 2002 14:50:12 -0500 Subject: [Bioperl-l] validating a sequence Message-ID: <3C7A9574.88648266@lbri.lionbioscience.com> Forgive me if this answer occurs somewhere else, but. . . I need to validate FASTA sequences. The web interface (another developer, can't touch his code) allows users to cut and paste, and many of them cut and paste sequences with numbers in them (i.e. >mysequence 1ACACGATCGACTGACATCGTCAGTACGTCGATACGATCGACTGACTAGCTC 51AACTCGTCGTCGTCGTCGCTGCTCGTCGCTGCTCGTCTGCTCGTCGTC etc.) The FASTA file is turned into a Bio::Index::Fasta by a cron job And then I (normally) run @ids = $inx->get_all_primary_ids(); foreach $id (@ids) { my $seq = $inx->getch($id); ....do stuff with seq.... ....connect to database... ....etc.... } This of course dies when the $seq is screwey ( MSG: Attempting to set the sequence to [1ACA....] which does not look healthy I see the $seq->validate_seq, but I'm not sure how to use it in my context Any suggestions, especially for stripping out non-IUPAC characters from a FASTA string, would be greatly appreciated... -Andreas -- ------------------ Andreas Matern Bioinformatician LION Bioscience Research, Inc. 141 Portland Street, 10th Floor Cambridge, MA 02139 andreas.matern@lbri.lionbioscience.com phone: (617) 245-5483 fax: (617) 245-5499 From kosth@hotmail.com Mon Feb 25 21:18:51 2002 From: kosth@hotmail.com (Kostas Thalassinos) Date: Mon, 25 Feb 2002 21:18:51 +0000 Subject: [Bioperl-l] Is there a module for reading in ClustalX files Message-ID: Could somebody please tell me how i could read in my program a CLustalX file (*.aln)? Thank you Kostas Thalassinos _________________________________________________________________ Chat with friends online, try MSN Messenger: http://messenger.msn.com From Guoneng.Zhong@med.nyu.edu Mon Feb 25 21:21:20 2002 From: Guoneng.Zhong@med.nyu.edu (Guoneng Zhong) Date: Mon, 25 Feb 2002 16:21:20 -0500 Subject: [Bioperl-l] oh, the problem Message-ID: <9CFA27A8-2A35-11D6-84A8-0050E41E5C1B@med.nyu.edu> Hi, Regarding previous posting, I forgot to mention that when I tried to force the output file from sim4 program to AlignIO, I get something like: Illegal division by zero at /Library/Perl/Bio/SimpleAlign.pm line 728... G From Guoneng.Zhong@med.nyu.edu Mon Feb 25 21:18:20 2002 From: Guoneng.Zhong@med.nyu.edu (Guoneng Zhong) Date: Mon, 25 Feb 2002 16:18:20 -0500 Subject: [Bioperl-l] what format is this and how I can use AlignIO Message-ID: <31A933A7-2A35-11D6-84A8-0050E41E5C1B@med.nyu.edu> --Apple-Mail-1--142521117 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed Hi, Sorry for the ignorance on this matter. I am trying to use the output of this standalone program called sim4 to create some sort of Alignment object. The output looks like the following. I thought perhaps BioPerl can give me an object that tells me exactly which nucleotide matches with which across the two aligned strands (that's what AlignIO does, right?). seq1 = sim4seq4HyyX8K.seq, 2460 bp seq2 = sim4seqOD34cJI.seq ((no header)), 876 bp (complement) 0 . : . : . : . : . : 1567 TGACAAGAGCACTGGCAAGGAGAACAAAATCACTATCACTAATGATAAGG ||||||||||||||||| |||||-|| |||||||||| | | | | 1 TGACAAGAGCACTGGCATGGAGA CAnAATCACTATCmcTanyGayAmGs 50 . : . : . : . : . : 1617 GTCGTCTCAGCAAGGAGGACATTGAGCGCATGGTGCAGGAAGCTGAGAAG | |||||||||||||||||||||||||||||||||||||||||||||||| 50 GtCGTCTCAGCAAGGAGGACATTGAGCGCATGGTGCAGGAAGCTGAGAAG 100 . : . : . : . : . : 1667 TACAAGGCTGAGGATGATGTGCAGCGTGACAAGGTTTCTGCCAAGAACGG |||||||||||||||||||||||||||||||||||||||||||||||||| 100 TACAAGGCTGAGGATGATGTGCAGCGTGACAAGGTTTCTGCCAAGAACGG Any help? Thanks, G --Apple-Mail-1--142521117 Content-Transfer-Encoding: 7bit Content-Type: text/enriched; charset=US-ASCII Hi, Sorry for the ignorance on this matter. I am trying to use the output of this standalone program called sim4 to create some sort of Alignment object. The output looks like the following. I thought perhaps BioPerl can give me an object that tells me exactly which nucleotide matches with which across the two aligned strands (that's what AlignIO does, right?). Courier New seq1 = sim4seq4HyyX8K.seq, 2460 bp seq2 = sim4seqOD34cJI.seq ((no header)), 876 bp (complement) 0 . : . : . : . : . : 1567 TGACAAGAGCACTGGCAAGGAGAACAAAATCACTATCACTAATGATAAGG ||||||||||||||||| |||||-|| |||||||||| | | | | 1 TGACAAGAGCACTGGCATGGAGA CAnAATCACTATCmcTanyGayAmGs 50 . : . : . : . : . : 1617 GTCGTCTCAGCAAGGAGGACATTGAGCGCATGGTGCAGGAAGCTGAGAAG | |||||||||||||||||||||||||||||||||||||||||||||||| 50 GtCGTCTCAGCAAGGAGGACATTGAGCGCATGGTGCAGGAAGCTGAGAAG 100 . : . : . : . : . : 1667 TACAAGGCTGAGGATGATGTGCAGCGTGACAAGGTTTCTGCCAAGAACGG |||||||||||||||||||||||||||||||||||||||||||||||||| 100 TACAAGGCTGAGGATGATGTGCAGCGTGACAAGGTTTCTGCCAAGAACGG Any help? Thanks, G --Apple-Mail-1--142521117-- From b_i_osborne@hotmail.com Mon Feb 25 22:16:36 2002 From: b_i_osborne@hotmail.com (Brian Osborne) Date: Mon, 25 Feb 2002 17:16:36 -0500 Subject: [Bioperl-l] Is there a module for reading in ClustalX files References: Message-ID: Kostas, You should take a look at the bptutorial, there's a section on reading alignment files, and clustal is one of the supported formats. Did you look at this section? Brian O. ----- Original Message ----- From: "Kostas Thalassinos" To: Sent: Monday, February 25, 2002 4:18 PM Subject: [Bioperl-l] Is there a module for reading in ClustalX files > Could somebody please tell me how i could read in my program a CLustalX file > (*.aln)? > > Thank you > Kostas Thalassinos > > > > _________________________________________________________________ > Chat with friends online, try MSN Messenger: http://messenger.msn.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From jason@cgt.mc.duke.edu Tue Feb 26 07:28:37 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Tue, 26 Feb 2002 02:28:37 -0500 (EST) Subject: [Bioperl-l] what format is this and how I can use AlignIO In-Reply-To: <31A933A7-2A35-11D6-84A8-0050E41E5C1B@med.nyu.edu> Message-ID: We currently don't have a parser for sim4 as alignments - we handle sim4 as an exon identifier see Bio::Tools::Sim4::Results for more info. I bet it wouldn't be terribly hard to make an alignio module using our existing parser. -jason On Mon, 25 Feb 2002, Guoneng Zhong wrote: > Hi, > > Sorry for the ignorance on this matter. I am trying to use the output > of this standalone program called sim4 to create some sort of Alignment > object. The output looks like the following. I thought perhaps BioPerl > can give me an object that tells me exactly which nucleotide matches > with which across the two aligned strands (that's what AlignIO does, > right?). > > > seq1 = sim4seq4HyyX8K.seq, 2460 bp > seq2 = sim4seqOD34cJI.seq ((no header)), 876 bp > > > (complement) > > 0 . : . : . : . : . : > 1567 TGACAAGAGCACTGGCAAGGAGAACAAAATCACTATCACTAATGATAAGG > ||||||||||||||||| |||||-|| |||||||||| | | | | > 1 TGACAAGAGCACTGGCATGGAGA CAnAATCACTATCmcTanyGayAmGs > > 50 . : . : . : . : . : > 1617 GTCGTCTCAGCAAGGAGGACATTGAGCGCATGGTGCAGGAAGCTGAGAAG > | |||||||||||||||||||||||||||||||||||||||||||||||| > 50 GtCGTCTCAGCAAGGAGGACATTGAGCGCATGGTGCAGGAAGCTGAGAAG > > 100 . : . : . : . : . : > 1667 TACAAGGCTGAGGATGATGTGCAGCGTGACAAGGTTTCTGCCAAGAACGG > |||||||||||||||||||||||||||||||||||||||||||||||||| > 100 TACAAGGCTGAGGATGATGTGCAGCGTGACAAGGTTTCTGCCAAGAACGG > > Any help? > > Thanks, > G > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From schan@xenongenetics.com Tue Feb 26 16:38:22 2002 From: schan@xenongenetics.com (Simon Chan) Date: Tue, 26 Feb 2002 08:38:22 -0800 Subject: [Bioperl-l] alignment of 2 sequences/ FASTA Message-ID: Hi, I would like to align 2 sequences. Kind of hard to explain what I need so I'll use an example: seq 1: abab seq 2: nnnnababnnnnn How can I get the script to output that the match between seq 1 and seq2 starts at position 5 and and ends at position 8? I was told to run the FASTA program on the seq1 and seq2 and that should get what I want, however, there doesn't seem to be a module that will perform FASTA comparisons...? Thanks, Everybody. simon ################################### From Wiepert.Mathieu@mayo.edu Tue Feb 26 18:22:57 2002 From: Wiepert.Mathieu@mayo.edu (Wiepert, Mathieu) Date: Tue, 26 Feb 2002 12:22:57 -0600 Subject: [Bioperl-l] deprecated blast parse? Message-ID: <2F41CC6C9777D311ACBD009027B108EA0292018F@excsrv32.mayo.edu> Hi, I have code (surreptitiously copied from Jason's examples) that is giving me a deprecation error. I have a refresh from this morning of bioperl-live. Method subject deprecated: use hit() instead STACK Bio::SeqFeature::SimilarityPair::subject /home/mxw02/bioperl_latest/lib/site_perl/5.6.0/Bio/SeqFeature/SimilarityPair .pm:342 STACK toplevel bpblast.pl:32 453 dbj|D53816.1|D53816 Code snippet is my @params = ('outfile' => 'seq.out', 'program' => 'tblastn', 'database' => 'est_human'); my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); my $str = Bio::SeqIO->new(-file=>'seq.fa' , '-format' => 'Fasta' ); my $input = $str->next_seq(); my $blast_report = $factory->blastall($input); my $searchio = new Bio::SearchIO(-format => 'blast', # -result_factory => $blast_report); -file => 'seq.out'); while( my $result = $searchio->next_result ) { while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { print $hsp->score . " " . $hsp->subject->seqname . "\n"; } } } I am sure i did something wrong... -Mat From Wiepert.Mathieu@mayo.edu Tue Feb 26 18:26:50 2002 From: Wiepert.Mathieu@mayo.edu (Wiepert, Mathieu) Date: Tue, 26 Feb 2002 12:26:50 -0600 Subject: [Bioperl-l] RE: deprecated blast parse? Message-ID: <2F41CC6C9777D311ACBD009027B108EA02920190@excsrv32.mayo.edu> Please ignore my previous post. Sorry for the chatter. -Mat From jon@compbio.dundee.ac.uk Tue Feb 26 19:40:09 2002 From: jon@compbio.dundee.ac.uk (Jonathan Barber) Date: Tue, 26 Feb 2002 19:40:09 +0000 Subject: [Bioperl-l] alignment of 2 sequences/ FASTA In-Reply-To: ; from schan@xenongenetics.com on Tue, Feb 26, 2002 at 08:38:22AM -0800 References: Message-ID: <20020226194009.M25839@weevil.dundee.ac.uk> On Tue, Feb 26, 2002 at 08:38:22AM -0800, Simon Chan wrote: > Hi, > > I would like to align 2 sequences. Kind of hard to explain what I need > so I'll use an example: > > > seq 1: abab > seq 2: nnnnababnnnnn > > How can I get the script to output that the match between > seq 1 and seq2 starts at position 5 and and ends at position 8? > > I was told to run the FASTA program on the seq1 and seq2 and that > should get what I want, however, there doesn't seem to be a module > that will perform FASTA comparisons...? I don't think there are objects for manipulating FASTA data (I may be wrong as I'm new to Bioperl), so it may be easier to use the BLAST packages which will provide more or less the same results (the differences lying in the heuristics that the BLAST and FASTA programs use). > > Thanks, Everybody. > > simon > > ################################### > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l -- Jon From Wiepert.Mathieu@mayo.edu Tue Feb 26 20:03:51 2002 From: Wiepert.Mathieu@mayo.edu (Wiepert, Mathieu) Date: Tue, 26 Feb 2002 14:03:51 -0600 Subject: [Bioperl-l] Parse Blast -m=1 Message-ID: <2F41CC6C9777D311ACBD009027B108EA02920194@excsrv32.mayo.edu> Hi, Can SearchIO handle different formats of blast output, or does it need the default? I tried a blast with 'm' => '1', the blast output was good, the parser didn't work. Got [blastall] ERROR: ncbiapi [000.000] NM_000754: SeqPortNew: gi|14171611 stop(870) >= len(840) -Mat From jason@cgt.mc.duke.edu Tue Feb 26 20:17:01 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Tue, 26 Feb 2002 15:17:01 -0500 (EST) Subject: [Bioperl-l] alignment of 2 sequences/ FASTA In-Reply-To: Message-ID: Hmm there are lots of ways to do this. It depends on a) how many sequences you plan to do this for b) whether you always know the 2 sequences you want to align c) are you doing nucleotide to nucleotide (DNA vs DNA or cDNA vs DNA) d) how accurate do you need your alignments to be? Per a) - you should use a heuristic algorithm like BLAST [1] or FASTA [2] if you need to search thousands or hundreds of thousands. - if not use a Smith-Waterman implementation - ssearch [2],EMBOSS water[3] b) you can use bl2seq, FASTA, SSEARCH, or water to do this c) sim4 or est2genome or genewise or exonerate here d) Don't use a heuristic alg (like BLAST or FASTA) 1. BLAST - ftp://ftp.ncbi.nih.gov/blast/executables/ (Altschul et al) 2. FASTA - http://fasta.bioch.virginia.edu/ (Pearson) 3. EMBOSS - http://www.emboss.org (Rice et al) We support parsing of FASTA (including ssearch I believe) and BLAST with Bio::SearchIO, water with Bio::AlignIO (format "emboss"), bl2seq with Bio::Tools::BPlite, sim4 and est2genome (as a gene prediction means) with Bio::Tools::Sim4::Result. We don't have an est2genome parser right now. Some of those modules are not in the 0.7 series but will be in 1.0 and are in the 0.9.x dev series. -jason On Tue, 26 Feb 2002, Simon Chan wrote: > Hi, > > I would like to align 2 sequences. Kind of hard to explain what I need > so I'll use an example: > > > seq 1: abab > seq 2: nnnnababnnnnn > > How can I get the script to output that the match between > seq 1 and seq2 starts at position 5 and and ends at position 8? > > I was told to run the FASTA program on the seq1 and seq2 and that > should get what I want, however, there doesn't seem to be a module > that will perform FASTA comparisons...? > > Thanks, Everybody. > > simon > > ################################### > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From jason@cgt.mc.duke.edu Tue Feb 26 20:19:36 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Tue, 26 Feb 2002 15:19:36 -0500 (EST) Subject: [Bioperl-l] Parse Blast -m=1 In-Reply-To: <2F41CC6C9777D311ACBD009027B108EA02920194@excsrv32.mayo.edu> Message-ID: nope - happy to have someone who needs it, write it... We do parse blast xml (-m 7). -j On Tue, 26 Feb 2002, Wiepert, Mathieu wrote: > Hi, > > Can SearchIO handle different formats of blast output, or does it need the > default? I tried a blast with 'm' => '1', the blast output was good, the > parser didn't work. Got > > [blastall] ERROR: ncbiapi [000.000] NM_000754: SeqPortNew: gi|14171611 > stop(870) >= len(840) > > -Mat > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From jason@cgt.mc.duke.edu Tue Feb 26 20:57:18 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Tue, 26 Feb 2002 15:57:18 -0500 (EST) Subject: [Bioperl-l] validating a sequence In-Reply-To: <3C7A9574.88648266@lbri.lionbioscience.com> Message-ID: On Mon, 25 Feb 2002, Andreas Matern wrote: > Forgive me if this answer occurs somewhere else, but. . . > > I need to validate FASTA sequences. The web interface (another > developer, can't touch his code) allows users to cut and paste, and many > of them cut and paste sequences with numbers in them > > (i.e. > >mysequence > 1ACACGATCGACTGACATCGTCAGTACGTCGATACGATCGACTGACTAGCTC > 51AACTCGTCGTCGTCGTCGCTGCTCGTCGCTGCTCGTCTGCTCGTCGTC > > etc.) > > The FASTA file is turned into a Bio::Index::Fasta by a cron job > And then I (normally) run > > @ids = $inx->get_all_primary_ids(); > foreach $id (@ids) { > my $seq = $inx->getch($id); > ....do stuff with seq.... > ....connect to database... > ....etc.... > } > > This of course dies when the $seq is screwey ( > > MSG: Attempting to set the sequence to [1ACA....] which does not look > healthy > > I see the $seq->validate_seq, but I'm not sure how to use it in my > context > You can protect these in a eval { } block - but I'm not sure when you want to evaluate - do you want to kick things out of the db before they are indexed or just handle bad entries semi-nicely? As for checking things before they are indexed - the only way I can think off the top of my head is to pre-process the file with Bio::SeqIO and protect the parse with eval {} do a goto to restart the loop like this (still not sure what the workflow is so not sure if this works in your scheme). NOte: up till now we haven't done a whole lot of trying to handle badly formatted data files very well. # you're going to build a new "CLEAN" db my $in = new Bio::SeqIO(-file => 'webdump.fa'); my $newin = new Bio::SeqIO(-file => '>newwebdump.fa'); eval { LOOP: while( my $seq = $in->next_seq ) { $newin->write_seq($seq); } }; if( $@) { print STDERR "skipping a sequence with error \n$@"; goto LOOP; } $newin->close(); # index webdump again Now - I'm not 100% sure that our throws end up getting caught in the eval so we may need to catch other signals - let me know if this doesn't work. > Any suggestions, especially for stripping out non-IUPAC characters from > a FASTA string, would be greatly appreciated... > > -Andreas > > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From alastair.kerr@cereon.com Tue Feb 26 22:22:48 2002 From: alastair.kerr@cereon.com (KERR, ALASTAIR [AG/2165]) Date: Tue, 26 Feb 2002 16:22:48 -0600 Subject: [Bioperl-l] alignment of 2 sequences/ FASTA Message-ID: <8D7A3D2453C7D2119CD800A0C9EAF09704F23C1D@ems2165-01.cereon.com> Hi Simon, If the sequences are as you describe the simplest way may be to use the inbuilt 'index' function. i.e. $seq1= "abab"; $seq2 = "nnnnababnnnnn"; $start = index($seq2, $seq1) + 1; #index count starts at 0 #(in this case $start == 5) - Alastair -----Original Message----- From: Simon Chan [mailto:schan@xenongenetics.com] Sent: Tuesday, February 26, 2002 11:38 AM To: bioperl-l@bioperl.org Subject: [Bioperl-l] alignment of 2 sequences/ FASTA Hi, I would like to align 2 sequences. Kind of hard to explain what I need so I'll use an example: seq 1: abab seq 2: nnnnababnnnnn How can I get the script to output that the match between seq 1 and seq2 starts at position 5 and and ends at position 8? I was told to run the FASTA program on the seq1 and seq2 and that should get what I want, however, there doesn't seem to be a module that will perform FASTA comparisons...? Thanks, Everybody. simon ################################### From jjly25@hotmail.com Wed Feb 27 01:15:44 2002 From: jjly25@hotmail.com (Jason Li Ying) Date: Tue, 26 Feb 2002 20:15:44 -0500 Subject: [Bioperl-l] Mailing List Message-ID: I would like to joing this mailing list. My address is jjly25@hotmail.com Jason _________________________________________________________________ Chat with friends online, try MSN Messenger: http://messenger.msn.com From jchuang@ucsf-104-53.ucsf.edu Wed Feb 27 03:17:08 2002 From: jchuang@ucsf-104-53.ucsf.edu (jchuang@ucsf-104-53.ucsf.edu) Date: Tue, 26 Feb 2002 19:17:08 -0800 (PST) Subject: [Bioperl-l] Sequences for joined features (in a Genbank file) Message-ID: Hi, I'm trying to extract the DNA sequence from features of genbank files (e.g. the CDS). The seq function does this great whenever I have a contiguous sequence, but it doesn't appear to work properly when the feature is made up of joined sequences. Instead of a patching together of all the joined subsequences, I get the sequence starting with the first start site and ending with the last end site. Is there a way to extract the patched together sequence instead? I couldn't tell from the docs if this feature had been implemented. The Location modules look relevant, but I didn't see a way to use them on sequence objects. Thanks for any help. Jeff -- Jeffrey Chuang UC San Francisco - Dept. of Biochemistry and Biophysics 513 Parnassus Avenue Box 0448 San Francisco, CA 94143-0001 W: 415-514-2616 jchuang@ucsf-104-53.ucsf.edu From birney@ebi.ac.uk Wed Feb 27 06:46:03 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 27 Feb 2002 01:46:03 -0500 (EST) Subject: [Bioperl-l] validating a sequence In-Reply-To: Message-ID: Alternatively you will have to process things yourself, eg while( <> ) { $_ =~ s/[^atgcATGCNn]//g; $seq_string .=$_; } $new_seq = Bio::Seq->new( -seq => $seq_string, -id => 'myseqeunce'); From birney@ebi.ac.uk Wed Feb 27 06:56:33 2002 From: birney@ebi.ac.uk (Ewan Birney) Date: Wed, 27 Feb 2002 01:56:33 -0500 (EST) Subject: [Bioperl-l] Sequences for joined features (in a Genbank file) In-Reply-To: Message-ID: On Tue, 26 Feb 2002 jchuang@ucsf-104-53.ucsf.edu wrote: > Hi, > > I'm trying to extract the DNA sequence from features of genbank files > (e.g. the CDS). The seq function does this great whenever I have a > contiguous sequence, but it doesn't appear to work properly when the > feature is made up of joined sequences. Instead of a patching together of > all the joined subsequences, I get the sequence starting with the first > start site and ending with the last end site. > > Is there a way to extract the patched together sequence instead? I > couldn't tell from the docs if this feature had been implemented. The > Location modules look relevant, but I didn't see a way to use them on > sequence objects. This is a common request and ... no... we don't do this. We should. I'll talk to Jason today about the best way to handle this. > > Thanks for any help. > > Jeff > > -- > Jeffrey Chuang > UC San Francisco - Dept. of Biochemistry and Biophysics > 513 Parnassus Avenue > Box 0448 > San Francisco, CA 94143-0001 > W: 415-514-2616 > jchuang@ucsf-104-53.ucsf.edu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > From mwilkinson@gene.pbi.nrc.ca Wed Feb 27 08:52:20 2002 From: mwilkinson@gene.pbi.nrc.ca (mwilkinson) Date: Wed, 27 Feb 2002 02:52:20 -0600 Subject: [Bioperl-l] what format is this and how I can use AlignIO References: Message-ID: <3C7C9E44.19202A76@gene.pbi.nrc.ca> I don't know if this will be useful or not... Genquire used to have a Sim4 GUI, which parsed the output and displayed the alignment against the genome sequence. It's pretty archaic, and we haven't included it in the final release of Genquire, but the code is still hanging around on my hard drive. No promises of functionality, and the Genquire "overhead" will have to be pulled out to make it functional, but if you think it would be remotely useful to you I can send you the module as soon as I get back from the hackathon. Mark Jason Stajich wrote: > We currently don't have a parser for sim4 as alignments - we handle sim4 > as an exon identifier see Bio::Tools::Sim4::Results for more info. > > I bet it wouldn't be terribly hard to make an alignio module using our > existing parser. > > -jason > On Mon, 25 Feb 2002, Guoneng Zhong wrote: > > > Hi, > > > > Sorry for the ignorance on this matter. I am trying to use the output > > of this standalone program called sim4 to create some sort of Alignment > > object. The output looks like the following. I thought perhaps BioPerl > > can give me an object that tells me exactly which nucleotide matches > > with which across the two aligned strands (that's what AlignIO does, > > right?). > > > > > > seq1 = sim4seq4HyyX8K.seq, 2460 bp > > seq2 = sim4seqOD34cJI.seq ((no header)), 876 bp > > > > > > (complement) > > > > 0 . : . : . : . : . : > > 1567 TGACAAGAGCACTGGCAAGGAGAACAAAATCACTATCACTAATGATAAGG > > ||||||||||||||||| |||||-|| |||||||||| | | | | > > 1 TGACAAGAGCACTGGCATGGAGA CAnAATCACTATCmcTanyGayAmGs > > > > 50 . : . : . : . : . : > > 1617 GTCGTCTCAGCAAGGAGGACATTGAGCGCATGGTGCAGGAAGCTGAGAAG > > | |||||||||||||||||||||||||||||||||||||||||||||||| > > 50 GtCGTCTCAGCAAGGAGGACATTGAGCGCATGGTGCAGGAAGCTGAGAAG > > > > 100 . : . : . : . : . : > > 1667 TACAAGGCTGAGGATGATGTGCAGCGTGACAAGGTTTCTGCCAAGAACGG > > |||||||||||||||||||||||||||||||||||||||||||||||||| > > 100 TACAAGGCTGAGGATGATGTGCAGCGTGACAAGGTTTCTGCCAAGAACGG > > > > Any help? > > > > Thanks, > > G > > > > -- > Jason Stajich > Duke University > jason@cgt.mc.duke.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l From jason@cgt.mc.duke.edu Wed Feb 27 08:51:34 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Wed, 27 Feb 2002 03:51:34 -0500 (EST) Subject: [Bioperl-l] [Bioperl-guts-l] Notification: incoming/1109 (fwd) Message-ID: Do you only have 4 queries in the file? You're only printing the query name. You're also calling next_result twice in the same loop - so you're going to skip an entry each loop iteration. Are you getting any warnings or errors? We have tested this on many multi-sequence reports - what version of blast and which blast (t)blast(n|p|x) are you using? #This code will only print the queries name and length for each of the #results. use Bio::SearchIO; my($filename) = "blast.cnone.cnone.txt"; my($searchio) = new Bio::SearchIO(-format => 'blast', -file => $filename); while ($result = $searchio->next_result) { print " Query \"", $result->query_name, "\" (", $result->query_length, " pb)\n"; # DELETE THIS LINE! $result = $searchio->next_result(); } -- Jason Stajich Duke University jason@cgt.mc.duke.edu ---------- Forwarded message ---------- Date: Wed, 27 Feb 2002 03:03:42 -0500 From: bioperl-bugs@bioperl.org To: bioperl-guts-l@bioperl.org Subject: [Bioperl-guts-l] Notification: incoming/1109 JitterBug notification new message incoming/1109 Message summary for PR#1109 From: aurelien.mazurie@free.fr Subject: A big bug (i think) for the BLAST parser ! Date: Wed, 27 Feb 2002 03:03:41 -0500 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From aurelien.mazurie@free.fr Wed Feb 27 03:03:41 2002 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.12.2/8.12.2) with ESMTP id g1R83fkO005874 for ; Wed, 27 Feb 2002 03:03:41 -0500 Date: Wed, 27 Feb 2002 03:03:41 -0500 Message-Id: <200202270803.g1R83fkO005874@pw600a.bioperl.org> From: aurelien.mazurie@free.fr To: bioperl-bugs@bioperl.org Subject: A big bug (i think) for the BLAST parser ! Full_Name: Aurelien Mazurie Module: Bio::SearchIO Version: 1.0-alpha PerlVer: v5.6.1 (ActivePerl) OS: Win2000 pro / Linux Submission from: (NULL) (134.157.194.52) First, apologizes for my english =) Maybe it's me, but when i try to use this piece of code: use Bio::SearchIO; my($filename) = "blast.cnone.cnone.txt"; my($searchio) = new Bio::SearchIO(-format => 'blast', -file => $filename); while ($result = $searchio->next_result) { print " Query \"", $result->query_name, "\" (", $result->query_length, " pb)\n"; $result = $searchio->next_result(); } The result is that the script print on screen only an entry on two of the original BLAST report file. Strange, isn't it ? So, what is the problem ? The result is the same over different BLAST files, and i DO use BioPerl to do the job, since i have 700 entries in different files ! PLEASE, help me ! This job is urgent, so if we can found and fix this problem... quickly, i will be happy =) (note: the problem is the same under Linux) Aurelien Mazurie (from France =) _______________________________________________ Bioperl-guts-l mailing list Bioperl-guts-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-guts-l From holford.5@osu.edu Wed Feb 27 17:38:35 2002 From: holford.5@osu.edu (holford.5@osu.edu) Date: Wed, 27 Feb 2002 09:38:35 -0800 Subject: [Bioperl-l] New to BioPerl Message-ID: <5.0.0.25.2.20020227093318.027e2010@pop.service.ohio-state.edu> Does a Perl script exist for such a job or would I need to develop one? I am in the process of learning Perl and still very wet behind the ears. I am a C/C++ programmer and I have been asked to develop a software program to automate a process. This process deals with parsing blast output. Since my background in not in Biology I will not attempt to paraphrase the request. So here it is in its entirety. "Do you know of a program that would allow or facilitate in silico > Northern (gene expression) analysis? What I would like to do is to take > a specific gene sequence and blast it against the soybean EST database > and then extract the hits below a certain E value into *categories* > (e.g., "roots", "inoculated", etc) based on the cDNA library > information for the particular EST in the text that accompanies the > blast output. It could also be done based on the cDNA library # > associated with the EST hits. This allows a quick determination of > whether a particular gene, for instance, is root specific in its > expression, etc. It would be extremely valuable to us in deciding what > genes to focus on (from a gene family, for instance) for expression > analysis with real life Northerns. > I think that such a program would be very valuable for alot of us!" Thanking you in advance, Ian From jason@cgt.mc.duke.edu Wed Feb 27 15:27:17 2002 From: jason@cgt.mc.duke.edu (Jason Stajich) Date: Wed, 27 Feb 2002 10:27:17 -0500 (EST) Subject: [Bioperl-l] New to BioPerl In-Reply-To: <5.0.0.25.2.20020227093318.027e2010@pop.service.ohio-state.edu> Message-ID: use Bio::SearchIO and read the bptutorial on how to extract the information you need -- all the tools you need to do this are in the Bio::SearchIO parser system and Bio::Search objects. I have a script that extracts tissue information for an EST blast see scripts/est_tissue_query.pl in the bioperl repository. That script is trying to do a lot so it might be a little confusing. it's run like % perl scripts/est_tissue_query.pl -r genbank -p 0.00001 -f blast -b MYFILE # you can add -c cache if you want to build a temporary cache so multiple This script has been fixed some just now to use Bio::SearchIO so you probably want the latest CVS version which you can see how to get from http://cvs.bioperl.org I'd suggest that you spend a little time learning bioperl by running through the tutorial and building a simle script that prints out the list of hits from a blast report using our objects. Then I suspect the path will be more obvious. Lots of good discussions of this type on our mailing list archives. -jason On Wed, 27 Feb 2002 holford.5@osu.edu wrote: > Does a Perl script exist for such a job or would I need to develop one? > > I am in the process of learning Perl and still very wet behind the ears. I > am a C/C++ programmer and I have been asked to develop a software program > to automate a process. This process deals with parsing blast > output. Since my background in not in Biology I will not attempt to > paraphrase the request. So here it is in its entirety. > > "Do you know of a program that would allow or facilitate in silico > > Northern (gene expression) analysis? What I would like to do is to take > > a specific gene sequence and blast it against the soybean EST database > > and then extract the hits below a certain E value into *categories* > > (e.g., "roots", "inoculated", etc) based on the cDNA library > > information for the particular EST in the text that accompanies the > > blast output. It could also be done based on the cDNA library # > > associated with the EST hits. This allows a quick determination of > > whether a particular gene, for instance, is root specific in its > > expression, etc. It would be extremely valuable to us in deciding what > > genes to focus on (from a gene family, for instance) for expression > > analysis with real life Northerns. > > I think that such a program would be very valuable for alot of us!" > > Thanking you in advance, > > Ian > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu From federaiko@yahoo.it Wed Feb 27 21:42:53 2002 From: federaiko@yahoo.it (Federico Malusa) Date: Wed, 27 Feb 2002 16:42:53 -0500 Subject: [Bioperl-l] subscribe federaiko@yahoo.it Message-ID: <3C7D52DD.2070206@yahoo.it> _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com From Wiepert.Mathieu@mayo.edu Wed Feb 27 16:14:03 2002 From: Wiepert.Mathieu@mayo.edu (Wiepert, Mathieu) Date: Wed, 27 Feb 2002 10:14:03 -0600 Subject: [Bioperl-l] blastall output error (not in bioperl) Message-ID: <2F41CC6C9777D311ACBD009027B108EA029201A6@excsrv32.mayo.edu> Not sure if anyone cares, but I kept getting an error from a script I was running. The error pointed to a line in my blast output file. I went to the file, and the file was malformed. I reran the blast outside of bioperl, and got the same output file. The error was in the last line of this hsp. >gb|T53734.1|T53734 ya91d12.r3 Stratagene placenta (#937225) Homo sapiens cDNA clone IMAGE:69047 5' similar to similar to gb:M58525 CATECHOL O-METHYLTRANSFERASE, MEMBRANE-BOUND FORM (HUMAN) Length = 217 Score = 120 bits (300), Expect = 2e-26 Identities = 57/69 (82%), Positives = 58/69 (83%) Frame = +1 Query: 27 RHWGWGLCLIGWNEFILQPIHNLLMGDTKEQRILNHVLQHAEPGNAQSVLEAIDTYCEQK 86 RH LCLIGWNEFILQPIHNLLMGDTKEQRILNHVLQH P N QSVLEAI+TY EQ Sbjct: 10 RHXX*XLCLIGWNEFILQPIHNLLMGDTKEQRILNHVLQHXXPXNXQSVLEAINTYXEQX 189 Query: 87 EWAMNVGDK 95 EW MNVGDK Sbjct: 190EWXMNVGDK 216 ^ Adding a space after 190 removed the error. For those interested the error was Use of uninitialized value in concatenation (.) at /home/mxw02/bioperl_latest/lib/site_perl/5.6.0/Bio/SearchIO/blast.pm line 629, line 9401. Maybe this is known, maybe not, must be rare? -Mat From jodoc@ucla.edu Thu Feb 28 02:07:40 2002 From: jodoc@ucla.edu (joe) Date: Wed, 27 Feb 2002 18:07:40 -0800 Subject: [Bioperl-l] RE: Bioperl-l digest, Vol 1 #636 - 3 msgs In-Reply-To: <200202271702.g1RH2ukO012550@pw600a.bioperl.org> Message-ID: <001a01c1bffc$b3813de0$de518e95@medsch.ucla.edu> Hi, FYI, in the bplite.pm module, the example that is given for getting hsp start and stops is incorrect. should be $hsp->subject->end; rather than $hsp->sbjct->end; joe -----Original Message----- From: bioperl-l-admin@bioperl.org [mailto:bioperl-l-admin@bioperl.org]On Behalf Of bioperl-l-request@bioperl.org Sent: Wednesday, February 27, 2002 9:03 AM To: bioperl-l@bioperl.org Subject: Bioperl-l digest, Vol 1 #636 - 3 msgs Send Bioperl-l mailing list submissions to bioperl-l@bioperl.org To subscribe or unsubscribe via the World Wide Web, visit http://bioperl.org/mailman/listinfo/bioperl-l or, via email, send a message with subject or body 'help' to bioperl-l-request@bioperl.org You can reach the person managing the list at bioperl-l-admin@bioperl.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Bioperl-l digest..." Today's Topics: 1. Re: New to BioPerl (Jason Stajich) 2. subscribe federaiko@yahoo.it (Federico Malusa) 3. blastall output error (not in bioperl) (Wiepert, Mathieu) --__--__-- Message: 1 Date: Wed, 27 Feb 2002 10:27:17 -0500 (EST) From: Jason Stajich To: cc: Subject: Re: [Bioperl-l] New to BioPerl use Bio::SearchIO and read the bptutorial on how to extract the information you need -- all the tools you need to do this are in the Bio::SearchIO parser system and Bio::Search objects. I have a script that extracts tissue information for an EST blast see scripts/est_tissue_query.pl in the bioperl repository. That script is trying to do a lot so it might be a little confusing. it's run like % perl scripts/est_tissue_query.pl -r genbank -p 0.00001 -f blast -b MYFILE # you can add -c cache if you want to build a temporary cache so multiple This script has been fixed some just now to use Bio::SearchIO so you probably want the latest CVS version which you can see how to get from http://cvs.bioperl.org I'd suggest that you spend a little time learning bioperl by running through the tutorial and building a simle script that prints out the list of hits from a blast report using our objects. Then I suspect the path will be more obvious. Lots of good discussions of this type on our mailing list archives. -jason On Wed, 27 Feb 2002 holford.5@osu.edu wrote: > Does a Perl script exist for such a job or would I need to develop one? > > I am in the process of learning Perl and still very wet behind the ears. I > am a C/C++ programmer and I have been asked to develop a software program > to automate a process. This process deals with parsing blast > output. Since my background in not in Biology I will not attempt to > paraphrase the request. So here it is in its entirety. > > "Do you know of a program that would allow or facilitate in silico > > Northern (gene expression) analysis? What I would like to do is to take > > a specific gene sequence and blast it against the soybean EST database > > and then extract the hits below a certain E value into *categories* > > (e.g., "roots", "inoculated", etc) based on the cDNA library > > information for the particular EST in the text that accompanies the > > blast output. It could also be done based on the cDNA library # > > associated with the EST hits. This allows a quick determination of > > whether a particular gene, for instance, is root specific in its > > expression, etc. It would be extremely valuable to us in deciding what > > genes to focus on (from a gene family, for instance) for expression > > analysis with real life Northerns. > > I think that such a program would be very valuable for alot of us!" > > Thanking you in advance, > > Ian > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@bioperl.org > http://bioperl.org/mailman/listinfo/bioperl-l > -- Jason Stajich Duke University jason@cgt.mc.duke.edu --__--__-- Message: 2 Date: Wed, 27 Feb 2002 16:42:53 -0500 From: Federico Malusa To: bioperl-l@bioperl.org Subject: [Bioperl-l] subscribe federaiko@yahoo.it _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com --__--__-- Message: 3 From: "Wiepert, Mathieu" To: "'bioperl-l@bioperl.org'" Date: Wed, 27 Feb 2002 10:14:03 -0600 Subject: [Bioperl-l] blastall output error (not in bioperl) Not sure if anyone cares, but I kept getting an error from a script I was running. The error pointed to a line in my blast output file. I went to the file, and the file was malformed. I reran the blast outside of bioperl, and got the same output file. The error was in the last line of this hsp. >gb|T53734.1|T53734 ya91d12.r3 Stratagene placenta (#937225) Homo sapiens cDNA clone IMAGE:69047 5' similar to similar to gb:M58525 CATECHOL O-METHYLTRANSFERASE, MEMBRANE-BOUND FORM (HUMAN) Length = 217 Score = 120 bits (300), Expect = 2e-26 Identities = 57/69 (82%), Positives = 58/69 (83%) Frame = +1 Query: 27 RHWGWGLCLIGWNEFILQPIHNLLMGDTKEQRILNHVLQHAEPGNAQSVLEAIDTYCEQK 86 RH LCLIGWNEFILQPIHNLLMGDTKEQRILNHVLQH P N QSVLEAI+TY EQ Sbjct: 10 RHXX*XLCLIGWNEFILQPIHNLLMGDTKEQRILNHVLQHXXPXNXQSVLEAINTYXEQX 189 Query: 87 EWAMNVGDK 95 EW MNVGDK Sbjct: 190EWXMNVGDK 216 ^ Adding a space after 190 removed the error. For those interested the error was Use of uninitialized value in concatenation (.) at /home/mxw02/bioperl_latest/lib/site_perl/5.6.0/Bio/SearchIO/blast.pm line 629, line 9401. Maybe this is known, maybe not, must be rare? -Mat --__--__-- _______________________________________________ Bioperl-l mailing list Bioperl-l@bioperl.org http://bioperl.org/mailman/listinfo/bioperl-l End of Bioperl-l Digest
href="http://www.123turnkey.com/?cid=G1" > target="_new"> src="http://www.123turnkey.com/images/logo.jpg" > target"=_new"> target="_new"> src="http://www.123turnkey.com/images/tour.gif"> target="_new"> src="http://www.123turnkey.com/images/start.gif"> target="_new"> src="http://www.123turnkey.com/images/questions.gif"> target="_new"> src="http://www.123turnkey.com/images/members.gif">
src="http://www.123turnkey.com/images/topbanner2.gif">