[Bioperl-l] Re: Bioperl-l Digest, Vol 1, Issue 1 (Eric E. Snyder: incommunicado)

Wed May 7 09:22:08 EDT 2003

Dr. Snyder will be unavailable for comment until May 26.

>>> bioperl-l 05/07/03 08:03 >>>

Send Bioperl-l mailing list submissions to
	bioperl-l at bioperl.org

To subscribe or unsubscribe via the World Wide Web, visit
	http://pw600a.bioperl.org/mailman/listinfo/bioperl-l
or, via email, send a message with subject or body 'help' to
	bioperl-l-request at bioperl.org

You can reach the person managing the list at
	bioperl-l-owner at bioperl.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioperl-l digest..."

Today's Topics:

   1. Re: A pattern problem in Perl (Peter Wilkinson)
   2. Bio::DB::GenBank->get_Stream_by_query (Josh Lauricha)
   3. An unsolved problem regarding RemoteBlast? (Joao Magalhaes)
   4. locuslink.pm questions (for Keith Ching/Hilmar ?)
      (Andrew Macgregor)
   5. Taxonomy  and get_taxaid (anthony.underwood at hpa.org.uk)
   6. problem with Bio::SeqIO->write_seq (Prachi Shah)
   7. Re: Miller-Myers Algorithm (Yee Man)
   8. module  (mlobo)
   9. Bioperl and ClustalW/Clustal (Hrishi)
  10. Scale with Bio::Grahics for upstream regions (Joao Magalhaes)
  11. Re: Error running BPbl2seq (BHurwitz at twt.com)

----------------------------------------------------------------------

Message: 1
Date: Thu, 01 May 2003 10:39:25 -0400
From: Peter Wilkinson <pwilk at videotron.ca>
Subject: Re: [Bioperl-l] A pattern problem in Perl
To: bioperl-l at bioperl.org
Message-ID: <5.2.0.9.0.20030501103850.00b8d488 at pop.videotron.ca>
Content-Type: text/plain; charset=us-ascii; format=flowed

Here ya go,

(?:^.*?=\s)(\d+)(?:\/.*?\()(\d+)

This will do the following:

(?:^.*?=\s) matches (but ignores) 'Identities = '
(\d+) matches the numerator of the fraction  PUTS IT INTO $1
(?:\/.*?\() matches ('/135 ('
(\d+) matches the percentage PUTS IT INTO $2

Here is another,

(\d+)(?:\s\()(\d+)

I like the first one because it handles the line in its entirety. Its 
stricter and easy to read. Non-capturing parts I don't want are clear,
and 
the parts I want are clear.

If you wanted to have some fun and get the rest of the data, then

(?:^.*?=\s)(\d+)(?:\/)(\d+)(?:.*?\()(\d+)(?:.*?\s)(\d+)(?:\/)(\d+)(?:.*?\()(\d+)

you get:

$1 = 124 # num
$2 = 135 # denom
$3 = 91  # percent
$4 = 2   # num
$5 = 135 # denom
$6 = 1   # percent

Peter W

At 11:12 AM 01/05/2003 +0800, you wrote:
>Hello,
>      I encountered a problem. I try to grap the number from txt files
>(bl2seqs report)
>Those lines read as A:"Identities = 124/135 (91%), Gaps = 2/135 (1%)"
or
>just B:"Identities = 124/135 (91%)". Both types coexists.
>I wrote the pattern matching script as "$judgecont=~m/Identities =
(.*)\/.*?
>\((.*)%\)/;"
>My purpose is to grap the $1--(the numerator of fraction) and $2 (the
>percentage of Identities not Gaps)
>However according to the greed principle, I always gain the
"1"(percentage
>of gaps) as in A situation.
>Any suggestion or guide?  Thank you very much!
>                       Regards
>                                                Darson 2003/05/01
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at bioperl.org
>http://bioperl.org/mailman/listinfo/bioperl-l

-------------------------------------
Peter Wilkinson
Bioinformatics Consultant

-------------------------------------  

------------------------------

Message: 2
Date: Thu, 1 May 2003 14:10:00 -0700
From: Josh Lauricha <laurichj at bioinfo.ucr.edu>
Subject: [Bioperl-l] Bio::DB::GenBank->get_Stream_by_query
To: bioperl-l at bioperl.org
Message-ID: <20030501211000.GB20466 at bioinfo.ucr.edu>
Content-Type: text/plain; charset=us-ascii

I am trying to fetch seqs from GenBank using the get_Stream_by_query(),
however the following code gives me an error:

#!/usr/bin/perl -w
use Bio::DB::GenBank;
use Bio::SeqIO;
use strict;

my $gb     = new Bio::DB::GenBank;
my $seqin  = new Bio::SeqIO(-format => 'efa');
my $seqout = new Bio::SeqIO(-format => 'efa');

my $seqio = $gb->get_Stream_by_query('Oryza sativa[Organism] AND EST');

while( my $seq =  $seqio->next_seq ) {
	          print "seq length is ", $seq->length,"\n";
}

The error is:

Warning(s) from GenBank: 
                <FieldNotFound>Organism</FieldNotFound>

Any thoughts? Tying the query into www.ncbi.nlm.nih.gov gives the
correct results.

Just in case there is another, better way to do this I am trying to find
the GI for a seqence. However, I am limited to basically the TIGR
accession number, a displayname similar to the swissprot name (aka:
something like AtFUT1, where FUT1_ARATH is the swprot name.

My plan was to do a query supplying all the info I have. This returns
numerous results, then I'd compare the sequences and look for a match.

Is there a better way to do this? There are quite a few seqs so I'd
rather not do it by hand.

Thanks,
Josh Lauricha

------------------------------

Message: 3
Date: Fri, 02 May 2003 02:47:01 +0200
From: Joao Magalhaes <joao.magalhaes at fundp.ac.be>
Subject: [Bioperl-l] An unsolved problem regarding RemoteBlast?
To: bioperl-l at bioperl.org
Message-ID: <5.1.0.14.2.20030502023553.00abc100 at pop.skynet.be>
Content-Type: text/plain; charset="us-ascii"; format=flowed

Hi!

I've been using RemoteBlast with a fasta file which has several
sequences. 
My problem is how do I distinguish the sequences once I retrieve the 
results. An obvious answer would be query_name, but I'm having problems 
making it work. Basically, it doesn't return anything and the blast file

doesn't have a query name either. I tried to use different fasta
headers, 
but the problem persists. Anyone can help?

I searched the list's archives and found a similar problem dating to
June 2002:
http://bioperl.org/pipermail/bioperl-l/2002-June/008208.html

And I found the exact same problem in November of 2002:
http://bioperl.org/pipermail/bioperl-l/2002-November/010255.html

Only no solution was presented at the time. Has anyone thought of a
solution?

Alternatively, I can search one sequence at a time, but I'm sure there's
a 
smarter way of BLASTing multiple sequences.

Thanking you.

Joao Pedro de Magalhaes

The University of Namur (FUNDP)
Research Unit on Cellular Biology (URBC)
Rue de Bruxelles, 61. B-5000 Namur. Belgium.

Fax: + 32 81 724135
Phone: + 32 81 724133
Website on Aging: http://www.senescence.info
Reason's Triumph: http://www.jpreason.com

------------------------------

Message: 4
Date: Fri, 2 May 2003 15:20:59 +1200
From: Andrew Macgregor <andrew at anatomy.otago.ac.nz>
Subject: [Bioperl-l] locuslink.pm questions (for Keith Ching/Hilmar ?)
To: bioperl-l at bioperl.org
Message-ID: <1861B686-7C4D-11D7-A8DB-00039399CEDC at anatomy.otago.ac.nz>
Content-Type: text/plain; charset=US-ASCII; format=flowed

Hi,

I am using the locuslink.pm with SeqIO to read the LL_tmpl file and was 
just wondering about data that are left out.

1. Looking at locusid 1 and 2. The first record begins:

 >>1
LOCUSID: 1
LOCUS_CONFIRMED: yes
LOCUS_TYPE: gene with protein product, function known or inferred
ORGANISM: Homo sapiens
STATUS: REVIEWED
NM: NM_130786|21071029|na
NP: NP_570602|21071030
CDD: Immunoglobulin C-2 Type|smart00408|104|na|4.466900e+01
PRODUCT: alpha 1B-glycoprotein
ASSEMBLY: AF414429,AK055885,AK056201

The second record begins:

 >>2
LOCUSID: 2
LOCUS_CONFIRMED: yes
LOCUS_TYPE: gene with protein product, function known or inferred
ORGANISM: Homo sapiens
STATUS: REVIEWED
NM: NM_000014|6226959|na
NP: NP_000005|4557225
CDD: Ependymins|EPEND|86|na|3.773540e+01
CDD: Alpha-2-macroglobulin family|pfam00207|2501|na|9.679920e+02
CDD: Alpha-2-macroglobulin family N-terminal 
region|pfam01835|1889|na|7.322500e+02
PRODUCT: alpha 2 macroglobulin precursor
ASSEMBLY: M11313

In the first record the CDD line doesn't appear to be stored at all. In 
the second record the first CDD line is stored ie the one beginning 
with "Ependymins" but the other two are not. I was wondering about the 
logic behind this.

2. Also wondering about things like LOCUS_CONFIRMED and STATUS and why 
these are left out.

3. Lastly, and this is very nit-picking ;) but thought you might want 
it pointed out for consistency...

In Bio::Annotation::OntologyTerm the as_text() method does something 
different than it says it will, and different to other as_text() 
methods. ie it doesn't print "Value: "

=head2 as_text

  Title   : as_text
  Usage   : my $text = $obj->as_text
  Function: return the string "Name: $v" where $v is the name of the
term
  Returns : string
  Args    : none

=cut

sub as_text{
    my ($self) = @_;

    return $self->tagname()."|".$self->name()."|".$self->identifier();
}

-- Andrew.

------------------------------

Message: 5
Date: Fri, 2 May 2003 18:31:37 +0100
From: anthony.underwood at hpa.org.uk
Subject: [Bioperl-l] Taxonomy  and get_taxaid
To: bioperl-l at bioperl.org
Message-ID: <TFSAAFTE at hpa.org.uk>
Content-Type: text/plain; charset=ISO-8859-1

Hi All,

I'm getting a problem with the taxonomy module when I write this script:

use Bio::DB::Taxonomy;
my $db=new Bio::DB::Taxonomy(-source=>'entrez');
my $taxaid=$db->get_taxaid("Salmonella");
my $species=$db->get_Taxonomy_Node(-taxaid=>$taxaid);
print $species->species->genus;

The error I get is Can't call method "children" on an undefined value at
/usr/lib/perl5/site_perl/5.6.1/Bio/DB/Taxonomy/entrez.pm line 167.
I think this is when the get_taxaid routine is called

Can anyone advise on this. Thanks in advance,

Anthony

Dr Anthony Underwood
Bioinformatics Unit
Central Public Health Laboratory
Health Protection Agency
61 Colindale Avenue
London
NW9 5HT
t:    0208 2004400 ext. 3618
f:    0208 3583138
e: anthony.underwood at hpa.org.uk

**************************************************************************
The information contained in the EMail and any attachments is
confidential
and intended solely and for the attention and use of the named
addressee(s).
It may not be disclosed to any other person without the express
authority of
the HPA, or the intended recipient, or both. If you are not the
intended
recipient, you must not disclose, copy, distribute or retain this
message or
any part of it.

For information on how to send data to the HPA in encrypted form via
E.Mail, visit www.HPA.org.uk.

This footnote also confirms that this EMail has been swept for computer
viruses, but please re-sweep any attachments before opening or saving.

HTTP://www.HPA.org.uk
**************************************************************************

------------------------------

Message: 6
Date: Fri, 2 May 2003 11:28:06 -0700 (PDT)
From: Prachi Shah <prachi_shroff at yahoo.com>
Subject: [Bioperl-l] problem with Bio::SeqIO->write_seq
To: bioperl <bioperl-l at bioperl.org>
Message-ID: <20030502182806.11554.qmail at web41102.mail.yahoo.com>
Content-Type: text/plain; charset=us-ascii

Hi,

I was playing around with the Bio::Index::Fasta
module, but got the following error. I index a Fasta
formatted sequence file and then try to search for
sequences with the fetch function.
I copied the code from the documentation of
Bio::Index::Fasta. The only difference is that the
example code in the documentation prints to STDIO and
I wanted it to write to a file. I have a very silly
mistake. I will really appreciate any help.

thanks,
Prachi.

------------- EXCEPTION  -------------
MSG: Did not provide a valid Bio::PrimarySeqI object
STACK Bio::SeqIO::fasta::write_seq
C:/Perl/site/lib/Bio\SeqIO\fasta.pm:166
STACK toplevel intersectionsequence-cmd.pl:367

--------------------------------------

### snippet of code ###
my $inx1 = Bio::Index::Fasta->new(
'-filename' => $Index_File_Name1,
'-write_flag' => 1);
my $out = Bio::SeqIO->new('-format' => 'Fasta','-file'
=> ">$clusterfile");
my $seq = $inx1->fetch($r);
$out->write_seq($seq);
################

__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

------------------------------

Message: 7
Date: Fri, 2 May 2003 16:01:15 -0700 (PDT)
From: Yee Man <ymc at paxil.stanford.edu>
Subject: [Bioperl-l] Re: Miller-Myers Algorithm
To: wrp at virgina.edu, amackey at virginia.edu
Cc: bioperl-l at bioperl.org
Message-ID:
	<Pine.GSO.3.96.1030502154214.20346S-100000 at halogen.stanford.edu>
Content-Type: TEXT/PLAIN; charset=US-ASCII

Hi, Dr Pearson and Aaron,

>This report is unfortunately misleading.  When the "ssearch34" program
>compares two sequences and produces the optimal Smith-Waterman 
>alignment, it does the comparison twice, once to get the score and a
>second time to do the alignment.
>The score calculation uses the Phil Green optimization, while the 
>alignment calculation uses Hirshberg/Miller/Myers.

I have implemented another version of Miller-Myers algorithm that does
almost two passes (find one pair of end points and then go back to find
the starting points. Is there a better way?) to find the end points and
then do Miller-Myers global alignment to align the subsequences bounded
by
the end points. This should be equivalent to ssearch in terms of purpose

I believe my statement was misleading as far as space consumption is
concerned because I didn't state that ssearch uses memory space to do
other things. I am sorry if that causes any inconvenience. However, I
still stand by the statements related to speed because I did mention it
took four passes to calculate a local alignment and I stated clearly
that
your time should be halved before any comparison to my program.

>Thus, the correct timing comparison would be 13.5 sec for ssearch (Phil

>Green score) vs 38 sec for Miller/Myers.  This is exactly what is
>expected - a non-Phil Green Smith-Waterman would be about 50% slower
> (~20 sec) and Miller/Myers/Hirshberg does two of these full
> Smith-Waterman's to produce an alignment (~40 sec expected).  ssearch
> may take a bit longer than this because of the other things it is
doing
> - statistics, alignment summary, etc.

>Bill Pearson

You can download my code at 
http://www.stanford.edu/~yeeman/dsw.tgz

This is a perl XS implementation. It has both my code (in linspc.c) and
Phil Green's code (inside pgreen directory) I extracted from ssearch
source code. The memory usage now becomes 6.5MB vs 8.5MB and 28 sec vs
56
sec in my machine. I removed lots of "junk" in that Phil Green code but
it
still underperform my code. In fact, Aaron checked my Phil Green library
before. He thinks my library is correct but he said it was slower than
he
thought. He told me he will look into that but I never heard from him
about this since January.

I believe Phil Green's code should be theoretically faster (although I
couldn't find his paper/documentation). There may be something wrong in
my
Phil Green library I don't notice. It would be great if Aaron can point
that out.

Thanks a lot.
Yee Man

------------------------------

Message: 8
Date: Sat, 03 May 2003 15:56:18 -0400
From: mlobo <mlobo at Princeton.EDU>
Subject: [Bioperl-l] module 
To: bioperl-l at bioperl.org
Message-ID: <3EB41EE2.439A243E at princeton.edu>
Content-Type: text/plain; charset=us-ascii

Is there any module to characterize all intergenic regions of S.
cerevisiae ORFs into tandem, convergent, and divergent regions? What
might be an efficient approach to do this?
                       Thanks,
                        Mark

------------------------------

Message: 9
Date: Sat, 03 May 2003 18:03:07 -0400
From: Hrishi <hdeshmuk at gmu.edu>
Subject: [Bioperl-l] Bioperl and ClustalW/Clustal
To: bioperl-l at bioperl.org
Message-ID: <0535E2E8-7DB3-11D7-81DF-000393640F7A at gmu.edu>
Content-Type: text/plain; charset=US-ASCII; format=flowed

Hi All,

Is there any way to use bioperl to color the conserved sites found by 
doing a multiple sequence alignment using Clustalw/Clustal.

Thanks in advance.

Hrishi

------------------------------

Message: 10
Date: Sun, 04 May 2003 00:20:12 +0200
From: Joao Magalhaes <joao.magalhaes at fundp.ac.be>
Subject: [Bioperl-l] Scale with Bio::Grahics for upstream regions
To: bioperl-l at bioperl.org
Message-ID: <5.1.0.14.2.20030504001545.02aef4c0 at pop.skynet.be>
Content-Type: text/plain; charset="us-ascii"; format=flowed

Hi!

I'm using Bio::Graphics to display putative transcription factor binding

sites upstream of the ORF. My problem is that I need to show negative 
values in regard to the distance to the ORF. How can I draw a scale with

negative values? For the moment I draw a scale like this:

my $full_length = Bio::SeqFeature::Generic->new(-start=>1,-end=>1000, 
-seq_id=>"$query");
$panel->add_track($full_length,
                 -glyph   => 'arrow',
                 -tick    => 1,
                 -fgcolor => 'black',
                 );

But I want start to be "-1000" and end "-1". Only if I put negative
values, 
the scale disappears. Any suggestions?

Thanks.

Joao Magalhaes (joao.magalhaes at fundp.ac.be)

Website on Aging: http://www.senescence.info
Reason's Triumph: http://www.jpreason.com

------------------------------

Message: 11
Date: Mon, 5 May 2003 14:06:33 -0500
From: BHurwitz at twt.com
Subject: Re: [Bioperl-l] Error running BPbl2seq
To: mikhail at ibioinformatics.org
Cc: bioperl-l at bioperl.org
Message-ID: <OFC6C26D12.C7972745-ON86256D1D.0068911E at twt.com>
Content-Type: text/plain; charset=us-ascii

Hi Mikhail,

Did you ever figure this out?  I just got the same message.  I am
running
RedHat Linux 7.0, so I don't think it is caused by Debian.  I seem to
get
this error on a sequence where no hits are returned from bl2seq.  The
wierd
part is that it seemed to work fine on a sequence just before it that
also
did not have any hsps.   Probably related to the other errors that folks
have reported with "no hits" in BPbl2seq.pm?

-Bonnie

|---------+----------------------------->
|         |           "Mikhail Esteves" |
|         |           <mikhail at ibioinfor|
|         |           matics.org>       |
|         |           Sent by:          |
|         |           bioperl-l-bounces@|
|         |           bioperl.org       |
|         |                             |
|         |                             |
|         |           04/10/2003 12:52  |
|         |           AM                |
|         |           Please respond to |
|         |           mikhail           |
|         |                             |
|---------+----------------------------->

>-------------------------------------------------------------------------------------------------------------------------------|
  |                                                                     
                                                         |
  |        To:      <bioperl-l at bioperl.org>                             
                                                         |
  |        cc:                                                          
                                                         |
  |        Subject: [Bioperl-l] Error running BPbl2seq                  
                                                         |

>-------------------------------------------------------------------------------------------------------------------------------|

Hello,

Just installed BioPerl and when trying a sample BPbl2seq query, I got
the
following message:

Can't call method "nextHSP" on unblessed reference at
/usr/share/perl5/Bio/Tools/BPbl2seq.pm line 236

I am using Debian and have installed Bioperl through apt. Is there a
problem with the Debian release? Is there anything I can do to avoid
this
error?

Thanks in advance.

Regards,
Mikhail

_______________________________________________
Bioperl-l mailing list
Bioperl-l at bioperl.org
http://bioperl.org/mailman/listinfo/bioperl-l

------------------------------

_______________________________________________
Bioperl-l mailing list
Bioperl-l at bioperl.org
http://pw600a.bioperl.org/mailman/listinfo/bioperl-l

End of Bioperl-l Digest, Vol 1, Issue 1
***************************************