[Bioperl-l] search for only C-terminal degenerate motifs
Aaron J Mackey
ajm6q at virginia.edu
Sun Oct 12 09:58:51 EDT 2003
Since you're looking for exact matches (in a defined location, no less),
why do you need BLAST, or any bioinfomatics tool? Doesn't simple string
comparison, or regexp matching get you what you need with minimal fuss?
-Aaron
On Sun, 12 Oct 2003, Lucas Carey wrote:
> On Fri, Oct 10, 2003 at 04:08:26PM -0400, Aaron J. Mackey wrote:
> > Yeah, the problem is you'd really like to *not* search all of the
> > database (for both speed and statistical reasons), only the first n
> > C-terminal residues of each sequence in the database. Both BLAST and
> Search time isn't an issue, I'll be using mpiBLAST on a 128cpu cluster. I don't think the statistics will matter, because I'm just looking for genes to test biologically, not to draw any conclusions from the db search alone.
>
> > But you said "motif" - are you trying to find:
> >
> > a) exact matches to a given short sequence
> exact matches to one of ~7 sequences located at : (C-3) - (C-2) - (C-1) - C
> The query file would just be a FASTA file with 7 4aa query sequences.
>
> > b) exact matches to a consensus regular expression (e.g.: CX[S|T]C)
> is this possible? can i search for [R|K][D|E] if I wanted to search for a negativly charged aa that follows a postitivly charged one?
>
> My thought, because I'm fairly competent with perl but have never used bioperl before, was to do this:
>
> >gi|17647257|ref|NP_523617.1| Chitinase 1 [Drosophila melanogaster]
> gi|37078008|sp|Q9W5U3|CHI1_DROME Probable chitinase 1
> Length = 508
>
> Score = 14.6 bits (27), Expect = 9048
> Identities = 4/4 (100%), Positives = 4/4 (100%)
>
> Query: 1 AGDK 4
> AGDK
> Sbjct: 226 AGDK 229
>
> If my signal sequence is AGDK. I want to look for matches where XXX in
> 'Length = XXX' is equal to XXX in 'Sbjct: nnn AGDK XXX'.
>
> I would not mind using this as an excuse to learn bioperl, assuming that there is a reasonable straight forward way to go about doing this in bioperl. In the BioSearch::IO HOWTO I see end('hit') and length('query') but no length('db_sequence') or something like that. Does this method exist?
> I could do
> if( (end('hit') == length('db_sequence')) && (matches('query')[0] == 4)) { print query_description ;}
> -Lucas
> thank you for your assistance
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Aaron J Mackey
Pearson Laboratory
University of Virginia
(434) 924-2821
amackey at virginia.edu
More information about the Bioperl-l
mailing list