[Bioperl-l] question about positioning peptide in a full protein sequence
Frank Schwach
fs5 at sanger.ac.uk
Mon Feb 21 09:26:21 UTC 2011
Hi Mingwei,
I guess this is MS data for phosphorylation sites? We are doing the same
here. I don't know what software you are using in yuor MS pipeline but
it may already map the peptides to the full-length protein for you. If
not, you probably get peptide sequences with the probabilities of a site
carrying a phosphate (or whatever post-translational modification)
encoded in the string, e.g the data I'm working with will show me
something like "..LKS[0.99]S[0.01]..." to indicate probabilities of 99%
and 1% of those two serines being modified. You then have to extract
that data from the peptide string using a regex. Then you can identifiy
the most probable site within the string and map the peptide string to
the full-length protein sequence using index (or a regex) as Chris
suggested. You can then calculate the position of the actual modified
site from the match position of the peptide and the position of the site
within the peptide. I don't think there is any ready-made solution of
this as it is basically just simply string-matching but please do let me
knof if you are getting stuck and I can help you further.
Cheers,
Frank
On Sun, 2011-02-20 at 20:57 -0600, Chris Fields wrote:
> If this is a direct string match (no ambiguity), just use perl's index function:
>
> index STR,SUBSTR,POSITION
> index STR,SUBSTR
> The index function searches for one string within another, but
> without the wildcard-like behavior of a full regular-expression
> pattern match. It returns the position of the first occurrence
> of SUBSTR in STR at or after POSITION. If POSITION is omitted,
> starts searching from the beginning of the string. POSITION
> before the beginning of the string or after its end is treated
> as if it were the beginning or the end, respectively. POSITION
> and the return value are based at 0 (or whatever you've set the
> $[ variable to--but don't do that). If the substring is not
> found, "index" returns one less than the base, ordinarily "-1".
>
> Also see here:
>
> http://perlmeme.org/howtos/perlfunc/index_function.html
>
> chris
>
> On Feb 20, 2011, at 4:28 PM, Mingwei Min wrote:
>
> > Hi Dave,
> >
> > Thank you for your suggestion. when I said "too comple for this simple
> > job", I just thought that there might be some particular module that
> > could do this straightforwardly. I'll have a try of BLAST anyway.
> > Thank you.
> >
> > Mingwei
> >
> > 2011/2/20 Dave Messina <David.Messina at sbc.su.se>:
> >> Hi Mingwei,
> >> Please remember to "reply all" so others on the mailing list can follow the
> >> conversation.
> >> Unless you have some way of other way of mapping the coordinates of the
> >> sequence with the post-translational sites to the coordinates of the full
> >> sequence, I think you'll have to do a similarity search of some form.
> >> BLAST may not be best for this, given that it's sloppy with the ends of an
> >> alignment, but there are plenty of options for BLAST that may improve your
> >> results. Again, you'll need to be specific about your problem for us to
> >> help. I don't what "too complex for this simple job" means. Is it too slow?
> >> Are you getting too many hits?
> >>
> >>
> >> Dave
> >>
> >>
> >> On Sun, Feb 20, 2011 at 22:35, Mingwei Min <mm809 at cam.ac.uk> wrote:
> >>>
> >>> Hi Dave,
> >>>
> >>> Sorry for not making it clear. Yes, I just want to get the coordinates
> >>> of the post-translational sites out of a protein sequence. And what I
> >>> have now is the peptide sequence with marker on the post-translated
> >>> residue... what should i do to map them to the whole protein sequence
> >>> and get the coordinates? The only way I could come up with is blast.
> >>> But it seems to be too complex for this simple job....
> >>>
> >>> Many thanks,
> >>>
> >>> Mingwei
> >>>
> >>> 2011/2/20 Dave Messina <David.Messina at sbc.su.se>:
> >>>> Hi Mingwei,
> >>>> I'm not sure what you mean by "positioning" here. Do you want to get the
> >>>> coordinates of the post-translational sites out of a protein sequence
> >>>> database record? Or do you want to draw the post-translational sites on
> >>>> a
> >>>> picture of the protein sequence? Or something else entirely?
> >>>>
> >>>> Dave
> >>>>
> >>>>
> >>>>
> >>>> On Sat, Feb 19, 2011 at 15:53, Mingwei Min <mm809 at cam.ac.uk> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I am trying to positioning some post-tranlational modification sites,
> >>>>> which is marked in peptides, in a full length protein sequence. Anyone
> >>>>> would be kind to tell me the model I could use for this?
> >>>>>
> >>>>> Many thanks
> >>>>>
> >>>>> Mingwei
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>
> >>
> >
> >
> >
> > --
> > Mingwei Min PhD student
> > University of Cambridge
> > Department of Genetics
> > Downing St
> > CB2 3EH
> > UK
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioperl-l
mailing list