[Bioperl-l] not all sequence is created equal (base quality
data)
Paul Gordon
gordonp@niji.imb.nrc.ca
Wed, 27 Jun 2001 20:22:17 -0300 (ADT)
> > This came up because I started playing with pir data and we can eaily make
> > it work except for the fact that some PIR files have quality information
> > about their bases, embedded in the sequence (probably not the best way
> > to do this...)
> >
> > >P1;CCDG
> > cytochrome c - dog (tentative sequence)
> > GDVEKGKKIFVQK(C.A.Q.C.H.T.V.E)KGGKHKTGPNLHGLFGRKTGQAPGFSYTDANKNKGITWGEETLMEYLENP
> > KKYIPGTKMIFAGIKKTGERADLIAYLKKATKE*
> >
> > Looking at their coding table (+) this is oh so much fun to try and code
> > for... I can at least strip out this quality data for now to allow us to
> > read in pir files, but it would be very interesting if we COULD integrate
> > quality data into the sequence object. If we wanted to be able to read in
> > the sequence read quality values.
> >
> >
> > (+)
> > Table II: Punctuation Description in Protein Sequences
<snip />
As near as I can tell, PIR uses the IUPAC notation. It would probably
not be a bad idea to be able to parse IUPAC peptide sequences in general
(though I'm not volunteering right now :-)). The specification can be
found at:
http://www.chem.qmw.ac.uk/iupac/AminoAcid/A2021.html#AA215
________________________________________________________________________
Paul Gordon Paul.Gordon@nrc.ca
Genomic Technologies http://maggie.cbr.nrc.ca
Institute for Marine Biosciences
National Research Council Canada