[Bioperl-l] Bio::Seq weight ?

Heikki Lehvaslaiho heikki at nildram.co.uk
Thu May 13 18:21:16 EDT 2004


Stephan and list,

I got the C++ program used in the SWISS-PROT generation (Thanks to Maria Jesus 
Martin!). The differences from Bio::Tools::SeqStats values are tiny. Here are 
the current values with the SWISS-PROT values in comments:

    my $amino_A_wt = 89.09;
    my $amino_C_wt = 121.15;
    my $amino_D_wt = 133.1;
    my $amino_E_wt = 147.13;
    my $amino_F_wt = 165.19;
    my $amino_G_wt = 75.07;
    my $amino_H_wt = 155.16;
    my $amino_I_wt = 131.18; #131.17 + 0.01
    my $amino_K_wt = 146.19;
    my $amino_L_wt = 131.18; #131.17 + 0.01
    my $amino_M_wt = 149.22; #149.21 + 0.01
    my $amino_N_wt = 132.12;
    my $amino_P_wt = 115.13;
    my $amino_Q_wt = 146.15;
    my $amino_R_wt = 174.21; #174.20 + 0.01
    my $amino_S_wt = 105.09;
    my $amino_T_wt = 119.12;
    my $amino_U_wt = 168.06;
    my $amino_V_wt = 117.15;
    my $amino_W_wt = 204.22; #204.23 - 0.01
    my $amino_Y_wt = 181.19;

When I apply the new values, output and input molecular weights are identical, 
at least in the test file t/data/swiss.dat.

I guess benefits of being able better roundtrip SWISS-PROT entries make it 
worth making this change in the repositories. Applied to bioperl-live and 1.4 
branch.

	-Heikki


On Wednesday 12 May 2004 17:09, Heikki Lehvaslaiho wrote:
> The current parsers do not read it in at all. The value is recreated using
> the Bio::Tools::SeqStats::get-mol_wt() for output.
>
> Is this crucial? What do you need it for? There are so many value sets that
> differ from each other very slightly that the last digit in hardly
> meaningful. If you need to verify the identity, use the checksum.
>
> I guess I could find out what are the amino acid weights used by SWISS-PROT
> programs and replace them in the SeqStats class. Is anyone relying heavily
> on current values?
>
> 	-Heikki
>
> On Wednesday 12 May 2004 07:08, stephan rosecker wrote:
> > thx,
> > but i use it already.
> > I want the weight from the original entry for example in swissprot the
> >   	Molecular weight enrty.
> >
> > SQ   SEQUENCE   403 AA;  45542 MW;  BC433B2D29587383 CRC64;
> >       MEELGLATAK VTVTKEASHH READLYQKMK SLESKLDFFN IQEEYIKYEY KNLKRELLHA
> >       QEEVKRIRSV PLLIGQLLEM VDSNTGIVQS TSGSTLCVRI LSTIDRELLK PSASVALQRH
> >       SNALVDTLPP ESDSSIHLLG ADEKPSESYS DIGGGDIQKQ EMREAVELPL THHNLYKQIG
> >       IDPPRGVLLY GPPGTGKTML AKAVAHHTSA AFIRVVGSEF VQKYLGEGPR LVRDVFRLAR
> >       ENSPAIIFID EIDAIATKRF DAQTGADREV QRILMELLNQ MDGFDVSVNV KVIMATNRQD
> >       TLDPALLRPG RLDRKIEFPL PDRRQKRLIF QVITSKMNLS DEVDLEDYVS RPDKLSGAEI
> >       QSICQEAGMH AIRKNRYVIL PKDFEKGYKA SIKKNTHEFN FYN
> >
> >
> > =>(45542 MW) i want this.
> > get_mol_wt() differs.
> >
> >
> >
> > regards,
> > stephan
> >
> > Heikki Lehvaslaiho wrote:
> > > On Tuesday 11 May 2004 19:14, stephan rosecker wrote:
> > >>Hi,
> > >>
> > >>is it possible to get the weight form a Bio::Seq object ?
> > >
> > > This is from the bptutorial script:
> > >
> > > III.3.2 Obtaining basic sequence statistics (SeqStats,SeqWord)
> > >
> > > In addition to the methods directly available in the Seq object,
> > > bioperl provides various helper objects to determine additional
> > > information about a sequence.  For example, SeqStats object provides
> > > methods for obtaining the molecular weight of the sequence as well the
> > > number of occurrences of each of the component residues (bases for a
> > > nucleic acid or amino acids for a protein.)  For nucleic acids,
> > > SeqStats also returns counts of the number of codons used.  For
> > > example:
> > >
> > >   use SeqStats;
> > >   $seq_stats  = Bio::Tools::SeqStats->new($seqobj);
> > >   $weight = $seq_stats->get_mol_wt();
> > >   $monomer_ref = $seq_stats->count_monomers();
> > >   $codon_ref = $seq_stats->count_codons();  # for nucleic acid sequence
> > >
> > > Note: sometimes sequences will contain ambiguous codes.  For this
> > > reason, get_mol_wt() returns a reference to a two element array
> > > containing a greatest lower bound and a least upper bound of the
> > > molecular weight.
> > >
> > >
> > > 	-Heikki
> > >
> > >>regards,
> > >>stephan
> > >>_______________________________________________
> > >>Bioperl-l mailing list
> > >>Bioperl-l at portal.open-bio.org
> > >>http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________


More information about the Bioperl-l mailing list