[Bioperl-l] Bio::Seq weight ?
Heikki Lehvaslaiho
heikki at nildram.co.uk
Thu May 13 18:21:16 EDT 2004
Stephan and list,
I got the C++ program used in the SWISS-PROT generation (Thanks to Maria Jesus
Martin!). The differences from Bio::Tools::SeqStats values are tiny. Here are
the current values with the SWISS-PROT values in comments:
my $amino_A_wt = 89.09;
my $amino_C_wt = 121.15;
my $amino_D_wt = 133.1;
my $amino_E_wt = 147.13;
my $amino_F_wt = 165.19;
my $amino_G_wt = 75.07;
my $amino_H_wt = 155.16;
my $amino_I_wt = 131.18; #131.17 + 0.01
my $amino_K_wt = 146.19;
my $amino_L_wt = 131.18; #131.17 + 0.01
my $amino_M_wt = 149.22; #149.21 + 0.01
my $amino_N_wt = 132.12;
my $amino_P_wt = 115.13;
my $amino_Q_wt = 146.15;
my $amino_R_wt = 174.21; #174.20 + 0.01
my $amino_S_wt = 105.09;
my $amino_T_wt = 119.12;
my $amino_U_wt = 168.06;
my $amino_V_wt = 117.15;
my $amino_W_wt = 204.22; #204.23 - 0.01
my $amino_Y_wt = 181.19;
When I apply the new values, output and input molecular weights are identical,
at least in the test file t/data/swiss.dat.
I guess benefits of being able better roundtrip SWISS-PROT entries make it
worth making this change in the repositories. Applied to bioperl-live and 1.4
branch.
-Heikki
On Wednesday 12 May 2004 17:09, Heikki Lehvaslaiho wrote:
> The current parsers do not read it in at all. The value is recreated using
> the Bio::Tools::SeqStats::get-mol_wt() for output.
>
> Is this crucial? What do you need it for? There are so many value sets that
> differ from each other very slightly that the last digit in hardly
> meaningful. If you need to verify the identity, use the checksum.
>
> I guess I could find out what are the amino acid weights used by SWISS-PROT
> programs and replace them in the SeqStats class. Is anyone relying heavily
> on current values?
>
> -Heikki
>
> On Wednesday 12 May 2004 07:08, stephan rosecker wrote:
> > thx,
> > but i use it already.
> > I want the weight from the original entry for example in swissprot the
> > Molecular weight enrty.
> >
> > SQ SEQUENCE 403 AA; 45542 MW; BC433B2D29587383 CRC64;
> > MEELGLATAK VTVTKEASHH READLYQKMK SLESKLDFFN IQEEYIKYEY KNLKRELLHA
> > QEEVKRIRSV PLLIGQLLEM VDSNTGIVQS TSGSTLCVRI LSTIDRELLK PSASVALQRH
> > SNALVDTLPP ESDSSIHLLG ADEKPSESYS DIGGGDIQKQ EMREAVELPL THHNLYKQIG
> > IDPPRGVLLY GPPGTGKTML AKAVAHHTSA AFIRVVGSEF VQKYLGEGPR LVRDVFRLAR
> > ENSPAIIFID EIDAIATKRF DAQTGADREV QRILMELLNQ MDGFDVSVNV KVIMATNRQD
> > TLDPALLRPG RLDRKIEFPL PDRRQKRLIF QVITSKMNLS DEVDLEDYVS RPDKLSGAEI
> > QSICQEAGMH AIRKNRYVIL PKDFEKGYKA SIKKNTHEFN FYN
> >
> >
> > =>(45542 MW) i want this.
> > get_mol_wt() differs.
> >
> >
> >
> > regards,
> > stephan
> >
> > Heikki Lehvaslaiho wrote:
> > > On Tuesday 11 May 2004 19:14, stephan rosecker wrote:
> > >>Hi,
> > >>
> > >>is it possible to get the weight form a Bio::Seq object ?
> > >
> > > This is from the bptutorial script:
> > >
> > > III.3.2 Obtaining basic sequence statistics (SeqStats,SeqWord)
> > >
> > > In addition to the methods directly available in the Seq object,
> > > bioperl provides various helper objects to determine additional
> > > information about a sequence. For example, SeqStats object provides
> > > methods for obtaining the molecular weight of the sequence as well the
> > > number of occurrences of each of the component residues (bases for a
> > > nucleic acid or amino acids for a protein.) For nucleic acids,
> > > SeqStats also returns counts of the number of codons used. For
> > > example:
> > >
> > > use SeqStats;
> > > $seq_stats = Bio::Tools::SeqStats->new($seqobj);
> > > $weight = $seq_stats->get_mol_wt();
> > > $monomer_ref = $seq_stats->count_monomers();
> > > $codon_ref = $seq_stats->count_codons(); # for nucleic acid sequence
> > >
> > > Note: sometimes sequences will contain ambiguous codes. For this
> > > reason, get_mol_wt() returns a reference to a two element array
> > > containing a greatest lower bound and a least upper bound of the
> > > molecular weight.
> > >
> > >
> > > -Heikki
> > >
> > >>regards,
> > >>stephan
> > >>_______________________________________________
> > >>Bioperl-l mailing list
> > >>Bioperl-l at portal.open-bio.org
> > >>http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list