[EMBOSS] about pepstats
Qiang Tu
qtu at sibs.ac.cn
Tue Nov 11 09:31:57 UTC 2003
---Original Message---
From: Peter Rice
Subject: Re: [EMBOSS] about pepstats
>Qiang Tu wrote:
>> I want to calculate the molecular weight and isoelectric point of all sequences in genbank. >
>Those are protein properties, so I assume you mean some other database?
I am sorry I mean genbank's protein nr database.
>> Scripts are too slow and I want to use the programs in EMBOSS directly.
>> There are two questions:
>> 1. iep supports multi sequences in one file, but pepstats only output the result of the first sequence. >> Is that true?
>
>Yes - some programs in EMBOSS can work over all sequences in a database,
>or all sequences in a file or in a list of sequences. It depends on how
>useful such output is (for example, whether any EMBOSS user has asked
>for such an extension).
>
>It also depends on how easily a GUI or Web interface can cope with
>multiple outputs - they need to automatically match the outputs to each
>input sequence.
I hope pepstats can deal with multiple sequences in one file, so I do not have to call the program by the script, which is slower than the program it self. :-)
>> 2. how ca I make iep and pepstats output a simple result? I jus want the mw and pI. :-)
>
>That is something we are working on for a future release. Best to wait
>for a general solution.
>
>Meanwhile, you can use perl or some other scripts to extract the numbers
>you need from the output, and to run over all sequences.
>
>But, if you are programming - you could change the programs (we can
>help) to produce the output you want or to read sets of sequences.
Parsing output by perl is relatively fast, while calculation by script is much slower than pepstats. I read the source code, but it was difficult for me to understand. :-)
Thank you for your help.
Best regards,
Qiang Tu
More information about the EMBOSS
mailing list