[EMBOSS] about pepstats

Qiang Tu qtu at sibs.ac.cn
Tue Nov 11 09:31:57 UTC 2003


---Original Message--- 
From: Peter Rice 
Subject: Re: [EMBOSS] about pepstats 
  
>Qiang Tu wrote: 
>> I want to calculate the molecular weight and isoelectric point of all sequences in genbank.  >
>Those are protein properties, so I assume you mean some other database?  

I am sorry I mean genbank's protein nr database.
 
>> Scripts are too slow and I want to use the programs in EMBOSS directly.  
>> There are two questions: 
>> 1. iep supports multi sequences in one file, but pepstats only output the result of the first sequence. >>  Is that true?  
> 
>Yes - some programs in EMBOSS can work over all sequences in a database, 
>or all sequences in a file or in a list of sequences. It depends on how 
>useful such output is (for example, whether any EMBOSS user has asked 
>for such an extension). 
> 
>It also depends on how easily a GUI or Web interface can cope with 
>multiple outputs - they need to automatically match the outputs to each 
>input sequence.  

I hope pepstats can deal with multiple sequences in one file, so I do not have to call the program by the script, which is slower than the program it self.  :-)

 
>> 2. how ca I make iep and pepstats output a simple result? I jus want the mw and pI. :-) 
> 
>That is something we are working on for a future release. Best to wait 
>for a general solution. 
> 
>Meanwhile, you can use perl or some other scripts to extract the numbers 
>you need from the output, and to run over all sequences. 
> 
>But, if you are programming - you could change the programs (we can 
>help) to produce the output you want or to read sets of sequences. 

Parsing output by perl is relatively fast, while calculation by script is much slower than pepstats. I read the source code, but it was difficult for me to understand. :-)

Thank you for your help.

Best regards,

Qiang Tu




More information about the EMBOSS mailing list