[EMBOSS] EMBOSS for protein alignment stats

Iddo Friedberg idoerg at gmail.com
Thu May 30 18:10:43 UTC 2019


infoalign should give you what you want. It does not do the summary
statistics, but for each sequence it gives the alignment length and the %ID
(note that  %ID can mean several things!) You can then programmatically
parse whose numbers to calculate mean and standard deviation.
http://emboss.sourceforge.net/apps/cvs/emboss/apps/infoalign.html#output.8

Iddo

On Thu, May 30, 2019 at 12:48 PM Anandkumar Surendrarao <aksrao at ucdavis.edu>
wrote:

> Greetings EMBOSS users!
>
> I have ~ 18000 files, each with clustal formatted protein alignments
> derived from Pfam-A.full.
> Some of these files are large > 500MB in size, the largest alignment is
> 3GB!
>
> I need to calculate the following alignment statistics
> A. average aligned length
> B. std. dev. of aligned length
> C. average of pairwise sequence ID %
> D. std. dev. of pairwise sequence ID %
>
> Here are my 2 problems that I seek help with:
> 1. I can calculate A and C using alistat that comes with UBUNTU, but not B
> or D.
> 2. For the really large alignments, there  is no option due to RAM
> requirements, and so I've used alistat's -f  (fast) option, which estimates
> average %id by "sampling"
>
> If EMBOSS has tools / tricks to report A - D, while having reasonable RAM
> and disk-usage footprints, and quick processing times, please let me know.
>
> I am open to suggestions regarding other tools as well.
> I look forward to your replies. Thanks, in advance.
>
> Sincerely,
> Anand
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at mailman.open-bio.org
> https://mailman.open-bio.org/mailman/listinfo/emboss



-- 
Iddo Friedberg
http://iddo-friedberg.net/contact.html
++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.>
++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----.
.>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>>
>>----.<--.>++++++.<<<<------------------------------------.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/emboss/attachments/20190530/f1951f52/attachment.htm>


More information about the EMBOSS mailing list