[Bioperl-l] Pfam_Scan
Dave Messina
David.Messina at sbc.su.se
Sat May 1 22:28:48 UTC 2010
Hi Rad,
As far as I can tell the Pfam_Scan output is simply tab-delimited text (see details below), so you should be able to group sequences which share domains by sorting on the sixth column. I suspect that sequences with multiple domain hits will have multiple lines in the output, one per hit, so if you want to identify sequences which share the same _set_ of domains you will have to do the bookkeeping yourself.
That being said, Pfam_Scan is not part of BioPerl — it's distributed by the Pfam team — so you may want to contact them directly for help (pfam-help at sanger.ac.uk).
Dave
[from the Pfam_Scan documentation]
The output format is:
<seq id> <alignment start> <alignment end> <envelope start> <envelope end> <hmm acc> <hmm name> <type> <hmm start> <hmm end> <hmm length> <bit score> <E-value> <significance> <clan> <predicted_active_site_residues>
Example output (with -pfamB, -as options):
Q5NEL3.1 2 224 2 227 PB013481 Pfam-B_13481 Pfam-B 1 184 226 358.5 1.4e-107 NA NA
O65039.1 38 93 38 93 PF08246 Inhibitor_I29 Domain 1 58 58 45.9 2.8e-12 1 No_clan
O65039.1 126 342 126 342 PF00112 Peptidase_C1 Domain 1 216 216 296.0 1.1e-88 1 CL0125 predicted_active_site[150,285,307]
More information about the Bioperl-l
mailing list