[Biopython-dev] COMPASS parsing code

James Casbon j.a.casbon at qmul.ac.uk
Tue Apr 27 07:45:20 EDT 2004


Hi,

I have written some code for parsing compass results.  Compass implements 
profile/profile alignment and is available by ftp.  See:

http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12547212
http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=14500884

for more details.

I have attached the code, which you might like to include in the biopython 
distribution. 

There are probably a few issues with the code that could make it better:

* the unit tests use some sample input, file comtest1 and comtest2.  These are 
just read using open.  I have seen someone use test.locate or something like 
that, but I'm not sure how that works.  If you want to enlighten me, I'll 
change it.

* i have used regular expressions inefficiently, as I'm not sure how you're 
supposed to cache them using the _Scanner/_Consumer framework.  At the moment 
each subroutine compiles an re when called, which can't be good.  Again, 
please enlighten me to a better way and I will change it.


regards,
James

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Compass.py
Type: application/x-python
Size: 12778 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20040427/96a80e10/Compass.bin
-------------- next part --------------
Ali1: 60456.blo.gz.aln  Ali2: allscop//14982.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388     filtered_length1=386    length2=116     filtered_length2=115
Nseqs1=399      Neff1=12.972    Nseqs2=1        Neff2=11.313
Smith-Waterman score = 35        Evalue = 1.01e+03

QUERY   178    KKDLEEIAD
               ++ ++++++
QUERY   9      QAAVQAVTA

Ali1: 60456.blo.gz.aln  Ali2: allscop//14983.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388     filtered_length1=386    length2=121     filtered_length2=119
Nseqs1=399      Neff1=12.972    Nseqs2=1        Neff2=11.168
Smith-Waterman score = 35        Evalue = 1.01e+03

QUERY   178    KKDLEEIAD
               ++ ++++++
QUERY   9      REAVEAAVD

Ali1: 60456.blo.gz.aln  Ali2: allscop//14984.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388     filtered_length1=386    length2=145     filtered_length2=137
Nseqs1=399      Neff1=12.972    Nseqs2=1        Neff2=5.869
Smith-Waterman score = 37        Evalue = 5.75e+02

QUERY   371    LEEAMDRMER~~~V
               + ++++ + +   +
QUERY   76     LQNFIDQLDNpddL

Ali1: 60456.blo.gz.aln  Ali2: allscop//15010.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388     filtered_length1=386    length2=141     filtered_length2=141
Nseqs1=399      Neff1=12.972    Nseqs2=1        Neff2=6.099
Smith-Waterman score = 37        Evalue = 5.75e+02

QUERY   163    LIINSP
               ++++++
QUERY   32     LFDAHD


-------------- next part --------------
....Ali1: 60456.blo.gz.aln      Ali2: 60456.blo.gz.aln
Threshold of effective gap content in columns: 0.5
length1=388     filtered_length1=386    length2=388     filtered_length2=386
Nseqs1=399      Neff1=12.972    Nseqs2=399      Neff2=12.972
Smith-Waterman score = 2759      Evalue = 0.00e+00

QUERY   2      LSDRLELVSASEIRKLFDIAAGMKDVISLGIGEPDFDTPQHIKEYAKEALDKGLTHYGPN
               ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY   2      LSDRLELVSASEIRKLFDIAAGMKDVISLGIGEPDFDTPQHIKEYAKEALDKGLTHYGPN


QUERY          IGLLELREAIAEKLKKQNGIEADPKTEIMVLLGANQAFLMGLSAFLKDGEEVLIPTPAFV
               ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY          IGLLELREAIAEKLKKQNGIEADPKTEIMVLLGANQAFLMGLSAFLKDGEEVLIPTPAFV


QUERY          SYAPAVILAGGKPVEVPTYEEDEFRLNVDELKKYVTDKTRALIINSPCNPTGAVLTKKDL
               ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY          SYAPAVILAGGKPVEVPTYEEDEFRLNVDELKKYVTDKTRALIINSPCNPTGAVLTKKDL


QUERY          EEIADFVVEHDLIVISDEVYEHFIYDDARHYSIASLDGMFERTITVNGFSKTFAMTGWRL
               ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY          EEIADFVVEHDLIVISDEVYEHFIYDDARHYSIASLDGMFERTITVNGFSKTFAMTGWRL


QUERY          GFVAAPSWIIERMVKFQMYNATCPVTFIQYAAAKALKDERSWKAVEEMRKEYDRRRKLVW
               ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY          GFVAAPSWIIERMVKFQMYNATCPVTFIQYAAAKALKDERSWKAVEEMRKEYDRRRKLVW


QUERY          KRLNEMGLPTVKPKGAFYIFPRIRDTGLTSKKFSELMLKEARVAVVPGSAFGKAGEGYVR
               ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
QUERY          KRLNEMGLPTVKPKGAFYIFPRIRDTGLTSKKFSELMLKEARVAVVPGSAFGKAGEGYVR


QUERY          ISYATAYEKLEEAMDRMERVLKERKL
               ++++++++++++++++++++++++++
QUERY          ISYATAYEKLEEAMDRMERVLKERKL



More information about the Biopython-dev mailing list