[Bioperl-l] Reciprocal best hits using Bioperl?

Tristan Lefebure tristan.lefebure at gmail.com
Mon Jan 18 01:36:38 UTC 2010


On Sunday 17 January 2010 18:59:05 Jason Stajich wrote:
> yes - but mcl alone is something slightly different in
>  that it doesn't   correct for inparalogs, but for
>  incomplete genomes this is probably okay.

interestingly, my experience with not too divergent 
bacterial genomes (same genera) does not support the 
normalization used in the orthoMCL (which, as far as I 
understand, is a standardization of the -Log10(evalue) per 
taxa combination, including a taxa with itself). MCL, which 
does not do any normalization (just -Log10(evalue)) gives 
about the same number of false negative (i.e. missed 
orthologs), but a lot less false positive (false orthologs). 
In other words, you get many fake singletons. I don't known 
exactly if the problem lies in the normalization process or 
the fact that orthoMCLv1.x is using a very old version of 
MCL. What I do known is that many false positive are made of 
short or incomplete proteins that are very common in draft 
genomes and automatic annotations... Things might be 
completely different with more divergent and globally longer 
proteins. Testing orthoMCLv2 on the same data set would 
probably give the answer.

--Tristan



More information about the Bioperl-l mailing list