Many genes in microbial genes remain functionally uncharacterised. Understanding signalling and metabolic pathways involving such genes is essential for deeper understanding of microbial biology, and also mechanisms of infectious diseases. It is well-known that genes which co-occur across genomes are more likely to share similar biological functions than random pairs of genes.
Here we propose a novel algorithm for assessing co-occurrence. We considered a serious limitation of existing approaches which is over-representation of some "popular" species like Escherichia coli. Our algorithm allows collapsing the co-occurrence relationships at different taxonomic levels. Below we provide protein co-occurrence analysis from two organism Escherichia coli and Legionella pneumophila.
Legionella pneumophila