Protein Family Neighborhood Analysis - PFAM Domain (Family)



In prokaryotic genomes functionally coupled genes can be organized in conserved clusters of neighboring genes, called operons, enabling their coordinated regulation. Thus, it is possible to predict function of uncharacterised genes by analysing functional annotations within their neighborhoods. Here, we present an algorithm that gives an insight into genomic neighborhoods of a query protein family by calculating statistical significance for overrepresentation of functional domains in the neighborhoods.

»As query, the user can choose a protein family from the Pfam database (September 2018, 17929 entries). Query domain identifier has to be in the form "pfNNNNN" or "PFNNNNN", e.g. PF02696.
» The size of neighborhood (e.g. 5000 bp) is described as number of base pairs (bp) before and after the gene whose protein product possesses the query Pfam domain. Typically, the size of neighborhood set to +-5000 bp corresponds to 10 genes in whole neighborhood.
» Due to possibility of high false discovery rate that can be a serious problem when multiple tests are performed (e.g. many Pfam domains are evaluated for overrepresentation), the user can decide which method of multiple test correction to use – Bonferroni correction or Benjamini-Hochberg procedure
» Families that are available for neighborhood analysis were chosen based on high number of representative genomes available. The genomic data that is used in this server is downloaded from JGI IMG Integrated Microbial Genomes & Microbiomes and it contains prokaryotic complete genomes and uncomplete genomic data from environmental sequencing.
» Currently, during neighborhood analysis genes encoded on both strands are considered.
» Cut-off value is used to limit the output based on p-value, and name of analysis is optional.