Protein Family Neighborhood Analysis - Sequence (All database)
In prokaryotic genomes functionally coupled genes can be organized in conserved clusters of neighboring genes, called operons, enabling their coordinated regulation. Thus, it is possible to predict function of uncharacterised genes by analysing functional annotations within their neighborhoods. Here, we present an algorithm that gives an insight into genomic neighborhoods of a query protein family by calculating statistical significance for overrepresentation of functional domains in the neighborhoods.
» As query, the userinputs protein sequence in FASTA format just like in example
» The size of neighborhood (e.g. 5000 bp) is described as number of base pairs (bp) before and after the gene whose protein product
possesses the query Pfam domain. Typically, the size of neighborhood set to +-5000 bp corresponds to 10 genes in whole neighborhood.
» Due to possibility of high false discovery rate that can be a serious problem when multiple tests are performed (e.g. many Pfam
domains are evaluated for overrepresentation), the user can decide which method of multiple test correction to use –
Bonferroni correction or
Benjamini-Hochberg procedure
»The genomic data that is used in this server is downloaded from
JGI IMG Integrated Microbial Genomes & Microbiomes and it
contains prokaryotic complete genomes and uncomplete genomic data from environmental sequencing.
» Currently, during neighborhood analysis both strands are taken to collect information about protein
families. However, in the future it will be possible also to choose only the strand on which the query pfam
domain is present in the analysed genome.
» Cut-off value is used to narrow the output based on p-value, and name of analysis is just for user's convenience