Protein Family Neighborhood Analysis - Sequence
(All database)

In prokaryotic genomes functionally coupled genes can be organized in conserved clusters of neighboring genes, called operons, enabling their coordinated regulation. Thus, it is possible to predict function of uncharacterised genes by analysing functional annotations within their neighborhoods. Here, we present an algorithm that gives an insight into genomic neighborhoods of a query protein family by calculating statistical significance for overrepresentation of functional domains in the neighborhoods.

» As query, the userinputs protein sequence in FASTA format just like in example
» The size of neighborhood (e.g. 5000 bp) is described as number of base pairs (bp) before and after the gene whose protein product possesses the query Pfam domain. Typically, the size of neighborhood set to +-5000 bp corresponds to 10 genes in whole neighborhood.
» Due to possibility of high false discovery rate that can be a serious problem when multiple tests are performed (e.g. many Pfam domains are evaluated for overrepresentation), the user can decide which method of multiple test correction to use – Bonferroni correction or Benjamini-Hochberg procedure
»The genomic data that is used in this server is downloaded from JGI IMG Integrated Microbial Genomes & Microbiomes and it contains prokaryotic complete genomes and uncomplete genomic data from environmental sequencing.
» Currently, during neighborhood analysis both strands are taken to collect information about protein families. However, in the future it will be possible also to choose only the strand on which the query pfam domain is present in the analysed genome.
» Cut-off value is used to narrow the output based on p-value, and name of analysis is just for user's convenience

Our website provides quick and efficient calculation services, with an estimated completion time of around 1 hour. Once your calculation is complete, we'll send you an email notification.

BlastP input

Your sequence

Select database:

All genomes

Reference genomes

Set cut-off value for output from BlastP algorithm

Neighborhood analysis parameters

Set size of neighbourhood, you want to investigate (bp for up and downstream)

Choose which multiple test correction method you want to use

Choose which DNA strand you are insterested in

Set cut-off value for output from statistic test

Give a name to your analysis

Protein Family Neighborhood Analysis - Sequence (All database)

BlastP input

Neighborhood analysis parameters

Protein Family Neighborhood Analysis - Sequence
(All database)