DTU Health Tech
Department of Health Technology
This link is for the general contact of the DTU Health Tech institute.
If you need help with the bioinformatics programs, see the "Getting Help" section below the program.
For large data sets, you are encouraged to download a stand-alone version of the program, with full functionality and no parameter limitations.
Job name:
This prefix is pre-pended to all files generated by the current run. If left empty, a
system-generated number will be assigned as prefix.
Number of clusters:
You may provide a specific number of clusters (e.g. 3), or an interval of partitions (e.g. 2-5).
In the second case, the method will suggest the optimal number of cluster it found in the
data, given the parameter configuration of the job. Maximum number of clusters: 15.
Make clustering moves at each iteration:
By default, simple shift moves are performed at each iteration, indel moves every 10
iterations, single peptide moves every 20 iterations, phase shift moves every 100 iterations.
You can alter this behavior by ticking this option; simple shift and phase shift moves
become disabled, and single peptide moves are made at each iteration. This set-up is
recommended for "nearly-aligned" data, where clustering and indels should be sampled more
regularly than extensions at the termini. That is the case, for example, of sets of MHC
class I ligands of different length, which would in most cases require central indels
to model peptide bulging of long ligand.
Number of iterations per sequence per temperature step:
This parameter ("I") specificies how long your clustering schedule should be. Note that total
number of iterations is the results of "I" multiplied by the number of sequences times the
number of temperature steps, and it will increase linearly the execution time.
Initial Monte Carlo temperature:
The temperature is a scalar, lowered by discreet steps as the iterations progress. The
temperature influences the probability of accepting or rejecting the moves of the algorith.
In the initial iterations (high temperature) the program is free to explore the landscape
of solutions, and as the system cools off only moves that increase the energy will be
accepted.
Number of temperature steps:
The number of steps in the cooling schedule (starting from the initial temperature specified
above).
Interval between Indel moves:
Specifies how often to attempt introducing insertions and deletions (see glossary).
Interval between Single peptide moves:
Specifies how often to attempt moving a sequence between clusters (see glossary).
Interval between Phase shift moves:
Specifies how often to attempt shifting the alignment window of a single cluster.
Background amino acid frequencies:
Construction of PSSMs relies on calculating the frequency of a given residue at a given position,
compared to the expected background frequency of that amino acid. You may use a flat background model
identical for all amino acids (Flat), a pre-calculated distribution reflecting the relative
frequency of each residue in naturally occurring proteins (Pre-calculated Uniprot), or determine
the background model directly from the dataset you submitted (From data).
Preference for hydrophobic AAs at P1:
In the special case of MHC class II data, we have previously found helpful to guide the
alignment by expressing a preference for hydrophobic residues at the P1 of the alignment.
Sequence weighting type:
Data redundancy may affect the quality of the clustering. You may use an explicit clustering
of the sequences in a given group (Clustering), or use a faster heuristic that calculates the degree of
variability at each column in the alignment (Heuristic, recommended); you may also disable
sequence weighting for downweighting of redundant sequences (None).
ClusteringTool Server - ResultsTechnical University of Denmark |
SOM_admut
|
SOM_sil
|
SOM_admut
|
SOM_admut
|
SOM_sil
|
SOM_sil
|
|
SOM_admut
|
SOM_sil
|
SOM_sil
|
SOM_sil
|
SOM
|
SOM_admut
|
SOM_sil
|
SOM_with_validation
|
GibbsCluster: unsupervised clustering and alignment of peptide sequences
Massimo Andreatta, Bruno Alvarez, Morten Nielsen
Nucleic Acids Research, 2017 Apr 12. doi: 10.1093/nar/gkx248
Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align peptide data. The GibbsCluster 2.0 presented here is an improved version incorporating insertion and deletions accounting for variations in motif length in the peptide input. In basic terms, the program takes as input a set of peptide sequences and clusters them into meaningful groups. It returns the optimal number of clusters it identified, together with the sequence alignment and sequence motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups, and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0.
If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).
If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.
Correspondence:
Technical Support: