Inform manual


Inform pdb2mp MatrixPlot zoom


Inform
NAME
Inform - computes the regular and mutual information content of a sequence alignment.
SYNOPSIS
Inform [options] [alignment_file]
DESCRIPTION
Inform computes the information content on each position in a sequence alignment as well as the mutual information shared between any two positions in the alignment. Allowed data formats are: a simple "align format, the fasta format, and the msf format. The program is written in gawk and options are written by their entire variable names, e.g., Inform matrix=test.mtr alignfile. Data can also be piped to Inform. The output goes to stdout, and is given in the "information" version of the mp format. The sequences in the alignment file can be of any alphabet, which the user must give to the program.
OPTIONS
alfile=<alfile>
Alphabet file. List the alphabet of which the sequences are composed. Should be listed in one line. Each symbol should be separated by spaces. Those letters that can substitute for one another (e.g., T and U in nucleotide sequences), can be grouped together. The typical alphabet for nucleotide sequences is "A C G UT -", where U and T substitute arbitrarily for one another. Note that gap (-) must be included as well. The alphabet file must be given.
backdist=<backdistfile>
File containing the background probability distribution of the considered alphabet. This is used in calculation of the position-wise information content. The first line should contain a listing of the alphabet in the same order as in the alphabet file. The following line(s) should have the same number of fields as the first line and indicate the background probabilities for the each position in the alignment (as many lines as positions in the alignment). If only one line with probabilities is listed then all positions in the alignment will be assumed to have that background distribution. The nucleotide example with uniform background distribution looks like:
                     A    C    G    UT  -
                   0.25 0.25 0.25 0.25  1
Note that gap background probability is set to one. The setting of the gap "background probability" is discussed in the introduction.
matrix=<matrixfile>
File containing a matrix that defines which symbols in the alphabet that are complementary. This only works when mtype=2. For a discussion of the "complementarity matrix", see in the introduction. With the nucleotide alphabet listed above the complementarity matrix has the form:
                  A  C  G  UT -
                A 0  0  0  1  0
                C 0  0  1  0  0
                G 0  1  0  1  0
               UT 1  0  1  0  0
                - 0  0  0  0  0
The gaps must be included. The alphabet should be listed in the same order as in the alphabet file.
mtype=1|2
Compute the mutual information by the standard form (1) or by the form introduced in the structure logo (2). For details see the discussion in the introduction. Default is 2 for nucleotide alphabet and 1 otherwise.
diagout=y|n
Include diagonal "zeros" in the output. Default y.
bp=y|n
Include complementary matrix elements in the output, rather than just the upper traingle of the matrix. Default n.
EXAMPLES
A basic example of how to execute the program:
Inform alfile=ntalfile backdist=ntdist matrix=ntmat mtype=2 alignfile > data.mp
Generates the mp file data.mp, using the alphabet listed in ntalfile, the background probabilities listed in ntdist, the complementarity matrix read from ntmat. The mutual information is computed as type 2. The alignment is read from alignfile.

Data can also be piped into Inform:

cat alignfile | Inform alfile=ntalfile backdist=ntdist mtype=1 > data.mp
AUTHORS
Original version by Jan Gorodkin, gorodkin@cbs.dtu.dk.
Program optimized by Hans Henrik Stærfeldt, hhs@cbs.dtu.dk.
Man pages by Jan Gorodkin, April 1999.
REFERENCE
MatrixPlot: visualizing sequence constraints. J. Gorodkin, H. H. Stærfeldt, O. Lund, and S. Brunak. Bioinformatics. 15:769-770, 1999. (http://www.cbs.dtu.dk/services/MatrixPlot/)