1. Specify the input sequences
All sequence headers/names MUST be different.
All the input sequences must be in one-letter amino acid or nucleotide
code. The suggested alphabet (not case sensitive) is as follows: A C D E F G H I K L M N P Q R S T U V W Y -
Gaps should be represented only by "-".
Other symbols e.g. B,J,X will be considered as nucleotides / amino acids.
The sequences can be input in
the following two ways:
-
Paste an alignment in
FASTA
format into the upper window of the main server page.
-
Select a FASTA
file on your local disk, either by typing the file name into the lower window
or by browsing the disk.
We have set a limit of 3000 sequences and a length limit of 5000 nucleotides or residues.
2. Advanced options
Ensure optimal solution (costly)
There are two algorithms in MaxAlign. One heuristic and the other branch-and-bound. The branch-and-bound
is guaranteed to find the optimal solution, because it tries all possible solutions. Therefore it is a
slower process. The heuristic algorithm finds the optimal solution in 96% of the test cases, and in the
last 4% it finds a solution within 99% of the optimal solution (see the paper).
Note: This has been improved
later to the heuristic algorithm finding the optimal solution in 98% of the test cases.
As a default, only the heuristic algorithm is used. If, however, you want to be absolutely sure that you
have the optimal alignment, you can check the
"Ensure optimal solution" checkbox, and the
branch-and-bound algorithm will be used as well. However, there is a limit (5 min) on the time used for finding
the optimum.
Detailed output
You might also be interested in knowing which sequences were discarded first (the heuristic algorithm is iterative).
These are probably the worst sequences. You might even want to use some intermediate alignment that is not so
good but keeps some more sequences. Or you might have clicked on
"Ensure optimal solution" and being
interested in comparing the algorithms. In that case you can check the
"Detailed output" checkbox.
You will get the same output as before plus the information relative to the progression of the MaxAlign run.
Preserving selected sequences
You might want to keep some sequences in your alignment, even at the cost of excluding some sites. You can do that by marking those sequences with a plus sign, "+", before their name, as in the example below:
>+Sequence 1
>Sequence 2
Sequence 1 above will always be incorporated in the output of MaxAlign, while Sequence 2 incorporation will be evaluated.
Please be sure your sequence names are not starting with a plus "+" if you dont want them to be marked.
Remove gap-only columns
This option will remove all the columns that only have gaps from the alignment. Please note that this is the only time when MaxAlign will meddle with the columns in the alignment. Otherwise it will only remove sequences. But gap-only columns are useless, and therefore you have the option of removing them. If the input alignment was in frame, removing gap-only columns will not cause any frameshift.
Run MaxAlign on protein sequences and obtain the result in nucleotide sequences (or vice versa)
You might be interested in running a codon-based analysis after MaxAlign. So it would be convenient to get the output as an alignment of nucleotides. However, you will have to run MaxAlign on the protein sequence if you are to run a codon-based analysis. So you can provide both alignments to MaxAlign. You must provide the alignment to be analysed (in this case, the protein one) as the first one, on the top of the page, but also provide the nucleotidic one in the second box, at the end of the page. MaxAlign will process the first one, but then use the second one to construct the output. The way it does this is by comparing the sequence identifiers. Please be sure that your sequences have the same identifiers (which, in fasta format, are the whole headers until the first space character).
3. Submit the job
Click on the
"Submit" button. The status of your job (either 'queued'
or 'running') will be displayed and constantly updated until it terminates and
the server output appears in the browser window.
At any time during the wait you may enter your e-mail address and simply leave
the window. Your job will continue; you will be notified by e-mail when it has
terminated. The e-mail message will contain the URL under which the results are
stored; they will remain on the server for 24 hours for you to collect them.