VDJSolver - 1.0

Analysis of human immunoglobulin VDJ recombination

The VDJsolver 1.0 server is a program that analyses human immunoglobulin VDJ recombination. The indetification of V and J genes is performed using standard sequencial alignment against databases of functional VH and JH genes from the IMGT database.

Sequences are aligned to the following model:

VH-P_VH-N1-P_Dup-D-P_Ddown-N2-P_JH-JH,

where P_xx designates palindromic nucleotide segments, and Nx N nucleotides upstream or downstream of D. The optimal alignment is obtained using maximum likelihood to select the best fit of the sequence to the model. In the fitting all segments except VH and JH may be omitted. The model includes all conventional germline D segments in the IMGT database (D gene list) in normal and inverted reading direction. For details on the model please see the Model description.

The project is collaboration between CBS and Clinical immunology, University of Southern Denmark.

Submission

CITATIONS

For publication of results, please cite:

Current version:
No evidence for the use of DIR, D-D fusions, chromosome 15 open reading frames or VH replacement in the peripheral repertoire was found on application of an improved algorithm, JointML, to 6329 human immunoglobulin H rearrangements.
Ohm-Laursen L, Nielsen M, Larsen SR, and Barington T.
Immunology. 119(2):265-77. 2006
View the abstract

Instructions

In order to use the VDJsolver server for prediction on nucleotide sequences:

Enter the sequence in the sequence window, or give a file name.
The sequence must be written using the one letter code: `acgt' or `ACGT'.
Other letters are ignored and treated as unknown.
Other characters, such as whitespace and numbers, will simply be ignored.
Press the "Submit sequence" button.
A www page will return the results when the prediction is ready. Response time depends on system load.

Model description

VDJsolver was developed using Yabasic (www.yabasic.de). The program uses the maximum likelihood method to obtain the best fit to the following model:

VH-P_H-N1-P_Ddown-D-P_Dup-N2-P_J-JH,

where Nx designates N and P palindromic nucleotides upstream or downstream of the D gene as indicated. Any segment may be omitted except VH and JH. VH was compared with the IGHV3-23*01 germline gene (GenBank accession number M99660) while JH was compared with the germline JH gene with the highest identity score from codon 114 through the splice site among all JH-genes in the IMGT database. The D segments were compared with any germline D segment available in the IMGT database. P segments were defined as 2-8 nucleotide long extensions from the VH, Dx or JH genes reverse complementary to the corresponding germline sequence. Maximum likelihood was determined by running through all possible combinations of segments for a given rearrangement and finding the combination maximizing the likelihood score. The score was defined as the product of estimated probabilities for any event deviating from the germline sequences in question. Probabilities for transitions and transversions in VH, Dx and JH segments were calculated from the number of substitutions found in the VH region from codon 1 through 100 (assuming a 5/4 ratio of transitions to transversions). For un-mutated sequences, the estimated Taq error rate was used. A given N nucleotide was attributed a probability equal to its frequency in all N segments (determined by iteration of the model on all sequences). To reduce stochastic assignment of D segments, D segments shorter than 4 nucleotides were not accepted and D segments with more mutations than the 95 percentile of that expected by the assumed mutation rate and length of the D segment (Poisson distribution) were not accepted either. A dynamic probability for including a D segment was introduced, dependent on the length of the joint region (codons 101 through the downstream splice site) and the mutation rate of the VH region. The parameters were fine tuned to find a D gene in 5% of the sequences from a set of artificial rearrangements made by a random permutation of the bases between the VH and JH segments of real rearrangements. D segments were generally at least eight nucleotides long.

Format of VDJsolver output

EXAMPLE OUTPUT

VDJsolver 1.0 using the JointMLc algorithm for IgH joint composition (version 060505) Result for sequence no: 1 >seq1 Rearrangement GTGCATTACTGTGCGAAGGGGAGGCTAGAGGATCCCGGGGAGCTACTAAAACTACCAAAACAACCATACTACCACTACCACGGCATGGACGTCGGGCGCCAAGGGACCACGGTCACCGTCTCCTCACGT ..at.............aga <- V-gene: IGHV3-23*01 GTGCATTACTGTGCGAA <- VH-segment GGGGAGGCTAGAGGATCCCG <- N-addition (1) GGGAGCTACTA <- D-segment ..tatagt...........c <- D-gene: IGHD1-26*01 AAACTACCAAAACAACCA <- N-addition (2) JH-segment -> TACTACCACTACCACGGCATGGACGTCGGGCGCCAAGGGACCACGGTCACCGTCTCCTCACGT JH-gene: IGHJ6*02 -> at......t.....t....t.........t..g.............................g.. Rearrangement conserves reading frame Number of stopcodons in joint at time of rearrangement= 0 Rearrangement is productive CDR3 length (bp)= 81

References

No Evidence for the use of DIR, D-D Fusions, Chromosome 15 Open Reading Frames or VH Replacement in the Peripheral Repertoire Was Found when Applying an Improved Algorithm, JointML, to 6329 Human IgH Rearrangement.
Line Ohm-Laursen¹, Morten Nielsen², Stine R Larsen¹, and Torben Barington^1* Immunology., 119(2):265-77, 2006.

¹Department of Clinical Immunology, Odense University Hospital, Denmark.
²Center for Biological Sequence Analysis, BioCentrum, Technical University of Denmark, Lyngby, Denmark.
^*Corresponding author.

Abstract

Antibody diversity is created by imprecise joining of the V-, (D-) and J-gene segments of the heavy and light chain loci. Analysis of rearrangements is complicated by somatic hypermutations and the uncertainty of the sources of gene segments and the precise way they recombine. It has been suggested that DIR and chromosome 15 open reading frames (OR15) can replace conventional D genes, that two or inverted D genes may be used and that the repertoire can be further diversified by VH replacement. Safe conclusions require large, well-defined sequence samples and algorithms minimizing stochastic assignment of segments. Two computer programs were developed for analysis of heavy chain joints. JointHMM is a profile hidden Markow model while JointML is a maximum likelihood based method taking the lengths of the joint and the mutational status of the VH gene into account. The programs were applied to a set of 6329 clonally unrelated rearrangements. A conventional D gene was found in 80% of un-mutated sequences and 64% of mutated sequences while D gene assignment was kept below 5% in artificial (randomly permutated) rearrangements. No evidence for the use of DIR, OR15, multiple D genes or VH replacements was found while inverted D genes were used in less than 0.1% of the sequences. JointML was shown to have a higher predictive performance when it comes to D-gene assignment in mutated and un-mutated sequences than four other publicly available programs. An online version 1.0 of JointML is available at http://services.healthtech.dtu.dk/service.php?VDJsolver-1.0. The VDJsolver 1.0 implements the JointMLc method described in the article.

PMID: 17005006

Software Downloads

Version 1.0b

Linux
IRIX64

GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: