Information & Background on the HLA Peptide Motif Searches

Table of Contents

coefficient viewing page
Reference list
Related web sites
Best test sequence

Return to HLA peptide motif search page


Purpose

This Web site allows users to locate and rank 8-mer, 9-mer, or 10-mer peptides that contain peptide-binding motifs for HLA class I molecules. Said rankings employ amino acid/position coefficient tables deduced from the literature by Dr. Kenneth Parker of the National Institute of Allergy and Infectious Diseases (NIAID) at the National Institutes of Health (NIH) in Bethesda, Maryland. The Web site was created by Ronald Taylor of the Bioinformatics and Molecular Analysis Section (BIMAS), Computational Bioscience and Engineering Laboratory (CBEL), Division of Computer Research & Technology (CIT), National Institutes of Health, in collaboration with Dr. Parker.


Instructions

The instructions below follow the order of buttons and entry boxes on the page: top-to-bottom, and within that order, left-to-right:
  1. Choose which HLA molecule is of interest via the "Molecule" menu button. The default value is "A_0201". (The choice of the molecule determines which coefficient table the scoring program uses on your sequence.)
  2. Choose the length of the subsequences the program extracts from your input sequence and then scores and ranks. Use the "mers" menu button for this. Currently three possibilities are available:
    • 8
    • 9
    • 10
    The default value is nine. If this default value is chosen, then the appropriate 20-by-9 coefficient matrix for the selected HLA molecule will be used for the scoring. If the subsequence length is chosen to be ten residues, the same matrix will be used, but the fifth residue will be ignored in scoring. If eight residues are chosen, a different 20-by-8 matrix will be employed.
  3. Choose the method you wish to use to limit the number of results returned. Do this via the "Results limited by" set of two toggle buttons. There are two possibilities:
    • "Explicit Number" (the default). If you choose this method, then proceed to the menu directly below this button to select which number of results to return. The menu default for the number of values to return is set at 20.
    • "Predicted T(1/2) >=". If you choose this method, you are requesting that only those scores (those predicted half-lives) over a given value be returned. Proceed to the menu directly below this button to select the number to use as this cutoff score. The menu default cutoff score is set at 100.
  4. Enter the sequence of the protein to be searched. This is required input. Either type in the sequence or paste it in from another window. Many formats can be used: Raw/Plain, EMBL, Pearson/Fasta, etc.
  5. Choose whether you wish to display your input sequence on the output page using the "Echo input sequence" button. Echoing is recommended, and the default value is set to "echo". If echoed, your sequence will be shown on the output page as numbered lines of 50 residues each. (The numbering allows you to easily find in the input sequence the subsequences returned in the results table.)
  6. Submit the job by clicking on the "submit" button.

The current maximum size of the input sequence is arbitrarily set to 5000 residues. The program will truncate the sequence at that point, no matter how long the original sequence entered is, and will only work with the first 5000 residues. This is a safeguard to prevent the Web site from being overwhelmed.

Output Page Returned to the User

After you submit your job, a set of scores will be calculated for all 8-mer, 9-mer, or 10-mer subsequences contained in your input sequence, depending on which option you selected. Based on the scores, the subsequences will be ranked. This task should be completed within a few seconds. (Unless your input sequence is extremely large, in which case somewhat more time will be required.) A display page will then be returned that shows

Each row of the scoring output table will consist of four columns. The values for these items represent

The number of rows (entries) in the scoring output table will be limited by whatever method you chose (cutoff score or explicit number).

A link back to the HLA motif search page, along with a date/time stamp, is placed at the bottom of the output page.

Scoring

The algorithm used to score each 8-mer, 9-mer, or 10-mer peptide subsequence is simple. It runs as follows:

Coefficient Files

For each HLA molecule, the coefficient values discussed above are stored in a file in our separate directory of coefficient files. When the user selects the HLA molecule type (or, on our restricted Web page for advanced users, selects the actual filename from the coefficient filename menu) and selects "9-mer" or "10-mer" as the length of the subsequence, the "standard" file containing the 20-by-9 coefficient matrix for the HLA molecule is read on-the-fly, with the 181 values (180 coefficient values plus one final constant) being read into an internal array for use by the scoring program. If the user selected "8-mer" as the subsequence length, then the program proceeds in the same fashion, but reads in a different coefficient file appropriate for 8-mer searches on that selected HLA molecule type.

A given HLA molecule can have multiple coefficient files constructed for it in our coefficient file directory, but only the two "standard" files for the molecule are available from this site for scoring. For example, if the user selects "A3" in the HLA molecule scrollable menu and "9" or "10" as the subsequence length, then the "A3_standard" coefficient file would be used. If the user selects "A3" in the HLA molecule scrollable menu and "8" as the subsequence length, then the "A3_8mer_standard" coefficient file would be used. (The other files are available for use/modification on the restricted advanced site.) To view the coefficient values in a file for a selected HLA molecule, go to our coefficient viewing page.

Handling of Ambiguous Characters

All non-alphabetic characters are filtered out from the input sequence. That is, they are completely removed before any subsequence extraction and scoring are performed.

The 26 alphabetic characters are handled as follows:

Re display in output: the 20 non-ambiguous alphabetic characters representing the 20 amino acid types will be displayed in uppercase in the subsequence matches in the scoring table. Alphabetic ambiguous characters will be outputted in the subsequence matches as periods (dots). As stated above, all non-alphabetic characters will be stripped out, not used in scoring, and hence not displayed.

Background

Note that the coefficient tables used at this Web site have not been published elsewhere (except for HLA-A2). We intend to update the values in the tables as we deem appropriate, based on new information, correspondence, or judgment calls. To cite these tables appropriately, note the date the program at this Web site was used, and obtain a printout of the table that was employed. Because the tables necessarily contain a large number of judgment calls, in the future we may explore allowing the user to modify a table, or to employ a table entirely of the user's own devising. Users are encouraged to check the tables with the references cited, so as to understand the basis for the numbers in the tables.

Depending on correspondence received, especially from other scientists in the field that have helped determine anchor residue preferences, we are likely to extend these tables to include additional class I molecules, and class II molecules.

Principle of the calculations: The idea behind these tables is the assumption that, to the first approximation, each amino acid in the peptide contributes independently to binding to the class I molecule. Dominant anchor residues, which are critical for binding, have coefficients in the tables that are significantly different than 1. Highly favorable amino acids have coefficients substantially greater than 1, and unfavorable amino acids have positive coefficients that are less than one. Auxiliary anchor residues have coefficients that are different from 1 but smaller in magnitude than dominant anchor residues. When any amino acid has been found to be enriched or depleted in the endogenous peptides associated with a class I molecule, the coefficients have been adjusted to take that fact into account, whether or not the enrichment or depletion has an obvious structural basis. In some cases, such coefficients may reflect the peptide-binding properties of other proteins in the peptide / class I complex pathway. In all cases, many amino acids have coefficients that have the default value of exactly 1.0. That means that the amino acid at that position is not known to make either a favorable or unfavorable contribution to the binding of the peptide. There are several reported instances where combinations of amino acids appear to make contributions to peptide binding that are greater than or less than expected from each amino acid considered separately. These complications are to be expected, and are best dealt with using this program by resorting the initial output manually. Perhaps later versions of this Web site may be able to account for such considerations. I (Ken Parker) believe that at present a more serious problem is that many of the coefficients have not been determined accurately enough. There must be many instances where auxiliary anchor effects have not been determined yet, and there are probably even more instances where unfavorable amino acid preferences have not been elucidated.


References

Links to references are provided on a separate References Page. Links to useful Web sites can be found on our Associated Web Site Page. Lastly, we provide a listing of concatenated peptide sequences that are known to bind to MHC class I molecules. This is the best test sequence for the coefficient tables. You can find it on our page for Concatenated MHC class I peptide sequences from Rammensee et al, Immunogenetics 41:178


Contact Information

For Comments/FeedBack regarding the implementation of this Web site contact Ronald Taylor at rtaylor@helix.nih.gov

For Comments/Questions regarding the research underlying the motif search algorithm and the coefficient values for the different HLA molecules used by this Web site contact Dr. Kenneth Parker at KPARKER@atlas.niaid.nih.gov


Return to HLA peptide motif search page

rtaylor@helix.nih.gov   Ronald Taylor of   BIMAS  /  CBEL  /  CIT  /  NIH