WebPSSM is a bioinformatic tool for predicting HIV-1 coreceptor usage from the amino acid sequence of the third variable loop (V3) of the envelope gene. A description and comprehensive analysis of the method is located here.
No user sequences are stored. If you would like to "donate" sequences that have known associated phenotypes, for the improvement of this method and the development of multiple subtype matrices, please contact Mark Jensen.
You may enter as many as 100 V3 sequences in FASTA format. Scores and prediction data are returned in the same window after submission. The user can obtain results in tab-delimited format for use in Excel or other programs.
The typical V3 in subtype B is 35 amino acids long, but length
differences are frequent. The matrix in the current implementation is
designed to score a 35 aa fragment. To obtain a correct score for
length variants, it is important that homologous residues be in the
correct position. Before sequences are scored, they are aligned
against a HIV-1 subtype B consensus sequence using Needleman-Wunsch
algorithm and a amino acid distance matrix. Gaps and insertions
relative to the consensus are ignored in the scoring (this does not
affect the predictions much, in general; see Jensen et al. 2003). If
multiple best alignments are calculated, all these alignments are
scored, and the actual scored sequences are displayed in the output.
Two matrices are available for determining scores in subtype B: X4R5, calculated
using sequences of known coreceptor phenotype, as assayed on indicator
cells expressing exogenous CD4 and either CCR5 or CXCR4; and SINSI,
calculated using sequences of known synctyium-inducing phenotype on
the MT2 cell line. We have found that these matrices can give
different phenotype predictions depending on sequence (see Jensen et
al., 2003), and that correlations with disease progression (in
preparation) and prognosis on HAART (Brumme et al., 2004) are better
using SINSI scores. The reason for these differences
appears to be that there are significant amino acid differences
between X4 sequences and SI sequences, according to preliminary
bioinformatic analyses. For subtype C only a SINSI matrix is available.
Questions regarding matrices can be sent to
Mark Jensen.
The current implementation uses matrices derived using either subtype B
or C sequences only, and have been tested only on phenotyped subtype B or
C sequences respectively. Predictions for other subtypes should be treated with
extreme skepticism. Matrices are planned for subtypes A and
D. Contributions of V3 sequences with associated known phenotypes in
these or any subtype would be greatly appreciated: please contact
Mark Jensen
Sequences which align poorly to the V3 consensus are flagged in the
output. These sequences may not be actual V3 loops, or may be from a
highly divergent subtype.
Matrices:
Citations:
Caveats: