Stanford University HIV Drug Resistance Database - A curated public database designed to represent, store, and analyze the divergent forms of data underlying HIV drug resistance.

A curated public database designed to represent, store, and analyze the divergent forms of data underlying HIV drug resistance.

Home Genotype-RX Genotype-Pheno Genotype-Clinical HIVdb Program

Release Notes for the Calibrated Population Resistance (CPR) Tool

Table of Contents

      1. Overview

      2. Input

      3. Processing

      4. Output: The CPR Report File

      Appendix 1. Mutation lists

      Appendix 2. STAR Genotyping

      References

1. Overview

    The CPR tool is a program for routine analysis of human immunodeficiency type 1 (HIV-1) sequences. The program provides a standard approach to estimating the prevalence of transmitted HIV drug resistance using population-sampled sequence data, and also provides a suitable approach for general batch-analysis of HIV-1 pol gene sequences.

2. Input

2.1 Query Data Set

    CPR accepts one or more FASTA-formatted HIV-1 PR and RT gene sequences as input (i.e. nucleotide sequences). (IN gene analysis is optional.) Nucleotide ambiguities and missing data (i.e. sequences that are not complete across the PR and RT region are acceptable and are handled in a consistent way. There is no limit to the number of sequences that may be submitted at once, but sessions may time out due to the length of time required for processing if more than 1000 are submitted simultaneuously.

3. Processing
3.1 Sequence Alignment

    A profile alignment is created by aligning each nucleotide sequence in the query data set to the consensus subtype B sequence used as a reference sequence throughout Stanford HIVdb. Mutations, deletions, and insertions (defined as changes relative to the consensus) are recorded for each query sequence.

3.2 Estimation of population prevalence of resistance

    CPR estimates the prevalence of drug resistance within the query sequence set using lists of well-characterized drug resistance mutations (DRMs). Users can select from a choice of DRM lists using the pull-down menu on the CPR input page. The selected list is used to compute the 'prevalence' (i.e. frequency) of drug resistance to each of the three main antiretroviral drug classes (protease inhibitors (PIs), nucleoside reverse transcriptase inhibitors (NRTIs), and non-nucleoside reverse transcriptase inhibitors (NNRTIs)). The presence of one or more DRMs within a sequence is taken as qualifying that sequence as resistant to the corresponding drug class.

3.3 Genotypic estimation of resistance to specific drugs

    The option to perform genotypic estimation of resistance to specific PR and RT inhibitors is provided on the CPR input page. If this option is selected algorithmic estimation of drug resistance to 8 protease inhibitors (ATV, DRV, FPV, IDV, LPV, NFV, SQV, TPV) and 11 RT inhibitors (3TC, ABC, AZT, D4T, DDI, FTC, TDF, DLV, EFV, ETR, NVP) is performed for each sequence using the HIVdb algorithm.

3.4 Genotyping (subtyping)

    There are several approaches by which viral sequences can be assigned to genotypic 'groups'. CPR uses a version of the STAR program described by Myers et al. (2005). See appendix 3 for details of the STAR subtyping process.

4. Output: The CPR report file

4.1 Section 1: Report header

    The 'report header' table shows the unique ID associated with the report, and summarizes which of the standard input and settings were used in analysis.

4.2 Section 2: Input data set summary

    A table showing summary statistics for the input data set: the numbers shown for each gene are calculated by counting only sequences for which a mimimum of 20% of the gene in question is present with sequence (i.e. fragments of genes constituting less than 20% of the total gene length are not counted). The number of hypermutated sequences (i.e. sequences presumed to be lethally edited by APOBEC enzymes) identified in the data set is indicated.

4.3 Section 3: Drug Resistance Summary

    This section reports the prevalence of resistance in the data set as determined using the selected drug resistance mutation (DRM) list. Resistance to each of three drug classes is given as the proportion of gene sequences in the data set containing at least one mutation on the DRM list. In a populated-sampled sequence set obtained from untreated individuals, this provides an estimate of the prevalence of transmitted drug resistance.

4.4 Section 4: Graphical Overview

    A schematic representation of the PR and RT genes shows the location of primary (i.e. summary) and secondary drug resistance mutations in the submitted data set. The RT gene is shown split into two sections (comprising amino acids 1-120 and 121-240 respectively). Primary and secondary drug resistance mutations are indicated by red and blue markers respectively. Hover the cursor over the markers to display the prevalence of mutations at that position.

4.5 Section 5: Drug resistance mutation prevalences by list

    The prevalences of drug resistance mutations identified in the query data set are listed. Prevalence in the query data set is shown along with the prevalence of the same mutation among sequences in the Stanford HIV Drug Resistance Database that were obtained from untreated individuals. Prevalences are shown for each of three major HIV-1 subtypes (A, B, C).

    Table footnote : the percentage prevalence of a mutation within the data set is calculated as the proportion of times the mutation occured relative to the number of times that codon position was represented in the data set. Codons with a high degree of ambiguity (>4 possible amino acids) due to the presence of undetermined nucleotides or nucleotide mixtures are treated as misisng data. Where mixtures of mutations are identified, each mutation in the mixture is listed seperately, and each occurrence of a mutation in a mixture is scored equal to it's ocurring alone.

4.6 Section 6: Drug resistance mutations by list, sequence and drug class

    Tables showing drug resistance mutations on the selected 'summary list' (i.e. SDRM) and identified in the data set are shown for each sequence, with mutations being grouped into columns according to which of the main class of drug they confer resistance to.

    Table footnote: in these tables, mutations that occur as part of a mixture are listed along with all the other inferred mutations in the mixture. For example, for the codon WMC T215NTYS will be shown.

4.7 Section 7: Genetic diversity by sequence

    An overall summary of genotypes (if genotyping was selected) and mutations in the query sequence set.

    Table footnote: Sequence IDs of hypermutated sequences are highlighted in red; primary mutations (i.e. 'SDRM') are highlighted in red. Unusual mutations in green. Mutations indicative of (potentially) lethal APOBEC3G-mediated editing are shown in purple. ND = not done, U = unclassifiable

4.8 Section 8: Genotypic estimation of drug resistance

    If the option to perform genotypic estimation of resistance to specific drugs is selected, the inferred resistance of each sequence to PR inhibitors and RT inhibitors is shown in separate tables. The following levels of inferred drug resistance are shown: susceptible (1), potential low-level resistance (2), low-level resistance (3), intermediate resistance (4), and high-level resistance (5).

4.9 Section 9: Quality assessment

    The quality assessment section provides an overview of the data set in terms of gene coverage and sequence quality. A plot shows the representation at each codon position in the region analyzed (codons 1-99 of PR and codons 1-240 of RT). Codons that are highly degenerate (due to mixtures or sequencing problems) are treated in the same way as missing data.

Appendix 1: Mutation Lists

Surveillance drug resistance mutation (SDRM) list

    The surveillance drug resistance mutation (SDRM) list is intended to provide a simple, unambiguous and stable measure of transmitted drug resistance in HIV-1 (Shafer et al). When used to assess resistance in a population-sampled set of HIV-1 sequences obtained from untreated individuals, the SDRM list provides an estimate of transmitted drug resistance in accordance with WHO guidelines. Mutations on the SDRM list have been selected for their suitability as indicators of transmitted resistance and conform to the following criteria: (i) they are commonly recognized as causing or contributing to resistance; (ii) they are nonpolymorphic in untreated persons; and (iii) they are applicable to all HIV-1 subtypes.

APOBEC3G-mediated defective (A3GD) mutation list

    HIV-1 sequences occasionally contain an excess of guanine (G) to adenine (A) substitutions introduced by the sequence editing activity of host enzymes belonging to the APOBEC family of cytidine deaminases, most notably APOBEC3G. Although it has been suggested that some degree of sub-lethal editing by APOBEC enzymes may contribute to HIV-1 evolution, extensive G-to-A editing generally leads to mutational impairment of viruses.

    Sequence variation in lethally edited viruses reflects qualitatively different biological processes to variation in viable viral genomes (i.e. sequence editing as opposed to purifying selection). It is therefore useful to identify lethally edited sequences in analyses that assume data to represent viable genetic material under selection, such as genotypic estimation of drug resistance. The 'A3GD' mutations are rare substitutions that are commonly found in sequences that have been extensively edited by APOBEC3G, but are uncommon in other sequences. The occurence of three or more A3GD mutations within a single PR-RT sequence is taken as indicating a >99% probability of a background of lethal, APOBEC-mediated editing

Atypical mutation list

    The Stanford HIV Drug Resistance Database is updated regularly with new sequence data and maintains a list of 'typical' mutations in the protease-RT region of HIV-1 group M viruses. Mutations that are not on this list (i.e. atypical mutations) may represent rare polymorphisms, novel drug resistance mutations, or artefacts introduced during sequecing or conceptual translation (e.g. when attempting to infer codons from sequences containing nucleotide mixtures).

Appendix 2: STAR Genotyping

    STAR analysis involves the use of position-specific scoring matrices (PSSMs) to assign sequences to subtypes/CRFs. A normalised P-distance score (z-score) is derived (Myers et al, 2005) and an empirically determined z-score cut-off of 2.5 is used as the threshold of statistical confidence for assignment. Sequences that score below this threshold are left unassigned (U), indicating that they are potentially divergent and/or recombinant.

5. References

The Team

The Data