Stanford University HIV Drug Resistance Database - A curated public database designed to represent, store, and analyze the divergent forms of data underlying HIV drug resistance.

A curated public database designed to represent, store, and analyze the divergent forms of data underlying HIV drug resistance.

Home Genotype-RX Genotype-Pheno Genotype-Clinical HIVdb Program

Release Notes for the Calibrated Population Resistance (CPR) Tool

Table of Contents

      1. Overview

      2. CPR Analysis: Components

      3. CPR Analysis: Processes

      4. The CPR Report File

      Appendix 1. Reference sequences

      Appendix 2. Mutation lists

      Appendix 3. Genotyping and phylogenetic analysis

      References

1. Overview

    CPR is a tool for routine analysis of human immunodeficiency type 1 (HIV-1) sequences. CPR aims to standardize analysis of HIV-1 sequence population sets through a modular framework of version-tracked components and processes. Although designed primarily for studies of primary HIV drug resistance, CPR also provides a suitable approach to general batch-analysis of HIV-1 pol gene sequence sets.

2. CPR Analysis: Components

2.1 Query Data Set

    CPR accepts one or more FASTA-formatted HIV-1 pol gene sequences as input (i.e. nucleotide sequences). Sequences do not need to be pre-aligned. Nucleotide ambiguities and missing data (i.e. sequences that are not complete across the pol gene) are acceptable and are handled in a consistent way. Currently, there is no limit to the number of sequences that may be submitted to CPR for processing. However, users should be aware that sessions may time out due to the length of time required for processing if more than 5000 sequences are submitted at once.

2.2 Reference Sequence(s)

    Standard reference sequence sets are a key component of CPR analysis. The reference sequences and sequence sets utilised within CPR are described in Appendix 1. Reference sequences are utilised within the following analytical processes:

    Sequence alignment and mutation scoring

    In CPR analysis, query nucleotide sequences are aligned to an amino acid reference sequence. Drug resistance mutations are defined as differences in the amino acid sequence of the query (inferred by conceptual translation) relative to that of the reference. Conventionally, sequences derived from the HXB2 strain of HIV-1 subtype B, and this notation is used by default in CPR.

    Genotyping

    Reference sequence sets constitute the basis of all HIV-1 genotyping protocols. The choice of reference sequences used in genotyping can influence the results. CPR aims to promote consistency across genotypic analyses of HIV-1 through the use of standardized, version-tracked HIV-1 reference sequence sets for genotypic analysis.

2.3 Mutation Lists

    Mutation 'lists' provide the basis for a range of analytical procedures performed in CPR. Mutations are defined as amino acid changes relative to a reference sequence as described in section 2.2 above. A brief description of the lists is provided below, more comprehensive information is provided in Appendix 2.

    Drug resistance mutation (DRM) lists

    Drug resistance mutations are identified in CPR analysis and highlighted in the CPR report. The prevalence of resistance to antiretroviral drugs within the query data set is summarized using standard lists of mutations suitable for this purpose. These include the surveillance drug resistance mutation (SDRM) list, and the IAS-USA (major) list. Drug resistance mutation lists are likely to undergo moderate change over time as new information becomes available. As lists are updated they will be version-tracked with previous versions remaining available for use.

    Other lists

    A list of typical mutations (defined by analysis of the Stanford HIV Drug Resistance Database), and a list of mutations that are indicators of APOBEC-mediated sequence editing, form the basis of certain quality analyses performed in CPR. Mutations belonging to these lists are highlighted in the CPR report.

3. CPR Analysis: Processes
3.1 Sequence Alignment

    A profile alignment is created by aligning each individual query nucleotide sequence to an amino acid 'reference' sequence (by default the HXB2 reference sequence). As HIV-1 pol genes are relatively highly conserved, optimal alignments can generally be obtained by this approach. Mutations, deletions, and insertions (defined as changes relative to the reference sequence) are recorded for each sequence.

3.2 Estimation of Resistance

    CPR estimates the prevalence of drug resistance within the query sequence set using lists of well-characterized drug resistance mutations (see section 2.3). Users can select from a choice of 'summary lists' for this purpose using the pull-down menu on the CPR input page. The selected list is used to compute the 'prevalence' of drug resistance (defined as the proportion of sequences within the query data set with one or more resistance mutations on the summary list) to each of the three main antiretroviral drug classes (protease inhibitors (PIs), nucleoside reverse transcriptase inhibitors (NRTIs), and non-nucleoside reverse transcriptase inhibitors (NNRTIs)).

3.3 Genotyping

    There are several approaches by which viral sequences can be assigned to genotypic 'groups'. CPR uses a version of the STAR (SubType AnalyseR) program described by Myers et al. (2005). See appendix 3 for details of the STAR subtyping process.

4. The CPR report file

4.1 Section 1: Report header

    The 'report header' table shows the unique ID associated with the report, and summarizes which of the standard components and settings were used in analysis.

4.2 Section 2: Input data set summary

    A table showing summary statistics for the input data set: the numbers shown for each gene are calculated by counting only sequences for which a mimimum of 20% of the gene in question is present with sequence (i.e. fragments of genes constituting less than 20% of the total gene length are not counted). The number of hypermutated sequences (i.e. sequences presumed to be lethally edited by APOBEC enzymes) identified in the data set is indicated.

4.3 Section 3: Drug Resistance Summary

    This section reports the prevalence of resistance in the data set as determined using the selected drug resistance mutation (DRM) list. Resistance to each of three drug classes is given as the proportion of gene sequences in the data set containing at least one mutation on the DRM list. In a populated-sampled sequence set obtained from untreated individuals, this provides an estimate of the prevalence of transmitted drug resistance.

4.4 Section 4: Graphical Overview

    If the option is selected, a graphical overview of drug resistance mutations in the input data set is shown. A schematic representation of the PR and RT genes shows the location of primary (i.e. summary) and secondary drug resistance mutations in the submitted data set. Primary and secondary drug resistance mutations are indicated by red and blue markers respectively. Hover the cursor over the markers to display the prevalence of mutations at that position. The RT gene is shown split into two sections (comprising amino acids 1-120 and 121-240 respectively).

4.5 Section 5: Drug resistance mutation prevalences by list

    The prevalences of drug resistance mutations identified in the query data set are listed. If you use the SDRM list to summarize drug resistance, you will also see output for the Borderline-Suspicious mutations, whereas if you use the IAS-USA (major) list, you will also see output for the IAS-USA (minor) mutations. Prevalence in the query data set is shown along with the prevalence of the same mutation in (i) untreated and (ii) treated persons in the Stanford HIV Drug Resistance Database.

    Table footnote : the percentage prevalence of a mutation within the data set is calculated as the proportion of times the mutation occured relative to the number of times that codon position was represented in the data set. Codons with a high degree of ambiguity (>4 possible amino acids) due to the presence of undetermined nucleotides or nucleotide mixtures are treated as misisng data. Where mixtures of mutations are identified, each mutation in the mixture is listed seperately, and each occurrence of a mutation in a mixture is scored equal to it's ocurring alone.

4.6 Section 6: Drug resistance mutations by list, sequence and drug class

    Tables showing drug resistance mutations on the selected 'summary list' (i.e. SDRM or IAS-USA (major)) and identified in the data set are shown for each sequence, with mutations being grouped into columns according to which of the main class of drug they confer resistance to.

    Table footnote: in these tables, mutations that occur as part of a mixture are listed along with all the other inferred mutations in the mixture. For example, for the codon WMC T215NTYS will be shown.

4.7 Section 7: Genetic diversity by sequence

    An overall summary of genotypes (if genotyping was selected) and mutations in the query sequence set.

    Table footnote: Sequence IDs of hypermutated sequences are highlighted in red; primary mutations (i.e. 'SDRM' or 'IAS-USA major') mutations are highlighted in red, secondary resistance mutations ('borderline/suspicious' or 'IAS-USA minor') are shown in blue. Unusual mutations in green. Mutations indicative of (potentially) lethal APOBEC3G-mediated editing are shown in purple. ND = not done, U = unclassifiable

4.8 Section 8: Quality assessment

    The quality assessment section provides an overview of the data set in terms of gene coverage and sequence quality. A plot shows the representation at each codon position in the region analyzed (codons 1-99 of PR and codons 1-240 of RT). Codons that are highly degenerate (due to mixtures or sequencing problems) are treated in the same way as missing data.

Appendix 1: Reference Sequences

Reference sequence for alignment and mutation definition.

    By default, CPR uses the conventional HXB2 sequence (accession no. NC_001802) as a reference for alignment and scoring mutations.

Reference sequences for genoyping

    Reference sequences used to create scoring matrices for STAR genotyping can be accessed here. For details of STAR genotyping, see appendix 3.

Appendix 2: Mutation Lists

Surveillance drug resistance mutation (SDRM) list

    The surveillance drug resistance mutation (SDRM) list is intended to provide a simple, unambiguous and stable measure of transmitted drug resistance (TDR) in HIV-1 (Shafer et al). When used to assess resistance in a population-sampled set of HIV-1 sequences obtained from untreated individuals, the SDRM list provides an estimate of transmitted drug resistance in accordance with WHO guidelines. Mutations on the SDRM list have been selected for their suitability as indicators of TDR and conform to the following criteria: (i) they are commonly recognized as causing or contributing to resistance; (ii) they are nonpolymorphic in untreated persons; and (iii) they are applicable to all HIV-1 subtypes. The SDRM list is associated with a secondary set of 'borderline/suspicious' mutations (see below). The mutations are not utilized in summary calculations of TDR, but are highlighted along with SDRMs to provide additional information which may be useful in the evaluation of results.

Borderline/Suspicious mutation list

    Mutations on the borderline/suspicious list are those that; (i) have demonstrable associations with drug treatment, and do not occur as natural polymorphisms at a level above 0.5% in untreated individuals (based on current data), but have been excluded from the SDRM list because they do not fulfil the criterion of being widely recognised as treatement-associated mutations, or (ii) have some demonstrable association with treatment, but also occur naturally, either as rare polymorphisms close to the cut-off point for consideration as natural polymorphisms (e.g between 0.5 and 1.0%), or as polymorpisms that are generally uncommon, but that occur at elevated frequencies in certain subtypes.

IAS-USA drug resistance mutation list

    The International AIDS Society-USA (IAS-USA) maintains a list of HIV-1 drug resistance mutations known to contribute to drug resistance (Johnson et al (2007)). CPR allows supports the use of the IAS-USA list to summarize drug resistance within a query data set. The IAS list distinguishes between major and minor mutations. Mutations in the minor group include natural polymorphisms and therefore only the IAS-USA major mutations are suitable for surveillance of transmitted drug resistance.

APOBEC3G-mediated defective (A3GD) mutation list

    HIV-1 sequences occasionally contain an excess of guanine (G) to adenine (A) substitutions introduced by the sequence editing activity of host enzymes belonging to the APOBEC family of cytidine deaminases, most notably APOBEC3G. Although it has been suggested that some degree of sub-lethal editing by APOBEC enzymes may contribute to HIV-1 evolution, extensive G-to-A editing generally leads to mutational impairment of viruses.

    Sequence variation in lethally edited viruses reflects qualitatively different biological processes to variation in viable viral genomes (i.e. sequence editing as opposed to purifying selection). It is therefore useful to identify lethally edited sequences in analyses that assume data to represent viable genetic material under selection, such as genotypic estimation of drug resistance. The 'A3GD' mutations are rare substitutions that are commonly found in sequences that have been extensively edited by APOBEC3G, but are uncommon in other sequences. The occurence of three or more A3GD mutations within a single PR-RT sequence is taken as indicating a >99% probability of a background of lethal, APOBEC-mediated editing

Unusual mutation list

    The Stanford HIV Drug Resistance Database is updated regularly with new sequence data and maintains a list of 'typical' mutations in the protease-RT region of HIV-1 group M viruses. Mutations that are not on this list (i.e. unusual mutations) may represent rare polymorphisms or novel drug resistance mutations, but are also likely to represent sequence errors or artefacts introduced during conceptual translation (when attempting to infer codons from sequences containing mixtures). Unusual mutations are highlighted in sections 7 and 8 of the CPR report.

Appendix 3: Genotyping and phylogenetic analysis

Genotyping

    STAR genotyping

    STAR analysis utilizes position-specific scoring matrices (PSSMs) representing nucleotide and amino acid frequencies at each sequence position in aligned sets of sequences representing the principal genetic groups (i.e. genotypes) of the virus under analysis. Query sequences are analysed using PSSMs and a normalised P-distance score (z-score) is derived (Myers et al, 2005). An empirically determined z-score cut-off of 2.5 is used as the threshold of statistical confidence for assignment of query sequences to reference lineages. Sequences that score below this threshold are left unassigned, indicating that they are potentially divergent and/or recombinant.

5. References

The Team

The Data