1. Overview
  2. Input
    1. Query dataset
    2. Surveillance drug-resistance mutation (SDRM) list
  3. Processing
    1. Sequence alignment and amino acid translation
    2. Sequence and SDRM quality assessment
    3. Proportions of sequences with SDRMs by drug class
    4. Phylogenetic analysis
  4. Output
    1. Summary
    2. Methods
    3. QA details
    4. Complete mutation list
    5. Genetic Diversity
  5. Appendices
    1. Appendix 1: SDRM lists
    2. Appendix 2: APOBEC3G/F hypermutations
    3. Appendix 3: Highly unusual amino acid mutations
    4. Appendix 4: Highly complex and rare mixtures
    5. Appendix 5: Sequence & SDRM quality assessment criteria
  6. References

1. Overview

The CPR (Calibrated Population Resistance) tool is a program for analyzing populations of human immunodeficiency type 1 (HIV-1) sequences. CPR provides a standard approach for determining the proportion of submitted sequences containing a mutation suggestive of transmitted HIV-1 drug resistance.

CPR currently provides two standard mutation lists (SDRMs: surveillance drug-resistance mutations) as indicators of transmitted resistance. CPR ensures consistency in the analysis of molecular epidemiologic studies by providing investigators access to a standard protocol for handling missing data and identifying sequence artifacts. Unlike the HIVdb interpretation program, CPR does not provide a drug resistance interpretation.

 

2. Input

2.1 Query dataset

CPR accepts a set of FASTA-formatted HIV-1 RT or protease (PR) gene nucleotide sequences as input sequences. Sequences can be passed into the textbox or uploaded as a file containing up to 500 non-interleaved FASTA sequences. Consistent with the FASTA format, each sequence should be preceded by a line containing ">" followed by a sequence name and optionally followed by additional descriptors separated by pipes ("|").

2.2 Surveillance drug-resistance mutation (SDRM) list

The surveillance drug-resistance mutations (SDRMs) were designed to be sensitive and specific indicators of antiretroviral drug selection pressure. A standard list of SDRMs makes it possible to compare the results of sequencing studies performed in different regions or at different times. Mutations on the current SDRM list were selected for their suitability as indicators of transmitted resistance: (a) they are commonly recognized as causing or contributing to resistance; (b) they are nonpolymorphic in untreated persons; and (c) they are applicable to all HIV-1 subtypes.

The WHO 2009 SDRM list [appendix 1] is currently the default list. However, users can also analyze their sequence data using the older WHO 2007 SDRM list. If additional changes to the SDRM list are made, users will continue to have the option of analyzing their data using previous lists.

 

3. Processing

3.1 Sequence alignment and amino acid translation

Each submitted nucleotide sequence is aligned to the consensus subtype B amino acid reference sequence using a nucleotide to amino acid sequence local alignment program ("Lap.c" by X Huang, Genomics 1996). The aligned nucleotide sequence is translated in the correct reading frame. Nucleotide triplets containing IUPAC ambiguities (e.g. R indicates a mixture of A and G) are translated into each of the possible amino acids they encode. For example, ATR indicates a mixture of ATA (Isoleucine; I) and ATG (Methionine; M). Nucleotide triplets encoding >4 possible different amino acids or containing a 'N' (a highly ambiguity nucleotide indicating 'G', 'A', 'T' or 'C') at the first or second nucleotide in a codon are translated to "X". Amino acid differences from the reference sequence are referred to as mutations.

RT sequences with fewer than 100 amino acid positions, protease sequences with fewer than 50 amino acid positions, or sequences of either gene with <50% identity to the consensus reference sequences will not be analyzed further.

3.2 Sequence and SDRM quality assessment

Sequences are assessed for their completeness and quality. Indicators of decreased quality include stop codons, frame-shifts, highly ambiguous nucleotides (B, D, H, V and N), evidence for APOBEC3G/F hypermutation [appendix 2], previously unpublished insertions and deletions, and a marked excess of highly unusual amino acid mutations [appendix 3].

Each SDRM is also examined for its reliability by assessing whether the mutation could be a result of a regional sequence artifact. To this end, we determine whether (a) the SDRM resulted from a GG AG (APOBEC3G) or GA AA (APOBEC3F) mutation in a sequence that has evidence for APOBEC-mediated hypermutation; (b) the SDRM is a possible artifact of a gap present within the alignment; (c) the SDRM is adjacent to a stop codon, an amino acid translated to an 'X', or ≥2 highly unusual mutations; and (d) the SDRM is part of a complex mixture that cannot be explained as a transition from wild-type to mutant or vice versa [appendix 4].

The QA criteria for sequence inclusion and SDRM exclusion is listed in appendix 5. Subtle changes to the QA criteria are likely to made in the future. These changes will result in a new version number for the program.

3.3 Proportions of sequences with SDRMs by drug class

The following proportions of sequences containing an SDRM are calculated:

(a) proportion of sequences containing at least one NRTI, NNRTI, or PI SDRM;
(b) proportion of sequences containing at least one NRTI SDRM (whether or not NNRTI or PI SDRMs is also present);
(c) proportion of sequences containing at least one NNRTI SDRM (whether or not NRTI or PI SDRMs is also present);
(d) proportion of sequences containing at least one PI SDRM (whether or not NRTI or NNRTI SDRMs is also present);
(e) proportion of sequences containing both NRTI and NNRTI SDRMs;
(f) proportion of sequences containing NRTI, NNRTI, and PI SDRMs.

3.4 Phylogenetic analysis

A phylogenetic tree is created from the submitted sequences and a set of reference subtype sequences belonging to subtype A1, A2, B, C, D, F1, F2, G, H, J, K, CRF01_AE and CRF02_AG. The tree is created by the PAUP program using the neighbor-joining method applied to a matrix of genetic distances calculated using the HKY85 substitution model and a gamma distribution at variable sites. The tree is rooted with a group N sequence.

The median pairwise distance of submitted sequences is also calculated using PAUP.

 

4. Output

The CPR report consists of a Summary page and four additional detailed reports accessible using the tabs located at the top of the page: Summary, Methods, QA Details, Complete Mutation List, and Genetic Diversity.

4.1 Summary

4.1.1 Date of CPR report generation, options for downloading the same summary page in pdf and the CPR analysis details in excel.

4.1.2 Number of input sequences

A table summarizing the numbers of sequences by gene in the input data set. This table also notes whether sequences were filtered because they were too short or were considered to have too many errors to be analyzable.

The table contains five rows: (a) sequences: number of sequences submitted regardless of gene or sequence length; (b) sequences containing either RT or PR: sequences of sufficient length containing RT and/or PR; (c) sequences containing RT (±PR); (d) Sequences containing PR (±RT); (e) sequences containing both RT and PR. If sequences are filtered, the table will have footnote listing these sequences and the criteria used for filtering.

4.1.3 SDRM position coverage

For each ARV drug class, a table lists for each SDRM position (a) the number of sequences not encompassing an SDRM position, (b) the number of sequences for which the SDRM position was sequenced but considered not evaluable, and (c) the number of sequences that were evaluable for the presence of the SDRM.

4.1.4 Proportion of sequences with SDRMs

(a) Proportion of sequences containing at least one NRTI, NNRTI, or PI SDRM; (b) proportion of sequences containing at least one NRTI SDRM (whether or not NNRTI or PI SDRMs are also present); (c) proportion of sequences containing at least one NNRTI SDRM (whether or not NRTI or PI SDRMs are also present); (d) proportion of sequences containing at least one PI SDRM (whether or not NRTI or NNRTI SDRMs are also present); (e) proportion of sequences containing both NRTI and NNRTI SDRMs; (f) proportion of sequences containing NRTI, NNRTI, and PI SDRMs.

4.1.5 Sequences with SDRMs

A table listing each of the SDRMs for those sequences containing one or more SDRMs.

4.2 Methods

This page summarizes (a) the list of SDRMs, (b) sequence inclusion criteria and (c) SDRM position exclusion criteria used for CPR analysis.

4.3 QA Details

4.3.1 Number of input sequences

The same table as the first table on the Summary page [4.1.2].

4.3.2 QA analysis results of each submitted sequence

Table listing the QA analysis results for each sequence and for each gene:

(a) gene - gene name (PR or RT). RT and PR sequences from the same individual will follow one another;
(b) first pos - the first amino acid position sequenced;
(c) last pos - the last amino acid position sequenced;
(d) No. of stops, insertions/deletions and Xs - the total number of stop codons + previously unpublished insertions/deletions + codons translated to X;
(e) No. A3GF - number of APOBEC3G/F hypermutations;
(f) No. highly unusual - number of highly unusual mutations;
(g) Sequence filtered - a check mark if the sequence does not meet a sequence inclusion criteria;
(h) SDRM position excluded - a list of SDRM positions excluded from counting as SDRMs if any;
(i) Mutation list - the complete list of amino acid differences from the consensus reference sequence. Stop codons, insertions/deletions/Xs were indicated in color orange, APOBEC3G/F hypermutations in yellow, highly unusual mutations in gray, and SDRMs in red.

4.4 Complete mutation list

4.4.1 Proportion of sequences with SDRMs

The same table as the table [4.1.4] on the Summary page.

4.4.2 Sequences with SDRMs

The same table as the table [4.1.5] on the Summary page.

4.4.3 Number of sequences according to number of SDRMs

Table listing the number of sequences according to the number of SDRMs for each drug class

4.4.4 SDRM analysis results of each sequence

A table listing for each sequence: (a) a complete PR mutation list; (b) a complete RT mutations; (c) number of PI SDRMs; (d) number of NRTI SDRMs; (e) number of NNRTI SDRMs. SDRMs are indicated in red.

4.5 Genetic Diversity

A simple representation of phylogenetic tree of submitted sequences. A number was assigned to each sequence name to ensure the sequence name is unique for creating a tree. For sequences containing SDRMs, the list of SDRMs was indicated in the sequence name and it was colored in red. The names of subtype reference sequences and a group N sequence for rooting were indicated in color blue. The median pairwise distance of submitted sequences is shown on the top of the tree.

The nexus tree file can be downloaded for viewing the tree using tree viewing tools. The distance matrix created by Paup can also be downloaded.

 

5. Appendices

Appendix 1: SDRM lists

SDRM 2007 (Shafer, et al, AIDS 2007)

SDRM 2009 (Bennett, et al, PLoS One 2009)
 

Appendix 2: APOBEC3G/F hypermutations

HIV-1 sequences occasionally contain an excess of guanine (G) to adenine (A) substitutions introduced by the sequence editing activity of host enzymes belonging to the APOBEC family of cytidine deaminases, most notably APOBEC3G (GG → AG) and APOBEC3F (GA → AA). Although it has been suggested that some degree of sub-lethal editing by APOBEC enzymes may contribute to HIV-1 evolution, extensive G-to-A editing generally leads to mutational impairment of viruses. Sequence variation in lethally edited viruses reflects qualitatively different biological processes to variation in viable viral genomes (i.e. sequence editing as opposed to purifying selection). It is therefore useful to identify lethally edited sequences in analyses that assume data to represent viable genetic material under selection, such as genotypic estimation of drug resistance.

The APOBEC3G/F hypermutations (Gifford, et al. AIDS 2008) are rare substitutions that are commonly found in sequences that have been extensively edited by APOBEC3G/F, but are uncommon in other sequences. The occurrences of three or more APOBEC3G/F mutations within a single PR-RT sequence is taken as indicating a >99% probability of a background of lethal, APOBEC-mediated editing.

APOBEC3G/F v.1
 

Appendix 3: Highly unusual amino acid mutations

The Stanford HIV Drug Resistance Database is updated regularly with new sequence data and maintains a list of 'typical' mutations in the protease-RT region of HIV-1 group M viruses. Mutations that are not on this list (i.e. unusual or atypical mutations) may represent rare polymorphisms, novel drug resistance mutations, or artifacts introduced during sequencing or conceptual translation (e.g. when attempting to infer codons from sequences containing nucleotide mixtures).

Typical/usual mutations v.1: protease, RT
 

Appendix 4: Highly complex and rare mixtures

Mixtures of three and more amino acids that are not included in the list of allowable complex mixtures.

Allowable complex mixtures v.1
 

Appendix 5: Sequence & SDRM quality assessment criteria

Version 1: July, 2011

Table 1. Sequence inclusion criteria
 RTPR
Minimum amino acid(AA) coverage
Positions 65 to 215
Positions 30 to 90
Maximum number of stop codons + frame-shifts + unpublished AA insertions or deletions + highly ambiguous nucelotides (B,D,H,V,N)
4
2
Maximum number of APOBEC3G/F hypermutated AAs (v.1)
3
2
Maximum number of highly unusual AA mutations (v.1)
15
8

Table 2. SDRM exclusion criteria
PR SDRM D30N, M46I, G73S and RT SDRM D67N, M184I, G190SE in sequences containing ≥2 APOBEC3G/F hypermutated AAs
SDRMs adjacent to an insertion or deletion or frame-shift
SDRMs adjacent to ≥2 highly unusual AA mutations, AAs with highly ambiguous nucleotides, or stop codons
SDRMs as a part of a highly complex mixture (v.1)

 

6. References

Bennett DE, Camacho RJ, Otelea D, Kuritzkes DR, Fleury H, Kiuchi M, Heneine W, Kantor R, Jordan MR, Schapiro JM, Vandamme AM, Sandstrom P, Boucher CA, van de Vijver D, Rhee SY, Liu TF, Pillay D, Shafer RW (2009). Drug resistance mutations for surveillance of transmitted HIV-1 drug-resistance: 2009 update. PLoS One. 2009;4(3):e4724.

Gifford RJ, Rhee SY, Eriksson N, Liu TF, Kiuchi M, Das AK, Shafer RW (2008) Sequence editing by Apolipoprotein B RNA-editing catalytic component and epidemiological surveillance of transmitted HIV-1 drug resistance. AIDS, 2008 Mar 30;22(6):717-25.

Shafer RW, Rhee SY, Pillay D, Miller D, Sandstrom P, Schapiro JM, Kuritzkes DR, Bennett D (2007). HIV-1 protease and reverse transcriptase mutations for drug resistance surveillance. AIDS 21:215-23.

Team

Resources