Consensus Integrase of a New HIV-1 Genetic Variant CRF63_02A1.

The high genetic variability of the human immunodeficiency virus (HIV-1) leads to a constant emergence of new genetic variants, including the recombinant virus CRF63_02A1, which is widespread in the Siberian Federal District of Russia. We studied HIV-1 CRF63_02A1 integrase (IN_CRF) catalyzing the incorporation of viral DNA into the genome of an infected cell. The consensus sequence was designed, recombinant integrase was obtained, and its DNA-binding and catalytic activities were characterized. The stability of the IN_CRF complex with the DNA substrate did not differ from the complex stability for subtype A and B integrases; however, the rate of complex formation was significantly higher. The rates and efficiencies of 3'-processing and strand transfer reactions catalyzed by IN_CRF were found to be higher, too. Apparently, all these distinctive features of IN_CRF may result from specific amino acid substitutions in its N-terminal domain, which plays an important role in enzyme multimerization and binding to the DNA substrate. It was also found that the drug resistance mutations Q148K/G140S and G118R/E138K significantly reduce the catalytic activity of IN_CRF and its sensitivity to the strand transfer inhibitor raltegravir. Reduction in sensitivity to raltegravir was found to be much stronger in the case of double-mutation Q148K/G140S.


INTRODUCTION
Human immunodeficiency virus type 1 (HIV-1) has high genetic variability, resulting in the occurrence of various subtypes, circulating recombinant forms (CRFs), and unique recombinant forms (URFs) [1]. The wide variety of HIV-1 genetic variants results from the high replication rate, the tendency toward recombination and reverse transcriptase errors [2][3][4]. Different HIV-1 subtypes have different geographical distributions. Subtype A prevails within the former USSR [5,6], but there are also various CRFs [7][8][9]. Since 2010-2012, a new HIV-1 genetic variant, CRF63_02A1, has been found to dominate in regions of the Russian Federation characterized by the highest HIV epidemic rates, such as the Kemerovo, Novosibirsk, Tomsk, and the Altai regions [10][11][12]. Because of its rapid spread, this genetic variant requires in-depth research.
An important step in studying a new form of HIV-1 is the characterization of its enzymes, the integrase (IN) that catalyzes the integration of viral DNA into the genome of the infected cell being one of them [13]. Three IN inhibitors -raltegravir, elvitegravir and dolutegravir -are currently used as components of antiretroviral therapy [14]. However, the emergence of viral resistance to these inhibitors has been identified [15,16]. It is known that both mutations conferring drug resistance and the mechanisms of their occurrence in viruses of different subtypes may vary [17][18][19][20][21][22]. In this regard, it is important to study the effect of the natural polymorphism of IN on its properties.
In this study, we have characterized IN of a new HIV-1 genetic variant, CRF63_02A1 (IN_CRF), and compared it to that of HIV-1 subtype A (IN_A), which is also widespread in Russia. In particular, we have studied the influence of the structural differences between the enzymes on their DNA-binding and catalytic activities. The influence of drug resistance mutations on the catalytic and DNA-binding activity of IN_CRF, as well as its sensitivity to the strand transfer inhibitor raltegravir, has also been analyzed.

Designing the consensus sequence of IN_CRF
The HIV-1 subtype was determined using a phylogenetic and recombination analysis according to the procedure described earlier in [10,11,23]. The IN_CRF consensus sequence was created using the BioEdit software (Ibis Biosciences, Carlsbad, CA).
HIV-1 RNAs were isolated using a commercial Realbest DeltaMag HBV/HCV/HIV kit (Vector-Best JSC, Russia) from clinical blood plasma samples (250 µl) from two treatment-naïve patients infected with HIV-1 variants that carried the IN genes most similar to the calculated IN_CRF consensus sequence. DNA fragments (878 bps) encoding IN_CRF were prepared by RT-PCR from the isolated RNA samples using a commercial LongRange 2Step RT-PCR kit (Qiagen, USA) and primers containing restriction sites for subsequent cloning.

Preparation of the vector encoding IN_CRF
DNA fragments encoding IN_CRF were ligated in the plasmid pCR_2.1Topo using the commercial TOPO® TA Cloning® Kit (pCR™2.1-TOPO®, Thermo Fisher Scientific Inc., USA). Plasmid DNA was isolated from 60 pCR_2.1Topo_IN clones (30 clones for each HIV-1 variant) using a commercial Plasmid Purification Mini Kit (Qiagen, USA); all the DNA samples were sequenced. Plasmid pCR_2.1Topo_IN_CRF* containing an IN sequence differing from the consensus one by two amino acid substitutions was selected for further sub-cloning in expressing vector pET_15b in frame with the codons for the N-terminal His 6 -tag (His-tag) (Novagen, USA).
The vector pET_15b_IN_CRF with the consensus IN_CRF sequence was obtained from the vector pET-15b_IN_CRF* by sequential site-directed mutagenesis, resulting in the amino acid substitutions I32V and I259V. The vectors encoding IN_CRF with substitutions Q148K/G140S and G118R/E138K were prepared by site-directed mutagenesis of the plasmid pET_15b_ IN_CRF using a QuikChange II Site-Directed mutagenesis kit (Agilent Technologies, USA).
The prokaryotic expression vector pET_15b carrying the gene of IN_A was a kind gift from M.G. Belikova-Isaguliants (Ivanovsky Institute of Virology, Russia).

Preparation of recombinant proteins
Consensus IN_CRF and IN_A proteins and those with the mutations Q148K/G140S and G118R/E138K were expressed in Escherichia coli strain Rosetta (DE3) (Novagen) and purified according to [24,25]. The proteins were analyzed by 12% Laemmli PAGE, followed by staining with SimplyBlue TM SafeStain (Invitrogen, USA).
The radioactive 32 P-label was inserted at the 5'-end of the U5B and U5B-2 oligonucleotides, and DNA substrates were formed as described in [25].

Integrase DNA binding activity assays
The kinetics of DNA binding by IN was studied using a fluorescence polarization assay on a Cary Eclipse Fluorescence Spectrophotometer (Varian, USA) according to [26]. The duplex 5'-Fl-U5B/U5A (10 nM) was incubated with 100 nM IN in 200 µl of buffer A (20 mm HEPES (pH 7.2), 10 mm DTT, 7.5 mm MgCl 2 ) at 25°C, and the values of fluorescence polarization of fluorescein (λ ex = 492 nm, λ em = 520 nm) were recorded at certain time points. A curve corresponding to the time dependence of changes in fluorescence polarization was constructed, and the binding rate constant (k on ) was calculated using the equation [IN/DNA] = [DNA] 0 × (1-e -kon*t ) [27].
The dissociation constant (K d ) of the IN/DNA complex was determined using the DRaCALA method (Differential Radial Capillary Action of Ligand Assay) [28]. The U5B/U5A duplex (5 nM) with the 32 P-labeled U5B-chain was incubated with IN at different concentrations (0-500 nM) in 10 µl of buffer A for 20 min at 25°C. Then, 5 µl aliquots of the mixture were applied on the Amersham TM Hybond TM -ECL nitrocellulose membrane. A Typhoon FLA9500 Phosphorimager (GE Healthcare, USA) was used for membrane analysis and quantification.

Integrase catalytic activity assays
For the 3'-end processing reaction, 5 nM U5B/U5A duplex (with 32 P-labeled U5B-chain) was incubated with 100 nM IN in buffer A as described in [25]. The reaction products were precipitated and analyzed by electrophoresis in 20% polyacrylamide/7 M urea gel in TBE buffer. Autoradiographic data analysis was performed using a GE Typhoon FLA 9500 scanner (GE Healthcare, USA). The efficiency of 3'-processing was determined using the ImageQuantTM 5.0 software as the ratio between the intensities of the bands corresponding to the U5B substrate and the reaction product U5B-2.
When analyzing the accumulation kinetics of the 3'-processing product, the reaction mixture was incubated at 37°C. The incubation time was varied from 5 min to 7 h; the reaction efficiency versus time was plotted. The initial reaction rate was determined from the angle of inclination of the initial section of the kinetic curve (~60 min).
The dependence between the 3'-processing efficiency and substrate concentration was determined by varying the DNA concentration (0; 2.5; 4; 10; 20; 50; 100 nM). The graphs showing the reaction efficiency versus substrate concentration were plotted; the maximum reaction rate (V max ) and the Michaelis constant (K M ) were determined.
For the strand transfer reaction, 10 nM U5B-2/ U5A duplex (with 32 P-labeled U5B-2) was incubated with 100 nM IN in buffer A for 2 and 4 h at 37°C. The reaction products were separated and analyzed as described above.

Inhibition of the strand transfer reaction
The strand transfer reaction was carried out as described above for 2 h in the presence of increasing inhibitor concentrations (raltegravir, Santa Cruz Biotechnology Inc., USA). The IC 50 value was determined based on the results of three independent experiments.

RESULTS AND DISCUSSION
Designing the consensus IN_CRF sequence and protein purification IN genes from 324 HIV-1 isolates from HIV-infected treatment-naïve patients in the Siberian (n = 250) and Ural (n = 74) federal districts of Russia were sequenced. Phylogenetic analysis showed that genetic variants of subtypes A (24.3%) and B (3.3%) were present, as well as recombinant forms CRF63_02A1 (55.3%) and various URFs formed as a result of secondary recombination of HIV-1 CRF63_02A1 and subtype A (6.7%).
Multiple alignment of the identified nucleotide sequences of CRF63_02A1 IN was performed, followed by translation, construction of the consensus amino acid IN sequence and its alignment with the  Among all the studied HIV-1 IN sequences, we selected one variant that was the closest to the consensus IN_CRF. Preparation of the cDNA encoding IN_CRF and its consecutive cloning allowed us to obtain the pET-15b_IN_CRF* vector, which was then subjected to site-directed mutagenesis to introduce the I32V and I259V substitutions and to obtain the pET-15b_IN_CRF expression vector coding for a consensus IN sequence. Genetic constructions with the drug resistance mutations G118R/E138K and Q148K/G140S were obtained by site-directed mutagenesis of pET-15b_IN_CRF. The constructed vectors were used for prokaryotic expression of recombinant IN proteins and their subsequent purification on Ni-NTA-agarose to a purity ≥ 90% (Fig. 2).

Characterization of DNA-binding activity
During integration, retroviral integrases bind to the ends of viral DNA and then interact with cellular DNA; the latter interaction is not sequence-specific [29]. Recombinant HIV-1 IN protein usually has the same affinity for DNA duplexes of different structures [30]. We evaluated the capacity of IN_CRF to bind the 21-mer DNA substrate representing the terminal sequence of Stability of the IN/DNA complex was determined using the DRaCALA method (Differential Radial Capillary Action of Ligand Assay) [28] that had been used earlier to study the complexes formed between IN_B and DNA [31]. We found that the K d values were similar for the complexes of DNA with both IN_CRF and IN_А ( Fig. 3A and Table 1), as well as IN_В [32].
The fluorescence polarization method was applied to study the kinetics of IN binding to the fluoresceinlabelled duplex 5'-Fl-U5B/U5A used as the DNA substrate. Having compared DNA binding by IN_CRF and IN_A, we found that the rate of DNA binding was higher for IN_CRF; the binding rate constants (k on ) for these enzymes differed 2.8-fold (Fig. 3B and Table 1). The binding rate constant k on for IN_A (0.24 min -1 ) was close to the value determined earlier for IN_B (0.18 min -1 ) [27].
The sequences of INs of HIV-1 subtypes A and B differ by 16 amino acid substitutions, 11 of which reside in the catalytic core; two reside in the C-terminal; and three, in the N-terminal domain. The latter three substitutions are synonymous: D3E, R20K, and V31I (Fig. 1). IN_CRF differs from IN_A and IN_B by four unique amino acid substitutions in the N-terminal domain: E11D, K14R, S24N, and M50I (Fig. 1). Keeping in mind that the rates of DNA substrate binding to IN_A and IN_B were comparable and differed significantly from that for IN_CRF, we could assume that this rate is mainly affected by the structure of the IN N-terminal domain. It is responsible for the IN multimeric state, which is crucial for its catalytic activity [33], and participates in binding of IN to the DNA substrate (viral DNA) [34][35][36]. There are two substitutions, S24N and M50I, in the structure of IN_CRF, which seem to be of greatest interest. The presence of an amide group in Asn and a branched chain in Ile can affect the intermolecular interactions upon formation of the catalytically active state of IN. In addition, Lys14 is in direct contact with viral DNA and plays an important role in IN multimerization [35,37]. In IN_CRF, Lys14 is substituted with Arg. Although both these amino acids are positively charged, the Arg residue is more bulky, less hydrophobic and has a higher pKa value than Lys [38,39]. Besides, Arg is characterized by positively charged delocalization within the guanidine group and can form multiple hydrogen bonds with different orientations [40,41], which can contribute to DNA substrate binding.
Thus, amino acid substitutions that are inherent to IN_CRF due to natural polymorphism did not affect the stability of its complex with the DNA substrate but significantly influenced the complex formation rate.

Characterization of the catalytic activity of IN_CRF
IN is involved in two successive reactions during viral replication: 3'-end processing (in which it catalyzes the cleavage of the GT dinucleotide from both 3'-ends of viral DNA) and strand transfer (insertion of the processed viral DNA into the cell DNA). Both of these reactions can be studied in vitro using standard protocols [25]. We used a standard synthetic DNA duplex U5B/ U5A mimicking the U5 region of HIV-1 DNA LTR for the 3'-processing reaction. The duplex contained a [5'-32 P]-labelled U5B strand, which was turned into a product shortened by two nucleotides as a result of the reaction. In the strand transfer reaction, the [5'-32 P]-U5B-2/U5A duplex with the already processed U5B-2 strand was used both as a substrate and a target.
Studying the kinetics of 3'-processing showed that IN_CRF processes its substrate more efficiently and faster than IN_A does (Fig. 4, Table 1). In order to thoroughly clarify the reasons for the increased reaction efficiency, we determined the kinetic parameters of 3'-processing (K M and V max ). It turned out that the K M is 1.8 times lower and V max is 1.6 times higher for IN_CRF than for IN_A (Table 1). Therefore, IN_CRF is characterized by a higher 3'-processing rate, which is achieved at lower substrate concentrations. Accordingly, the catalytic efficiency (V max /K M ) of IN_CRF was almost three times higher than that of IN_A (Table 1). Obviously, such a high activity of IN_CRF cannot be attributed only to the higher rate of the DNA substrate binding (Fig. 3), especially taking into account that the dissociation constants for the complexes of both enzymes with DNA were similar. As mentioned above, the N-terminal domain of IN is responsible for its multimeric state, which changes when IN binds to its DNA substrate [42,43] to form the catalytically active enzyme-substrate complex. It is possible that amino acid substitutions located on the N-terminal domain of IN_CRF contribute to the formation of such a complex, thereby stimulating the more efficient reaction.
When studying the strand transfer reaction, we determined the reaction efficiency and rate, as well as the pattern of the reaction products, which demonstrates on which site the substrate is inserted into the target DNA. Strand transfer efficiency and rate were again higher for IN_CRF, whereas the pattern of products was the same (Fig. 5). Of note, the profiles of the strand  ity of IN_A in the 3'-processing reaction resulting from the Q148K/G140S and G118R/E138K mutations, but this decline was the same for both double mutations (3.8-fold) [25].
Similarly to IN_A [25], the Q148K/G140S and G118R/E138K substitutions significantly reduced the efficiency of strand transfer catalyzed by IN_CRF ( Table 2). The decrease was slightly stronger in the case of G118R/E138K than for the Q148K/G140S mutation: 4.6-fold versus 3.7-fold, respectively. Interestingly, G118R/E138K substitutions in the case of IN_B affected the strand transfer efficiency very slightly [25]. The strong negative effect of these substitutions in both IN_CRF and IN_A is obviously related to the natural polymorphism S119P (Fig. 1), resulting in  (Fig. 1). Therefore, location of the target DNA in its complexes with IN_CRF and IN_A is similar and differs from its location in the complex with IN_B.

The influence of drug resistance mutations on IN_ CRF activity and its sensitivity to raltegravir
Since no data are available about drug resistance mutations in the genetic variant of the virus under study, we introduced mutations known to confer resistance to strand transfer inhibitors in other HIV-1 subtypes in the IN_CRF gene. We chose the primary mutation Q148K and the secondary compensatory mutation G140S causing IN resistance to raltegravir and elvitegravir [44,45]. The G118R and E138K substitutions resulting in reduced IN sensitivity to dolutegravir were also selected [46,47]. Therefore, IN_CRF protein variants containing Q148K/G140S and G118R/E138K double substitutions were prepared (Fig. 2). We investigated their DNA binding activity and the dependence of their 3'-end processing efficiency on the IN concentration and reaction time. It turned out that the introduced mutations did not affect the stability of the enzyme-substrate complex but significantly reduced the IN catalytic activity (  Fig. 6 and [25]). The Q148K/G140S substitutions did not change the pattern of the reaction products when compared to the initial IN_CRF (Fig. 6).
We also studied the sensitivity of IN_CRF and its Q148K/G140S and G118R/E138K mutants to inhibition by raltegravir, the drug used for treatment of HIV-infected patients in the Russian Federation. IN_CRF was efficiently inhibited by raltegravir ( Table  2); the IC 50 value was close to those obtained earlier for IN_A and IN_B [25]. Introduction of Q148K/G140S resistance mutations detected in other HIV-1 subtypes also led to the emergence of IN_CRF resistance. We observed a 70-fold increase in the IC 50 value, which was consistent with the data previously obtained for IN_A [25]. The G118R/E138K mutations also reduced the sensitivity of IN_CRF to raltegravir but not substantially (FC = 7, Table 2). It should be noted that IN_A bearing G118R/E138K mutations exhibited almost no drop in sensitivity to raltegravir [25].

CONCLUSIONS
The recombinant IN protein from a new HIV-1 genetic variant, CRF63_02A1, that is rapidly spreading across Siberia has been identified and characterized for the first time. IN_CRF was found to catalyze both 3'-processing and strand transfer reactions faster and more efficiently than IN of HIV-1 subtype A does. The high rates of these reactions are likely to be ensured  Note. The average values of at least three independent measurements (± standard deviation) are presented. ND -the values were not determined. *According to [25].
DNA positioning in the active site of the enzyme-substrate complex was similar for both INs.
The results obtained allowed us to suggest that the resistance of the HIV-1 genetic variant CRF63_02A1 to raltegravir may develop primarily due to the emergence and fixation of Q148K/G140S mutations as it was described for other HIV-1 subtypes [48]. The introduction of these mutations in IN_CRF led to a 70-fold increase in resistance to this inhibitor as compared to the initial parental IN_CRF. The G118R/E138K mutations resulted in only a seven-fold increase of the resistance of IN_CRF to raltegravir.

This work was supported by the Russian Science
Foundation (grant no. 16-15-10238).