Characterizing key nucleotide polymorphisms of hepatitis C virus-disease associations via mass-spectrometric genotyping

As more than 80% of hepatocellular carcinoma patients in Japan also suffer from hepatitis C virus infections some time in their medical history, identifying genetic aberrations associated to hepatitis C virulence in these individuals remains a high priority in the diagnosis and treatment of hepatocellular carcinoma. From the BioBank Japan Project, we acquired 480 subjects of hepatocellular carcinoma, chronic hepatitis and liver cirrhosis, and genotyped 131 clinically relevant host single nucleotide polymorphisms to survey the potential association between certain risk alleles and genes to a patient's predisposition to hepatitis C and liver cancer. Among those polymorphisms, we found 12 candidates with statistical significance to support association with hepatitis C virus susceptibility and genetic predisposition to hepatocellular carcinoma. SNPs in genes such as XPC, FANCA, KDR and BRCA2 also suggested likely connections between hepatitis C virus susceptibility and the contraction of liver diseases. Single nucleotide polymorphisms reported here provided suggestions for genes as biomarkers and elucidated insights briefing the linkage of hepatitis C virulence to the alteration of healthy liver genomic landscape as well as liver disease progression.


Introduction
The seriousness of hepatitis C virus (HCV) infections lies in their elevation of risks pertaining to various life-threatening liver conditions such as liver cirrhosis (LC), chronic hepatitis (CH) as well as hepatocellular carcinoma (HCC).For instance, the estimated risk of hepatocellular carcinoma has been reported to be often 15 to 20 times higher in subjects infected with HCV (1).Statistics also suggest that 75-85% of people infected will become chronic carriers, and 10-15% of HCV infection cases will advance to cirrhosis within the first 20 years (2), along with increased risks of developing HCC.In Japan, ~80% of patients suffering from HCC is caused by prior or concurrent HCV (3), suggesting potential links between possible genetic predisposition and infection incidence.While genetic aberrations can direct and drive carcinogenesis, exact details, especially with the ethnic Japanese population, linking HCV virulence to HCC susceptibility (and to a lesser extent LC and CH) remain opaque.Preventative treatment options for HCV are also lacking in a similar regard, although a meta-analysis by Colombo and Iavarone had suggested the possibility of reducing HCC risks in a small fraction of cases with interferon administration (4).To-date, while various studies have sought to characterize the effect of genotypic alterations in liver diseases, the role of SNPs in liver diseases still remain poorly understood (5)(6)(7)(8)(9)(10)(11).Additionally, while genes such as IFNγ and IL-28B have been reported to harbor potential associations to HCV infections leading to conditions such as fibrosis and jaundice (12)(13)(14), no clinical screening and characterization studies have been conducted between HCV and severe liver conditions such as CH, LC and HCC.
With an estimated 143 million people infected with hepatitis C worldwide (15), early genetic screening can have a beneficial role in greatly improving the quality of life for those

Characterizing key nucleotide polymorphisms of hepatitis C virus-disease associations via mass-spectrometric genotyping
afflicted with liver diseases.With the establishment of the BioBank Japan Project that stores and maintains a number of annotated liver disease cases, we decide to explore the possibility, if any, that certain genetic risk factors are associated with the prevalence of HCV infections to hepatocellular carcinoma in the Japanese population by massARRAY genotyping, a mALDI-TOF mass spectrometry-based approach well demonstrated in medium-to large-scale cohort studies (16) to analyze a set of specific polymorphisms.We herein present one of the earliest studies attempting to assess several clinically relevant marker gene and variant candidates linking HCV to liver cancer, as well as CH and LC for the purpose of comparison, in the Japanese population using DNA extracted from blood specimens.For each candidate, we then assessed its significance by comparing genotypic frequencies against records from the Japan Single Nucleotide Polymorphisms databank (JSNP), NCBI dbSNP and the Tohoku medical megabank (Tmm) datasets (17)(18)(19)(20) to elucidate their potential roles as risk factors in HCV-induced liver diseases and their pertinence in other ethnic groups.

Material and methods
SNP selection.Among published SNPs and polymorphisms in genes relevant to cancer in recent literature, we generated a preliminary list of SNP candidates for screening.To broaden the scope, this list was later expanded to include a small number of SNPs that had few reports of clinical significance to-date, but were well-conserved, disease-associated and believed to impact protein structure by SNPs3D searches (21).In all, 131 SNPs over 4 primer sets were found to meet the massARRAY prerequisites and selected for analysis (tabulated data of SNPs and corresponding primers available upon request).
Specimen collection and subject demographics.Blood DNA samples from a collection of CH (200), LC (80) and HCC (200) subjects were obtained from the BioBank Japan Project (22).Hepatitis subjects had a gender makeup of 103/97 male/female, with 102 over the age of 50 (mean 54±15).Samples were confirmed to be HCV-negative and cancer-free.The Cirrhosis test group consisted of 45 males and 35 females, with 71 people over the age of 50 (mean 65±11) and similarly confirmed to be HCV-negative and cancer-free.Among HCC patients of 103 men and 97 women, 194 were over 50 (mean 69±8) and all cases were confirmed to be HCV-positive.No other medical history (including history of hepatitis virus B infection) or personal information was obtained.For the purpose of statistical assessment, a simulated control ('healthy') set based on JSNP (last updated may, 2014) and NCBI data (build 142, October, 2014) was used.In cases where multiple genotypes exist from both JSNP and NCBI, prevailing JSNP results were used for the control set.The Tmm 2KJPN database (release June, 2016) was also consulted for comparative analysis of variant allele frequencies for the Japanese population.
SNP detection by single-base extension.DNA specimens (5 µg) were diluted to the stock concentration of 10 ng/µl prior to PCR using the iPLEX Gold reagent kit (Sequenom, San Diego, CA, USA).Each specimen (1 µl) was mixed with the primer mix (500 nm) prior to SNP isolation, and 6/96 samples were then randomly selected to confirm the extent of PCR by gel electrophoresis.Four primer mixes, each containing 33 SNPs, were prepared to the same final composition per Sequenom's instructions.PCR products underwent shrimp alkaline phosphatase (0.5 unit) treatment to dephosphorylate unincorporated deoxynucleotides at 37˚C for 40 min followed by 85˚C for 5 min.Extension reactions were performed following the iPLEX-Extend protocol by preparing each sample in a solution of 0.62 µl water, 0.2 µl 10X iPLEX buffer plus, 0.2 µl iPLEX termination mix, 0.94 µl extend primer mix, 0.04 µl iPLEX enzyme for reaction as follows: 94˚C (step 1, 30 sec), 94˚C (step 2, 5 sec), 52˚C (step 3, 5 sec), 80˚C (step 4, 5 sec), in which steps III and IV were repeated for 5 cycles followed by 40 cycles of steps II-IV before the final step at 72˚C for 3 min.Following instructions from the SpectroChip Chip and Resin kit (Sequenom), samples were characterized by mass spectrometric analysis on a massARRAY compact mALDI-TOF mass spectrometric analyzer.Oligonucleotides were binned with replicates per manufacturer's recommendations to minimize calling failures.Data were acquired and processed using EpiTYPER 1.0 in the Sequenom massARRAY Workstation suite.For illustrative purposes, some data and cluster plot/spectrum snapshots were exported to XML and tab-delimited text files, respectively, for replotting and annotation with R 3.2 (available R-project.org).

Statistical analysis.
Statistical analyses were performed using SPSS 15.0 (IBm Corporation, Armonk, NY, USA) and R. Allele and genotypic differences between healthy (see Specimen collection) and diseased subjects as well as ethnicity were evaluated by the χ 2 test of significance.Ancestral/reference and alternative allele were reordered for presentation according to dbSNP conventions; in certain cases where the ancestral allele was ambiguous, e.g.PMS2 (rs1805321), the statistically dominant allele in dbSNP was selected as the ancestral allele (e.g.C for rs1805321).Comparisons to the Tmm dataset were determined in R, but not used to determine significance as only allele frequencies were available.A predefined α-level of 0.05 was used for statistical assessment.Per the exploratory and targeted nature of this study, P-values were not corrected for multiple comparisons; instead, the authors advised readers to use corrected significance levels, for instance the Dunn-Šidák corrected α'=0.00039(m=131) when cross-examining results here with other non-targeted genotyping association studies to account for potential consequences of multiple hypothesis testing.
Changes to local protein structure.As specimens obtained for this study came from a Japanese population, we consulted annotated records from Tmm to determine potential changes to local protein structures.For non-synonymous SNPs meeting the significance level, such changes were predicted by evaluating the impact of amino acid substitutions by a composite probability from HumVar PolyPhen-2 (2.2.2r398) and Grantham scores (23,24).PolyPhen2 FPR thresholds were appraised at 10/20%.For the purpose of calculation, the Grantham value of all non-identical amino acid replacements were transformed to a kernel density function in R and re-evaluated as cumulative probabilities.Scores were averaged by the geometric mean (square root of the product of two probabilities).
Interaction and gene enrichment analysis.Potential interactions among the list of significant candidate genes were analyzed by GenemANIA to identify the top 25 related genes.Interactions were selected to allow at most 25 genes with 25 attributes, and filtered by the presence of possible associations via co-expression, co-localization, genetic and physical interactions, pathway information as well as predicted interactions.Networks were analyzed and visualized in Cytoscape.Statistically enriched Reactome pathways (version 58) were identified by the Panther overrepresentation test with a predefined α-level of 0.05 to assess individual pathway significance.
Linkage disequilibrium analysis.Potential linkage disequilibria were examined using LDproxy in the LDlink suite (25) to possible linkages among the significant SNPs, using the Japanese population genotype data from the Phase 3 of the 1000 Genomes Project as input.Proxy locus pairs with correlation coefficients (R 2 ) >0.80, and RegulomeDB scores of 1-5 (including subgroups of 1-3) were retained for further analysis.Variant pair associations, as a function of the correlation coefficient between loci pairs, were visually examined in Cytoscape as network diagrams to identify common functionally active proxy variants.
Cohort allele frequency dilution.VCF files containing individual variant data information from the Phase 3 1000 Genomes Project (26) were acquired via the Data Slicer (GRCh37 release 89, last accessed August, 2017) for each of the 12 candidate polymorphisms.To test the effect on changes in candidate allele frequencies if samples from this population were mixed with the cohort.n=0-480 randomly selected genotypes from G1000 were mixed with the cohort, also randomly selected to make up a final sample size of 480, in order to examine the resultant reference (A) and alternative (B) allele counts as well as the mean allele frequencies (%B) as a function of n.In a separate validation, both the cohort and G1000 populations were randomly sampled (100 individuals/set) to evaluate the likelihood that both populations contained different mean allele frequencies.Hundred random trials were performed for both validations in R, with statistical significance by two-sample t-tests.
We also evaluated the likelihood that these variants differed from corresponding records in the Tmm data-bank by their maximum χ 2 statistics between the JSNP and dbSNP datasets as a potential screen to confirm the ethnic specificity of those variants to the Japanese population, but did not assign significance based on these statistics as the Tmm databank contained considerably less publicly available information than JSNP or dbSNP at the time of analysis.most of the variants appeared to have allele frequencies that varied across different phenotypes, some with only one subgroup that strongly deviated from the other two, e.g. 3 in only HCC, and others in multiple subgroups, e.g. 4 and 8. Generally, the HCC subgroup frequently exhibited allele frequencies that deviated from the other two phenotypes, for example candidates 5 and 6.For SNPs with significant differences in genotype and allele frequency, all but 1 (AMACR) expressed differential compositions across different ethnic groups (Table II), suggesting that they may be specific to the Japanese population.
We also explored whether potential linkage disequilibria existed among candidates in Table I to ascertain the possibility of non-random association with other alleles as means to confirm their true phenotypic implications on the liver diseases.While some of those SNPs did have proxy pairs that would suggest some extent of linkage disequilibria (available upon request), we observed no common proxy variants among them (Fig. 1B and C).Annotations of those proxy variants also suggested either synonymous mutations or the lack of clear biological function.As not all variants in Table I were on the same chromosome, and most linked proxy pairs were outside the immediate proximity (only 12/170 were within 1,000 bp; data available upon request), potential associations on allele frequencies of SNPs in Table I by those proxy variants most likely had no effect on disease outcomes.

Discussion
Roughly 10% of the SNPs genotyped in this study displayed critical roles in liver functions and various cancer phenotypes.For instance, AMACR is an ubiquitously expressed enzyme responsible for bile branched-chain fatty acid metabolism (27) that undergoes increases in mRNA and protein expressions in colorectal and prostate cancers as well as pre-cancerous hyperplastic polyps in the large intestine (28,29).Co-increases of COX-2 expression and AMACR could also potentially lead to immune suppression, neoplastic changes and tumor invasion (30).As polymorphisms in AMACR have been implicated in prostate and colorectal tumorigenesis (28), its possible contribution in HCV infection-induced disease phenotypes shall not be overlooked.Similarly, despite the lack of prior evidence suggesting direct association of ARHGAP8 to HCV-induced HCC, the loss of Rho-suppressing ARHGAP7 has been implicated in multiple cancers including HCC (31).Regulatory features of RhoGAP via Rho has been said to attribute to reductions in hepatic fibrosis, suggesting that a loss of function variant may induce liver cirrhosis in a manner independent from HCV infection.Adding to the ability for statins to attenuate liver fibrosis in chronic HCV infections (32), inter-regulatory feedback of Rho acylation by members of the ARHGAP family, for instance ARHGAP8, may well attribute to the aberrant dysregulation, via HCV infections, during HCC development (33).While mutations in DNA mismatch or strand-break repairs have always been intimately associated with tumor development, the presence of risk alleles in XPC, a gene critical in nucleotide excision repair (34), further highlights the importance of exogenous genomic editing by HCV in HCC.XPC has been said to elevate the rise of skin cancer 1,000-fold and escalating organ neoplasms as much as 10-fold (35) through its role in xeroderma pigmentosum, a pre-cancerous condition.From this, it is likely that defects in XPC may also increase one's susceptibility to HCC.Additionally, aberrations in FANCA, a known tumor suppressor (36) as well as BRCA2, a DNA repair gene, can have serious repercussions leading to oncogenesis, most notably breast cancer, both in Japanese and other ethnic cohorts (37)(38)(39)(40).Among these reports, candidate 8 also belongs to the group of 25 breast cancer potential risk alleles, sufficiently justifying the need for further investigation into its mechanistic role in HCV-induced HCC.Likewise, mutations in PMS2 may be similarly implicated in response to HCV virulence as the gene was most remarkably known for its role in DNA mismatch repair and links to microsatellite instability as in hereditary nonpolyposis colorectal cancer (41)(42)(43).
Our genotyping results also revealed some curiosities on the role of functional surrogates in HCV-induced oncogenesis, particularly through potential risk alleles in ADAMTS16.ADAMTS16 belongs to a family of secreted metalloproteases that tends to be fairly tissue limited, and the discovery of its association to HCV was unexpected as ADAMTS16 was primarily expressed only in the lung and brain; with that said, recent literature did suggest the possibility of high-quantity shuttling of ADAMTS13 and ADAMTS19 to unexpected destinations, such as tumors and nearby tissues in cases of osteosarcoma, melanoma and colon cancer (44).Surfacing of ADAMTS16 in HCC samples could also infer the occurrence of a similar phenomenon by mutations or other mechanisms.Interestingly, while some in the ADAmTS family were implicated in diseases such as angio-inhibition (45) and Ehlers-Danlos syndrome (46), most remain relatively unknown and inconclusive (44,47,48).As such, risk alleles in ADAMTS16 as well as the observation that these metalloproteases were typically downregulated in cancer as a consequence of promoter hypermethylation (49), would suggest that polymorphisms in ADAMTS16 may potentially push the gene to displace and redistribute itself and other members in the family, possibly via extracellular matrix remodeling (50), and become oncogenic.Additionally, not all conditions were genetically predisposed to the same polymorphisms (Table III) or reacted identically to the virulence of HCV.For example, a mutation to 3 would more likely drive onset in HCC over LC or CH, while subjects with a mutation to 6 would perhaps present HCC phenotypes differently.Odds ratios (OR >1, P<0.05) also inferred differential susceptibility for subjects with mutations in XPC and BRCA2, as XPC showed elevated odds for CH and BRCA for all subgroups.Just as the BRCA genes helped physicians to assess breast cancer susceptibility, the combination of BRCA and perhaps a combination of several aforementioned SNPs could also help elucidate risks to HCC by the process of elimination.
In some situations nearby variants may act in concert and contribute jointly to particular phenotypes; to examine the possibility of these nonrandom associations, we next explored Table I possible interactions and the presence of common functionally active proxy variants associated with polymorphisms identified in this study.While the presence of a small interaction revealed the presence of common interacting partners with the 10 candidate genes, for instance SMARCA4, PPP2CA and XPO1 (Fig. 1A), linkage disequilibrium analysis painted a slightly different picture, alternatively suggesting that these candidates were genotypically pairwise independent, and contributed cumulatively to the outcome of each disease.
Additionally, multivariate analysis across genders and ages revealed no strong covariances among the candidates (data not shown).Interestingly, enrichment analysis highlighted strong connections to DNA repair and post-translational modifications, two biologically orthogonal functions, for the candidate genes and their interacting partners (table available upon request).Enrichment in the two nonparallel pathways would thus posit an overall additive effect that reflected the candidates' genotypic independence.While defects in DNA repair Table II.Stratified comparisons of significant SNPs across different ethnic groups.were commonplace in cancer as a consequence of unregulated division, no mechanistically direct explanation could answer the functional enrichment in protein modification as a consequence; as such, the hypothesis that both pathways, via functional alteration by the candidates genotyped, acted in concert to achieve this additive effect reflected in the association of HCV to HCC, appeared to be a plausible one and hopefully could be further pondered.We also analyzed changes in protein local secondary and quaternary structures in order to attribute the impact of non-synonymous mutations to the connection between HCV and HCC to ascertain the underlying routes to disease progression.Although DNA repair mechanisms may be biochemically similar across different types of cancer, focused studies on BRCA2 and FANCA still remain relatively unexplored in HCV-induced HCC.Especially since hepatic tissues are constantly under oxidative stress, genome repair during acute and chronic HCV infection and the oncogenesis of HCC are critical in understanding disease progression.We observed that ADAMTS16, SELE and KDR all carried polymorphisms with potentially damaging mutations; as a matter of fact, both ADAMTS16 and SELE were associated with HCC and CH, suggesting a possible path of disease progression upon HCV infection from CH to HCC via these mutations (Table Ι).Nonetheless, the fact that KDR also manifested a damaging mutation and was associated with HCC would also suggest a possible role for the gene, as KDR-mediated mechanisms could support  I; edges between nodes are illustrative only, with lengths not respective of association strengths.A fully annotated network diagram as vector graphics is available upon request.
A difficulty in establishing genetic connections between diseases and HCV infections is that the virus can remain dormant in the liver, leaving the patient with normal liver functions and no symptoms.As such, disease progression in early stages becomes retarded.These associated SNPs can, in theory, display no phenotypical effect during early onset, which can delay diagnosis and treatment altogether.As both acute and chronic HCV infections could transform healthy livers, the full and exact extent of the infection would be difficult to gauge.HCV carriers may present normal ALT levels from normal liver panel screenings and the amount of fat deposits in the tissue may also mask the presence of HCVs, liver biopsies are likely to be inconclusive in these cases for determining the stage of infection.As such, extended genotyping studies beyond the one described herein, alongside functional characterizations of BRCA2 and FANCA, may be a practical option clinically for diagnosis.
Among the different genotyping technologies, massARRAY proved to be a capable choice, demonstrating high sensitivities in various applications such as human papillomavirus genotyping for cervical cancer (52) or classification of ectopic crypts in colorectal cancers (53) via KRAS and BRAF mutations.Although we were unable to call every SNP on the initial test, we nonetheless obtained call rates surpassing 90% in duplicate typing of most SNPs, suggesting reasonably minimal genotyping errors (Fig. 2A and B).mass spectrometry-based genotyping offers excellent sensitivity in analyte detection and is thus an ideal approach for characterizing specimens from tissue depositories, where amounts of DNA may be very limited.Extension (1-bp) coupled with mass
The massARRAY method allowed us to maintain a reasonable subject size of 480, although the relatively lower number of LC cases might have marginally reduced the statistical power in this study.While increasing the number of test subjects could certainly be beneficial in improving the statistical power, such an approach was not always possible in practice.In consideration for the potential loss in statistical power, we performed two validation simulations to see whether these differences in allele frequencies were still preserved if subsets of our cohort were 'diluted' with a different population sample.By taking genotypes from randomly selected G1000 individuals and mixing them with (also randomly selected) subjects in the cohort, we assembled subsets of 480 individuals and checked the frequencies of the candidate risk allele.Plotting such changes as a function of different numbers of G1000 individuals (Fig. 2C) revealed moderate frequency shifts for all of the candidate polymorphisms.Frequencies of these risk alleles thus appeared to be condition-specific, as they were certainly susceptible to dilution with subjects from a different and considerably more heterogeneous population of various ethnicities and health states.Additionally, bootstrapping our cohorts to smaller subsets of 100 individuals for direct comparison to sets of equal sizes in the G1000 population also confirmed differences in allele frequencies (Fig. 2D).While such tests could not truly offset benefits of increasing cohort sizes, results nonetheless demonstrated a reasonable level of statistical robustness considering the size of our cohort.
Genotyping more candidates, e.g.Tolloid-like 1 protein (54), from overlooked biochemical pathways (for instance, stress responses, inflammation and cell cycles) could allow us to further understand the manner in which HCV infections evolve phenotypically.Characterizing changes in known and nearby polymorphisms in genes implicated in insulin biosynthesis, inflammation and adipose tissue remodeling, etc., would be highly useful in deciphering risk factors of obesity to HCV-induced HCC; in a similar vein, alcohol metabolism and various other pathways related to opioid metabolism would also be highly useful in risk assessment of substance abuse to HCC.Additionally, this type of candidate expansion would be particularly more beneficial in analyzing ethnically more heterogeneous populations.As most candidate SNPs, with the exception of AMACR (rs34677), exhibited pairwise independence across non-ethnically Japanese datasets, our results Table III hinted that HCV-induced liver disease progressed differently from other infection-driven oncogenesis such as the case with Helicobacter pylori and stomach adenocarcinoma.Based on the differences across different datasets, we would recommend increasing the number of candidates for genotyping to validate this hypothesis.Nonetheless, our approach here could still serve as a useful tool for early diagnosis based on liver disease examination.An ongoing discussion in liver cancer is the possibility of oncogenic addiction, in which cancer progression is often controlled by a handful of driver genes subsequently leading to uncontrollable growth and eventual transformation to tumor.Horizontally integrating SNP-based oncogenesis information from one cancer type to another can be a useful approach in deciphering the puzzle.For liver diseases that may ultimately lead to cancer, identifying function-altering SNPs is a useful tool for facilitating earlier diagnosis and expediting treatment.Furthermore, for cancers which the theory of oncogenic addiction well applies, the ability to identify driver mutations quickly also has significant therapeutic implications.Based on our previous knowledge of SNPs in liver diseases and oncogenes, we utilized a targeted SNP-screening approach using a massARRAYbased protocol from a set of 131 SNPs, and were able to identify 12 positions with implications ranging from disease onset, survival rates and in other metrics.Additionally, we were also able to highlight the significance of BRCA2 and FANCA germ-line mutations and associate them to liver cancers.most of these HCV-dependent mutation candidates were non-synonymous, and their association with other risk factors further suggested their potential roles as biomarkers for the liver conditions described in this study.
It is however important to note that several factors such as age have not been well explored in this study, as illuminated by the relatively smaller sample sizes of subjects under 50 years old in all three disease groups; additionally, at 80 samples, the statistical power for SNPs associated to cirrhosis may also be lower than the other two groups.Other factors such as hepatitis B infection, substance use or obesity could also influence disease progression.It is now common knowledge that sharing syringe needles is a common way that HCV is transmitted between persons, and that alcohol abuse has been said to worsen chronic HCV progression (55).Obesity, mostly through nonalcoholic fatty liver disease and type 2 diabetes, can also have an effect on the outcome of liver cancer (56).In the Japanese population, hepatitis B infection attributed to ~16% of HCC cases (2), a fraction ~1/5 of HCV infections; while relatively minor compared to HCV, hepatitis B infections still present a great public health concern and future genotyping studies should also consider the inclusion of genes potentially implicated in these infections.Along this note, it is also important to note that the status of HCV infections such as virus titers, genotypes and history of prior treatments may also affect the progression of liver cancer and thus critical to be taken into consideration in the cohort selection.Nonetheless, the incorporation of in silico screening and whole-exome sequencing data to identify and refine the list of SNP candidates for future MassARRAY studies should provide definitive improvements for more reliable characterization of disease associations and medium-size clinical cohort studies.

Figure 1 .
Figure 1.Single nucleotide polymorphisms (SNPs) potentially associated with hepatitis C virus-induced liver conditions in surveyed Japanese subjects, as identified by MassARRAY genotyping.(A) Interaction network of genes with candidate SNPs and their top 25 common interacting partners.Blue ovals, candidate genes; pink, interacting partners, with gray edges indicating possible associations.(B) Scatterplot of proxy variants to candidate SNPs and the linkage disequilibrium correlation coefficient.Dist, distance between a proxy variant to candidate SNP, expressed as log 10 of the absolute value in bp; R 2 , the correlation coefficient of linkage disequilibrium.(C) Association networks of proxy variants (green) to candidate SNPs (black) as numbered in TableI; edges between nodes are illustrative only, with lengths not respective of association strengths.A fully annotated network diagram as vector graphics is available upon request.

Figure 2 .
Figure 2. Genotyping hepatitis C virus (HCV)-associated single nucleotide polymorphisms (SNPs) by massARRAY technology and validation by G1000 cross-sampling.(A) Sample call high/low mass weight cluster plot.(B) Sample mass spectrum, m/z 7000-8000.mutation highlighted here is rs1051624 (CDH17, forward primer, ACGTTGGATGACTCATGCCCGACTGTCTAC).UEP, unextended extended primer (green); ions containing its corresponding genotypes are indicated by red and blue vertical lines, and UEPs of adjacent SNPs are illustrated in gray.(C) Allele frequencies of the cohort exhibited deviations evident from dilution with a different population.Line plot of alternative allele frequencies (%B) of the 12 candidate polymorphisms in a set of mixed cohort/G1000 genotypes were visualized as a function of the number of G1000 subjects intermixed with the Japanese cohort totaling 480 samples.Indices 1-12 correspond to candidate SNPs listed in Table I. (D) Barplot of candidate risk allele frequencies (%B) between random subsets of 100 cohort subjects (red) and 100 G1000 subjects (blue).Error bars indicate SEM.Significance evaluated by Student's t-test of frequencies from 100 random trials; ** P<0.01.

Table I .
Genotypes, alleles and impact on local protein structure of statistically significant SNPs and corresponding JSNP and dbSNP entries.
. Continued.curated JSNP or dbSNP genotype or allele counts and corresponding P-values.ID, SNP index to be used throughout this article.ST, significance type (statistical significance against control genotype 'GT' or allele 'AL' frequencies, respectively); A/B, A and B alleles; DP, disease phenotype; mut, mutation type (Int, intron; Syn, synonymous); PD, PolyPhen2 HumDiv score; PV, PolyPhen2 HumVar scores; GV, Grantham value; PG, Grantham probability of impact from amino acid substitution; GmI, geometric mean impact on protein structure.

Table III .
Stratified genotypic comparisons of significant SNPs among disease phenotypes.