Identification of copy number variation-driven genes for liver cancer via bioinformatics analysis

To screen out copy number variation (CNV)-driven differentially expressed genes (DEGs) in liver cancer and advance our understanding of the pathogenesis, an integrated analysis of liver cancer-related CNV data from The Cancer Genome Atlas (TCGA) and gene expression data from EBI Array Express database were performed. The DEGs were identified by package limma based on the cut-off of |log2 (fold-change)| >0.585 and adjusted p-value <0.05. Using hg19 annotation information provided by UCSC, liver cancer-related CNVs were then screened out. TF-target gene interactions were also predicted with information from UCSC using DAVID online tools. As a result, 25 CNV-driven genes were obtained, including tripartite motif containing 28 (TRIM28) and RanBP-type and C3HC4-type zinc finger containing 1 (RBCK1). In the transcriptional regulatory network, 8 known cancer-related transcription factors (TFs) interacted with 21 CNV-driven genes, suggesting that the other 8 TFs may be involved in liver cancer. These genes may be potential biomarkers for early detection and prevention of liver cancer. These findings may improve our knowledge of the pathogenesis of liver cancer. Nevertheless, further experiments are still needed to confirm our findings.


Introduction
Primary liver cancer is the fifth most frequently diagnosed cancer globally and the second leading cause of cancer-related mortality.In developing countries, incidence rates are 2-to 3-fold higher than in developed countries (1) and it currently results in 360,000 cases and 350,000 deaths a year in China.The clinical prognosis is very poor with the medium survival time approaching 6 months (2).Hepatocellular carcinoma (HCC) is the most common type of liver cancer.Most cases of HCC are induced by either a viral hepatitis infection (hepatitis B or C) or cirrhosis.Despite recent discoveries in screening and early detection, HCC exhibits a rapid clinical course with an average survival of 6 months and an overall 5-year survival rate of 5% (3).Therefore, there is an urgent demand for biomarkers of early detection and targeted therapy.
Copy number variations (CNVs) are alterations of the DNA and they are being identified with different genome analysis platforms, such as array comparative genomic hybridization (aCGH), single nucleotide polymorphism (SNP) genotyping platforms, and next-generation sequencing.CNVs are involved in human health and disease (4,5) and are currently being applied for the diagnosis of various diseases (6,7).
CNVs also play important roles in the pathogenesis of various types of cancer, such as CNVs of epidermal growth factor receptor (EGFR), which have been associated with head and neck squamous (8), non-small cell lung (9), colorectal (10) and prostate cancer (11).Previous studies have indicated that decrease in the copy number of mitochondrial DNA may be a critical event during the early phase of liver carcinogenesis (12,13).Guichard et al conducted an integrated analysis of somatic mutations and focal copy-number changes and subsequently identified several key genes and pathways in HCC (14).
In the present study, we carried out an integrated analysis of liver cancer CNV data from The Cancer Genome Atlas (TCGA) and liver cancer expression profile data from the EBI Array Express database using bioinformatic tools, aiming to identify CNV-driven genes.These CNV-related differentially expressed genes (DEGs) may be potential biomarkers for early diagnosis or treatment.In addition, they may aid in identifying underlying mechanisms of liver cancer.

Materials and methods
Data sources.The CNV data set was obtained from TCGA database.Genome-Wide SNP array 6.0 chip was used to detect CNV information in 323 pairs of cases and controls with hg19 as the reference genome.Level 3 data were adopted in the following analysis.CNV sites and mean segment information were acquired in each sample.Gene expression data set E-MTAB-950 in original CEL format were downloaded from EBI Array Express.A total of 30 samples were selected out, including 10 normal liver tissue samples and 20 liver cancer samples.
Pretreatment of gene expression data.CEL format was converted into expression matrix using the rma function from package affy of R. Probes were then mapped into genes using Bioconductor with annotation files of Affymetrix Human Genome U133 Plus 2.0 Array.Expression values were averaged when multiple probes were mapped into a single gene.Box plots for gene expression data before and after normalization were plotted using R.
Pretreatment of CNV data.The case and the control group were pretreated separately.The distribution of CNVs on the 22 chromosomes was analyzed in three intervals, 1-10, 10-50 and >50 kb, respectively.P-values of difference in CNV distribution between the case and the control group were calculated using permutation test.Circos circular diagram was plotted to display CNV distribution.DEGs were also marked in the diagram.
Screening of DEGs.Differential analysis was performed with package limma to screen out DEGs. |log2FC| >0.585 (i.e.absolute fold-change >1.5) and adjusted p-value <0.05 were set as the cut-offs.
Screening of potential liver cancer-related CNVs.Using hg19 annotation information provided by UCSC (15), genes in CNV regions and values of CNVs were obtained.Liver cancerrelated CNVs were then screened out according to the criterion that it is not observed in controls but is detected in >80% of cases.The gene-CNV matrix was constructed and missing value was filled up with 0 (i.e.log2 (segment_mean) = 0, copy number 1).Screening of CNV-driven genes.Matrix of CNVs and expression values were constructed and correlation analysis was performed on genes with both values.Genes showing same trends in significant differential expression and CNV were termed as CNV-driven genes.
Functional enrichment analysis.Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed on DEGs and CNV-driven genes using Database for Annotation, Visualization, and Integrated Discovery (DAVID) (16) online tools.P-value <0.05 was set as the threshold to filter out significant terms.TF-target gene interactions were also predicted with information from UCSC using DAVID online tools.The transcriptional regulatory network was then visualized with Cytoscape (17).

Results
DEGs.A total of 19,944 gene expression values were obtained in normal liver tissue samples and liver cancer samples.Box plots for gene expression values before and after normalization are shown in Fig. 1.A total of 1,675 DEGs were identified in liver cancer, of which 1,090 were upregulated.
CNV data analysis results.CNV data were analyzed and distribution of CNVs in chromosomes is shown in Tables I and II, and Figs. 2 and 3.

Screening of potential liver cancer-related CNVs.
A total of 735 liver cancer-related CNVs were obtained.Matrix of genes and CNVs was then constructed.A total of 251 genes with CNVs gene expression values were selected out, of which 46 genes showed significant differential expression between liver cancer and normal liver tissue.Given CNVs in X and Y chromosome from controls were not included, 11 genes located in X and Y chromosome were excluded from subsequent analysis and 35 genes were retained for subsequent analysis (Fig. 4 and Table V).
CNV-driven genes and transcriptional regulatory network.A total of 25 CNV-driven genes were identified.Functional enrichment analysis results of these genes are shown in Table VI.

Duplications only
In the transcriptional regulatory network, 8 TFs have been linked to cancers and the other 8 TFs (AHRARNT, MAZR, NRF2, ROAZ, RORA1, SREBP1, TAXCREB and ZIC1) are implicated in regulation of the 21 CNV-driven genes and may play roles in the pathogenesis of liver cancer.The CD4 vs. CD8 lineage specification of thymocytes is linked to co-receptor expression.The transcription factor POZ (BTB) and AT hook containing zinc finger 1 (PATZ1, MAZR) has been identified as an important regulator of Cd8 expression (25).Transcription factor nuclear factor erythroid-2-related factor 2 (NRF2) is essential for the antioxidant responsive element (ARE)mediated induction of phase II detoxifying and oxidative stress enzyme genes (26).Shibata et al reported that mutations in NRF2 impair its recognition by Keap1-Cul3 E3 ligase and promote malignancy (27).Zinc finger protein 423 (ZFP423, ROAZ), a rat C2H2 zinc finger protein, plays a role in the regulation of olfactory neuronal differentiation through its interaction with the Olf-1/EBF transcription factor family (28).Sterol regulatory element-binding protein 1 (SREBP-1), a member of the basic-helix-loop-helix-leucine zipper (bHLH-ZIP) family of transcription factors, is synthesized as a 125 kd precursor that is attached to the nuclear envelope and endoplasmic reticulum (29).Human T-lymphotropic virus type 1 Tax interacts specifically with the cellular transcription factor CREB and the viral 21-bp repeat element to form a Tax-CREB-DNA ternary complex which mediates activation of viral mRNA transcription (30).These TFs merit further study to delineate their roles in liver cancer.
Collectively, the present study identified DEGs in liver cancer and disclosed a range of CNV-driven genes.Their biological functions and regulatory network were also discussed.These findings may improve our understanding of liver cancer and advance therapy development.

Figure 1 .
Figure 1.Box plots for gene expression data before (upper) and after normalization (lower).The medians (black lines) are almost at the same level, suggesting a good performance of normalization.

Figure 2 .
Figure 2. The distribution of copy number variations (CNVs) in the chromosomes.CNVs were divided into 3 groups according to the length.The horizontal axis represents different chromosomes and the vertical axis represents case/control ratio.The significance of case/control ratio was examined by permutation test and p-value was obtained.

Figure 3 .
Figure 3. Circos plots for the distribution of copy number variations (CNVs) in the chromosomes.CNVs were divided into 3 groups: (a) 1-10, (b) 10-50 and (c) >50 kb.From outside to inside, each circle represents 22 chromosomes, as well as X and Y chromosome; differentially expressed genes (green for the downregulated genes and red for upregulated genes); distribution of CNVs (length of line represents copy number; outward line represents duplicated CNVs, inward line represents depleted CNVs; CNVs in Y chromosome of control group were not included).

Figure 4 .
Figure 4. Scatter diagram of copy number and gene expression value.The horizontal axis represents copy number and the vertical axis differential expression.

Figure 5 .
Figure 5.The transcriptional regulatory network for the potential copy number variation (CNV)-driven genes.Red triangle represents transcription factor and yellow circle potential CNV-driven genes.

Table III .
Top 5 significant GO terms and KEGG pathways in upregulated genes.
GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; FDR, false discovery rate.

Table IV .
Significant GO terms and KEGG pathways in downregulated genes.
GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; FDR, false discovery rate.

Table V .
Result of correlation analysis between copy number and differential expression for the 35 genes.

Table VI .
Functional enrichment analysis results for potential CNV-driven genes.