Human cancer databases (Review)

Cancer is one of the four major non-communicable diseases (NCD), responsible for ~14.6% of all human deaths. Currently, there are >100 different known types of cancer and >500 genes involved in cancer. Ongoing research efforts have been focused on cancer etiology and therapy. As a result, there is an exponential growth of cancer-associated data from diverse resources, such as scientific publications, genome-wide association studies, gene expression experiments, gene-gene or protein-protein interaction data, enzymatic assays, epigenomics, immunomics and cytogenetics, stored in relevant repositories. These data are complex and heterogeneous, ranging from unprocessed, unstructured data in the form of raw sequences and polymorphisms to well-annotated, structured data. Consequently, the storage, mining, retrieval and analysis of these data in an efficient and meaningful manner pose a major challenge to biomedical investigators. In the current review, we present the central, publicly accessible databases that contain data pertinent to cancer, the resources available for delivering and analyzing information from these databases, as well as databases dedicated to specific types of cancer. Examples for this wealth of cancer-related information and bioinformatic tools have also been provided.


Comprehensive cancer projects
large-scale collaborative cancer projects generate a large amount of cancer data. the International cancer genome consortium (Icgc) (1) and the cancer genome Atlas (tcgA) (2) are the most prominent examples of such efforts (Table Ⅰ). ICGc aims to obtain a comprehensive description of the genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes that are of clinical and social significance (1). The curated data types are: sample donor IDs, cancer project, simple somatic mutations (ssMs) and genes with ssMs. the data are complemented by associated attributes, including the primary site of the tumor at diagnosis (e.g., brain, skin, blood, bone and prostate), gender, tumor stage at diagnosis (e.g., 4, M0, M3, 1, M1), available data types (e.g., copy number and structural somatic mutations, mirnA expression, gene expression, DnA methylation and exon junction. the Icgc Data portal (3) provides tools for querying, visualizing and downloading the data released quarterly by the consortium's member projects. the Icgc Data portal contains data from other large-scale cancer genome projects, including tcgA, Johns Hopkins University (Baltimore, MD, UsA) (4,5) and tumor sequencing project (tsp) (6). the Icgc Data portal is based on the BioMart data management platform (7,8), which uses a seamless federated data model to enable the cross querying of diverse biological databases in a unified manner. To maintain the uniformity of ICGC datasets, the same set of data models, ontologies, controlled vocabularies and references have been applied in all of the Icgc's member databases. three interfaces are available: cancer projects, advanced search and data repository. the cancer projects interface contains data available in the 49 Icgc member projects, as well as additional filters and a selection of attributes. the advanced search interface contains the complete set of filters and attributes. The database can be queried interactively using three main options: donors, genes and mutations. By selecting any of these options, the results are presented in tabulated form. The results can be filtered based on several search criteria. the Data repository provides access to all Icgc cancer project data, including uniformly processed and annotated data files. The results can be downloaded and exported for further analysis.
tcgA is a joint project of the national cancer Institute (ncI) and the national Human genome research Institute (nHgrI) (both from Bethesda, MD, UsA) that provides a comprehensive map of the important genomic changes that occur in the major types and subtypes of cancer (2). It contains clinical information, genomic characterization data and high level sequence analysis of the tumor genomes. the tcgA Data portal enables investigators to explore, download and analyze datasets genera ted by tcgA. the data types stored in tcgA include gene expression, copy number, somatic mutations, single nucleotide polymorphisms (snps), micrornAs, clinical outcomes and tissue slide images. Four main methods for downloading data are available: ⅰ) Data Matrix enables users to select and download a subset of data for a particular cancer type, but does not allow searching and downloading data across multiple cancer types simultaneously; ⅱ) Bulk Download facilitates the bulk download of archives of data as uploaded by the TCGA Centers; ⅲ) File Search allows users to filter and download data files in a more easily accessible manner; and ⅳ) Access HTTP Directories enables users to access the Http directories where the data archives are stored. the tcgA roadmap (9) engine was developed to index and annotate the TCGA files and capture file metadata in the tcgA open-access Http by applying third-generation web technolo gies (Web 3.0) (10). An example of searching for and downloading processed data concerning expressed genes and micrornAs from next generation sequencing (ngs) experiments in matched tumors is shown in Fig. 1.
the clinical proteomic tumor Analysis consortium (cptAc) (11,12), launched by ncI, aims at elucidating the molecular basis of cancer through the application of proteomic techniques. In particular, cptAc analyzes cancer samples by mass spectrometry in order to identify and quantify their constituent proteins and localize the post-translational protein modifications, such as phosphory lation. The CPTAC Data portal is the central repository for the distribution of proteomic data collected by the proteome characterization centers (pccs). the cptAc Data portal uses an Aspera connect transfer server for the transport of large data files. The Cancer Genome Project (CGP) (13) at the Sanger Institute (cambridge, UK), seeks to identify somatic variants/mutations critical in the etiology and pathogenesis of human cancers by using the sequenced human genome and high-throughput mutation detection technologies.

Resources
the large volume of data that emerges from these large-scale programs has resulted in the concomitant development of new databases for accessing and analyzing cancer data (Table Ⅰ).
Tools. specialized web-based tools are available to enable investigators to query, retrieve and analyze cancer-related data in a rapid, reliable and efficient manner. the cancer genome Anatomy project (cgAp) (14) of ncI includes a number of bioinformatic analysis tools and interconnected modules that enable users to access cgAp data. these data include cancer-relevant genes and snps, malignant tissues and chromosomal aberrations in cancer patients. Moreover, cgAp provides information regarding the differential expression of a given gene in normal, precancerous and cancerous tissues based on serial Analysis of gene expression (sAge), as well as rnA interference (rnAi) constructs that target cancer-related genes, and diagrams of biochemical pat hways and protein complexes. the Ucsc cancer genomics Browser (15,16) is a suite of web-based tools used to integrate, display and analyze cancer genomics and clinical data. the browser allows whole-genome views of several different types of genomics and associated clinical data. various datasets can be viewed together as coordinated 'heatmap tracks', thus enabling the user to make comparisons across studies and cancer types. Annotated biological pathways, collections of genes, genomic or clinical information can be sorted, filtered, aggregated, classified and viewed interactively based on any given feature set, including clinical features, annotated biological pathways and user-contributed collections of genes. the cancer genome Workbench (cgWB) (17) includes copy number, mutation, expression and methylation data from various pro jects, including tcgA, the catalogue of somatic Mutations in cancer (cosMIc) (18,19), Johns Hopkins University, and the therapeutically Applicable research to generate effective treatments (tArget) initiative. cgWB provides a series of tools for visualizing genomic and transcription alterations from different cancer samples. the data in CGWB can be viewed in three different ways: ⅰ) Integrated track, which provides a sample-level view of genomic alterations from multiple data sources; ⅱ) Heatmap view, an interactive graphical view of gene expression and copy number data and their associated clinical features; and ⅲ) Bambino, an alignment viewer for ngs data.
Cancer driver genes. several repositories of driver genes or gene families that play a causal role in carcinogenesis have been developed. the tumor gene Family Databases (tgDBs) (20) contain a broad range of information regarding genes involved in cancer. Apart from tgDB itself, the data of two component databases, the oral cancer gene Database (orcgDB) (21) and the Breast cancer gene Database (BcgD) (22), have been merged into tgDBs. gene information includes gene aliases, cell location, biochemical function, frequency in various tumors, oncogenicity, chromosomal location, tumor gene type (either proto-oncogene or tumor suppressor gene) and the signal transduction pathways in which the gene of interest is involved.
the DriverDB database (23) compiles a large amount (>6,000 cases) of exome-sequencing (exome-seq) data, annotation databases such as dbsnp (24), 1000 genome (25) and cosMIc, as well as various bioinformatics algorithms for the identification of driver genes or mutations. The database can be queried either by cancer type, where the driver genes/mutations for a specific cancer type are estimated, or by gene where the mutation information of a driver gene in five different aspects is presented. Meta-Analysis, another option offered in DriverDB, enables users to identify driver genes in custom-defined samples according to clinical criteria. the rAs oncogene Database (rAsonD) (20) integrates large amounts of genomics and proteomics data derived from publicly available databases such as ncBI's genBank (26), online Mendelian Inheritance in Man (oMIM) (27), Universal protein resource (Uniprot) (28), protein Databank (pDB) (29), Kyoto encyclopedia of genes and genomes (Kegg) (30) and pubMed (31). the rAsonD database contains 199,046 entries from 101 species, allowing investigators to retrieve Underlined denote abbreviated form. snp, single-nucleotide polymorphism; cnA, copy-number alterations; exome-seq, exome-sequencing. Genetic variations. cancer is characterized by abundant genetic abnormalities in the form of mutations, snps, copy number alterations (cnAs), genomic rearrangements and gene fusions. to manage the increasing amount of infor mation, public resources have been implemented to collect, curate, annotate and analyze data regarding cancer genetic variations. cosMIc (18,19) is the largest public database that stores and displays information on somatically acquired mutations involved in cancer and associated clinical and phenotypic data. currently, cosMIc contains information on 28,735 genes, 2,002,811 coding mutations and 10,435 fusion gene mutations reported in 1,029,547 cancer samples. the data are primarily extracted from published scientific literature and whole-genome sequencing screens from cgp. to provide a uniform representation of the data, a histology and tissue ontology has been created. cosMIc uses the BioMart (32) data mining software that enables users to filter the available data according to cancer sample, gene, mutation, tumor site, histology and tumor. the results are presented in tabulated format (Fig. 2). the cancer gene census (cgc) (33) lists >1% of all human genes which bear mutations that causally contribute to carcinogenesis. the gene census data include gene symbol according to Hgnc (34), a short description of the gene, gene chromosomal location, type of mutations (i.e., somatic or germline), type of tumor and the cancer syndrome in which the mutated gene is involved. the data are provided in a table and can be downloaded and exported in several formats. the catalogue is updated at regular intervals. BioMuta (35) is a curated database of cancer-related non-synonymous single-nucleotide variations (nssnvs) that affect functional sites. the datasets are derived from the tcgA, cosMIc, clinvar (36) and Uniprot Knowledgease (UniprotKB) (37) databases. Due to the large amount of data present in the primary ngs repositories, the High-performance Integrated virtual environment (HIve) platform (35) has been implemented in BioMuta in order to store, analyze, compute and curate ngs data and associated metadata. casnp (38) is a comprehensive collection of cnAs from 11,485 Affymetrix snp arrays with raw data from ncBI's gene expression omnibus (geo) (39), additional arrays from the tcgA consortium and a few individual publications covering 34 different cancer types in 105 studies. the user can query casnp by gene, region or cancer type and retrieve information regarding the frequencies of copy number aberrations for each study. casnp also provides a heatmap showing cnAs estimated at each snp marker around the query region across all studies. canprovar (40) has been developed to store and prevent germline and somatic amino acid variations in the human proteome associated with human tumorigenesis based on published literature. the cangeM (41) database stores clinical information on tumor samples and array comparative genomic Hybridization (acgH) data to detect gene cnAs in cancer. Users can create custom datasets for specific clinical sample characteristics or cnAs of individual genes. The Integrative Cancer Profiler System (ICPS) (10) database integrates genomic alterations such ascnA and loH, with transcription signatures (sAge, microarray) in order to study gene profiles in one or more different types/subtypes of cancer. Currently, ICPS contains five different data types and 23,375 experiments covering 11 major cancer types. Apart from public data, Icps also supports in-house data of users.
given that ~50% of human tumors harbor tp53 gene mutations (42), a UMD tp53 (43) database was created to provide detailed information on tp53 mutants such as the molecular and cell properties of each tp53 mutant and localization or various gains of functions. the UMD tp53 database contains >110,000 entries. the International Agency for research on cancer (IArc) tp53 (44,45) is a comprehensive resource that compiles all tp53 gene variations in human cancers derived from scientific publications. the datasets available in the resource are: tp53 somatic and germline mutations, validated common TP53 polymorphisms identified in human population and their functional and clinical impact, tp53 gene status (i.e., wild-type, mutant, null) in various human cell lines, mouse models with engineered tp53 constructs and experimentally induced tp53 mutations.
Epigenetic modifications. Epigenetic modifications, such as DnA methylation and chromatin-modifying factors, play a critical role in carcinogenesis by regulating tumor-suppressor gene silencing, proto-oncogene activation and chromosomal instability (46). Methycancer (47), the database of human DnA methylation and cancer, was developed to study the association of DnA methylation, gene expression and cancer. It contains data of DnA methylation, cancer-relevant genes and cpg Island (cgI) clones derived from high-throughput sequencing. the Methyview option allows the graphical presentation of cgI information of >30,000 genes. pubMeth (48) is a cancer methylation database that includes information of genes that have been reported in the literature to be methylated in various cancer types. the information is extracted from pubMed abstracts using a text-mining approach, goldMine, followed by manual annotation. there are two options for searching the database, the 'gene-centric' (the cancer types/subtypes where the genes of the interest are reported to be methylated) and the 'cancer-centric' (the genes reported to be methylated in a particular cancer type/subtype). In chromoHub v2 (49), chemical, structural and biological data extracted from public repositories, such as tcgA and Icgc, are mapped on phylogenetic trees of protein families involved in chromatin-mediated signaling.
OncomiRs. oncomirs, micrornAs that are associated with diverse cancer-related processes, play a significant role in the epigenetic regulation of cancer. the mircancer database (50) provides a comprehensive collection of micrornA expression profiles in various human malignancies that are automatically extracted from publications in pubMed. oncomirDB (51) is a database developed for annotating the experimentally validated oncomirs from literature. the database includes 2,259 entries of oncomir regulations, covering 328 mirnAs and 829 target genes in 25 cancer tissues extracted from published literature. the user is able to search by mirnA, tissue, tumor, target gene and function (e.g., proliferation, apoptosis, migration).
Transcriptomics. Databases have been designated to extract, store and interpret data from large-scale and genome-wide expression studies. oncomine (52) is a cancer microarray database that collects and curates 715 gene expression datasets and 86,733 samples and associated clinical data from most major types of data. oncomine allows a user to contact a gene-centric search to retrieve the differential expression analyses of a gene of interest across all available datasets; in a study-centric search, the genes that are differentially expressed in the selected study are provided. to facilitate data mining, the current version of oncomine enables multi-gene search, gene ontology-based filtering and integration of Oncomine concepts. Integrated tumor transcriptome Array and clinical data Analysis (IttAcA) (53) is a central repository of transcriptome microarray and associated clinical data from breast carcinoma, bladder carcinoma and uveal melanoma. A web interface offers different options for class comparison analyses, such as the comparison of profiles of expression distribution and patient survival analyses. the user is able to analyze the differential expression of one or more gene between two groups of samples with different phenotypes, and, conversely, the genes differentially expressed between two groups of samples (Fig. 3).
the cancer gene expression Database (cgeD) (54) contains cancer gene expression profiles and related clinical information. the expression data are obtained by adaptor-tagged competitive pcr from breast, colorectal, esophageal, gastric, hepatocellular, lung and thyroid cancers and glioma. the database can be queried either using gene identifiers or by functional categories. Mosaic plots are used for the visualization of gene expression data and comparison of the expression patterns of various genes. cancerMA (55) is an integrated bioinformatic pipeline used for the automated identifi cation of novel candidate cancer biomarkers by analyzing the expression profiles of a user-defined gene list across public cancer microarray [geo, Arrayexpress (56)] experimentally verified datasets. A total of 80 microarray datasets covering 13 types of cancer are available.
Proteins. Differentially expressed proteins (Deps) that contribute to the onset and progression of cancer have been identified. the first database of Deps in human cancers, dbDepc (57,58), currently contains 4,029 Deps, curated from 331 mass spectrometry experiments across 20 types of human cancer. this resource enables the users to investigate whether a protein of interest has altered in particular cancers and, to create an association network of query proteins. Moreover, dbDepc shows a heatmap that represents the expression profiles of a certain protein across various cancer types. An example of how to query dbDepc is shown in Fig. 4.
Phosphorylation. the genes encoding protein kinases, enzymes that phosphorylate proteins, are among the most commonly mutated genes in human cancers. the MoKca database (59) provides a collection of the mutations present in protein kinases involved in cancer, along with structural and functional annotation and, wherever possible, prediction of the impact of these mutations in the structure and function of kinases. the user can select from a pull-down list the gene that codes for a protein kinase: information is available for the types of mutations (e.g., missense, silent) found in tumor cell lines and, the mutated amino acid residues are mapped onto the tertiary structures of the affected protein kinase domains.
Cell lines. cancers are thought to be initiated and maintained by a subpopulation of stem or stem-like cells with tumorigenic potential (60). scDe (61,62) is an integrated repository of curated tissue and cancer stem cell data from blood, brain and intestine. the datasets are homogenized with regard to structure, formatting and annotation and stored in the Investigation/study/Assay-tab (IsA-tab) format. scDe is linked to the galaxy framework which provides a series of analytical tools to compare those data to genes, molecular signatures and pathways. the celllinenavigator (63) database contains gene expression profiles (generated uniformly) of >300 human cancer lines. on the basis of their phenotypic attributes, these cell lines were further classified into 28 tissues of origin and 57 different disease states. the database is also linked to advanced tools of bioinformatics analyses. the database can be searched for ⅰ) differentially expressed genes; ⅱ) pathological or physiological states; and ⅲ) gene names or functional characteristics, such as gene ontologies (gos) (64) and Kegg pathway maps. A combination of all query options is also possible. Immunomics. tumor-associated antigens (tAAs) have been applied extensively in the clinical diagnosis and treatment of human cancers. the publicly available Human potential tumor Associated Antigen (Hptaa) (66) database contains potential TTAs identified by in silico computing. Hptaa incorporates publicly available microarray expression data, geo's sAge data and Unigene (67) expression data, as well as other relevant knowledge bases such as cgAp. currently, a total of 3,518 potential targets are included in the database. A web query interface enables users to search for potential tAAs overexpressed in several cancer types with particular gene features including chromosome (X and y or euchromosome), coding capacity (protein-coding genes or else) and subcellular location (membrane or secretory proteins). the human immune response (humoral and cellular) to an increasing number of tAAs has also been well documented. the Academy of cancer Immunology supported by the ludwig Institute (both from new york, ny, UsA) for cancer research have established the cancer Immunome Database (cID) (68) which provides information on all the gene products against which an immunome response has been reported in cancer patients. the user can access information regarding the genes that encode the  cancer antigens. given that a gene can yield multiple antigenic epitopes, the frequency with which the antigenic epitopes are recognized by sera or cells from healthy and diseased individuals is reported. Wherever appropriate, access is provided to experimental evidence (serological results, microscope images and cytotoxic assays) or patients' information (the type of cancer from which they are suffering, the disease stage, and time the samples were obtained). ctdatabase (69) is a curated repository of annotated and computationally predicted cancer-testis (ct) antigens. the ct antigens are broadly classified according to their expression pattern in human healthy tissue. ctdatabase also provides information on genes, the verified splice variants, genomic locations, gene duplications and bibliographical references.
Anticancer agents. Knowledge bases dedicated to cancer translational research and identification of drugs and compounds that inhibit cancer-related target genes are also available. cansAr (70) is a public resource that supports cancer translational research and finding of drug through the integration of biological, chemical, pharmacological and disease data, structural biology and cellular networks. the user, through a single portal, is able to access information regarding genes, protein families, cell lines and compounds, as well as approved drugs and clinical candidates associates with cancer. cancerresource (71) is a comprehensive knowledge base that integrates cancer-relevant relationships of compounds/drugs and targets deduced from the text mining of >19 million pubMed abstracts and external resources such as therapeutic target Database (ttD) (72), comparative toxicogenomics Database (ctD) (73), pharma cogenomics Knowledge Base (pharmgKB) (74) and DrugBank (75). cancerresource can be queried by cancer (the user can view the genes expressed in specific cancer tissues, and also browse cancer-related Kegg pathways), Drug (the user can search by compound or drug and obtain information about the cancer relevance of the query drug/compound and its interactions with targets) and target (drugs interacting with targets). the Anticancer Agent Mechanism Database (76,77) contains a list of 122 compounds with anticancer activity classified by their mechanism of action into alkylating agents, topoisome rase Ⅰ/Ⅱ inhibitors, rnA/DnA antimetabolites, and antimitotic agents. this set is generated by neural networks able to predict the mechanism of action of a drug based on its pattern of activity against a diverse panel of human tumor cell lines in the ncI drug screening program.
Drug resistance. A major obstacle in cancer therapies is the development of drug resistance based on mutations in drug targets. therefore, it is important to identify mutations in drug targets responsible for drug resistance. cancerDr (78) provides information of 148 anticancer drugs and their pharmacological profiling across ~1,000 cancer cell lines. pharmacological profiling information of these anticancer drugs was collected from the cancer cell line encyclopedia (ccle) (79) and cosMIc databases. cancerDr provides information about each drug target (cancer genes) that corresponds to these anticancer drugs, such as gene sequences in respective cancer cell lines, mutations, function and structure. this database allows users to search for drug targets, drugs, cell lines and structure.
clustering of cell lines on the basis of their drug sensitivity towards a drug target allows users to identify groups of cell lines, which are resistance to a particular anticancer drug, as well as multipotent drugs effective against a wide range of cancer cell lines. the clustering of sequences of a drug target is important to identify mutants/variants against the corresponding drug target.
Integrative resources. Intogen (80) is an integrative resource of high-throughput data associated with genomic, transcriptional, mutational alterations and modules (e.g., go terms, Kegg pathways) involved in carcinogenesis. Intogen collects data from various resources such as cosMIc, geo, Arrayexpress, progenetix (81), tcgA and cgp. tumor samples in Intogen are annotated with terms from the International Classification of Diseases for oncology (IcD-o) (82) where the tumors are classified based on their topography (location in the human body) and histology (morphology). Intogen can be queried by genes, projects, cancer sites and pathways. the Biomart portal (83) enables more complicated queries and the bulk download of all analysis results. the interface has a number of filters and attributes. Intogen Biomart can be queried based on ⅰ) IntOGen Experiments, where the user can query gene or module (e.g., go terms, Kegg pathways) information; ⅱ) IntOGen Combinations, where the user is allowed to query a combination of experiments annotated with the same ICD-O term; and ⅲ) IntOGenOncomodules which enables the user to search for combinations and experiments (Fig. 5). ncg 4.0 (84,85) is the current version of the network of cancer genes, a repository of systems-level properties of cancer genes and oncomirs (cancer-related micrornAs). It compiles information on 2,000 cancer genes that have been reported in literature to be mutated in 23 different types of cancer collected from 3,460 whole-exome and -genome screenings of cancer samples. ncg 4.0 reports information on the duplicability, functional annotation, evolutionary origin and interactions with other human proteins and micrornAs.

Cancer type-specific databases
Databases that focus on certain types or subtypes of cancer are also available (Table Ⅰ).
the cervical cancer gene DataBase (ccDB) (86) contains a manually curated list of experimentally validated genes reported to be involved in different aspects of cervical carcinogenesis. each record includes information concerning the gene of interest such as gene structure, chromosomal location, homology, ontology, and mrnA/cDc/protein sequences for each isoform encoded by the gene, as well as links to the original pubMed references and external databases [e.g., Hgnc, Human protein reference Database (HprD) (87), Homologene (67), pharmgKB, pDB]. the database can be queried by ⅰ) Gene name (the user can obtain information pertinent to the query gene); ⅱ) Category, where the genes are grouped into categories; and ⅲ) Chromosome number to view all cervical cancer-related genes present in a particular chromosome (Fig. 6).
the Dragon Database of genes associated with prostate cancer (DDpc) (88) is an integrated resource of genes that have been experimentally confirmed to be involved in Prostate cancer. DDpc provides information about each gene such as experimental evidence, associated pathways, orthologous genes, gene ontologies, and related proteins. the user can select a gene from a pull-down list or search the database for genes using a combination of one or more options, including Anatomical system, cell line, Kegg pathways, and gene ontology. DDpc also contains a list of the predicted transcription factor-binding sites on the promoters of genes included in the database. Moreover, the database contains DrugBank drugs reported to be associated with prostate cancer.
the curatedovarianData (89) resource provides gene expression data and documented clinical annotations from 2,970 ovarian cancer patients from 23 studies with ovarian cancer across 11 microarray platforms. the data are made available as expressionset objects for r/Bioconductor (90). the gene expression datasets are obtained from public Figure 6. results of a gene-centered search of cervical cancer gene DataBase (ccDB) using BCL2 as query. Information is provided regarding gene ID, gene description, synonyms, chromosomal location and the molecules with which the gene interacts. there are also links to mrnA/ccDs/protein sequence entries, to homologous genes from various species and gene ontology information.
databases, processed in a uniform manner and mapped to standard Hgnc gene symbols (34).
the genes-to-systems Breast cancer (g2sBc) Database (91) is an integrated resource of genes, transcripts and proteins reported in the literature to be dysregulated in breast cancer. Moreover, in g2sBc, the analysis is performed at different levels: the molecular components level, where the analysis is performed at the level of genes, transcripts and proteins; the molecular systems level, an analysis based on biological processes and protein-protein interaction networks; and the cellular systems level where the user can browse and simulate mathematical models of carcinogenesis, tumor growth and response to treatments. An ontology-based query system is also available for annotations associated with particular ontologies.
the HlungDB (92) is an integrated resource of lung cancer-related genes, proteins and micrornAs and pertinent clinical information extracted manually from the scientific literature. each entry in the database describes the relationships between genes and lung cancer, containing detailed information of the gene, the expression pattern of the relevant gene (up-or downregulated) in the patient, experimentally verified information (e.g., transcription factor binding sites in the promoter of the gene) and protein-protein interaction networks. the database includes mirnAs that are differentially expressed in lung cancer or reported to be associated with lung cancer along with their experimentally verified identified targets. HLungDB is cross-linked to relevant external resources, including pubMed, HprD, HUgo, IpI, eBI and Kegg. the lung cancer-related genes can be viewed either from a pull-down list where the genes are sorted by alphabetical order or by chromosome where the user can view all cancer-related genes located in the selected chromosome.
the osteosarcoma Database (93) is a repository of osteosarcoma (os)-relevant genes and micrornAs. the data stored in database are extracted from pubMed using an automated dictionary-based gene and micrornA recognition procedure, manual review and annotation. currently, the database contains 911 protein-coding genes and 81 micrornAs deduced from 1,331 abstracts. the user is able to search by gene or micrornA. each entry is linked to pubMed. the pancreatic expression Database (peD) (94), powered by the BioMart software, is a comprehensive resource of pancreatic cancer data from the literature obtained using a range of technologies, including genomics, transcriptomics, proteomics and mirnA. peD includes tools for mining data by using a combination of queries (e.g., gene expression and cnAs). the use of BioMart facilitates interoperability with other BioMart-compliant cancer resources, which allows users to expand their investigations to a number of relevant resources, such as reactome, prIDe and cosMIc.
the renal cancer gene Database (rcDB) (95) is a manually curated repository of protein-coding genes and mirnAs associated with various forms of renal cell carcinomas (rcc). the protein-coding genes have been classified into six categories according to the type of alteration observed in RCC: ⅰ) methylation; ⅱ) overexpression; ⅲ) downregulation; ⅳ) mutation; ⅴ) translocation; and ⅵ) unclassified. RCDB also includes the mirnAs dysregulated in rcc. Users are able to query the protein-coding genes and mirnAs using keyword, category or, in the case of genes, chromosome. the ViroBLAST (96) tool is used to query a user-defined sequence against the sequences available in rcDB.