The nuclear receptor (NR) superfamily is comprised of transcription factors that are ligand-activated in their majority and play a pivotal role in biological functions that are essential for life, such as metabolism and homeostasis. Following activation, they undertake the regulation of the transcription of their target genes with the help of co-regulator proteins, rendering them very promising pharmacological targets. In total, 59 NRs have been discovered in several species of the Animalia kingdom, 20 of which are still orphan receptors. The present study aimed to further enlighten the evolutionary scenario that reveals the association between members of the NR superfamily. An updated evolutionary analysis was performed for the NR protein superfamily with the aim of clustering all the NRs, and discovering conserved regions and motifs that play major roles in their signaling pathway; the mechanisms of action were also investigated. The findings of the present study demonstrate a clear separation of the NR family in three majors monophyletic branches, the steroid hormone-related, the thyroid hormone-related and the retinoid X receptor-related clusters, from which, through evolution, may correspond to three ancestral NRs that were differentiated from a common ancestral NR.
The transcription factors that form the nuclear receptor (NR) superfamily are either ligand-activated or orphan proteins that regulate numerous physiological processes, such as metabolism, reproduction and homeostasis in humans, as well as in other organisms of the animal kingdom (
In order to further elucidate the function of NRs, it is crucial to understand their structure (
The phylogenetic analysis of the NR superfamily has revealed that there is no connection between the chemical nature of a ligand and the evolutionary origin of the corresponding receptor (
NR ligands are lipophilic hormones that can enter the cell by passive transport. Once inside the cell, the hormone binds to its congener receptor, which is located in the cytoplasm or nucleus, usually bound to other proteins. Once these proteins are released and the hormone is bound, the receptor is activated, which then binds to the DNA and regulates the transcription of target genes. In the case of cytoplasmic receptors, the binding of the hormone induces their entry into the nucleus, where the hormone-receptor complex acts. NRs bind to DNA in specific sequences known as hormone response elements (HREs). NR targets are genes that are regulated by promoters that contain HREs. The regulation of the transcription of these genes by NRs is usually accomplished by proteins known as co-regulators. These proteins fall into two broad categories: Co-activators, which interact with NRs in the AF-2 region via an LXXLL motif (where L symbolizes leucine and X any amino acid) and help activate gene transcription, and co-repressors which bind to the same region via conserved (L/I) XX (I/V) I or LXXX (I/L) XXX (I/L) motifs (where L denotes leucine, I isoleucine and X any amino acid), and suppress the transcription of target genes (
NRs can be categorized based on their mechanisms of action. Category 1 includes homogeneous steroid hormone receptors, which are activated by cholesterol-derived steroid hormones, such as estrogens, androgens, corticosteroids, and progestogens. In the absence of the ligand, these receptors are located in the cytoplasm bound to chaperone proteins; however, following their activation by the ligand, they are released from the chaperone proteins and are transported to the nucleus. In the nucleus, SRs form homodimers and bind to specific DNA sequences (HREs), which consist of two reversed repeats. Category 2 includes RXR-containing heterodimeric receptors, such as RAR and LXR, which often remain in the nucleus, regardless of the presence of a ligand. Following the binding of the ligand, they form heterodimers with RXR receptors and bind to a DNA element with direct repeats of HREs. Category 3 includes homodimeric ‘orphan receptors’. These receptors are named after the fact that their related hormones are unknown. Receptors of this category bind as homodimers to the recognition sequences (HREs), which are arranged as direct repeats. Finally, category 4 comprises the ‘orphan receptor’ monomers, which bind as monomers to asymmetric recognition sequences (HREs) (
A NR that has been of considerable interest to the scientific community is the glucocorticoid receptor (GR). This receptor is a ligand-dependent transcription factor, which is activated by glucocorticoid binding and then binds to glucocorticoid response elements (GREs) in the promoter of target genes (
Mutations in GR disrupt glucocorticoid signal transduction, leading to generalized resistance or hypersensitivity to glucocorticoids. A widely studied pathological condition is primary generalized glucocorticoid resistance (PGGR), a rare familial or sporadic disease characterized by partial or general resistance of specific tissues in cortisol. This resistance results in the activation of the hypothalamic-pituitary-adrenal (HPA) axis to repair the reduced activity of glucocorticoids in the target tissues and increased secretion of adrenocorticotropic hormone (ACTH) into the systemic circulation. This excessive secretion of ACTH leads to adrenal hyperfunction, and to the increased secretion of cortisol and other steroid hormones, such as androgens and saline corticosteroids (
The molecular basis of Chrousos syndrome has been attributed mainly to mutations occurring in exons 5-9 (regions encoding the LBD region) of the human NR3C1 gene, which affect the mechanisms of action of hGRα (
Among the 26 mutations studied and associated with loss-of-function mutations, six are in the DBD region and 20 are in the LBD region (
Protein sequences of all members of the NR superfamily were extracted from the NCBI protein database (
The phylogenetic analysis of the NR superfamily was performed using the MATLAB Bioinfromatics Toolbox (
The primary dataset that was extracted from NCBI included 110,000 protein sequences, related to NRs. Irrelevant, hypothetical, partial, low quality and predicted proteins were eliminated from the dataset. Due to the large amount of data remaining and the presence of duplicate sequences, further filtering and the selection of representative sequences was conducted. The representative protein sequences were selected manually for each class of every NR following multiple alignments and analyses within the family members subgroups. The same species were selected as representative for each class where possible; for example,
In the DBD domain, two highly conserved cystine-rich zinc finger motifs have already been identified: A highly conserved pattern, termed the P-box (located in amino acid positions 908-921 of the MSA), is found in the first zinc finger, whereas another not so highly conserved one known as the D-box, is located in the second zinc finger motif (
The phylogenetic analysis revealed a separation of the NR superfamily members into three major monophyletic branches. Each branch includes one or more NR subfamilies, and as shown in
By observing the phylogenetic tree (
NRs are a group of proteins that regulate a large number of biological processes that are important for life. The majority of NRs become activated by the binding of small lipophilic molecules, while for others, the ligands are not yet known. The primary function of NRs is the cell type- and promoter-specific transcriptional regulation of target genes under their control, through the recruitment of negative or positive regulatory proteins, known as co-repressors and co-activators respectively (
The first NR to be sequenced was the human GR followed by the estrogen receptor (ER). Overall, 48 NR members from the superfamily have been found in humans to date; however, in other organisms of the animal kingdom, >900 have been identified (
The evolutionary study of the NR superfamily is crucial, since they play a pivotal role in the regulation of numerous physiological and pathophysiological processes in all organisms of the animal kingdom. In the present study, six conserved motifs were identified and are considered to be to key target regions for the development of novel pharmaceutical agents. A major challenge in this case is accomplishing the desired selectivity within each subfamily of related NRs that bind very similar ligands (
A structural analysis of the NR members is required, since the full three-dimensional structure of a NR remains unknown. Difficulties are identified in the crystallization of the DBD of the NRs. A structural analysis of the NRs may provide beneficial knowledge for the NR evolution in the future. However, the structural analysis of the LBD of the NRs by Mitsis
In conclusion, in the present study, an updated comprehensive sequence analysis of the NR superfamily was performed. A significant amount of sequence data available for the NR superfamily was used by selecting representative protein sequences for every phylum and every class of each member of the superfamily. Thus, through different filtering techniques, a final dataset of 333 unique, non-duplicate, representative protein sequences was formed and used for further research. Considering the important role NRs play in ‘switching on and off’ genes, they present a great potential as innovative drug targets for a variety of diseases, including cancer. In the present study, an updated phylogenetic tree of the NR superfamily was created that provides useful information for the groups formed inside the superfamily and their evolution. This beneficial knowledge may provide the basis towards associating NR members in several aspects, including signaling pathways and biological activities.
Not applicable.
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
LP, LS, KP, EP, SM, KD, NCN, AG, FB, GPC, EE and DV contributed to the conceptualization and design of the study, as well as in the writing, drafting, revising, editing and reviewing of the manuscript. All authors confirm the authenticity of all the raw data. All authors have read and approved the final manuscript.
Not applicable.
Not applicable.
GPC is an Editorial Advisor of the journal, but had no personal involvement in the reviewing process, or any influence in terms of adjudicating on the final decision, for this article. The other authors declare that they have no competing interests.
Structural domains of the NR superfamily: N-terminal in orange, DNA binding domain (DBD) in red, hinge region in blue, ligand binding domain (LBD) in green and C-terminal domain in yellow. NR, nuclear receptor.
Phylogenetic tree of the 333 representative protein sequences of the nuclear receptor superfamily. The tree was calculated with the neighbor-joining method and processed with the MEGA and iTOL programs. The three main groups in which the tree is dived have been colored in orange, purple and orange-red. The receptors that belong to each group are shown on the side with colored stripes.
Nuclear receptor superfamily members used in the present study and the subfamilies they belong to.
Nuclear receptor | Abbreviation | Subfamily | Groups |
---|---|---|---|
Thyroid hormone receptor | TRα | NR1A1 | 1 |
TRβ | NR1A2 | ||
Retinoic acid receptor | RARα | NR1B1 | 1 |
RARβ | NR1B2 | ||
RARγ | NR1B3 | ||
Peroxisome proliferator activated receptor | PPARα | NR1C1 | 1 |
PPARβ | NR1C2 | ||
PPARγ | NR1C3 | ||
V-ErbA-related protein | Rev-ErbΑ | NR1D1 | 1 |
Rev-ErbΒ | NR1D2 | ||
Ecdysone-induced protein 78C | Eip78C | NR1E1 | 1 |
RAR related orphan receptor | RORα | NR1F1 | 1 |
RORβ | NR1F2 | ||
RORγ | NR1F3 | ||
Liver X receptor | LXRα | NR1H2 | 1 |
LXRβ | NR1H3 | ||
Farnesoid X receptor | FXRα | NR1H4 | 1 |
FXRβ | NR1H5 | ||
Vitamin D receptor | VDR | NR1I1 | 1 |
Pregnane X receptor | PXR | NR1I2 | 1 |
Constitutive androstane receptor | CAR | NR1I3 | 1 |
Nuclear receptor HR96, HR8 and HR48 | HR96 | NR1J1 | 1 |
Nuclear receptor HR8 | HR8 | NR1J2 | 1 |
Nuclear receptor HR48 | HR48 | NR1J3 | 1 |
Nuclear receptor HR1 | HR1 | NR1K1 | 1 |
V-erbA-related protein 2 | EAR-2 | NR2F3 | 2 |
Steroid hormone receptor cnr14 | Cnr14 | NR1G1 | 2 |
Estrogen receptor | ERα | NR3A1 | 2 |
ERβ | NR3A2 | ||
Estrogen related receptor | ERRα | NR3B1 | 2 |
ERRβ | NR3B2 | ||
ERRγ | NR3B3 | ||
Glucocorticoid receptor | GR | NR3C1 | 2 |
Mineralocorticoid receptor | MR | NR3C2 | 2 |
Progesterone receptor | PR | NR3C3 | 2 |
Androgen rceptor | AR | NR3C4 | 2 |
Nerve growth factor IB | NGFIB | NR4A1 | 2 |
Nuclear receptor related 1 | NURR1 | NR4A2 | 2 |
Neuron-derived orphan receptor 1 | NOR-1 | NR4A3 | 2 |
Steroidogenic factor 1 | SF-1 | NR5A1 | 2 |
Liver receptor homolog 1 | LRH-1 | NR5A2 | 2 |
Nuclear hormone receptor FTZ-F1 beta | FTZ-F1β | NR5B1 | 2 |
Germ cell nuclear factor | GCNF | NR6A1 | 3 |
Zygotic gap protein knirps | kni | NR0A1 | 3 |
Dosage-sensitive sex reversal | DSS | NR0B1 | 3 |
Small heterodimer partner | SHP | NR0B2 | 3 |
Ecdysone receptor | EcR | NR1H1 | 3 |
Hepatocyte nuclear ractor 4 | HNF4α | NR2A1 | 3 |
HNF4γ | NR2A2 | ||
Retinoid X receptor | RXRα | NR2B1 | 3 |
RXRβ | NR2B2 | ||
RXRγ | NR2B3 | ||
Ultraspiracle | USP | NR2B4 | 3 |
Testicular receptor | TR2 | NR2C13 | |
TR4 | NR2C2 | ||
Tailes-related receptor | TLX | NR2E1 | 3 |
Photoreceptor specific nuclear receptor | PNR | NR2E2 | 3 |
COUP transcription factor | COUP-TF1 | NR2F1 | 3 |
COUP-TF2 | NR2F2 |
Eukaryotes in Animalia from which they have been identified NR family members.
Domain | Kingdom | Phylum | Class |
---|---|---|---|
Eukaryotes | Animals | Chordates | Mammals |
Birds | |||
Fish | |||
Turtles | |||
Amphibians | |||
Lizards | |||
Arthropods | Insects | ||
Arachnids | |||
Crustaceans | |||
Horseshoe crabs | |||
Nematodes | |||
Molluscs | |||
Flatworms |