- Source: C16orf86
Uncharacterized protein C16orf86 is a protein in humans that is encoded by the C16orf86 gene. It is mostly made of alpha helices and it is expressed in the testes, but also in other tissues such as the kidney, colon, brain, fat, spleen, and liver. For the function of C16orf86, it is not well understood, however it could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles as mentioned in the DNA microarray data below in the gene level regulation section.
Function
C16orf86 protein function is still not well understood, however, based on the DNA microarray data and the post-translational modifications data below, this protein could be a transcription factor in the nucleus that regulates G0/G1 in the cell cycle for tissues such as the kidney, brain, and skeletal muscles.
Localization
= Tissue
=C16orf86 has tissue expression high in the testes along with expression in regions such as the kidney, colon, brain, fat, spleen, liver.
C16orf86 microarray data was found using NCBI UniGene and going to GeoProfiles for C16orf86. This data below shows C16orf86 tissue expression patterns for cell cycle regulation in kidney cells, colon cancer cells, and adipose tissue.
This DNA microarray figure below was done on MIF deficient cells and control cells using cDNA. Results showed that the MIF cytoplasmic protein is a regulator for promoting cell proliferation and cell cycle progression in kidney cells, for example, HEK293. When MIF is inhibited, P53 blocks cell cycle of G1/S phase progression. Also, inhibition of E2F and AP1 and activation of P53 contribute to cell cycle regulators that result in cell cycle arrest at the G0/G1 phase in MIF cells. These are transcription factors in the C16orf86 promoter. E2F is important for cell cycle progression with AP1 and these are blocked by MIF and P53 takes over. C16orf86 could be important in cell cycle progression in the kidney, where it is expressed in the tissues.
This DNA microarray figure below shows purified T98G Glioblastoma Cells that were cycled, G0 arrested, or released into S phase for 10 to 16 hours. The researchers tested to see how the mechanism of PRB, p107, and p130 represses the E2F target genes and how P130 complex interacts with Dp, RB like, and other E2F transcription factors to help module DREAM in cell cycle arrest. The results showed that the E2F4 along with P130 and other transcription factors mediate the repression of the cell cycle from G1 cell to G0. If there is activation, S phase is going to bind E2F1/2/3 with other transcription factors to activate transcription in the cell cycle. C16orf86 could be important in cell cycle progression in the brain due to the E2F4 and the E2F1/2/3 transcription factors being located in its promoter sequence.
This DNA microarray experiment below uses the idea of Infinium HumanMethylation450 BeadChip arrays with GWAS to figure out the DNA methylation profiles at day 3, day 8, and day 15 for skeletal myoblasts. This DNA methylation at day 3, day 8, and day 15 for skeletal myoblasts profiles were used to study myogenic cell differentiation. The results showed that methylation patterns do indeed affect myogenic cell differentiation. One of the transcription factors tested in this experiment in particular, as pertaining to one of the transcription factors in the experiment, MYF6, it is a transcription factor that is located in C16orf86 promoter. This transcription factor are supposed to be down-regulated during muscle cell differentiation. This can be seen when first introduced with the stimulus and never being able to reach its top peak. This could mean that C16orf86 could be muscle cell differentiation in skeletal myoblast cells.
= Subcellular
=Protein C16orf86 is mainly localized in the nucleus along with being in the cytoplasm, mitochondria, and endoplasmic reticulum. This result were found using the protein tool on Expasy called PSORTII. This tool was used to put in sequence data along with comparing the results to its distant orthologs of Weddell seal and red fox.
Gene
= Location
=C16orf86 (Chromosome 16 Open Reading Frame 86) is a gene found on the long arm of chromosome 16 at position q22.11. It has a genomic sequence that starts at 67,667,030 base pair and ends at base pair 67,668,590. Its genomic sequence is read in the forward direction with the positive strand.
C16orf86 is part of the ENKD1 region. This region contains 3 genes with the ENKD1 protein along with its isoforms ENKD1 isoform X1 and ENKD1 isoform X2. Other genes located near C16orf86 are GFOD2 to the right, ACD to the left, and PARD6A to the left.
= Exons and introns
=C16orf86 has a total of 4 Exon regions within its protein sequence. The first exon boundary is located at amino acid 34 and 35 within base pairs G and T. Then, the second exon boundary is located at amino acid 111 and 112 within base pairs A and G. Finally, the third exon boundary is located between amino acid 184 and 186 within base pairs C and G.
C16orf86 has a total of 3 Intron regions within its protein sequence.
= Length of coding gene
=C16orf86 spans 317 amino acids long and starts transcription at a amino acid 1 Methionine and goes until amino acid 317, which is a stop codon.
= Isoforms
=There are 2 isoforms of C16orf86, which is uncharacterized protein C16orf86 isoform X1 and uncharacterized protein C16orf86 isoform X2.
uncharacterized protein C16orf86 isoform X1 has a span of 332 amino acids long and has a total of 2 exon regions and 1 intron region.
uncharacterized protein C16orf86 isoform X2 has a span of 326 amino acids long and has a total of 4 exon's and 3 introns regions.
Gene regulation
= Promoter
=There are three different promoter sequences in C16orf86. These promoter sequences were found using the tool on Genomatix called Gene2Promoter for C16orf86. These promoter sequences were each compared to C16orf86 distant ortholog promoters with the human C16orf86 human protein sequence in the program Clustal Omega multiple sequence alignment. The results had promoter GXP_107609 match more closely in its sequence compared to the GXP_7544221 promoter and the GXP_6033384 promoter.
= Transcription factor binding sites
=Promoter for C16orf86 protein (GXP_107609) had transcription factor binding sites that were found using the Genomatix tool Gene2Promoter and clicking on analyze binding sites. Binding sites were chosen based on a high matrix score along with a high amount of occurrences within the promoter. The transcription factors that was in the conserved regions of the promoter sequence for C16orf86 (GXP_107609) was MYF3, MYF4, E2F, and CCCTC binding factor. These transcription factors all deal with cell cycle regulation.
Transcript level regulation
= 5'UTR region
=For C16orf86, there was a multiple sequence alignment done on Clustal Omega for 5'UTRs for orangutans, gorillas, chimpanzees, macaque, and humans. The results of the MSA was compared with figures of the structure of the 5'UTR. These figures were created using the bioinformatics tool called m-fold The sequences that stood out in the 5'UTR compared within the MSA is base pairs 105 to 113. These regions could have a stem-loop region pertaining to a certain function or dealing with protein interactions.
= 3'UTR region
=For C16orf86, there was a multiple sequence alignment done on Clustal Omega for 5'UTRs for orangutans, gorillas, chimpanzees, macaque, and humans. The results of the MSA was compared with figures of the structure of the 3'UTR. These figures were created using the bioinformatics tool called m-fold. The sequences that stood out in the 3'UTR compared within the MSA is base pairs 1294 to 1300. These regions could have a stem loop region pertaining to certain function or dealing with protein interactions.
Structure
C16orf86 has found to have a molecular weight of 33.5 kilodaltons and a PI of 5.30.
C16orf86 protein sequence is rich in Proline and Glutamate having a total of 39 Proline's (P) and 39 Glutamate's (E). In addition, C16orf86 has low amino acid regions of Asparagine (N), Threonine (T), Isoleucine (I), and Phenylalanine (F). These regions have 3 Asparagine's, 9 Threonine's, 2 Isoleucine's, and 1 Phenylalanine. This makes the protein acidic with a low PH.
C16orf86 contains Domain of Unknown Function (DUF4691) from amino acid 1 to 184 and a Nuclear Localization Signal from amino acids 105–109. This figure was created using the Expasy prosite tool.
For the C16orf86 protein, there is a nuclear localization signal that is from amino acid 105 to 109 and is composed of (PKRKP) in the forward direction. This pattern is conserved and seen in humans and its distant orthologs such as the red fox and Weddell seal.
= Secondary
=C16orf86 overall has a high census of alpha helices compared to beta sheets. For the predicting location of alpha helices and beta sheets, Phyre 2 was used. For the alpha helices, there is a high-level prediction for amino acids 187–199, 231–244, 265–270, and 294–307. In addition to the alpha helices, there is a high level of prediction for beta strands at amino acids 96–97.
= Tertiary
=The tertiary structure for C16orf86 PDB file was taken from Phyre2 and I-Tasser. The PDB files were put into EZmol bioinformatics tool to create the tertiary structure. This figure has amino acids labeled with sites that pertain to Phosphorylation, Nuclear Localization Signaling, and Nuclear Export Signaling.
= Post-translational modifications
=C16orf86 post-translational modifications were found using protein modification tools from Expasy. For this protein, the sites that were most intriguing for this protein was its nuclear export signals (L rich regions), Nuclear localization signals, and phosphorylation sites. The nuclear localization signals and export signals allow for this protein to become localized within the cell's nucleus. In addition, this protein sequence has phosphorylation sites for CDK5, GSK3, P38MAPK, PKA, PKC, CDC2, ATM, CKII, and DNAPK. These all play a specific role in cell cycle regulation. There is also a conceptual translation for C16orf86 below with the rest of the post-translation modifications.
Evolution
The orthologs were sorted by increasing data of divergence and sequence similarity
= Paralogs
=After conducting a search with NCBI Blast and after finding no paralog sequences similar to C16orf86 in BLAT, it was confirmed that C16orf86 does not have any paralogs. Only isoforms were shown below for the sequence, but no full sequences.
= Orthologs
=C16orf86 orthologs include dogs, chimpanzee, cows, rats, mice, and chimpanzees.
Ortholog space: C16orf86 orthologs include only placental mammals. This means there are no other mammal groups, birds, fungi, archaea, protists, reptiles, plants, or any other invertebrate species that are orthologs to C16orf86. The most distant ortholog in the placental mammal group, macroscelidea, was the most diverged species from C16orf86, which was 102 million years ago.
= Homologs
=The most distant homologs with partial sequences to C16orf86 include marsupial mammals, reptiles, and fish. The furthest homolog for C16orf86 was the whale shark that diverged 465 million ago from humans.