logo logo
Structural variation mutagenesis of the human genome: Impact on disease and evolution. Lupski James R Environmental and molecular mutagenesis Watson-Crick base-pair changes, or single-nucleotide variants (SNV), have long been known as a source of mutations. However, the extent to which DNA structural variation, including duplication and deletion copy number variants (CNV) and copy number neutral inversions and translocations, contribute to human genome variation and disease has been appreciated only recently. Moreover, the potential complexity of structural variants (SV) was not envisioned; thus, the frequency of complex genomic rearrangements and how such events form remained a mystery. The concept of genomic disorders, diseases due to genomic rearrangements and not sequence-based changes for which genomic architecture incite genomic instability, delineated a new category of conditions distinct from chromosomal syndromes and single-gene Mendelian diseases. Nevertheless, it is the mechanistic understanding of CNV/SV formation that has promoted further understanding of human biology and disease and provided insights into human genome and gene evolution. Environ. Mol. Mutagen. 56:419-436, 2015. © 2015 Wiley Periodicals, Inc. 10.1002/em.21943
Whole-genome single nucleotide variant distribution on genomic regions and its relationship to major depression. Psychiatry research Recent advances in DNA technologies have provided unprecedented opportunities for biological and medical research. In contrast to current popular genotyping platforms which identify specific variations, whole-genome sequencing (WGS) allows for the detection of all private mutations within an individual. Major depressive disorder (MDD) is a chronic condition with enormous medical, social and economic impacts. Genetic analysis, by identifying risk variants and thereby increasing our understanding of how MDD arises, could lead to improved prevention and the development of new and more effective treatments. Here we investigated the distributions of whole-genome single nucleotide variants (SNVs) on 12 different genomic regions for 25 human subjects using the symmetrised Kullback-Leibler divergence to measure the similarity between their SNV distributions. We performed cluster analysis for MDD patients and ethnically matched healthy controls. The results showed that Mexican-American controls grouped closer; in contrast depressed Mexican-American participants grouped away from their ethnically matched controls. This implies that whole-genome SNV distribution on the genomic regions may be related to major depression. 10.1016/j.psychres.2017.02.041
Disruption of Transcriptional Coactivator Sub1 Leads to Genome-Wide Re-distribution of Clustered Mutations Induced by APOBEC in Active Yeast Genes. Lada Artem G,Kliver Sergei F,Dhar Alok,Polev Dmitrii E,Masharsky Alexey E,Rogozin Igor B,Pavlov Youri I PLoS genetics Mutations in genomes of species are frequently distributed non-randomly, resulting in mutation clusters, including recently discovered kataegis in tumors. DNA editing deaminases play the prominent role in the etiology of these mutations. To gain insight into the enigmatic mechanisms of localized hypermutagenesis that lead to cluster formation, we analyzed the mutational single nucleotide variations (SNV) data obtained by whole-genome sequencing of drug-resistant mutants induced in yeast diploids by AID/APOBEC deaminase and base analog 6-HAP. Deaminase from sea lamprey, PmCDA1, induced robust clusters, while 6-HAP induced a few weak ones. We found that PmCDA1, AID, and APOBEC1 deaminases preferentially mutate the beginning of the actively transcribed genes. Inactivation of transcription initiation factor Sub1 strongly reduced deaminase-induced can1 mutation frequency, but, surprisingly, did not decrease the total SNV load in genomes. However, the SNVs in the genomes of the sub1 clones were re-distributed, and the effect of mutation clustering in the regions of transcription initiation was even more pronounced. At the same time, the mutation density in the protein-coding regions was reduced, resulting in the decrease of phenotypically detected mutants. We propose that the induction of clustered mutations by deaminases involves: a) the exposure of ssDNA strands during transcription and loss of protection of ssDNA due to the depletion of ssDNA-binding proteins, such as Sub1, and b) attainment of conditions favorable for APOBEC action in subpopulation of cells, leading to enzymatic deamination within the currently expressed genes. This model is applicable to both the initial and the later stages of oncogenic transformation and explains variations in the distribution of mutations and kataegis events in different tumor cells. 10.1371/journal.pgen.1005217
Cancer classification based on multiple dimensions: SNV patterns. Computers in biology and medicine BACKGROUND:The occurrence of cancer is closely related to single nucleotide variants (SNVs). However, in DNA samples collected from patients with distinct cancers, SNVs are detected in different patterns. Therefore, it is an important task to select the appropriate method by which to classify cancer to the greatest extent of SNV patterns, which will aid in cancer diagnosis and treatment. In traditional studies, researchers combined each SNV with its neighboring nucleotides to form a trinucleotide. Mutation signatures for cancer classification were extracted from the patterns of the trinucleotides, but the SNV feature extraction in a single dimension may result in partial information loss and poor model performance. RESULTS:In this study, we defined multidimensional SNV (M-SNV) features to classify cancer. M-SNV features considered first- and second-order neighboring nucleotides of one-dimensional SNVs and included six types of features. We validated the feasibility of M-SNV features using a dataset obtained from The Cancer Genome Atlas (TCGA) consisting of 2761 samples from 12 cancers. We performed preliminary screening of 562,321 DNA mutation sites in these samples. The remaining mutation sites were characterized by cancer type in six signatures. We found that the extracted features showed a similar distribution in the cluster center of the cancer type of the samples. After the preprocessing of raw data, samples were more focused on the cancer subtype distributions at the SNV level. We used KNN (k-nearest neighbors) to classify the extracted features and employed the leave-one-out cross to verify them. The accuracy of classifying is stable at approximately 97% and can reach 97.43% in the most optimal case. Furthermore, we found that the validated oncogenes in the loci of the features had the highest importance among the 8 cancers. CONCLUSIONS:It is feasible to classify cancers by the distribution of features we defined. Moreover, our methodology has potential implications for the discovery of oncogenes. 10.1016/j.compbiomed.2022.106270
Similarities and differences between exome sequences found in a variety of tissues from the same individual. Gómez-Ramos Alberto,Sanchez-Sanchez Rafael,Muhaisen Ashraf,Rábano Alberto,Soriano Eduardo,Avila Jesús PloS one DNA is the most stable nucleic acid and most important store of genetic information. DNA sequences are conserved in virtually all the cells of a multicellular organism. To analyze the sequences of various individuals with distinct pathological disorders, DNA is routinely isolated from blood, independently of the tissue that is the target of the disease. This approach has proven useful for the identification of familial diseases where mutations are present in parental germinal cells. With the capacity to compare DNA sequences from distinct tissues or cells, present technology can be used to study whether DNA sequences in tissues are invariant. Here we explored the presence of specific SNVs (Single Nucleotide Variations) in various tissues of the same individual. We tested for the presence of tissue-specific exonic SNVs, taking blood exome as a control. We analyzed the chromosomal location of these SNVs. The number of SNVs per chromosome was found not to depend on chromosome length, but mainly on the number of protein-coding genes per chromosome. Although similar but not identical patterns of chromosomal distribution of tissue-specific SNVs were found, clear differences were detected. This observation supports the notion that each tissue has a specific SNV exome signature. 10.1371/journal.pone.0101412
Comparative genome-wide survey of single nucleotide variation uncovers the genetic diversity and potential biomedical applications among six Macaca species. Li Jing,Fan Zhenxin,Sun Tianlin,Peng Changjun,Yue Bisong,Li Jing International journal of molecular sciences Macaca is of great importance in evolutionary and biomedical research. Aiming at elucidating genetic diversity patterns and potential biomedical applications of macaques, we characterized single nucleotide variations (SNVs) of six Macaca species based on the reference genome of Macaca mulatta. Using eight whole-genome sequences, representing the most comprehensive genomic SNV study in Macaca to date, we focused on discovery and comparison of nonsynonymous SNVs (nsSNVs) with bioinformatic tools. We observed that SNV distribution patterns were generally congruent among the eight individuals. Outlier tests of nsSNV distribution patterns detected 319 bins with significantly distinct genetic divergence among macaques, including differences in genes associated with taste transduction, homologous recombination, and fat and protein digestion. Genes with specific nsSNVs in various macaques were differentially enriched for metabolism pathways, such as glycolysis, protein digestion and absorption. On average, 24.95% and 11.67% specific nsSNVs were putatively deleterious according to PolyPhen2 and SIFT4G, respectively, among which the shared deleterious SNVs were located in 564⁻1981 genes. These genes displayed enrichment signals in the 'obesity-related traits' disease category for all surveyed macaques, confirming that they were suitable models for obesity related studies. Additional enriched disease categories were observed in some macaques, exhibiting promising potential for biomedical application. Positively selected genes identified by PAML in most tested Macaca species played roles in immune and nervous system, growth and development, and fat metabolism. We propose that metabolism and body size play important roles in the evolutionary adaptation of macaques. 10.3390/ijms19103123
A Regional Burden of Sequence-Level Variation in the 22q11.2 Region Influences Schizophrenia Risk and Educational Attainment. Biological psychiatry BACKGROUND:Genomic loci where recurrent pathogenic copy number variants are associated with psychiatric phenotypes in the population may also be sensitive to the collective impact of multiple functional low-frequency single nucleotide variants (SNVs). METHODS:We examined the cumulative impact of low-frequency, functional SNVs within the 22q11.2 region on schizophrenia risk in a discovery cohort and an independent replication cohort (N = 1933 and N = 11,128, respectively), as well as the impact on educational attainment (EA) in a third, independent, general population cohort (N = 2081). In the discovery and EA cohorts, SNVs were identified using genotyping arrays; in the replication cohort, whole-exome sequencing was available. For verification, we compared the regional SNV count for schizophrenia cases in the discovery cohort with a normative count distribution derived from a large population dataset (N = 26,500) using bootstrap procedures. RESULTS:In both schizophrenia cohorts, an increased regional SNV burden (≥4 low-frequency SNVs) in the 22q11.2 region was associated with schizophrenia (discovery cohort: odds ratio = 7.48, p = .039; replication cohort: odds ratio = 1.92, p = .004). In the EA cohort, an increased regional SNV burden at 22q11.2 was associated with decreased EA (odds ratio = 4.65, p = .049). Comparing the SNV count for schizophrenia cases with a normative distribution confirmed the unique nature of the distribution for schizophrenia cases (p = .002). CONCLUSIONS:In the general population, an increased burden of low-frequency, functional SNVs in the 22q11.2 region is associated with schizophrenia risk and a decrease in EA. These findings suggest that in addition to structural variation, a cumulative regional burden of low-frequency, functional SNVs in the 22q11.2 region can also have a relevant phenotypic impact. 10.1016/j.biopsych.2021.11.019