386 - Unique Signatures of Highly Constrained Genes Across Publicly Available Genomic Databases
Friday, April 25, 2025
5:30pm – 7:45pm HST
Publication Number: 386.6772
Klaus Schmitz-Abe, University of Miami Leonard M. Miller School of Medicine, Miami, FL, United States; Qifei Li, University of Miami Miller School of Medicine and Holtz Children's Hospital, Miami, FL, United States; Sunny Greene, University of Miami Leonard M. Miller School of Medicine, Miami, FL, United States; Michela R. Borrelli, University of Miami Leonard M. Miller School of Medicine, Fort Lauderdale, FL, United States; Shiyu Luo, University of Miami Leonard M. Miller School of Medicine, Miami, FL, United States; Madesh Ramesh, University of Miami Leonard M. Miller School of Medicine, Miami, FL, United States; Pankaj B.. Agrawal, University of Miami Leonard M. Miller School of Medicine, Miami, FL, United States
Associate professor University of Miami Leonard M. Miller School of Medicine Miami, Florida, United States
Background: The advent of large-scale population databases has revolutionized the field of genetics by providing a rich resource for gene and variant-level data. Such databases catalog the frequency and distribution of genetic variations across diverse populations, offering a powerful tool for interpreting human biology, and linking genes and their variants with human diseases. Objective: We aimed to determine and evaluate highly constrained genes to provide critical insights about their chromosomal distribution, size, levels and type of tissue expression, specific cellular functions, and molecular pathways. Those highly constrained genes known to be associated with human diseases provide interesting patterns regarding inheritance and protein function, among others, while those yet to be linked should be prioritized for human disease gene discovery. Design/Methods: We utilized one of the largest publicly available databases, gnomAD, to determine genes that are highly constrained for only loss of function (LoF), only missense (Ms), and both LoF/Ms variants (Figure 1). We identified their unique signatures and explored their causal relationship with human diseases. Results: We identified unique patterns of inheritance, protein size, and enrichment in distinct molecular pathways for those constrained genes associated with human disease. A majority of constrained genes were dominant (p < 0.0001), while the N-C group contains the highest percentage of recessive genes (p < 0.001) (Figure 2). In addition, our analysis identifies 75 highly constrained genes (32 LoF/Ms-C, 29 Ms-C, and 14 LoF-C) that are not yet linked to a human disease using OMIM, HGMD, and ClinVar databases (Table 1). These novel genes are promising candidates as novel human disease genes and for future studies to determine their role in genetic disorders.
Conclusion(s): This study presents a methodology to use public, large-scale population databases using a constraint-based approach to identify biologically important proteins, molecular pathways, and prioritizing candidates for rare genetic disorders. The integration of constraint metrics with gene ontology and population databases can provide a better understanding of the pathways disrupted in disease and essential for cellular physiology. As relevant datasets grow, constraint-based approaches will become increasingly helpful and powerful for revealing connections of biological pathways and genetic diseases. Together, these tools and approaches will further gene discovery and expand our understanding of human physiology.
Selection of constraint genes Figure 1: Distribution of constrained and non-constrained genes after grouping of highly constrained genes into three categories (LoF-C, Ms-C, and LoF/Ms-C) based on the missense Z-score to the LoF Z-score provided by gnomAD.
Distribution of constrained and non-constrained genes in relation to inheritance using OMIM Figure 2: High-constraint genes were more likely to be dominantly inherited compared to N-C genes, and similarly N-C genes were more likely to be recessively inherited compared to constrained genes. **** p < 0.0001, *** p < 0.001, ** p < 0.01, * p <0.05.
Potential novel disease genes Table 1: Identification of potential novel genes in high-constraint groups using OMIM, HGMD, and ClinVar.