Application of beta regression to the CD44 alternative splicing problem

 pdf (809K)

Aberrant alternative splicing of the CD44 gene drives colorectal cancer progression and facilitates the emergence of cancer stem cells. Although biomedical research recognizes this transmembrane glycoprotein as a major catalyst of malignancy, deciphering its multi-isoform regulatory networks remains a complex analytical challenge. To address this knowledge gap, this study presents a machine learning framework designed to decode these biological mechanisms. The author constructed a neural network regressor based on beta regression to model bounded isoform proportions. This computational architecture jointly estimates both the mean and the precision parameters of the underlying probability distribution. Furthermore, the system employs elastic net regularization to perform quantitative feature selection from highdimensional molecular expression data.

The investigation evaluates the proposed framework using gene expression profiles from colorectal cancer patients. The primary objective involves identifying specific ribonucleic acid-binding proteins acting as regulatory splicing factors. The experimental design contrasts two distinct mathematical modeling strategies. The first configuration incorporates an independent ”one-vs-all” approach that treats each transcript variant as an isolated regression target. The second formulation utilizes a structured ”isoform tree” method that directly mirrors hierarchical exon inclusion relationships. Validation experiments on synthetically generated datasets confirmed the mathematical integrity of the network. The model recovered true distribution parameters with precision and exhibited no systematic bias. Comprehensive empirical comparisons subsequently demonstrated that the independent ”one-vs-all” layout consistently outperforms the hierarchical tree configuration in predictive stability and accuracy.

The computational analysis maps the regulatory landscape of the CD44 gene. The framework validates several established splicing factors while uncovering new candidate proteins, including ACO1, NUDT21, and AGO2. Based on these statistical associations, the paper introduces a biological hypothesis. This concept functionally connects intracellular iron metabolism via the ACO1 protein with the shifting balance of CD44 variants. These discoveries provide deeper insights into oncogenic splicing regulation. Ultimately, they highlight molecular targets for future therapeutic interventions aimed at suppressing the cancer stem cell phenotype.

Keywords: beta regression, machine learning, splicing, CD44
Citation in English: Pirogov A.A. Application of beta regression to the CD44 alternative splicing problem // Computer Research and Modeling, 2026, vol. 18, no. 3, pp. 697-714
Citation in English: Pirogov A.A. Application of beta regression to the CD44 alternative splicing problem // Computer Research and Modeling, 2026, vol. 18, no. 3, pp. 697-714
DOI: 10.20537/2076-7633-2026-18-3-697-714

Copyright © 2026 Pirogov A.A.

Indexed in Scopus

Full-text version of the journal is also available on the web site of the scientific electronic library eLIBRARY.RU

The journal is included in the Russian Science Citation Index

The journal is included in the RSCI

International Interdisciplinary Conference "Mathematics. Computing. Education"