Causal representation learning for disease risk stratification in multi-ethnic populations using real-world Biobank Cohorts
Triella Consults, Agbaoku Street, Ikeja, Lagos state, Nigeria.
Review Article
World Journal of Advanced Research and Reviews, 2022, 16(03), 1339-1357
Publication history:
Received on 25 October 2022; revised on 21 December 2022; accepted on 28 December 2022
Abstract:
Health disparities across racial and ethnic groups remain a persistent challenge in modern healthcare systems, particularly in disease diagnosis, prognosis, and risk stratification. Traditional predictive models often fail to generalize across diverse populations due to biases in training data, confounding variables, and lack of robust causal inference mechanisms. Recent advances in causal representation learning offer a transformative framework to disentangle spurious correlations from underlying causal factors, enabling more equitable and interpretable disease risk prediction. This study proposes a novel causal representation learning (CRL) pipeline that integrates real-world biobank data from multi-ethnic cohorts to enhance disease risk stratification. By leveraging structured electronic health records (EHRs), genetic variants, social determinants of health, and longitudinal outcomes, we model latent causal structures that remain invariant across subpopulations. We apply domain-invariant learning and counterfactual reasoning to correct for population-specific confounding, enhancing the generalizability of disease risk scores. Experiments conducted on the UK Biobank and All of Us datasets demonstrate that our CRL approach outperforms standard machine learning models in identifying high-risk individuals across African, Asian, Hispanic, and European ancestry groups. Furthermore, our method improves calibration, reduces disparities in false-positive rates, and provides interpretable insights into population-specific risk drivers. This work bridges methodological innovation in causal machine learning with the urgent need for equity in biomedical research and clinical decision-making. Our findings advocate for the deployment of causally-aware, population-adaptive algorithms in real-world health systems to enable more personalized and fair healthcare interventions for all ethnic groups.
Keywords:
Causal Representation Learning; Disease Risk Stratification; Multi-Ethnic Populations; Biobank Cohorts; Health Equity; Real-World Evidence
Full text article in PDF:
Copyright information:
Copyright © 2022 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0
