selected publications
- DLSupervised multi-specialist topic model with applications on large-scale electronic health record dataSong, Ziyang, Sumba, Xavier, Xu, Yixin, Liu, Aihua, Guo, Liming, Powell, Guido, Verma, Aman, Buckeridge, David, Marelli, Ariane, and Li, YueACM Conference on Bioinformatics, Computational Biology, and Health Informatics 2021
Electronic health record (EHR) data provides a new venue to elucidate disease comorbidities and latent phenotypes for precision medicine. To fully exploit its potential, a realistic data generative process of the EHR data needs to be modelled. We present MixEHR-S to jointly infer specialist-disease topics from the EHR data. As the key contribution, we model the specialist assignments and ICD-coded diagnoses as the latent topics based on patient's underlying disease topic mixture in a novel unified supervised hierarchical Bayesian topic model. For efficient inference, we developed a closed-form collapsed variational inference algorithm to learn the model distributions of MixEHR-S. We applied MixEHR-S to two independent large-scale EHR databases in Quebec with three targeted applications (1) Congenital Heart Disease (CHD) diagnostic prediction among 154,775 patients; (2) Chronic obstructive pulmonary disease (COPD) diagnostic prediction among 73,791 patients; (3) future insulin treatment prediction among 78,712 patients diagnosed with diabetes as a mean to assess the disease exacerbation. In all three applications, MixEHR-S conferred clinically meaningful latent topics among the most predictive latent topics and achieved superior target prediction accuracy compared to the existing methods, providing opportunities for prioritizing high-risk patients for healthcare services.
- MLClustering count data with stochastic expectation propagationSumba, Xavier, Zamzami, Nuha, and Bouguila, NizarIn Asian Conference on Intelligent Information and Database Systems 2021
- MLImproving classification using topic correlation and expectation propagationSumba, Xavier, and Bouguila, NizarIn Canadian Conference on Artificial Intelligence 2020
- DLBetween the interaction of graph neural networks and semantic webSumba, Xavier, and Ortiz, JoséIn Proceedings of the 2019 NeurIPS Workshop on Graph Representation Learning 2019
- SWREDI: a linked data-powered research networking platformSumba, Xavier, Segarra, José, Ortiz, José, Villazón-Terrazas, Boris, Espinoza, Mauricio, and Saquicela, VíctorIn European Semantic Web Conference 2018
Research networking is a difficult part of academics in spite of the multiple benefits that the Web has brought within this field in recent years. Even though scientific and business social networks provide a medium to discover peers worldwide, their usefulness meets its limits when real-world requirements come in. The broad audience of those tools and other bibliographic databases lead them to ignore cultural and geographical aspects such regional indexes, organizational structures, among others. On this poster, we introduce REDI, a Linked Data - powered research networking platform which combines both local (institutional/regional) and external (Web) scholarly sources in a consolidated knowledge base. Moreover, REDI leverages on its knowledge base to cluster authors within similar research areas easing networking and unveiling a variety of new information from data for multiple purposes.
- ML & SWDetecting similar areas of knowledge using semantic and data mining technologiesSumba, Xavier, Sumba, Freddy, Tello, Andres, Baculima, Fernando, Espinoza, Mauricio, and Saquicela, VictorElectronic Notes in Theoretical Computer Science 2016
Searching for scientific publications online is an essential task for researchers working on a certain topic. However, the extremely large amount of scientific publications found in the web turns the process of finding a publication into a very difficult task whereas, locating peers interested in collaborating on a specific topic or reviewing literature is even more challenging. In this paper, we propose a novel architecture to join multiple bibliographic sources, with the aim of identifying common research areas and potential collaboration networks, through a combination of ontologies, vocabularies, and Linked Data technologies for enriching a base data model. Furthermore, we implement a prototype to provide a centralized repository with bibliographic sources and to find similar knowledge areas using data mining techniques in the domain of Ecuadorian researchers community.
@article{sumba2016detecting, abbr = {ML & SW}, bibtex_show = {true}, selected = {true}, title = {Detecting similar areas of knowledge using semantic and data mining technologies}, author = {Sumba, Xavier and Sumba, Freddy and Tello, Andres and Baculima, Fernando and Espinoza, Mauricio and Saquicela, Victor}, journal = {Electronic Notes in Theoretical Computer Science}, volume = {329}, pages = {149--167}, year = {2016}, publisher = {Elsevier}, pdf = {https://reader.elsevier.com/reader/sd/pii/S1571066116301165?token=3B6D9A1ABD2D632CFAEA0268C4EAF16B2155FDA7793A23363E65AA5148EF3254F9C93CA821565365FC91E3B78E77BBFE&originRegion=us-east-1&originCreation=20211207070120}, html = {https://www.sciencedirect.com/science/article/pii/S1571066116301165} }
services
| Google Summer of Code | Port Apache Marmotta to Eclipse RDF4J 2017 |
|---|---|
| Volunteer | NeurIPS 2018 , ICML 2019 , EMNLP 2020 , ACL 2020 , NeurIPS 2020 , NeurIPS 2021 |
| LatinX in AI Chair | ICML 2019 , ICML 2025 |
| LatinX in AI Program Committee | ICML 2019 , NeurIPS 2019 , ICML 2020 , ICML 2025 |
| Reviewer | ML Reproducibility Challenge 2020 , ML4H: Machine Learning for Health 2021 , LatinX in AI workshop at CVPR 2021 2021 , ICBINB@NeurIPS 2021 , LoG 2023 , LoG 2024 , LXAI@NeurIPS 2024 , ICLR 2025 , LoG 2025 , ICML 2026 |
| Mentor/Advisor | LatinX in AI at ICML 2021 |