| Oral Presentation Accepted Papers (no poster board) | First Name of Presenting Author | Last Name of Presenting Author | Please indicate the session that you are submitting your paper to: | Paper acceptance status: | Last name of first author. | Paper Title | List all authors (first name first with names separated by commas) in the order they appear on the paper. | Author affiliations (in order of the list of authors). Please separate affiliations with commas. | Submit your abstract (300 words or less) for inclusion in the abstract book. If your paper was accepted for oral presentation and you would also like space for a poster in the general poster session, please submit a separate abstract via the <a href="http://psb.stanford.edu/abstract.html">abstract submission form</a>. | Provide your poster's DOI. | |
| Oral Presentation-No Poster | Michael | Burkhart | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Burkhart | Quantifying surprise in clinical care: Detecting highly informative events in electronic health records with foundation models | Michael C. Burkhart, Bashar Ramadan, Luke Solo, William F. Parker, Brett K. Beaulieu-Jones | University of Chicago, University of Chicago, University of Chicago, University of Chicago, University of Chicago | We present a foundation model-derived method to identify highly informative tokens and events in electronic health records. Our approach considers incoming data for the entire context of a patient's hospitalization to find surprising events. Context enables flagging anomalous events that rule-based approaches would consider within a normal range. We demonstrate that the events our model flags are significantly more useful than average events for predicting downstream patient outcomes and show that a fraction of events we identify as unsurprising can be safely dropped without an adverse impact on performance. Finally, we show how informativeness can help interpret the predictions of prognostic models trained on foundation model-derived representations. | https://doi.org/10.48550/arXiv.2507.22798 | |
| Oral Presentation-No Poster | Emma | Chen | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Chen, Emma | Evaluation of Large Language Models as Emergency Department Revisit Predictors | Emma Chen, Luyang Luo, Fatma Gunturkun, Sraavya Sambara, Rushil Arora, Boyang Tom Jin, Pranav Rajpurkar, David A Kim | Harvard Medical School; Harvard John A. Paulson School Of Engineering And Applied Sciences, Harvard Medical School, Stanford University, Harvard Medical School, Stanford University, Stanford University, Harvard Medical School, Stanford University | Large Language Models (LLMs) have shown promise in clinical reasoning and question answering, yet their effectiveness for real-world clinical prediction remains an open question. We present the first large-scale study evaluating LLMs for predicting 30-day emergency department (ED) revisits using 138,010 visits from the Adult Emergency Department at Stanford. We assessed two modeling paradigms: (1) direct prediction, where the LLM generates revisit risk assessments in natural language, and (2) embedding-based approaches that leverage LLM-derived vector representations (LLM2Vec) of patient data for downstream modeling. Retrieval augmentation improved direct prediction performance (e.g., Claude 3.7 F1 from 0.3755 with 95% CI 0.3647-0.3864 to 0.4160 with 95% CI 0.4024-0.4294), and embedding-based methods consistently outperformed direct approaches, with LLM2Vec achieving F1 of 0.4505 with 95% CI 0.4345-0.4666. Despite having access to comprehensive structured and unstructured clinical data, all LLM approaches (F1 range 0.3022-0.4505) failed to exceed a traditional LightGBM model using only structured data (F1 of 0.4614 with 95% CI 0.4496-0.4789). Through systematic analysis of the reasoning chains in 17,488 predictions, we suggest potential failure patterns: reasoning may systematically degrade performance through overweighting medical histories and similar visits, neglecting protective factors, and risk aversion. Our work establishes essential baseline performance while revealing fundamental limitations in current-generation LLMs for clinical prediction tasks. | https://doi.org/10.7490/f1000research.1120322.1 | |
| Oral Presentation-No Poster | Wenyuan | Chen | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Chen, W | Retrieval‑Augmented Guardrails for AI‑Drafted Patient‑Portal Messages: Error Taxonomy Construction and Large-Scale Evaluation | Wenyuan
Chen, Fateme Nateghi Haredasht, Kameron C Black, François Grolleau, Emily Alsentzer, Jonathan H Chen, Stephen P Ma |
Stanford
University, Stanford University, Stanford University, Stanford University, Stanford University, Stanford University, Stanford University |
Asynchronous patient–clinician messaging via EHR portals is a growing source of clinician workload, prompting interest in large language models (LLMs) to assist with draft responses. However, LLM outputs may contain clinical inaccuracies, omissions, or tone mismatches, making robust evaluation essential. Our contributions are threefold: (1) we introduce a clinically grounded error ontology comprising 5 domains and 59 granular error codes, developed through inductive coding and expert adjudication; (2) we develop a Retrieval-Augmented Error Checking (RAEC) pipeline that leverages semantically similar historical message–response pairs to improve judgment quality; and (3) we provide a two-stage prompting architecture using DSPy to enable scalable, interpretable, and hierarchical error detection. Our approach assesses the quality of drafts both in isolation and with reference to similar past message–response pairs retrieved from institutional archives. Using a two-stage DSPy pipeline, we compared baseline and reference-enhanced evaluations on over 1,500 patient messages. Retrieval context improved error identification in domains such as clinical completeness and workflow appropriateness. Human validation on 100 messages demonstrated superior agreement (concordance = 50% vs. 33%) and performance (F1 = 0.500 vs. 0.256) of context-enhanced labels vs. baseline, supporting the use of our RAEC pipeline as AI guardrails for patient messaging. | https://doi.org/10.48550/arXiv.2509.22565 | |
| Oral Presentation-No Poster | Sumon Kanti | Dey | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Dey | Inference Gap in Domain Expertise and Machine Intelligence in Named Entity Recognition: Creation of and Insights from a Substance Use-related Dataset | Sumon Kanti Dey, Jeanne M. Powell, Azra Ismail, Jeanmarie Perrone, Abeed Sarker | Emory University, Emory University, Emory University, University of Pennsylvania, Emory University | Nonmedical opioid use is an urgent public health challenge, with far-reaching clinical and social consequences that are often underreported in traditional healthcare settings. Social media platforms, where individuals candidly share first-person experiences, offer a valuable yet underutilized source of insight into these impacts. In this study, we present a named entity recognition (NER) framework to extract two categories of self-reported consequences from social media narratives related to opioid use: ClinicalImpacts (e.g., withdrawal, depression) and SocialImpacts (e.g., job loss). To support this task, we introduce RedditImpacts 2.0, a high-quality dataset with refined annotation guidelines and a focus on first-person disclosures, addressing key limitations of prior work. We evaluate both fine-tuned encoder-based models and state-of-the-art large language models (LLMs) under zero- and few-shot in-context learning settings. Our fine-tuned DeBERTa-large model achieves a relaxed token-level F1 of 0.61 [95% CI: 0.43–0.62], consistently outperforming LLMs in precision, span accuracy, and adherence to task-specific guidelines. Furthermore, we show that strong NER performance can be achieved with substantially less labeled data, emphasizing the feasibility of deploying robust models in resource-limited settings. Our findings underscore the value of domain-specific fine-tuning for clinical NLP tasks and contribute to the responsible development of AI tools that may enhance addiction surveillance, improve interpretability, and support real-world healthcare decision-making. The best performing model, however, still significantly underperforms compared to inter-expert agreement (Cohen's kappa: 0.81), demonstrating that a gap persists between expert intelligence and current state-of-the-art NER/AI capabilities for tasks requiring deep domain knowledge. The dataset, annotation guidelines, appendix, and training scripts are publicly available to support future research. | https://doi.org/10.48550/arXiv.2508.19467 | |
| Oral Presentation-No Poster | Tianning | Feng | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Feng | SeizureFormer: A Multi-Scale Transformer for Seizure Risk Forecasting from RNS-Derived Biomarkers | Tianning Feng, Juntong Ni, Wei Jin, Ezequiel Gleichgerrcht | University of Pennsylvania, Emory Univeristy, Emory Univeristy, Emory Univeristy | We present SeizureFormer, a Transformer-based model for long-horizon seizure risk forecasting (1–14 days) using structured biomarkers—interictal epileptiform activity (IEA) and long episodes (LE)—extracted from responsive neurostimulation (RNS) systems. Unlike prior models based on raw scalp EEG, SeizureFormer leverages stable RNS biomarkers and integrates multi-scale CNN patch embedding, cross-variable temporal convolution, and squeeze-and-excitation attention to capture both short-term fluctuations and long-term seizure cycles. Tested across five patients and multiple prediction windows (1–14 days), SeizureFormer achieved state-of-the-art performance with mean ROC AUC of 79.44% and mean PR AUC of 76.29% across five patients and four prediction windows. Compared to statistical, classical ML, and deep learning baselines, it demonstrates superior generalizability under class imbalance. Clinically, it enables actionable multi-day forecasting, supporting personalized and proactive intervention in epilepsy care by forecasting seizure-related events 1 to 14 days ahead. | https://doi.org/10.7490/f1000research.1120388.1 | |
| Oral Presentation-No Poster | Romain | Hardy | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Hardy | ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors | Romain Hardy, Tyler M. Berzin, Pranav Rajpurkar | Harvard Medical School, Harvard Medical School, Beth Israel Deaconess Medical Center | Three-dimensional (3D) scene understanding in colonoscopy presents significant challenges that necessitate automated methods for accurate depth estimation. However, existing depth estimation models for endoscopy struggle with temporal consistency across video sequences, limiting their applicability for 3D reconstruction. We present ColonCrafter, a diffusion-based depth estimation model that generates temporally consistent depth maps from monocular colonoscopy videos. Our approach learns robust geometric priors from synthetic colonoscopy sequences, enabling reliable depth estimation across frames. We also introduce a style transfer technique that preserves geometric structure while adapting realistic clinical videos to match our synthetic training domain. ColonCrafter achieves state-of-the-art zero-shot performance on the C3VD dataset, outperforming both general-purpose and endoscopy-specific approaches. Although full trajectory 3D reconstruction remains a challenge, we demonstrate clinically relevant applications of ColonCrafter, including 3D point cloud generation and surface coverage assessment. Our code will be made publicly available at https://github.com/rajpurkarlab/ColonCrafter. | https://doi.org/10.7490/f1000research.1120306.1 | |
| Oral Presentation-No Poster | Shreya | Johri | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Johri | A Clinician-Guided Framework for Endoscopic AI: Developing PanEndoAtlas and Benchmarking Foundation Models Across the Full GI Spectrum | Shreya
Johri, Luyang Luo, Hong-Yu Zhou, Todd Brenner, Sami Elamin, Mark Enrik
Geissler, Tyler M. Berzin, Pranav Rajpurkar |
Department of Biomedical Informatics, Harvard Medical School; Department of Biomedical Informatics, Harvard Medical School; Department of Biomedical Informatics, Harvard Medical School; Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School; Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School; Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School; Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School; Department of Biomedical Informatics, Harvard Medical School | Endoscopic procedures play a central role in the diagnosis and management of gastrointestinal (GI) diseases, yet the field lacks large‐scale, clinically diverse benchmarks and unified datasets to evaluate vision foundation models. We introduce PanEndoSuite, the first unified ecosystem for endoscopic AI, developed through systematic collaboration between AI researchers and practicing gastroenterologists. PanEndoSuite consists of three complementary components: PanEndoAtlas, PanEndoX, and PanEndoFM. PanEndoAtlas is a harmonized dataset of over 420,000 labeled images from 30 public endoscopy datasets across 13 countries and 26 hospitals, creating a clinically-grounded hierarchical taxonomy that mirrors diagnostic reasoning patterns across 111 GI diseases. PanEndoX is a benchmark of 10 clinically grounded tasks, including hierarchical GI-tree classification, Barrett’s esophagus grading, ulcerative colitis scoring, polyp subtyping, Boston Bowel Preparation Scale assessment, multi-organ disease classification, and anatomical landmark identification—designed to probe generalization across anatomical regions, disease presentations, and annotation granularities. PanEndoFM is a foundation model pretrained on a 10 million–image corpus curated from public data sources, spanning the entire GI tract. We benchmark PanEndoFM against two endoscopy-specific foundation models (EndoFM-LV, EndoSSL) and two general-purpose vision models (ViT-B/16, ResNet-50). PanEndoFM achieves the highest macro-AUC on 6 of 10 tasks, demonstrating broad clinical generalization; EndoFM-LV performs best on colon-focused tasks, EndoSSL excels in polyp subtyping, and ViT-B/16 shows strengths on small-intestine conditions. Together, PanEndoSuite establishes a foundation for building robust, generalist AI systems in gastrointestinal endoscopy that bridge current AI capabilities and clinical practice. | https://doi.org/10.7490/f1000research.1120379.1 | |
| Oral Presentation-No Poster | Dennis | Wall | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Ko | Abstention
and Threshold Identification for Uncertainty Management in Clinical Decision
Tools: A Case Study using Human-In-The-Loop Pediatric Autism Classifiers |
Aiden Ko, Aaron Kline, Kaitlyn Dunlap, SaiMourya Surabhi, Parnian Azizian, Peter Y. Washington, Dennis P. Wall | Aiden
Ko, Aaron Kline, Kaitlyn Dunlap, SaiMourya Surabhi, Parnian Azizian Department of Pediatrics (Clinical Informatics), Stanford University, Stanford, CA 94305, USA Peter Y. Washington Division of Clinical Informatics and Digital Transformation, Department of Medicine, University of California, San Francisco (UCSF), San Francisco, CA 94143, USA Dennis P. Wall Departments of Pediatrics (Clinical Informatics), Biomedical Data Science, and Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA |
Uncertainty quantification remains an underdeveloped aspect of AI-based clinical decision tools. As AI systems become increasingly prevalent in healthcare, it is essential not only to measure uncertainty but also to manage it in ways that support clinical decision-making. In this study, we investigate abstention as a practical mechanism for managing uncertainty in diagnostic classifiers. To stress-test this approach, we deliberately evaluate abstention performance on a purposefully noisy dataset of pediatric autism video assessments comprising heterogeneous video sources and a diverse range of human raters. We apply abstention strategies to existing autism classifiers trained on diagnostic assessment data, comparing baseline performance to a range of thresholding configurations that trade off retained sample coverage against key clinical metrics. We compare performance gains from prioritizing sensitivity or specificity to targeting a balanced increase in Youden’s J to demonstrate a wide variety of use cases that abstention can enable. This work demonstrates a concrete use case of introducing abstention into the output range of clinical decision models, enabling both uncertainty quantification and management in diagnostic classifiers. | https://doi.org/10.7490/f1000research.1120389.1 | |
| Oral Presentation-No Poster | Wei | Jin | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Liu | Higher-order Interaction Matters: Modeling Epidemics via Dynamic Hypergraph Neural Networks | Songyuan Liu; Shengbo Gong; Tianing Feng; Zewen Liu; Max S.Y Lau; Wei Jin | Department
of Computer Science, Emory University; Department of Computer Science, Emory
University; Department of Computer Science, Emory University; Department of
Computer Science, Emory University; Department of Biostatistics and Bioinformatics, Emory University Department of Computer Science, Emory University; |
The ongoing need for effective epidemic modeling has driven advancements in capturing the complex dynamics of infectious diseases. Traditional models, such as Susceptible-Infected-Recovered, and graph-based approaches often fail to account for higher-order interactions and the nuanced structure pattern inherent in human contact networks. Higher-order interactions, such as those in schools, workplaces, or public transit, involve simultaneous contact among more than two individuals. This study introduces a novel Human Contact-Tracing Hypergraph Neural Network framework tailored for epidemic modeling called EpiDHGNN, leveraging the capabilities of hypergraphs to model intricate, higher-order relationships from both location and individual level. Both real-world and synthetic epidemic data are used to train and evaluate the model. Results demonstrate that EpiDHGNN consistently outperforms baseline models across various epidemic modeling tasks, such as source detection and forecast, by approximately 12.1\% through effectively capturing the higher-order interactions and preserving the complex structure of human interactions. This work underscores the potential of representing human contact data as hypergraphs and employing hypergraph-based methods to improve epidemic modeling, providing reliable insights for public health decision-making. | ||
| Oral Presentation-No Poster | Guillermo | Lopez-Garcia | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Lopez-Garcia | Scoring Physician Risk Communication in Prostate Cancer Using Large Language Models | Guillermo Lopez-Garcia, Dongfang Xu, Michael Luu, Renning Zheng, Timothy J. Daskivich, Graciela Gonzalez-Hernandez | Department of Computational Biomedicine (Cedars-Sinai Medical Center), Department of Computational Biomedicine (Cedars-Sinai Medical Center), Department of Biostatistics (Cedars-Sinai Medical Center), Department of Urology (Cedars-Sinai Medical Center), Department of Urology (Cedars-Sinai Medical Center), Department of Computational Biomedicine (Cedars-Sinai Medical Center) | Effective risk communication is essential to shared decision-making in prostate cancer care. However, the quality of physician communication of key concepts varies widely in real-world consultations. Manual evaluation of communication is labor-intensive and not scalable. We present a structured, rubric-based framework that uses large language models (LLMs) to automatically score the quality of risk communication in prostate cancer consultations. Using transcripts from 20 clinical visits, we curated and annotated 487 physician-spoken sentences that referenced five key concepts for shared decision-making: cancer prognosis, life expectancy, and three treatment side effects (erectile dysfunction, incontinence, and irritative urinary symptoms). Each sentence was assigned a score from 0 to 5 based on the precision and patient-specificity of communicated risk, using a validated scoring rubric. We modeled this task as five multiclass classification problems and evaluated both fine-tuned transformer baselines and GPT-4o with rubric-based and chain-of-thought (CoT) prompting. Our best performing approach, which combined rubric-based CoT prompting with few-shot learning, achieved micro averaged F1 scores between 85.0 and 92.0 across domains, outperforming supervised baselines and matching inter-annotator agreement. These findings establish a scalable foundation for AI-driven evaluation of physician–patient communication in oncology and beyond. | https://doi.org/10.7490/f1000research.1120370.1 | |
| Oral Presentation-No Poster | Howard | Prioleau | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Prioleau | Leveraging Large Language Models for Adverse Drug Event Detection: A Comparative Study of Token and Span-Based Named Entity Recognition | Howard Prioleau, Saurav Aryal, Jeremy Blackstone | Howard University, Howard University, Howard University | Adverse Drug Events (ADEs) pose a persistent threat to patient safety and public health. This study investigates the use of large language models (LLMs) fine-tuned for both token classification and span-based named entity recognition (NER) to improve ADE detection in clinical text. Using the 2018 n2c2 Track 2 dataset, we evaluate models under both predefined (gold label) and end-to-end settings. RoBERTa Large consistently outperforms other models, particularly in identifying ADEs, which remain more challenging due to their contextual ambiguity. Token-based models generally deliver stronger performance than span-based approaches, and ensemble methods, especially majority voting and XGBoost-based aggregation, further enhance end-to-end relation extraction by mitigating individual model weaknesses. These findings highlight the potential of fine-tuned LLMs, augmented by strategic ensembling, to advance clinical NLP pipelines and support safer, more personalized healthcare. | https://doi.org/10.7490/f1000research.1120310.1 | |
| Oral Presentation-No Poster | Sraavya | Sambara | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Sambara | 3DReasonKnee: Advancing Grounded Reasoning in Medical Vision Language Models | Sraavya
Sambara, Sung Eun Kim, Xiaoman Zhang, Luyang Luo, Shreya Johri, Mohammed
Baharoon, Du Hyun Ro, Pranav Rajpurkar |
Harvard University, Seoul National University Hospital | Current Vision-Language Models (VLMs) struggle to ground anatomical regions in 3D medical images and reason about them in a step-by-step manner—a key requirement of real- world diagnostic assessment. This ability is essential for aligning model outputs with the diagnostic workflows clinicians use in practice, enabling trustworthy clinician-AI collabora- tion. Existing 3D datasets provide localization labels, but none support this “grounded rea- soning” ability. To address this gap, we introduce 3DReasonKnee, the first 3D grounded reasoning dataset for medical images, which provides 494k high-quality quintuples derived from 7,970 3D knee MRI volumes. Each quintuple includes: (1) the 3D MRI volume, (2) a diagnostic question targeting a specific anatomical region (3) a 3D bounding box local- izing the relevant anatomical structures, (4) clinician-generated diagnostic reasoning steps that explicitly detail the 3D reasoning process, and (5) structured severity assessments for the relevant anatomical region. The meticulous creation and validation of 3DReasonKnee, involving over 450 hours of expert clinician time for manually segmenting MRIs and gen- erating reasoning chains, ensures its superior quality and clinical relevance. We establish ReasonKnee-Bench to evaluate localization and diagnostic accuracy, providing novel in- sight into VLM ability to perform grounding and severity assessment across diverse anatom- ical regions and diagnostic inquiries. We benchmark five state-of-the-art VLMs, providing baseline performance for ReasonKnee-Bench. By providing this unique resource of expert- annotated 3D reasoning pathways, 3DReasonKnee serves as a repository of orthopedic surgeons’ diagnostic expertise and offers a vital testbed for advancing multimodal medi- cal AI systems towards 3D, clinically aligned, localized decision-making capabilities. The dataset can be found in HuggingFace: rajpurkarlab/3DReasonKnee. | https://doi.org/10.7490/f1000research.1120304.1 | |
| Oral Presentation-No Poster | Eric | Strobl | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Strobl | Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data | Eric V. Strobl | University of Pittsburgh | Causal inference in longitudinal biomedical data remains a central challenge, especially in psychiatry, where symptom heterogeneity and latent confounding frequently undermine classical estimators. Most existing methods for treatment effect estimation presuppose a fixed outcome variable and address confounding through observed covariate adjustment. However, the assumption of unconfoundedness may not hold for a fixed outcome in practice. To address this foundational limitation, we directly optimize the outcome definition to maximize causal identifiability. Our DEBIAS (Durable Effects with Backdoor-Invariant Aggregated Symptoms) algorithm learns non-negative, clinically interpretable weights for outcome aggregation, maximizing durable treatment effects and empirically minimizing both observed and latent confounding by leveraging the time-limited direct effects of prior treatments in psychiatric longitudinal data. The algorithm also furnishes an empirically verifiable test for outcome unconfoundedness. DEBIAS consistently outperforms state-of-the-art methods in recovering causal effects for clinically interpretable composite outcomes across comprehensive experiments in depression and schizophrenia. R code is available at github.com/ericstrobl/DEBIAS. | https://doi.org/10.7490/f1000research.1120300.1 | |
| Oral Presentation-No Poster | Andrew | Zolensky | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Oral presentation | Zolensky | Speaker Role Identification in Clinical Conversations | Andrew Zolensky, Kuk Jin Jang, Janice Sabin, Andrea Hartzler, Basam Alasaly, Sriharsha Mopidevi, Mark Liberman, Kevin Johnson | University of Pennsylvania, Hongkik University, University of Washington, University of Washington, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania | Patient-clinician
communication research is crucial for understanding interaction dynamics and
for predicting outcomes that are associated with clinical discourse. Traditionally, interaction analysis is conducted manually because of challenges such as Speaker Role Identification (SRI), which must reliably differentiate between doctors, medical assistants, patients, and other caregivers in the same room. Although automatic speech recognition with diarization can efficiently create a transcript with separate labels for each speaker, these systems are not able to assign roles to each person in the interaction. Previous SRI studies in task-oriented scenarios have directly predicted roles using linguistic features, bypassing diarization. However, to our knowledge nobody has investigated SRI in clinical settings. We explored whether Large Language Models (LLMs) such as BERT could accurately identify speaker roles in clinical transcripts, with and without diarization. We used veridical turn segmentation and diarization identifiers, fine-tuning each model at varying levels of identifier corruption to assess impact on performance. Our results demonstrate that BERT achieves high performance with linguistic signals alone (82% accuracy/82% F1-score), while incorporating accurate diarization identifiers further enhances accuracy (95%/95%). We conclude that fine-tuned LLMs are effective tools for SRI in clinical settings. |
https://doi.org/10.7490/f1000research.1120293.1 | |
| Oral Presentation-No Poster | Shafayat | Ahmed | Biological molecular function: methods and benchmarks for finding function in biological dark matter | Oral presentation | Ahmed | HALO: Hybrid Attention Model for Subcellular Localization | Shafayat Ahmed, Nazifa Ahmed Moumi, Liqing Zhang | Virginia Tech, Virginia Tech , Virginia Tech, | Subcellular
localization prediction is critical for understanding protein functions and
interactions, providing insights into cellular mechanisms and potential
therapeutic targets. We propose HALO (Hybrid Attention model for subcellular
LOcalization), a framework that integrates semantic embeddings from
fine-tuned protein language models (e.g., ESM) with structural information
derived from AlphaFold. HALO uses a graph attention network (GAT) to
incorporate biochemical, structural, and sequence-derived features into a
unified representation, while dynamically balancing their contributions.
Crucially, the design allows HALO to operate in two modes: (i) a
sequence-only mode, where predictions are made from the fine-tuned protein language model (PLM) when structural data are unavailable, and (ii) a hybrid mode, where structural adjacency and biochemical features complement PLM predictions, especially in low-confidence regions. We evaluate HALO on multiple datasets with minimal homology between training and test sets, where it achieves competitive performance across key metrics. By flexibly combining sequence-based and structure-informed predictions, HALO addresses the limitations of relying on a single modality and offers an adaptable framework for accurate and generalizable subcellular localization. |
https://doi.org/10.7490/f1000research.1120299.1 | |
| Oral Presentation-No Poster | Seowon | Chang | Biological molecular function: methods and benchmarks for finding function in biological dark matter | Oral presentation | Chang | PertSpectra:
Interpretable Matrix Factorization for Predicting Functional Impact of
Genetic Perturbation Experiments |
Seowon
Chang, Anna Shcherbina, Tal Ashuach, Shahin Mohammadi, Stephanie See, Ninad
Ranadive, Emily Fox, Navpreet Ranu |
Brown University, Insitro, Insitro, Insitro, Insitro, Insitro, Insitro, Insitro | In drug discovery, measuring the effects of genetic perturbations is a powerful tool for studying unknown disease mechanisms, but biological interpretation of these effects, especially with the advent of screens involving combinatorial perturbations, remains challenging. To address limitations in current methodology we introduce PertSpectra, a guided triple matrix factorization that incorporates perturbation information and regularizes the model using a known gene-gene interaction graph prior to generate sparse, biologically relevant latent factors that capture perturbational effects. We evaluate PertSpectra on three single-cell RNA-seq datasets with both single and combinatorial genetic perturbations, measuring latent space interpretability, predictive ability on unseen combinations of observed perturbations, and stratification of functionally similar perturbations. We show that PertSpectra provides an integrated modeling approach to understanding combinatorial perturbation data in the context of drug discovery. | https://doi.org/10.7490/f1000research.1120291.1 | |
| Oral Presentation-No Poster | Cory | Scott | Biological molecular function: methods and benchmarks for finding function in biological dark matter | Oral presentation | Scott | Implicitly and Differentiably Representing Protein Surfaces and Interfaces | Cory B. Scott, Charlie Rothschild, Benjamin E. Nye | Colorado College | We introduce a pipeline for implicitly representing a protein, or protein complex, as the union of signed distance functions (SDFs) by representing each atom as a sphere with the appropriate van der Waals radius. While this idea has been used previously as a way to render images of proteins, it has not, to our knowledge, been widely adopted in a machine learning setting. Mirroring recent successful work applying SDFs to represent 3D geometry, we present a proof of concept that this representation of proteins could be useful in several biologically relevant applications. We also propose further experiments that are necessary to validate the proposed approach. | https://doi.org/10.7490/f1000research.1120372.1 | |
| Oral Presentation-No Poster | Ben | Viggiano | Biological molecular function: methods and benchmarks for finding function in biological dark matter | Oral presentation | Viggiano | Steering Protein Generative Models at Test-Time for Guided AAV2 Capsid Design | Ben Viggiano, Wenhui Sophia Lu, Xiaowei Zhang, Luis S. Mille-Fragoso, Xiaojing J. Gao, Euan Ashley, Wing Hung Wong | Stanford
University Department of Biomedical Data Science, Stanford University Department of Statistics, Stanford University Department of Bioengineering, Stanford University Sarafan ChEM-H, Stanford University Bio-X, Stanford University Department of Medicine, Stanford University Center for Undiagnosed Diseases, Stanford University Department of Chemical Engineering |
Recent advances in protein generative models have created new opportunities for protein engineering. However, a significant challenge remains in effectively steering these models to generate sequences with specific, desired functionalities, especially when these properties are defined by "black-box" or non-differentiable fitness functions. To address this, we present ProVADA+, a model-agnostic framework that guides pretrained generative models at test-time without costly retraining. Our approach introduces a reinforcement learning-based adaptive masking technique (MADA-DUCB) that significantly accelerates convergence. We demonstrate this framework on the challenging task of designing novel Adeno-Associated Virus 2 (AAV2) capsids. By coupling a ProteinMPNN generative prior with a fine-tuned AAV viability oracle, our method successfully navigates the rugged fitness landscape where unguided random mutagenesis is ineffective---with prior experiments showing as few as 0.3% of variants with six or more mutations are viable. In its final iterations, ProVADA generated a pool of novel candidates with a mean viral selection score of 2.72, consistently scoring highly viable variants while maintaining a diverse range of sequence similarity to the wild-type sequence. Our results show that ProVADA provides a powerful and efficient framework for accelerating the design of proteins with complex, user-defined properties. | https://doi.org/10.7490/f1000research.1120297.1 | |
| Oral Presentation-No Poster | Zara | Ansari | Fairness and Bias in Biomedical AI/ML: Defining Goals and Putting Them Into Practice | Oral presentation | Ansari | Using Large Language Models to Audit Model Healthcare Biases | Zara N. Ansari, Aaron Fanous, Jesutofunmi A. Omiye, Ank Agarwal, Roxana Daneshjou | Department
of Biomedical Data Science Stanford University, Department of Biomedical Data Science Stanford University, [Department of Biomedical Data Science Stanford University, Department of Dermatology Stanford School of Medicine], Department of Biomedical Data Science Stanford University, [Department of Biomedical Data Science Stanford University, Department of Dermatology Stanford School of Medicine] |
Large language models (LLMs) can potentially mitigate pain points in healthcare tasks such as decision support, text summarization, and question-answering. However, LLMs exhibit bias related to race, gender identity, sexual orientation, and other demographics, posing a major concern. Although human review helps reduce bias, the sheer data volume renders thorough evaluation impractical and onerous at scale. This motivates the use of LLMs in auditing models for bias. This study uses the Stanford Healthcare red-teaming dataset, which contains prompts, outputs, and expert-level bias labels, to examine how model size and prompting techniques affect bias detection with GPT-3.5-turbo, GPT-4o, llama3.3, and o1-mini. Our results show that the best model for bias detection depends on the chosen metric. Smaller, cost-effective models like o1-mini outperformed GPT-4o in precision and F1 scores, with up to 53.11% higher precision and 10.32% higher F1. This suggests that smaller models may be preferable when precision or F1 is a priority. Additionally, self-critiquing capabilities in larger models do not significantly improve bias detection over smaller models (χ2, p = 0.597). Moreover, the use of prompting techniques, particularly Thread of Thought, significantly enhanced bias detection across all models, (χ2, p < 0.001). Our findings suggest that depending on the metric of concern for the auditor, smaller models can offer a cost effective alternative to larger models. | https://doi.org/10.7490/f1000research.1120367.1 | |
| Oral Presentation-No Poster | Helena | Coggan | Fairness and Bias in Biomedical AI/ML: Defining Goals and Putting Them Into Practice | Oral presentation | Coggan | Deciphering the influence of demographic factors on the treatment of pediatric patients in the emergency department | Helena
Coggan (1,2), Anne Bischops (1,2), Pradip Chaudhari (3), Yuval Barak-Corren
(4), Andrew M. Fine (2,5), Ben Y. Reis (1,2,6), Jaya Aysola (7) and William G. La Cava (1,2,*) |
1.
Computational Health Informatics Program, Boston Children's Hospital, Boston,
MA, USA 2. Harvard Medical School, Boston, MA, USA 3. Division of Emergency and Transport Medicine, Children's Hospital Los Angeles and Department of Pediatrics, Keck School of Medicine of the University of Southern California, LA, CA, USA 4. Department of Pediatric Cardiology, Schneider Children's Medical Center, Affiliated to Tel Aviv University Faculty of Medical and Health Sciences, Petach Tikvah, Israel 5. Division of Emergency Medicine, Boston Children's Hospital, Boston, MA, USA 6. Ivan and Francesca Berkowitz Living Laboratory, Harvard Medical School and Clalit Research Institute, Boston, MA, USA and Ramat-Gan, Isreal 7. Leonard Davis Institute of Health Economics, University of Pennsylvania; Department of Medicine, Perelman School of Medicine, University of Pennsylvania; and Penn Medicine Center for Health Equity Advancement, Philadelphia, Pennsylvania * corresponding author |
Persistent demographic disparities have been identified in the treatment of patients seeking care in the emergency department (ED). These may be driven in part by subconscious biases, which providers themselves may struggle to identify. To better understand the operation of these biases, we performed a retrospective cross-sectional analysis using electronic health records describing 339,400 visits to the ED of a single US pediatric medical center between 2019-2024. Odds ratios were calculated using propensity-score matching. Analyses were adjusted for confounding variables, including chief complaint, insurance type, socio-economic deprivation, and patient comorbidities. We also trained a machine learning [ML] model on this dataset to identify predictors of admission. We found significant demographic disparities in admission (Non-Hispanic Black [NHB] relative to Non-Hispanic White [NHW]: OR 0.77, 95% CI 0.73-0.81; Hispanic relative to NHW: OR 0.80, 95% CI 0.76-0.83). We also identified disparities in individual decisions taken during the ED stay. For example, NHB patients were significantly less likely than NHW patients to be assigned an ‘emergent’ triage acuity score of (OR 0.70, 95% CI 0.67-0.72), but emergent NHB patients were also significantly less likely to be admitted than NHW patients with the same triage acuity (OR 0.86, 95% CI 0.80-0.93). Demographic disparities were particularly acute wherever patients had normal vital signs, public insurance, moderate socio-economic deprivation, or a home address distant from the hospital. An ML model assigned higher importance to triage score for NHB than NHW patients when predicting admission, reflecting these disparities in assignment. We conclude that many visit characteristics, clinical and otherwise, may influence the operation of subconscious biases and affect ML-driven decision support tools. | https://doi.org/10.48550/arXiv.2510.0284 | |
| Oral Presentation-No Poster | Nicole | Foti | Fairness and Bias in Biomedical AI/ML: Defining Goals and Putting Them Into Practice | Oral presentation | Foti | Building Fair and Trustworthy Biomedical AI: A Tool for Identifying Key Decision Points | Nicole Foti, Janet K. Shim, Caitlin McMahon, Sandra Soo-Jin Lee | Stanford University, University of California San Francisco, Columbia University, Columbia University | Recent advancements in artificial intelligence (AI) have transformed biomedicine, offering tools for improved diagnostics, drug discovery, and patient care. Yet these innovations raise pressing ethical concerns, including bias, inequitable outcomes, and privacy risks, which highlight the need for deliberate attention to fairness, trust, and trustworthiness in AI development. In this paper, we argue that ethical responsibility should be embedded at both institutional and individual levels, and that multi-stakeholder engagement, especially with underrepresented groups, is essential to ensure AI tools meet diverse needs. Building on a framework originally developed for precision medicine research, we present an adapted decision-mapping tool—the Trustworthy AI Decision Map—that can anchor and structure dialogue about the ethical implications of specific AI tools. The map identifies key decision points across the AI life cycle that impact fairness and trustworthiness and facilitates dialogue among stakeholders. In making these decisions visible, the map seeks to enable teams to anticipate downstream consequences, integrate multiple perspectives, and support institutional accountability. We illustrate its potential through a case involving the deployment of AI in rural healthcare settings. Moving forward, we suggest that empirical testing with stakeholders is needed to validate and refine the map’s utility in biomedical AI contexts to promote fair and trustworthy AI practices. | ||
| Oral Presentation-No Poster | Chris | German | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Oral presentation | German | Integrating Polygenic Risk Improves Generative Forecasting of Disease Trajectories | Chris German, Suyash Shringarpure, Payam Dibaeinia, James Ashenhurst, Bertram L. Koelsch, Adam Auton, and Aly A. Khan | 23andMe Inc, University of Chicago | Predicting the longitudinal sequence of diseases an individual will develop over their lifetime is a central challenge in medicine. While recent AI models can process health histories, they have been limited by cohort size and the omission of genetic data. Here we introduce the Next Health Event (NHE) model, a generative transformer trained on the health trajectories of 7.1 million research participants. By using a transformer architecture to integrate demographic data, longitudinal BMI, and polygenic risk scores (PRS) for 297 traits with sequential health history, NHE significantly outperforms baseline models, including XGBoost with the same inputs, in predicting the next diagnosis across 129 conditions (Top-1 accuracy 25.5% vs. 22.3%). Systematic ablation studies reveal that both PRS and longitudinal BMI provide substantial, non-redundant predictive power, whereas self-reported lifestyle information offers limited additional value. The model’s predictive accuracy is the same when forecasting prospectively reported incident outcomes vs. combined prospectively and retrospectively reported outcomes (AUROC 0.917), demonstrating its utility for real-world risk assessment. By uniting large-scale health histories with genetics, our work establishes a new framework for predictive health and demonstrates that generative models can effectively forecast individual disease pathways. | ||
| Oral Presentation-No Poster | Jici | Jiang | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Oral presentation | Jiang | Literature-driven extraction and computational prediction of causal statements linking genetic variants to biological processes, pathways and phenotypes | Jici Jiang, Predrag Radivojac, Benjamin M. Gyori | Northeastern University | Understanding the mechanistic basis of pathogenic genetic variants requires reconstructing the molecular pathways connecting the variant, via a chain of molecular intermediates,to a disease-causing biological process and phenotype. However, a literature-wide assembly of causal networks connecting variants, molecular pathways, biological processes and phenotypes has not been previously available. To create such a resource, we developed an automated pathway reconstruction approach building on the Integrated Network and Dynamical Reasoning Assembler (INDRA) system which extracts causal mechanistic statements (positive regulation, phosphorylation, complex formation, etc.) by combining structured databases and literature mining. We traversed INDRA statements extracted from publications to identify those describing a genetic variant resulting in a protein point mutation. We then reconstructed directed paths (consisting of one or more linked INDRA statements) connecting this variant to a term representing a biological process, phenotype or disease within the same publication. This resulted in a directed multigraph obtained from 25,862 paths for variants in 2,561 proteins. Each node in this graph corresponds to an ontology-grounded molecular or process term and each edge is explicitly linked to supporting literature evidence, enabling full auditability of inferred mechanisms. To leverage the assembled networks, we trained a classification model to predict likely downstream biological processes or specific disease associations for protein variants. As features to the model, we integrated molecular annotations (including protein sequence features, ClinVar pathogenicity labels, and UniProt domain mappings) in combination with representations from the ESM2 transformer-based protein language model. The performance achieved by this model shows promise for reconstructing causal mechanistic statements associated with function of genetic variants, a framing of the variant effect prediction task that goes significantly beyond simple assessment of pathogenicity. This integrative framework enables the mechanistic interpretation of known variants and prediction of functional relevance for variants lacking prior phenotypic annotation. | https://doi.org/10.7490/f1000research.1120375.1 | |
| Oral Presentation-No Poster | Jubair Ibn Malik | Rifat | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Oral presentation | Rifat | BioLM-NET: an interpretable deep learning model combining prior biological knowledge and contextual LLM gene embeddings on multi-omics data to predict disease | Jubair Ibn Malik Rifat, Thasina Tabashum, Md Marufi Rahman, Md Farhad Mokter, Sarthak Engala, Serdar Bozdag | Department
of Computer Science & Engineering, University of North Texas, Department of Computer Science & Engineering, University of North Texas, Department of Computer Science & Engineering, University of North Texas, Department of Computer Science & Engineering, University of North Texas, Department of Computer Science & Engineering, University of North Texas, Department of Computer Science & Engineering, University of North Texas, Department of Mathematics, University of North Texas, Center for Computational Life Sciences, University of North Texas |
Biologically informed deep neural networks, which connect input layer to hidden layers based on gene-pathway relationship have gained popularity in recent years. However, most existing methods do not incorporate protein-protein interactions (PPI) and protein-DNA interactions (PDI) in their designs. In this study, we introduce BioLM-NET, a deep learning-based framework that fuses single cell or bulk gene expression data and DNA methylation data with prior biological knowledge including Protein-Protein Interactions (PPI), Protein-DNA Interactions (PDI). BioLM-NET also aggregates latent representation of omics signals at pathway-level through an attention-based pathway layer where a pre-trained large language model (LLM) was incorporated to generate context-specific gene embeddings. We evaluated BioLM-NET on single cell colorectal cancer data from scTrioseq2 platform to predict primary and metastatic cancer cells, on TCGA-BRCA, TCGA-GBM, TCGA-COAD to predict cancer subtypes and ROSMAP data to predict Alzheimer’s disease patient. Our results showed that BioLM-NET outperformed baseline and state-of-the-art (SOTA) methods, P-NET and PASNet with statistical significance on scTrioseq2 data, TCGA-COAD and ROSMAP data and ties with SVM and Dense neural network on TCGA-BRCA data. Our ablation studies demonstrated the importance of incorporating PPI, PDI data and attention-based pathway layer. We also interpret our models and found out that our important input features are significantly enriched in GO terms and KEGG pathways and can serve as potential biomarkers or therapeutic targets for the corresponding disease. | https://doi.org/10.7490/f1000research.1120294.1 | |
| Oral Presentation-No Poster | Sahil | Sethi | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Oral presentation | Sethi | Prototype
Learning to Create Refined Interpretable Digital Phenotypes from ECGs |
Sahil Sethi, David Chen, Michael C. Burkhart, Nipun Bhandari, Bashar Ramadan,Brett Beaulieu-Jones | University of Chicago, University of Chicago, University of Chicago, University of California Davis, University of Chicago, University of Chicago | Prototype-based
neural networks offer interpretable predictions by comparing inputs to learned, representative signal patterns anchored in training data. While such models have shown promise in the classification of physiological data, it remains unclear whether their prototypes capture an underlying structure that aligns with broader clinical phenotypes. We use a prototype-based deep learning model trained for multi-label ECG classification using the PTB-XL dataset. Then without modification we performed inference on the MIMIC-IV clinical database. We assess whether individual prototypes, trained solely for classification, are associated with hospital discharge diagnoses in the form of phecodes in this external population. Individual prototypes demonstrate significantly stronger and more specific asso- ciations with clinical outcomes compared to the classifier’s class predictions, NLP-extracted concepts, or broader prototype classes across all phecode categories. Prototype classes with mixed significance patterns exhibit significantly greater intra-class distances (p < 0.0001), indicating the model learned to differentiate clinically meaningful variations within diag- nostic categories. The prototypes achieve strong predictive performance across diverse con- ditions, with AUCs ranging as high as 0.89 for atrial fibrillation to 0.91 for heart failure, while also showing substantial signal for non-cardiac conditions such as sepsis and renal dis- ease. These findings suggest that prototype-based models can support interpretable digital phenotyping from physiologic time-series data, providing transferable intermediate pheno- types that capture clinically meaningful physiologic signatures beyond their original training objectives. |
https://doi.org/10.48550/arXiv.2508.01521 | |
| Oral Presentation-No Poster | Aditya | Sriram | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Oral presentation | Sriram | DeepDiff-SHAP: Interpretable deep learning for subgroup- specific causal hypothesis generation using conditional SHAP | Aditya Sriram, Soyeon Kim, Joseph A Carcillo, Hyun Jung Park | University of Pittsburgh Department of Human Genetics, University of Pittsburgh Department of Pediatrics, University of Pittsburgh Department of Pediatrics, University of Pittsburgh Department of Human Genetics | Precision
medicine aims to tailor healthcare strategies to individual differences in
genetic, clinical, and environmental factors. However, identifying
subgroup-specific causal relationships in complex biomedical data remains a
major challenge, especially when standard causal inference methods average
over population heterogeneity. We introduce DeepDiff-SHAP, a novel framework
that combines regression-based and deep learning-based differential causal
inference to detect changes in causal relationships across patient subgroups.
DeepDiff-SHAP integrates conditional SHapley Additive exPlanations (SHAP) to
estimate conditional dependencies and perform nonlinear differential causal
inference in a principled, interpretable manner. Applying DeepDiff-SHAP to two
population-scale datasets, the CDC Diabetes Health Indicators Dataset and a
UK Biobank sepsis cohort stratified by hypertension status, we identified
clinically meaningful and subgroup-specific causal changes in relationships
between features across the datasets including age, general health, alkaline
phosphatase, and cholesterol. Our results reinforce the idea that deep
learning enhances sensitivity to complex interaction patterns overlooked by
linear models, providing new biological insights into disease progression and
comorbidity-specific risk mechanisms. DeepDiff-SHAP offers a scalable and
interpretable solution to uncover individualized causal pathways, advancing
the goal of truly personalized medicine. |
https://doi.org/10.7490/f1000research.1120321.1 | |
| Oral Presentation-No Poster | Rasika | Venkatesh | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Oral presentation | Venkatesh | Integrating Imaging-Derived Clinical Endotypes with Plasma Proteomics and External Polygenic Risk Scores Enhances Coronary Microvascular Disease Risk Prediction | Rasika Venkatesh, Tess Cherlin, Penn Medicine BioBank, Marylyn D. Ritchie, Marie A. Guerraty, Shefali S. Verma | University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania | Coronary microvascular disease (CMVD) is an underdiagnosed but significant contributor to the burden of ischemic heart disease, characterized by angina and myocardial infarction. The development of risk prediction models such as polygenic risk scores (PRS) for CMVD has been limited by a lack of large-scale genome-wide association studies (GWAS). However, there is significant overlap between CMVD and enrollment criteria for coronary artery disease (CAD) GWAS. In this study, we developed CMVD PRS models by selecting variants identified in a CMVD GWAS and applying weights from an external CAD GWAS, using CMVD-associated loci as proxies for the genetic risk. We integrated plasma proteomics, clinical measures from perfusion PET imaging, and PRS to evaluate their contributions to CMVD risk prediction in comprehensive machine and deep learning models. We then developed a novel unsupervised endotyping framework for CMVD from perfusion PET-derived myocardial blood flow data, revealing distinct patient subgroups beyond traditional case-control definitions. This imaging-based stratification substantially improved classification performance alongside plasma proteomics and PRS, achieving AUROCs between 0.65 and 0.73 per class, significantly outperforming binary classifiers and existing clinical models, highlighting the potential of this stratification approach to enable more precise and personalized diagnosis by capturing the underlying heterogeneity of CMVD. This work represents the first application of imaging-based endotyping and the integration of genetic and proteomic data for CMVD risk prediction, establishing a framework for multimodal modeling in complex diseases. | https://doi.org/10.1101/2025.08.18.25333844 | |
| Oral Presentation-No Poster | Haohan | Wang | Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms | Oral presentation | Chen | Discovery of Disease Relationships via Transcriptomic Signature Analysis Powered by Agentic AI | Ke Chen, Haohan Wang | University of Illinois Urbana-Champaign, University of Illinois Urbana-Champaign | Modern disease classification often overlooks molecular commonalities hidden beneath divergent clinical presentations. This study introduces a transcriptomics-driven framework for discovering disease relationships by analyzing over 1,300 disease–condition pairs using GenoMAS, a fully automated agentic AI system. Beyond identifying robust gene-level overlaps, we develop a novel pathway-based similarity framework that integrates multi-database enrichment analysis to quantify functional convergence across diseases. The resulting disease similarity network reveals both known comorbidities and previously undocumented cross-category links. By examining shared biological pathways, we explore potential molecular mechanisms underlying these connections—offering functional hypotheses that go beyond symptom-based taxonomies. We further show how background conditions such as obesity and hypertension modulate transcriptomic similarity, and identify therapeutic repurposing opportunities for rare diseases like autism spectrum disorder based on their molecular proximity to better-characterized conditions. In addition, this work demonstrates how biologically grounded agentic AI can scale transcriptomic analysis while enabling mechanistic interpretation across complex disease landscapes. All results are publicly accessible at github.com/KeeeeChen/Pathway_Similarity_Network. | https://doi.org/10.7490/f1000research.1120292.1 | |
| Oral Presentation-No Poster | Rachael | Blair | Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms | Oral presentation | Krishnan | Network optimal retrieval of sparse perturbations for steady-state control | Krithika
Krishnan, Tiange Shi, Satyam Kumar, Han Yu, Rachael Hageman Blair |
Institute
for Artificial Intelligence and Data Science, University at Buffalo, Buffalo
NY, Department of Biostatistics, University at Buffalo, University at Buffalo, Buffalo NY, Institute for Artificial Intelligence and Data Science, University at Buffalo, Buffalo NY, Roswell Park Comprehensive Cancer Center, Buffalo NY, Institute for Artificial Intelligence and Data Science, University at Buffalo, Buffalo NY, + Department of Biostatistics, University at Buffalo, Buffalo NY, |
Prioritizing targeted perturbation experiments remains a central challenge in systems biology, where experimental constraints limit network manipulation. We introduce NORSP (Network Optimal Retrieval of Sparse Perturbations). This novel computational framework integrates network propagation with supervised subset selection to identify minimal perturbation sets that can shift a system from its initial to a desired steady state. NORSP leverages a sensitivity matrix derived solely from network topology, enabling control prediction without requiring full knowledge of system dynamics. Applicable to undirected, directed, and signed networks, NORSP accommodates a broad range of biological models and experimental scenarios. We validate its effectiveness using YBX1 knockdown transcriptomics data and $61$ curated metabolic networks from the BioModels repository, demonstrating NORSP’s robustness, scalability, and experimental relevance. Even under constraints that obscure true perturbations, the algorithm reliably infers alternative targets that achieve comparable control. Control is confirmed both in graphical approximations and through full dynamical model simulations. Overall, NORSP offers a practical and generalizable solution for steady-state control in complex biological systems, providing a foundation for multi-omics hypothesis generation and systems-level experimental design. | ||
| Oral Presentation-No Poster | Alena | Orlenko | Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms | Oral presentation | Orlenko | A random-walk-based learning framework to uncover novel gene candidates for Alzheimer’s disease therapy | Alena Orlenko, Binglan Li, Neda Khanjani, Mythreye Venkatesan, Li Shen, Marylyn D. Ritchie, Zhiping Paul Wang, Tayo Obafemi-Ajayi, Jason H. Moore | Cedars-Sinai Medical Center, Cedars-Sinai Medical Center, Cedars-Sinai Medical Center, Cedars-Sinai Medical Center, University of Pennsylvania, University of Pennsylvania, Cedars-Sinai Medical Center, Missouri State University, Cedars-Sinai Medical Center | Identifying repurposable therapeutic targets for Alzheimer's disease (AD) remains challenging due to various clinical and biological factors. This study aimed to identify candidate genes for AD therapy. We hypothesize that gene and disease-specific network properties – learnable from these large-scale biomedical knowledge graphs – can inform implicit gene-AD connections and prioritize repurposable AD drug targets. To evaluate the hypothesis, we focused on druggable genes curated from Drug-Gene Interaction Database and Alzheimer’s Knowledge Base (AlzKB). We applied scalable random walk methods to Hetionet to learn unbiased gene and disease embeddings, representative of their topological and semantic network properties. The embeddings were then used to compute gene-AD similarity and derive network-based scores for each gene. To validate the scores, using Alzheimer’s Disease Sequencing Project (ADSP) data, we constructed AD classifier models with Tree-based pipeline optimizer 2 (TPOT2), an automated machine learning framework. % with multi-objective optimization. Models were optimized for performance, model complexity, and high aggregate network-based scores. Network-based scores successfully prioritized diverse feature sets – many not previously associated with AD – that are enriched in biologically meaningful body parts such as brain, and pathways including neuronal signaling, potassium channels, and creatine metabolism. The results suggested that knowledge graphs and network-informed embeddings can capture both known and novel insights into AD mechanisms. Additionally, integrating network-based scores with feature-set-guided TPOT2 offers a scalable and biologically interpretable framework for AD drug repurposing and discovery | ||
| Oral Presentation-No Poster | Nure | Tasnina | Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms | Oral presentation | Tasnina | Provenance Tracing in Network Diffusion Algorithms | Nure Tasnina, Mark Crovella, Simon Kasif, T. M. Murali | Virginia Tech, Boston University, Boston University, Virginia Tech | We
propose a novel strategy for provenance tracing in random walk-based network
diffusion algorithms, a problem that has been surprisingly overlooked in spite of the widespread use of diffusion algorithms in biological applications. Our path-based approach enables ranking paths by the magnitude of their contribution to each node’s score, offering insight into how information propagates through a network. Building on this capability, we introduce two quantitative measures: (i) path-based effective diffusion, which evaluates how well a diffusion algorithm leverages the full topology of a network, and (ii) diffusion betweenness, which quantifies a node’s importance in propagating scores. We applied our framework to SARS- CoV-2 protein interactors and human PPI networks. Provenance tracing of the Regularized Laplacian and Random Walk with Restart algorithms revealed that a substantial amount of a node’s score is contributed via multi-edge paths, demonstrating that diffusion algorithms exploit the non-local structure of the network. Analysis of diffusion betweenness identified proteins playing a critical role in score propagation; proteins with high diffusion betweenness are enriched with essential human genes and interactors of other viruses, supporting the biological interpretability of the metric. Finally, in a signaling network composed of causal interactions between human proteins, the top contributing paths showed strong overlap with COVID-19-related pathways. These results suggest that our path-based framework offers valuable insight into diffusion algorithms and can serve as a powerful tool for interpreting diffusion scores in a biologically meaningful context, complementing existing module- or node-centric approaches in systems biology |
https://doi.org/10.7490/f1000research.1120298.1 | |
| Oral Presentation-No Poster | Di | Zhou | Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms | Oral presentation | Zhou | REPEL
- Random Embedding Perturbation for Enhanced Learning of Protein Function |
Di Zhou, Lenore Cowen, Kaiyi Wu, Xiaozhe Hu, Donna Slonim | Tufts University, Tufts University, Tufts University, Tufts University, Tufts University | Protein function prediction from multiplex protein-protein association networks is a crucial approach to extending functional annotation. Current methods use embeddings of the heterogeneous network data that aim to place related proteins near each other in embedding space. However, such embeddings suffer from spurious protein proximity as well, reducing function prediction accuracy. Because heterogeneous input networks often have very different structures, it is hard to confidently declare proteins to be dissimilar using the network structure or the resulting embeddings. Here we address this problem with REPEL, a function prediction tool using a random graph augmentation method that applies a uniform weak force to push nodes apart. We assess this method on simulated networks with planted overlapping communities, as well as on real multiplex yeast and E. coli protein association networks. Surprisingly, we find that this method consistently improves protein function prediction over competing methods Mashup, deepNF, and BIONIC. The random repelling nature of the augmented graphs has a denoising effect on the learning process, distancing node pairs with spurious proximity while preserving true functional connections, thus increasing robustness. This graph augmentation principle may generalize to denoising and improving robustness in other graph-based learning algorithms. | https://doi.org/10.7490/f1000research.1120378.1 | |
| Poster Board # | First Name of Presenting Author | Last Name of Presenting Author | Please indicate the session that you are submitting your paper to: | Paper acceptance status: | Last name of first author. | Paper Title | List all authors (first name first with names separated by commas) in the order they appear on the paper. | Author affiliations (in order of the list of authors). Please separate affiliations with commas. | Submit your abstract (300 words or less) for inclusion in the abstract book. If your paper was accepted for oral presentation and you would also like space for a poster in the general poster session, please submit a separate abstract via the <a href="http://psb.stanford.edu/abstract.html">abstract submission form</a>. | Provide your poster's DOI. | |
| 1 | Erick | Scott | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Poster presentation | Ahsen | Leveraging Generative AI for Interpretable Clinical Decision Making Through Causal Graphs | Mehmet Eren Ahsen PhD, Rand Kittani, Travis Gerke ScD, Laya Krishnan, Sean Rogan, Erick R. Scott MD MHS | Gies College of Business University of Illinois Urbana-Champaign, Carle Illinois College of Medicine University of Illinois Urbana-Champaign, cStructure, Carle Illinois College of Medicine University of Illinois Urbana-Champaign, cStructure, cStructure | Clinical AI systems' lack of interpretability limits their adoption in evidence-based medicine. To address this challenge, we propose a computational framework that harnesses generative AI's medical knowledge to create interpretable structural causal models (SCMs) for clinical decision support, quality improvement evaluation, and population health management. We evaluated our approach through a case study using data from the Midwest Healthcare Conference Causal Diagram Challenge, where we compared transformer-based large language models against human performance on a complex causal reasoning task: estimating COVID-19 treatment effects through target trial emulation. Both groups designed SCMs to evaluate glucocorticoid treatment effects on 28-day mortality using real-world data from more than 2,000 hospitalized patients, benchmarked against published RECOVERY randomized controlled trial results. The best performing SCMs achieved bootstrap coverage rates exceeding 90% for two of three severity strata. Both human and AI models demonstrated equivalent clinical plausibility (n=3 expert reviewers) and similar statistical performance, though both struggled with critical disease severity. Ablation experiments comparing SCM-based approaches against traditional potential outcomes methods revealed SCMs achieved 76-98% coverage versus 1-37% for traditional methods. These results suggest that structural causal models can effectively bridge the interpretability gap in clinical AI by providing essential scaffolding for reliable causal inference and enabling meaningful human-AI collaboration while preserving methodological rigor essential for evidence-based medicine. | https://doi.org/10.7490/f1000research.1120384.1 | |
| 2 | Oishi | Banerjee | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Poster presentation | Banerjee | The Intention-Execution Disconnect in Medical AI: The ReXecution Framework for Evaluating Real-World Clinical Performance | Oishi Banerjee, Lucas Bijnens, Subathra Adithan and Pranav Rajpurkar | Harvard Medical School, KU Leuven and Harvard Medical School, Jawaharlal Institute of Postgraduate Medical Education and Research, Harvard Medical School | We present the ReXecution framework for conducting clinician-centered assessments of medical AI assistants, providing detailed insights into their reliability in realistic clinical settings. Using this framework, we assessed AI assistants for chest X-ray (CXR) interpretation, exploring the gap between current model capabilities and real-world radiological needs. Unlike prior benchmarks that rely on automatically generated questions with limited clinical relevance, our dataset consists of 100 expert-curated tasks that radiologists might realistically present to an AI assistant in their day-to-day workflow. Through detailed manual review by a radiologist, we evaluated two leading foundation models, ChatGPT-o3 and MedGemma, on our tasks. While both models demonstrated considerable medical knowledge and reasoning capabilities on our tasks, they frequently struggled to interpret images and execute tasks accurately, producing correct outputs in only 5-10\% of cases. Our detailed manual evaluation highlights a critical mismatch: models often abstractly understand radiology concepts but cannot reliably execute their plans when interpreting specific medical images. This work identifies key gaps in current models' ability to serve as comprehensive radiology assistants and provides insights into how the development and evaluation of models can better align with real-world clinician needs, enabling seamless clinician-AI collaboration. | https://doi.org/10.7490/f1000research.1120371.1 | |
| 3 | Eric | Chen | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Poster presentation | Chen, Eric | MedAgentBench v2: Improving Medical LLM Agent Design | Eric Chen, Sam Postelnik, Kameron Black, Yixing Jiang, Jonathan Chen | Massachusetts Institute of Technology, Binghamton University, Stanford University, Stanford University, Stanford University | MedAgentBench is the first benchmark for evaluating LLM agents on clinical tasks in a FHIR-compliant EHR. In this paper, we present significant prompt engineering and tool design improvements over the original agent implementation and introduce a memory component that enables the agent to learn from prior failures. We added new tools for the agent to properly format its output for tasks, interact with an EHR without constructing explicit HTTP requests, which were prone to syntax errors, and make math calculations. We also wrote a new system prompt that asked the agent to outline its plan before making any tool calls and think step by step using chain of thought reasoning, and provided few shot examples of good vs. bad outputs. Using GPT-4.1 as the base model, our agent achieved a success rate of 91.0% without memory and 98.0% with memory. A surprising consequence is that the agent performed better on a different task that had no associated memory entry, possibly demonstrating that LLMs can adapt to the style of tasks presented by users. To contribute to the benchmark and evaluate the generalization of our agent, we developed 300 new multi-step clinically-driven tasks in collaboration with a physician. Lastly, we show the current limitations of these benchmarks and highlight the necessary next steps and challenges for the responsible deployment of AI agents in real-world healthcare settings. We hope that this paper leads to further development of EHR agents and benchmarks. | https://doi.org/10.7490/f1000research.1120393.1 | |
| 4 | Feng | Chen | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Poster presentation | Chen, F | Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models | Feng Chen, Dror Ben-Zeev, Gillian Sparks, Arya Kadakia, Trevor Cohen | University of Washington | Post-Traumatic Stress Disorder (PTSD) remains under-detected in clinical settings, presenting opportunities for automated detection to identify at-risk patients. This study evaluates natural language processing approaches for binary PTSD classification from clinical interview transcripts using the DAIC-WOZ dataset, which contains semi-structured interviews with standardized psychological assessments. We compared embedding-based methods (SentenceBERT/LLaMA with logistic regression), general and mental health-specific transformer models (BERT/RoBERTa), and large language model prompting strategies (zero-shot/few-shot/chain-of-thought). SentenceBERT embeddings with logistic regression achieved the highest overall performance (AUPRC=0.758±0.128), outperforming domain-specific end-to-end fine-tuning models like Mental-RoBERTa (AUPRC=0.675±0.084 vs. RoBERTa-base 0.599±0.145). Few-shot prompting using DSM-5 criteria and two examples yielded competitive results (AUPRC=0.737). Performance varied significantly across symptom severity and comorbidity status with depression, with higher accuracy for severe PTSD cases and patients with comorbid depression. Our findings highlight the potential of embedding-based methods and LLMs for scalable screening while underscoring the need for improved detection of nuanced presentations. | https://doi.org/10.7490/f1000research.1120285.1 | |
| 5 | Fateme | Haredasht | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Poster presentation | Grolleau | MedFactEval and MedAgentBrief: A Framework and Workflow for Generating and Evaluating Factual Clinical Summaries | François Grolleau, Emily Alsentzer, Timothy Keyes, Philip Chung, Akshay Swaminathan, Asad Aali, Jason Hom, Tridu Huynh, Thomas Lew, April Liang, Weihan Chu, Natasha Steele, Christina Lin, Jingkun Yang, Kameron Black, Stephen Ma, Fateme N. Haredasht, Nigam H. Shah, Kevin Schulman, Jonathan H. Chen | Center
for Biomedical Informatics Research, Stanford University, Stanford, CA,
USA, Department of Biomedical Data Science, Stanford University, Stanford, CA, USA, Stanford Health Care, Palo Alto, CA, USA, Department of Anesthesiology and Pain Medicine, Stanford Medicine, Stanford, CA, USA, Department of Radiology, Stanford University, Stanford, CA, USA, Stanford Clinical Excellence Research Center, Stanford University, Stanford, CA, USA, Department of Medicine, Stanford University, Stanford, CA, USA |
Evaluating factual accuracy in Large Language Model (LLM)-generated clinical text is a critical barrier to adoption, as expert review is unscalable for the continuous quality assurance these systems require. We address this challenge with two complementary contributions. First, we introduce MedFactEval, a framework for scalable, fact-grounded evaluation where clinicians define high-salience key facts and an "LLM Jury''—a multi-LLM majority vote—assesses their inclusion in generated summaries. Second, we present MedAgentBrief, a model-agnostic, multi-step workflow designed to generate high-quality, factual discharge summaries. To validate our evaluation framework, we established a gold-standard reference using a seven-physician majority vote on clinician-defined key facts from inpatient cases. The MedFactEval LLM Jury achieved almost perfect agreement with this panel (Cohen's kappa=81%), a performance statistically non-inferior to that of a single human expert (kappa=67%, P < 0.001). Our work provides both a robust evaluation framework (MedFactEval) and a high-performing generation workflow (MedAgentBrief), offering a comprehensive approach to advance the responsible deployment of generative AI in clinical workflows. | https://f1000research.com/posters/14-1391 | |
| 6 | Sy | Hwang | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Poster presentation | Hwang | Leveraging Large Language Models to Derive Multiple Sclerosis Progression Assessments from Clinical Notes: A Feasibility Study | Sy Hwang, Sunil Thomas, Heather Williams, Tom Hutchinson, Emily Schriver, Ashley Batugo, Amit Bar-Or, Vishakha Sharma, Frederik Buijs, Christopher Perrone, Danielle Mowery | University of Pennsylvania, F. Hoffman-La Roche, Roche Diagnostics | Ascertainment of multiple sclerosis (MS) progression is important for informing clinical care decisions and supporting biomedical research, yet Expanded Disability Status Scale (EDSS) and Functional System (FS) scores are inconsistently structured and often embedded in free-text notes. We present a single-site feasibility evaluation of a transparent, instruction-guided large language model (LLM) pipeline that infers EDSS and FS directly from routine neurology documentation. The system applies task-specific prompts to extract FS subscores and an EDSS consistent with exam and ambulation descriptions, followed by light post-processing checks for internal consistency. Targeted error analyses highlight common failure modes, including underspecified ambulation, historical versus current exam leakage, and ambiguous severity descriptors, and show that disagreements are predominantly adjacent and preserve the same clinical category, suggesting limited practical impact. Results support the feasibility of using instruction-guided LLMs to recover clinically interpretable EDSS and FS signals from narrative notes and offer a pragmatic reference point for scalable, low burden MS disability phenotyping in real-world settings. | https://doi.org/10.7490/f1000research.1120423.1 | |
| 7 | Luyang | Luo | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Poster presentation | Luo | ED-Explain: Personalized Video Instructions for Patients Discharged from the Emergency Department | Luyang Luo, Emma Chen, Xiaoman Zhang, Julian Nicolas Acosta, Boyang Tom Jin, Fatma Gunturkun, Christian Rose, Carl Preiksaitis, Brian Suffoletto, Pranav Rajpurkar, David A Kim | Harvard Medical School, Harvard Medical School, Harvard Medical School, Harvard Medical School, Stanford University, Stanford University School of Medicine, Stanford University School of Medicine, Stanford University School of Medicine, Stanford University School of Medicine, Harvard Medical School, Stanford University School of Medicine | This paper presents ED-Explain, an integrated, AI-driven system that transforms emergency department (ED) discharge instructions and electronic health records into personalized video presentations featuring a virtual healthcare provider. By leveraging multimodal ED data, large language models, and video generation, we aimed to produce accessible discharge instructions tailored to patients' ED visits. Four board-certified Emergency Medicine physicians reviewed 39 pairs of original and ED-Explain-produced discharge instructions. AI video summaries received significantly higher (p$<$0.001) average ratings (1-5) of completeness (4.1 vs. 3.1), correctness (3.9 vs. 3.5) and patient accessibility (3.4 vs. 2.9). Physicians expressed reservations about 13.3\% of ED-Explain's discharge instructions for patient viewing, and only 4.6\% of ED-Explain's instructions were found inappropriate for use with patients. Physician feedback suggests that AI-enhanced video discharge summaries have potential to improve communication of discharge information to ED patients, though patient-centered evaluation is needed. This work contributes to the growing field of AI-assisted healthcare communication and offers insights into the potential for AI to improve physician-patient communication and patient self-efficacy. | https://doi.org/10.7490/f1000research.1120318.1 | |
| 8 | Liam | McCoy | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Poster presentation | McCoy | Asking the Right Questions: Benchmarking Large Language Models in the Development of Clinical Consultation Templates | Liam G. McCoy, Fateme Nateghi Haredasht, Kanav Chopra, David Wu, David JH Wu, Abass Conteh, Sarita Khemani, Saloni Kumar Maharaj, Vishnu Ravi, Arth Pahwa, Yingjie Weng, Leah Rosengaus, Lena Giang, Kelvin Zhenghao Li, Olivia Jee, Daniel Shirvani, Ethan Goh, Jonathan H. Chen | Division of Neurology, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada; Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA, Stanford Center for Biomedical Informatics Research, Stanford, CA, USA, Stanford Center for Biomedical Informatics Research, Stanford, CA, USA, Department of Dermatology, Mass General Brigham, Harvard Medical School, Boston, MA, USA, Department of Radiation Oncology, Stanford Cancer Center, Palo Alto, CA, USA, Department of Radiation Oncology, Stanford Cancer Center, Palo Alto, CA, USA, Division of Hospital Medicine, Stanford University School of Medicine, Stanford, CA, USA, Division of Hospital Medicine, Stanford University School of Medicine, Stanford, CA, USA, Stanford Mussallem Center for Biodesign, Stanford University; Stanford University School of Medicine, Stanford, CA, USA, Division of Neurology, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada, Quantitative Sciences Unit, Stanford University School of Medicine, Stanford, CA, USA, Stanford Health Care, Stanford Medicine, Palo Alto, CA, USA, Stanford Health Care, Stanford Medicine, Palo Alto, CA, USA, Department of Ophthalmology, Stanford Byers Eye Institute; Department of Ophthalmology, Tan Tock Seng Hospital; Centre of AI in Medicine, Lee Kong Chian School of Medicine, Nanyang Technological University, Palo Alto, CA, USA; Singapore, Singapore, Division of Primary Care and Population Health, Stanford University School of Medicine, Palo Alto, CA, USA, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada, Stanford Center for Biomedical Informatics Research; Clinical Excellence Research Center, Stanford School of Medicine, Stanford, CA, USA, Stanford Center for Biomedical Informatics Research; Division of Hospital Medicine, Stanford School of Medicine; Clinical Excellence Research Center, Stanford School of Medicine; Department of Medicine, Stanford University, Stanford, CA, USA | This study evaluates the capacity of large language models (LLMs) to generate structured clinical consultation templates for electronic consultation. Using 145 expert-crafted templates developed and routinely used by Stanford’s eConsult team, we assess frontier models—including o3, GPT-4o, Kimi K2, Claude 4 Sonnet, Llama 3 70B, and Gemini 2.5 Pro—for their ability to produce clinically coherent, concise, and prioritized clinical question schemas. Through a multi-agent pipeline combining prompt optimization, semantic autograding, and prioritization analysis, we show that while models like o3 achieve high comprehensiveness (up to 92.2%), they consistently generate excessively long templates and fail to correctly prioritize the most clinically important questions under length constraints. Performance varies across specialties, with significant degradation in narrative-driven fields such as psychiatry and pain medicine. Our findings demonstrate that LLMs can enhance structured clinical information exchange between physicians, while highlighting the need for more robust evaluation methods that capture a model’s ability to prioritize clinically salient information within the time constraints of real-world physician communication. Limitations include reliance on Stanford-specific templates and concordance-based grading, which may not capture all clinically reasonable outputs. | https://f1000research.com/posters/14-1249 | |
| 9 | Xiaoman | Zhang | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Poster presentation | Pal | ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding | Ankit Pal, Jung-Oh Lee, Xiaoman Zhang, Malaikannan Sankarasubbu, Seunghyeon Roh, Won Jung Kim, Meesun Lee, Pranav Rajpurkar | Saama
AI Research, Saama Technologies, India, Seoul National University, Seoul, South Korea, Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA, Saama AI Research, Saama Technologies, India, Seoul National University, Seoul, South Korea, Seoul National University, Seoul, South Korea, Seoul National University, Seoul, South Korea, Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA |
We
present ReXVQA, the largest and most comprehensive benchmark for visual
question answering (VQA) in chest radiology, comprising 694,841 questions
paired with 160,000 chest X-rays studies across training, validation, and
test sets. Unlike prior efforts that rely heavily on template based queries,
ReXVQA introduces a diverse and clinically authentic task suite reflecting
five core radiological reasoning skills: presence assessment, location
analysis, negation detection, differential diagnosis, and geometric
reasoning. We evaluate eight state-of-the-art multimodal large language
models, including MedGemma-4Bit, Qwen2.5-VL, Janus-Pro-7B, and Eagle2-9B. The
best-performing model (MedGemma) achieves 83.24% overall accuracy. To bridge
the gap between AI performance and clinical expertise, we conducted a
comprehensive human reader study involving 3 senior radiology residents on
200 randomly sampled cases. Our evaluation demonstrates that MedGemma
achieved superior performance (83.84% accuracy) compared to human readers
(best radiology resident: 77.27%), representing a significant milestone where
AI performance exceeds human evaluation on chest X-ray interpretation. The reader study reveals distinct performance patterns between AI models and radiology residents, with strong inter-reader agreement among the human readers while showing more variable agreement patterns between human readers and AI models. ReXVQA establishes a new standard for evaluating generalist radiological AI systems, offering public leaderboards, fine-grained evaluation splits, structured explanations, and category-level breakdowns. This benchmark lays the foundation for next-generation AI systems capable of mimicking expert-level clinical reasoning beyond narrow pathology classification. |
https://doi.org/10.7490/f1000research.1120303.1 | |
| 10 | Sydney | Pugh | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Poster presentation | Pugh | WATCH-SS: Developing a Trustworthy and Explainable Modular Framework for Detecting Cognitive Impairment from Spontaneous Speech | Sydney Pugh, Matthew Hill, Sy Hwang, Rachel Wu, Kuk Jang, Stacy Iannone, Karen O'Connor, Kyra O'Brien, Eric Eaton, Kevin Johnson | University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, Hongik University, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania | Early detection of cognitive impairment (CI) is critical for timely intervention in Alzheimer's disease and AD-related dementias. To address this, we propose the Warning Assessment and Alerting Tool for Cognitive Health from Spontaneous Speech (WATCH-SS), a modular and explainable three-stage framework for detecting CI from a patient's speech sample. The framework uses detectors for five linguistic and acoustic indicators of CI, aggregates their outputs into a set of clinically interpretable summary features, and uses a predictive model for CI classification. We consider multiple approaches to implementing these detectors that range from simple, computationally efficient methods suitable for real-time analysis to strong, resource-intensive methods, better for high accuracy offline analysis. On the DementiaBank ADReSS dataset, WATCH-SS achieved strong predictive performance (AUC = 80% on the test set). This work demonstrates that a modular, feature-based approach can achieve strong performance while providing a transparent diagnostic profile, representing a significant step towards a trustworthy and clinically-usable screening tool for primary care. | https://doi.org/10.7490/f1000research.1120305.1 | |
| 11 | Lio | Schmitz | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Poster presentation | Schmitz | Towards Automated Analysis of Gaze Behavior from Consumer VR Devices for Neurological Diagnosis | Lio Schmitz, Markus Plack, Berkan Koyak, Muhammad Ehsan Ullah, Ahmad Aziz, Reinhard Klein, Zorah Lähner, Hannah Dröge | Visual
Computing Group, University of Bonn, Visual Computing Group, University of Bonn, Department of Neurology, University Hospital Bonn, Department of Neurology, University Hospital Bonn, Department of Neurology, University Hospital Bonn, Visual Computing Group, University of Bonn, Visual Computing Group, University of Bonn, Visual Computing Group, University of Bonn |
Recent
studies have demonstrated that eye tracking is a valuable tool in the
detection, classification and staging of neurodegenerative diseases such as
Parkinson’s Disease (PD). However, traditional methods for capturing gaze
data often rely on expensive and non-engaging clinical equipment such as
video-oculography, limiting their accessibility and scalability. In this
work, we investigate the feasibility of using eye tracking data collected via
consumer-grade virtual reality (VR) headsets to support neurological
diagnostics in a more accessible and user-friendly manner. This approach enables large-scale, low-cost, and remote assessments, which are particularly valuable in early detection and monitoring of neurodegenerative conditions. We show that relevant oculomotor features extracted from VR-based eye tracking can be used for predictive assessment. Despite the inherent noise and lower precision of consumer devices, careful preprocessing and robust feature engineering, including deep learning embeddings, mitigate these limitations. Our results demonstrate that both handcrafted and learned features from gaze behavior enable promising levels of classification performance. This research represents an important step towards scalable, automated, and accessible diagnostic tools for neurodegenerative diseases using ubiquitous VR technology. |
https://doi.org/10.7490/f1000research.1120287.1 | |
| 12 | David | Wu | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Poster presentation | Wu | Automated Evaluation of Large Language Model Response Concordance with Human Specialist Responses on Physician-to-Physician eConsult Cases | David
JH Wu, Fateme Nateghi Haredasht, David Wu, Vishnu Ravi, Liam G. McCoy,
Yingjie Weng, Kanav Chopra, Selin S. Everett, George Nageeb, Wenyuan Chen,
Stephen P. Ma, Saloni Kumar Maharaj, Jessica Tran, Leah Rosengaus, Lena
Giang, Olivia Jee, Ethan Goh, Jonathan H Chen |
Department
of Radiation Oncology, Stanford Cancer Center, Stanford Center for Biomedical
Informatics Research, Department of Dermatology, Mass General Brigham,
Harvard Medical School, Stanford Mussallem Center for Biodesign, Stanford
University, School of Medicine, Division of Neurology, Faculty of Medicine
and Dentistry, University of Alberta, Department of Medicine, Beth Israel
Deaconess Medical Center, Institute for Medical Engineering and Science,
Massachusetts Institute of Technology, Quantitative Sciences Unit, Division
of Hospital Medicine, Stanford Health Care, Division of Primary Care and
Population Health, Clinical Excellence Research Center, Department of
Medicine |
Specialist consults in primary care and inpatient settings typically address complex clinical questions beyond standard guidelines. eConsults have been developed as a way for specialist physicians to review cases asynchronously and provide clinical answers without a formal patient encounter. Meanwhile, large language models (LLMs) have approached human-level performance on structured clinical tasks, but their real-world effectiveness requires evaluation, which is bottlenecked by time-intensive manual physician review. To address this, we evaluate two automated methods: LLM-as-judge and a decompose-then- verify framework that breaks down AI answers into verifiable claims against human eConsult responses. Using 40 real-world physician-to-physician eConsults, we compared AI-generated responses to human answers using both physician raters and automated tools. LLM-as-judge outperformed decompose-then-verify, achieving human-level concordance assessment with F1-score of 0.89 (95% CI: 0.750, 0.960) and Cohen's kappa of 0.75 (95% CI 0.47,0.90) , comparable to physician inter-rater agreement κ = 0.69-0.90 (95% CI 0.43- 1.0). | https://doi.org/10.7490/f1000research.1120313.1 | |
| 13 | Xiaoman | Zhang | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise? | Poster presentation | Zhang | Automated Chest X-ray Report Generation Remains Unsolved | Xiaoman Zhang, Julian Nicolas Acosta, Xiaoli Yang, Subathra Adithan, Luyang Luo, Hong-Yu Zhou, Joshua Miller, Ouwen Huang, Zongwei Zhou, Ibrahim Ethem Hamamci, Shruthi Bannur, Kenza Bouzid, Xi Zhang, Zaiqiao Meng, Aaron Nicolson, Bevan Koopman, Inhyeok Baek, Hanbin Ko, Mercy Prasanna Ranjit, Shaury Srivastav, Sriram Gnana Sambanthan, Pranav Rajpurkar | Harvard Medical School, Harvard Medical School, Harvard Medical School, Harvard Medical School, Harvard Medical School, Harvard Medical School, Gradient Health, Gradient Health; Duke University; Durham, Johns Hopkins University, University of Zurich, Microsoft Research Health Futures, Microsoft Research Health Futures, University of Glasgow, University of Glasgow, CSIRO Health and Biosecurity, CSIRO Health and Biosecurity, Seoul National University, Seoul National University Graduate School, Microsoft Research India, Microsoft Research India, Indian Institute of Technology Madras, Harvard Medical School | Accurate interpretation of chest radiograph images and generation of narrative reports is essential for patient care but places a heavy burden on radiologists and clinical experts. While AI models for automated report generation show promise, standardized evaluation frameworks remain limited. Here we present the ReXrank Challenge V1.0, a competition in the generation of chest radiograph reports utilizing ReXGradient, the largest test dataset consisting of 10,000 studies across 67 sites. The challenge attracted diverse participants from academic institutions, industry, and independent research teams, resulting in 8 new submissions alongside 16 state-of-the-art models previously benchmarked. Through comprehensive evaluation using multiple metrics, we analyzed model performance across various dimensions: differences between normal and abnormal studies, generalization capabilities across healthcare sites, and error rates in identifying clinical findings. This benchmark reveals that automated chest X-ray report generation remains fundamentally unsolved, with significant performance gaps between normal and abnormal studies, where even top-performing models achieve less than 45% error-free reporting on abnormal cases, and substantial variability across healthcare institutions, indicating that robust, clinically-ready systems require continued development before widespread deployment. | https://doi.org/10.7490/f1000research.1120296.1 | |
| 14 | Suyash | Shringarpure | Biological molecular function: methods and benchmarks for finding function in biological dark matter | Poster presentation | Shringarpure | Large language models identify causal genes in complex trait GWAS | Suyash S. Shringarpure, Wei Wang, Sotiris Karagounis, Xin Wang, Anna C. Reisetter, Adam Auton, and Aly A. Khan | 23andMe
Inc., Departments of Family Medicine, and Pathology, and Institute for
Population and Precision Health, University of Chicago |
Pinpointing
causal genes at genome-wide association study (GWAS) loci remains a major
bottleneck. Existing literature-mining approaches are often limited in
accuracy and scalability. We show that large language models (LLMs) can
accurately prioritize likely causal genes at GWAS loci. We systematically
evaluated several widely available general-purpose LLMs against benchmark
datasets of high-confidence causal genes, including a unique set from 23 unpublished GWAS. Our results demonstrate that LLMs outperform or match current state-of-the-art methods and, crucially, exhibit robust performance on novel loci not previously linked to traits, underscoring their generalizability. Moreover, when integrated with existing methods, LLMs substantially enhance overall performance. This work establishes LLMs as an accurate, scalable, and broadly generalizable approach to accelerate causal gene identification in complex traits. |
https://f1000research.com/posters/14-1259 | |
| 15 | Zhiyong | Lu | Biological molecular function: methods and benchmarks for finding function in biological dark matter | Poster presentation | Wang | Gene-R1: Reasoning with Data-Augmented Lightweight LLMs for Gene Set Analysis | Zhizheng Wang, Yifan Yang, Qiao Jin, Zhiyong Lu | Division
of Intramural Research (DIR)/National Library of Medicine (NLM)/National
Institutes of Health (NIH) |
The gene set analysis (GSA) is a foundational approach for uncovering the molecular functions associated with a group of genes. Recently, LLM-powered methods have emerged to annotate gene sets with biological functions together with coherent explanatory insights. However, existing studies primarily focus on proprietary models, which have been shown to outperform their open-source counterparts despite concerns over cost and data privacy. Furthermore, no research has investigated the application of advanced reasoning strategies to the GSA task. To address this gap, we introduce Gene-R1, a data-augmented learning framework that equips lightweight and open-source LLMs with step-by-step reasoning capabilities tailored to GSA. Experiments on 1,508 in-distribution gene sets demonstrate that Gene-R1 achieves substantial performance gains, matching commercial LLMs. On 106 out-of-distribution gene sets, Gene-R1 performs comparably to both commercial and large-scale LLMs, exhibiting robust generalizability across diverse gene sources. | https://doi.org/10.7490/f1000research.1120383.1 | |
| 16 | Maxat | Kulmanov | Biological molecular function: methods and benchmarks for finding function in biological dark matter | Poster presentation | Zhapa-Camacho | LLM Agent Based Protein Function Prediction | Fernando Zhapa-Camacho, Olga Mashkova, Robert Hoehndorf, Maxat Kulmanov | King Abdullah University of Science and Technology | Protein function prediction remains a fundamental challenge in computational biology. Here, we present a Large Language Model (LLM) agent-based system that improves protein function prediction performance using knowledge-augmented reasoning and multi-source evidence synthesis. Our approach integrates computational predictions with structured protein metadata, scientific literature, and ontological knowledge through a multi-stage reasoning process. An LLM agent equipped with specialized tools progressively refines functional predictions by querying constraints, cross-referencing evidence, and ensuring biological plausibility. Furthermore, the system provides detailed explanations for each prediction update, documenting the reasoning process and evidence sources. We evaluate our approach against established baseline methods across three Gene Ontology sub-ontologies using four complementary metrics, achieving superior performance in threshold-dependent measures, attaining the lowest Smin scores across all ontologies and the best Fmax for Molecular Function and Cellular Component ontologies. We make our code publicly available at https://github.com/bio-ontology-research-group/go-agent. | https://doi.org/10.7490/f1000research.1120380.1 | |
| 17 | Clemence | Mottez | Fairness and Bias in Biomedical AI/ML: Defining Goals and Putting Them Into Practice | Poster presentation | Mottez | From Detection to Mitigation: Addressing Bias in Deep Learning Models for Chest X-Ray Diagnosis | Clemence Mottez, Louisa Fay, Maya Varma, Sophie Ostmeier, Curtis Langlotz | Stanford University, University Hospital of Tübingen, Stanford University, Stanford University, Stanford University | Deep learning models have shown promise in improving diagnostic accuracy from chest X-rays, but they also risk perpetuating healthcare disparities when performance varies across demographic groups. In this work, we present a comprehensive bias detection and mitigation framework targeting sex, age, and race-based disparities when performing diagnostic tasks with chest X-rays. We extend a recent CNN–XGBoost pipeline to support multi-label classification and evaluate its performance across four medical conditions. We show that replacing the final layer of CNN with an eXtreme Gradient Boosting classifier improves the fairness of the subgroup while maintaining or improving the overall predictive performance. To validate its generalizability, we apply the method to different backbones, namely DenseNet-121 and ResNet-50, and achieve similarly strong performance and fairness outcomes, confirming its model-agnostic design. We further compare this lightweight adapter training method with traditional full-model training bias mitigation techniques, including adversarial training, reweighting, data augmentation, and active learning, and find that our approach offers competitive or superior bias reduction at a fraction of the computational cost. Finally, we show that combining eXtreme Gradient Boosting retraining with active learning yields the largest reduction in bias across all demographic subgroups, both in and out of distribution on the CheXpert and MIMIC datasets, establishing a practical and effective path toward equitable deep learning deployment in clinical radiology. | https://doi.org/10.7490/f1000research.1120295.1 | |
| 18 | Yinan | Sun | Fairness and Bias in Biomedical AI/ML: Defining Goals and Putting Them Into Practice | Poster presentation | Sun | Barriers to Designing Inclusive Ecological Momentary Assessment and Wearable Data Collection Protocols for AI-Driven Substance Use Monitoring in Hawai‘i | Yinan Sun, Aditi Jaiswal, Ali Kargarandehkordi, Christopher Slade, Roberto M Benzo, Kristina T Phillips, Peter Washington | Ohio State University, University of Hawaii at Manoa, Kaiser Permanente Hawaii, University of California - San Francisco (UCSF) | Ecological momentary assessment (EMA) and wearable sensors offer unprecedented opportunities to capture the dynamics of substance use through real-time, high-resolution behavioral and physiological data. These data streams are increasingly used to train AI/ML models for digital phenotyping and predictive intervention, raising critical questions about fairness, bias, and inclusivity in model development. However, the adoption of these technologies, or the lack thereof, among diverse and historically marginalized groups raises questions and challenges of equity, cultural relevance, and participant trust. In this study, we conducted a four-week observational study with adults in Hawaiʻi where we combined continuous Fitbit monitoring and daily EMA surveys to document substance use patterns and cravings. Through semi-structured interviews and grounded theory analysis, we identified six primary barriers to study participation and adherence: (1) disruptions to daily routines, (2) physical and psychosocial discomfort associated with wearing the Fitbit device, (3) concerns about aesthetic compatibility and professional appearance, (4) phone-related issues, (5) challenges related to substance use and cravings, and (6) socially sensitive contexts. We also highlight participant-identified facilitators, such as the value of participant-driven scheduling, motivational feedback, and contextually adaptive protocols. Drawing on these collective findings, we propose a set of design guidelines aimed at advancing the inclusivity, engagement, and fairness of wearable-based EMA research. | https://f1000research.com/posters/14-993 | |
| 19 | Katie | Cardone | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Poster presentation | Cardone | Integrating Polygenic Scores with Clinical, Lifestyle, and Social Risk Factors to Improve Heart Failure Risk Prediction | Katie Cardone, Dokyoon Kim, Marylyn D. Ritchie | Department
of Genetics University of Pennsylvania Perelman School of Medicine,
Philadelphia, PA, USA, Institute for Biomedical Informatics University of
Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA, Division of
Informatics, Department of Biostatistics, Epidemiology, and Informatics
University of Pennsylvania Perelman School of Medicine, Philadelphia, PA,
USA |
Heart
failure (HF) is highly prevalent, high-burden disorder. Early detection of HF
can reduce morbidity and mortality; therefore, novel early detection methods
are needed. Polygenic scores (PGS) can combine common variants across the
genome and provide phenotype-specific risk scores. However, there are also
many well-known, non-genomic risk factors of HF, in the clinical, lifestyle,
and social determinant of health (SDOH) domains, and it is unclear how
genetic and non-genetic risk factors collectively contribute to HF risk. To
address this question, we assessed whether combining HF PGS with clinical,
lifestyle, and SDOH risk factors improves risk prediction. Leveraging data
from the All of Us Research Program, clinical risk factors were aggregated
into a clinical risk score (CRS) while lifestyle and SDOH risk factors were
aggregated into a polyexposure score (PXS). Feature selection was conducted
with LASSO and logistic regressions. Features were included in the model if
they were statistically significant and important in ≥ 95% of 1000
iterations. To assess model performance, logistic regressions with HF
case/control status were conducted with each risk score individually, as well
as integrated models. The integrated model (PGS + CRS + PXS) performed better
than individual risk scores. To assess the validity of the CRS and PXS, an
integrated model with the PGS and clinical and exposure risk factors as
independent features was also evaluated. Based on AUPRC and F1 score, this
integrated model (PGS + CRS risk factors + PXS risk factors) performed better
than the combining the PGS with the CRS and PXS. These findings demonstrate
that integration of risk factors across multiple domains can improve HF
prediction. Knowing that PGS combined with clinical, lifestyle, and SDOH risk
factors is predictive of HF risk provides greater opportunity for the
identification of individuals at risk of HF prior to disease onset. |
https://doi.org/10.7490/f1000research.1120396.1 | |
| 20 | Hyunjun | Choi | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Poster presentation | Choi | Deep Learning-based Classification of Patients with Postural Orthostatic Tachycardia Syndrome using Wearable ECG and Accelerometer Data | Hyunjun Choi, Nicholas Matsumoto, Xi Li, Debbie Teodorescu, Anxhela Kote, Min-Jing Yang, Xiao Liu, Miguel E. Hernandez, Jason H. Moore, Graciela Gonzalez Hernandez, Peng-Sheng Chen | Department
of Computational Biomedicine, Center for Artificial Intelligence Research and
Education, Cedars-Sinai Medical Center, Department of Computational Biomedicine, Center for Artificial Intelligence Research and Education, Cedars-Sinai Medical Center, Department of Computational Biomedicine, Center for Artificial Intelligence Research and Education, Cedars-Sinai Medical Center, Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Department of Computational Biomedicine, Center for Artificial Intelligence Research and Education, Cedars-Sinai Medical Center, Department of Computational Biomedicine, Center for Artificial Intelligence Research and Education, Cedars-Sinai Medical Center, Department of Computational Biomedicine, Center for Artificial Intelligence Research and Education, Cedars-Sinai Medical Center, Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center |
Postural Orthostatic Tachycardia Syndrome (POTS) is a chronic autonomic disorder characterized by chronic (> 3 months) orthostatic intolerance and an increase in heart rate (HR) of ≥ 30 beats per minute (bpm) without orthostatic hypotension. Traditional diagnostic approaches, such as the active standing or tilt-table test, are typically conducted under controlled clinical conditions, limiting their ability to capture the natural variability of symptoms and the intricate physiological responses occurring in daily life. These tests may cause patient discomfort, dizziness, nausea, or syncope. Furthermore, they are timeconsuming and cannot be used as a screening tool for POTS. To address these limitations, this study explored wearable devices that continuously collect physiological data-specifically, electrocardiogram (ECG) and accelerometer (ACC)-derived metrics-from POTS patients and healthy controls during routine daily activities. Physiological features around posturechange events identified in the data were processed and used to train and test a baseline deep learning model. The model demonstrated promising performance in accurately differentiating POTS patients from healthy controls in a relatively small cohort (66 from POTS patients and 20 from controls), indicating its potential as a feasibility study for clinical decision support. Future studies involving larger and more diverse samples under varying clinical conditions would be necessary to enhance the robustness and viability of our diagnostic model. | https://doi.org/10.7490/f1000research.1120312.1 | |
| 21 | Hannah | Seagle | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Poster presentation | Seagle | Impact of using PRS-CSx and pruning and thresholding for polygenic partitioning of apparent treatment resistant hypertension | Hannah M. Seagle(1-4,9), Jeewoo Kim(1-3,5-7), Alexis T. Akerele(1,5,6,10), VA Million Veteran Program, Adriana Hung(3,8,11), Jacklyn N. Hellwege(1,2,8,11)* and Todd L. Edwards(3,4,11)* | 1.
Vanderbilt Genetics Institute, 2. Division of Genetic Medicine, 3. Department of Medicine, 4. Division of Epidemiology, 5. Division of Quantitative Science, 6. Department of Obstetrics and Gynecology, 7. Vanderbilt Medical Scientist Training Program, 8. Division of Nephrology and Hypertension, Vanderbilt University Medical Center, Nashville, TN 37203, USA, 9. Joseph Maxwell Cleland Atlanta VA Medical Center, Atlanta, GA 37203, USA 10. School of Graduate Studies, Meharry Medical College, Nashville, TN 37208, USA 11. VA Tennessee Valley Healthcare System (626), Nashville, TN 37203, USA |
Apparent treatment-resistant hypertension (aTRH) is a clinically challenging condition with heterogeneous etiologies. Understanding the biological pathways underlying resistance to antihypertensive treatment could inform targeted therapeutic strategies. To evaluate how methodological choices in SNP selection influence biological inference, we applied two approaches to select aTRH-associated variants for clustering: PRS-CSx and pruning and thresholding (P&T). Using k-means clustering, we grouped aTRH-associated variants based on their association profiles across 91 cardiometabolic-related phenotypes. We then performed pathway and tissue enrichment analyses to evaluate the biological processes represented by each cluster. Both methods identified multiple genetic clusters, but the distribution of variants and biological signals differed. Clustering based on PRS-CSx produced unequally distributed clusters of SNPs and yielded limited tissue enrichment, while P&T based clustering captured more uniform trends across cardiometabolic traits and broader tissue and pathway enrichment. These results demonstrate that methodological choices in SNP selection influence downstream clustering and biological interpretation. Despite some overlap in identified pathways and tissue enrichment, each approach identified unique biological signals, highlighting the potential of pairing polygenic methods and k-means clustering to elucidate the biological heterogeneity of aTRH and guide future mechanistic studies. | 10.7490/f1000research.1120311.1 | |
| 22 | Keita | Tamura | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Poster presentation | Tamura | Patch-level phenotype identification via weakly supervised neuron selection in sparse autoencoders for CLIP-derived pathology embeddings | Keita Tamura, Yao-zhong Zhang, Yohei Okubo, Seiya Imoto | School
of Medicine, Hiroshima University, Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo |
Computer-aided analysis of whole slide images (WSIs) has advanced rapidly with the emergence of multi-modal pathology foundation models. In this study, we propose a weakly supervised neuron selection approach to extract disentangled representations from CLIP-derived pathology foundation models, leveraging the interpretability of sparse autoencoders. Specifically, neurons are ordered and selected using whole-slide level labels within a multiple instance learning (MIL) framework. We investigate the impact of different pre-trained image embeddings derived from general and pathology images and demonstrate that a selected single neuron can effectively enable patch-level phenotype identification. Experiments on the Camelyon16 and PANDA datasets demonstrate both the effectiveness and explainability of the proposed method, as well as its generalization ability for tumor patch identification. | https://doi.org/10.7490/f1000research.1120390.1 | |
| 23 | Ananya | Rajagopalan | Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms | Poster presentation | Rajagopalan | DRIVE-KG: Enhancing variant-phenotype association discovery in understudied complex diseases using heterogeneous knowledge graphs | Ananya Rajagopalan, Tram Anh Nguyen, Lindsay A. Guare, Andre Luis Garao Rico, Rasika Venkatesh, Lannawill Caruth, Regeneron Genetics Center, Penn Medicine BioBank, Anurag Verma, Marylyn D. Ritchie, Molly A. Hall, Joseph D. Romano, Shefali Setia-Verma | University
of Pennsylvania, Philadelphia, PA, USA, University of Pennsylvania, Philadelphia, PA, USA, University of Pennsylvania, Philadelphia, PA, USA, University of Pennsylvania, Philadelphia, PA, USA, University of Pennsylvania, Philadelphia, PA, USA, University of Pennsylvania, Philadelphia, PA, USA, Regeneron Genetics Center LLC, University of Pennsylvania, Philadelphia, PA, USA, University of Pennsylvania, Philadelphia, PA, USA, University of Pennsylvania, Philadelphia, PA, USA, University of Pennsylvania, Philadelphia, PA, USA, University of Pennsylvania, Philadelphia, PA, USA, University of Pennsylvania, Philadelphia, PA, USA |
Multi-omics
data are instrumental in obtaining a comprehensive picture of complex
biological systems. This is particularly useful for women’s health conditions
such as endometriosis, which has been historically understudied despite
having a high prevalence (around 10% of women of reproductive age).
Subsequently, endometriosis has limited genetic characterization: current
genome-wide association studies explain only 11% of its 47% total estimated
heritability, underscoring the need for integrative approaches. Graph
representations provide an intuitive way to harmonize biological data, using
nodes to represent biological concepts and edges to represent their
relationships. We present DRIVE-KG (Disease Risk Inference and Variant
Exploration Knowledge Graph), which uses a heterogeneous graph representation
to integrate data from diverse multi-omics datasets. We trained two models
using DRIVE-KG: a link prediction model to suggest associations between SNPs
and two pilot phenotypes (endometriosis and obesity), and a graph
convolutional network (GCN) for patient-level classification of
endometriosis/adenomyosis. We conducted patient-level classification using
data from 1,441 Penn Medicine BioBank participants with gold standard
chart-reviewed endometriosis/adenomyosis status. The link prediction model
uncovered 66 high-confidence candidate SNP-endometriosis associations,
representing largely distinct genetic signals (R2 < 0.1). These variants
were enriched for obesity/body mass index traits (24.2%), lipid metabolism
(6%), and depressive disorders (4.5%), aligning with emerging hypotheses
about endometriosis etiology. In contrast, 38.22% of the high-confidence,
candidate SNP–obesity association were in high linkage disequilibrium (R2 ≥
0.8) with known obesity or comorbidity associations. The GCN to classify
patient endometriosis/adenomyosis status had an F1 score of 0.752 compared to
0.698 for a genetic risk score, and learned meaningful stratification of
underlying adenomyosis signal and severe endometriosis grades. Together,
these results demonstrate that heterogeneous integration of multi-omics data
is valuable for diverse downstream tasks, particularly for understudied
diseases where traditional genomic approaches are insufficient. |
https://doi.org/10.1101/2025.08.19.25333942 | |
| Presenting Poster Author Name | Last | Session/Workshop Area | Abstract Type | Last Name of First Author | Abstract Title | List all authors (first name first with names separated by commas and proper capitalization) in the order they appear on the abstract. Please do NOT list affiliations or addresses in this field. | Author affiliations (in order of the list of authors). Please separate affiliations with commas. | Abstract (300 words or less) | Poster DOI or URL (if you are uploading a PDF, type "N/A" in this field) | Upload a PDF of your poster. | |
| 24 | Hee Young | Cho | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise | Poster only | Cho | Daily self-administered transcranial direct current stimulation (tDCS) and objective behavioral changes in perinatal women: A pilot observational study | Hee Young Cho1, Bo Seong Yun2, Sehwan Park3 | 1Department
of Obstetrics and Gynecology, Seoul National University College of Medicine,
Seoul, Republic of Korea 2Department of Obstetrics and Gynecology, CHA Ilsan Medical Center, CHA University Goyang-si, Republic of Korea 3Medical Research Team, Digital Medic co., Ltd., Seoul, Republic of Korea |
Objective This pilot observational study examined whether a four-week course of daily self-administered transcranial direct current stimulation (tDCS) was associated with changes in depressive symptoms and objective behavioral activity in perinatal women. Materials and Methods Thirty-six perinatal participants (mean age = 34.8 ± 4.3 years, all female) independently administered 20-minute sessions of anodal tDCS targeting the left dorsolateral prefrontal cortex each day for four consecutive weeks. Participants continuously wore wearable devices (Fitbit Inspire 2 wristbands), which passively collected minute-level data on step count, walking distance, calories, and heart rate. Daily metrics were calculated by summing (steps, distance, calories) or averaging (heart rate) the 1-minute data. Missing values were not imputed. Depressive symptoms were measured at baseline and week 4 using the Montgomery–Åsberg Depression Rating Scale (MADRS). Statistical analysis included repeated-measures nonparametric tests and linear mixed-effects models to evaluate both overall change and time-dependent patterns while accounting for within-subject variability and missingness. Results Participants demonstrated a significant reduction in MADRS scores, with mean scores decreasing from 17.5 (SD = 8.1) at baseline to 12.3 (SD = 7.7) at week 4 (p < .001, d = 0.73). Behavioral indicators also showed improvement: step count (p < .001, d = 0.90), walking distance (p < .001, d = 0.89), and calories burned (p < .001, d = 0.86) all increased significantly, while resting heart rate decreased (p < .01, d = 0.56). Conclusion These findings suggest the potential utility of combining daily self-administered tDCS with passive behavioral monitoring via wearable devices in perinatal populations. While limited by its single-arm design, relatively small sample size, and lack of imputation for missing data, this study provides preliminary evidence supporting the feasibility of scalable digital neuromodulation approaches. Future randomized trials with larger samples are warranted to confirm efficacy and clinical relevance. |
N/A | 2026psb1127.pdf |
| 25 | Julie | Lynch | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise | Poster only | Cogill | A Novel Machine Learning Model to Predict Elevated Lipoprotein(a) in US Veterans | Steven
Cogill, PhD1,2,3; Shriram Nallamshetty, MD1,2,4; Kent Heberer, PhD1,2,3; Mei-Chung Shih, PhD1,2,3,5; Ana Maldanado,
PhD1,2,3; Ying Q. Chen, PhD1,2,3,6; Adam Bress, PhD2,7,8; Julie Lynch,
PhD2,7,9; Jennifer S. Lee, MD, PhD, MBA1,2 |
1.
VA Palo Alto Healthcare System, Palo Alto, CA, 2. Veterans Affairs Cooperative Studies Program Leveraging Electronic Health Information to Advance Precision Medicine (VA LEAP) Initiative, 3. VA Palo Alto Cooperative Studies Coordinating Center, Palo Alto, CA, 4. Division of Cardiovascular Medicine, Stanford School of Medicine, Stanford CA, 5. Dept. of Biomedical Data Science, Stanford School of Medicine, Stanford, CA, 6. Stanford Prevention Research Center, Stanford, CA, 7. VA Salt Lake City Healthcare System, Salt Lake City, UT, 8. Dept. of Population Health Sciences, University of Utah School of Medicine, Salt Lake City, Utah, 9. Division of Epidemiology, University of Utah School of Medicine, Salty Lake City, UT |
Background:
Elevated lipoprotein(a) [Lp(a)] is a prevalent, genetically determined
cardiovascular (CV) risk factor that can double the risk of myocardial
infarction (MI) and stroke. Elevated Lp(a) is present in 20% of the
population globally. Despite the broad availability of an accurate and
relatively inexpensive serological test, elevated Lp(a) remains
underdiagnosed; currently, less than 1% of the general population has been
tested for elevated Lp(a). Objective: Develop an automated machine learning (ML) model to predict the presence of elevated Lp(a). Methods: A cohort of 44,992 Veterans with Lp(a) assessments was built from a database of approximately 12.3 million Veterans in the Corporate Data Warehouse (CDW), a data repository for all patients receiving care through Veterans Healthcare Administration (VHA) facilities. Elevated Lp(a) is defined as a serum Lp(a) of > 125 nmol/L, per clinical guidelines. An ML tool that predicts elevated Lp(a) was built using a generalized linear modeling approach that employed a total of 81 clinical variables. The cohort was divided into training and testing datasets in a ratio of 80:20. Results: The median age in the Veteran cohort was 59 years. The majority were men (89.0%) and White (64.9%). Approximately 22% of Veterans had elevated Lp(a). The predictive ML model had a C-statistic of 0.684 95%CI [0.672, 0.697], sensitivity of 0.647 95% CI [0.628, 0.666], specificity of 0.631 95% CI [0.620, 0.642]. Based on the model’s discriminatory capacity, we project a 64.8% decrease in the number of VHA patients needed to test to identify those with elevated Lp(a), compared to a random screening approach. Conclusion and Relevance: An ML model that accurately predicts elevated Lp(a) holds the potential to address barriers to broad Lp(a) screening by identifying individuals who should be prioritized for testing. |
N/A | lpa_prediction_model_psb26_poster.pdf |
| 26 | Tomoko | Ishibashi | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise | Poster only | Ishibashi | Developing a Machine Learning Model to Predict Fresh Frozen Plasma Requirements during Stanford Type A Acute Aortic Dissection Surgery Using Early ROTEM FIBTEM Data | Tomoko Ishibashi, Kota Ishibashi, Takeshi Fukatsu | Tokyo Bay Urayasu Ichikawa Medical Center, Kojimachi Junior Highschool, Tokyo Bay Urayasu Ichikawa Medical Center | Background:
In emergency surgery for Stanford Type A Acute Aortic Dissection (TAAAD),
severe hypofibrinogenemia frequently develops, yet fibrinogen concentrate and
cryoprecipitate—commonly recommended elsewhere—are not available in Japan.
Fresh frozen plasma (FFP) is therefore the only option for fibrinogen
replacement and often requires large transfusion volumes. Because the
cardiopulmonary bypass (CPB) circuit incorporates extracorporeal
ultrafiltration (ECUM), excess volume can be removed intraoperatively, making
it preferable to administer FFP during CPB. To support this strategy, we
sought to predict the FFP volume required to achieve a FIBTEM A5 of
approximately 9 mm at protamine administration. Methods: A three-phase strategy was used. Phase 1: We generated 1,000 synthetic cases using published distributions and institutional averages for body size, aortic morphology, CPB parameters, and early FIBTEM A5. Six features were included: body weight, early FIBTEM A5, predicted CPB time, dissection extent, false lumen status, and true lumen size. Target FFP dose was computed to reach FIBTEM A5 = 9 mm. Three-fold cross-validation was performed and the best hyperparameters were fixed. Phase 2: Twenty-eight clinical TAAAD cases were used to refine the synthetic-generation rules without providing clinical outcomes to the learning algorithm. The calibrated rules were applied to regenerate the synthetic dataset, the clinical cases were up-weighted and merged, and the combined dataset underwent 3-fold cross-validation for retraining and internal validation. Phase 3: External validation will be performed once additional clinical cases become available. Results: Phase 2 internal validation showed excellent performance (severe under-transfusion 0.15%, MAE 0.84 units), with almost no severe over-transfusion. Conclusion: This model provides the first method to estimate FFP requirements in emergency Stanford Type A aortic dissection surgery using only FIBTEM A5 and clinically accessible perioperative variables. It may support transfusion planning in settings where the supply of blood products is limited. |
https://f1000research.com/posters/14-1310 | |
| 27 | Seongho | Jang | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise | Poster only | Jang | Deep Convolutional Neural Network Analysis of Biomechanical Gait Improvements Following Ankle-Foot Orthosis use in Stroke Patients | Seongho
Jang, Sibog Park, Shi-Uk Lee, Yeo Joon Yun |
Department
of Physical Medicine and Rehabilitation Hanyang University College of
Medicine, Department of Physical Medicine and Rehabilitation Hanyang University College of Medicine, Department of Rehabilitation Seoul Metropolitan Government Boramae Medical Center, Department of Physical Medicine and Rehabilitation Hanyang University College of Medicine, |
Advanced computational approaches, such as deep convolutional neural networks (DCNN), provide new opportunities for objectively classifying and interpreting complex biomechanical gait improvements following Ankle-Foot Orthosis (AFO) use in stroke rehabilitation. This study aimed to evaluate the efficacy of DCNN models in distinguishing affected versus control gait patterns and identifying subtle biomechanical improvements after AFO use, utilizing Gradient-weighted Class Activation Mapping (Grad-CAM) for interpretability. | https://f1000research.com/posters/14-1144 | p_deep_convolutional_neural_network_analysis_of_biomechanical_gait.pdf |
| 28 | Shreya | Johri | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise | Accepted proceedings paper with oral presentation | Johri | A Clinician-Guided Framework for Endoscopic AI: Developing PanEndoAtlas and Benchmarking Foundation Models Across the Full GI Spectrum | Shreya
Johri, Luyang Luo, Hong-Yu Zhou, Todd Brenner, Sami Elamin, Mark Enrik
Geissler, Tyler M. Berzin, Pranav Rajpurkar |
Department of Biomedical Informatics, Harvard Medical School; Department of Biomedical Informatics, Harvard Medical School; Department of Biomedical Informatics, Harvard Medical School; Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School; Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School; Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School; Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School; Department of Biomedical Informatics, Harvard Medical School | Endoscopic procedures play a central role in the diagnosis and management of gastrointestinal (GI) diseases, yet the field lacks large‐scale, clinically diverse benchmarks and unified datasets to evaluate vision foundation models. We introduce PanEndoSuite, the first unified ecosystem for endoscopic AI, developed through systematic collaboration between AI researchers and practicing gastroenterologists. PanEndoSuite consists of three complementary components: PanEndoAtlas, PanEndoX, and PanEndoFM. PanEndoAtlas is a harmonized dataset of over 420,000 labeled images from 30 public endoscopy datasets across 13 countries and 26 hospitals, creating a clinically-grounded hierarchical taxonomy that mirrors diagnostic reasoning patterns across 111 GI diseases. PanEndoX is a benchmark of 10 clinically grounded tasks, including hierarchical GI-tree classification, Barrett’s esophagus grading, ulcerative colitis scoring, polyp subtyping, Boston Bowel Preparation Scale assessment, multi-organ disease classification, and anatomical landmark identification—designed to probe generalization across anatomical regions, disease presentations, and annotation granularities. PanEndoFM is a foundation model pretrained on a 10 million–image corpus curated from public data sources, spanning the entire GI tract. We benchmark PanEndoFM against two endoscopy-specific foundation models (EndoFM-LV, EndoSSL) and two general-purpose vision models (ViT-B/16, ResNet-50). PanEndoFM achieves the highest macro-AUC on 6 of 10 tasks, demonstrating broad clinical generalization; EndoFM-LV performs best on colon-focused tasks, EndoSSL excels in polyp subtyping, and ViT-B/16 shows strengths on small-intestine conditions. Together, PanEndoSuite establishes a foundation for building robust, generalist AI systems in gastrointestinal endoscopy that bridge current AI capabilities and clinical practice. | https://doi.org/10.7490/f1000research.1120379.1 | |
| 29 | Junghwa | Hong | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise | Poster only | Kim | Hydrodynamic Analysis of the Cerebral Perivascular Space Using a Hybrid PINN–CFD Method | Jaemin Kim¹, Soonmoon Jung¹, Youngho Lee¹, Hyeyeong Song¹, Jiwoo Jang¹, Inyeop Na¹, Seungyun Oh¹, Joo Hyun Kim², Junghwa Hong¹ | ¹Department
of Control and Instrumentation Engineering, Korea University, Sejong,
Republic of Korea, ²Department of Mechanical and Aerospace Engineering, New
York University, New York, NY, USA |
The clearance of protein aggregates such as amyloid-β and tau, key pathological drivers of Alzheimer’s disease, occurs through the glymphatic system, including perivascular spaces (PVSs). However, the microscopic scale and deep penetration of PVSs from the subarachnoid space (SAS) into the brain parenchyma limit in vivo quantification of fluid-dynamic variables. Therefore, the primary driving mechanism of PVS flow—whether dominated by arterial pulsations or static pressure gradients—remains controversial, as distinguishing these factors via direct observation is challenging. In this study, deep PVS flow characteristics were estimated via physics-informed neural networks (PINNs) integrated with a 3D computational fluid dynamics (CFD) model of the SAS-PVS-parenchyma structure simulating tracer diffusion, leveraging limited data from the observable penetrating arterial PVS segment near the SAS. Cross-sectional average longitudinal concentration data from the superficial PVS were used to train the PINN, which inversely estimates velocity (u_est) and diffusion coefficient (D_est) while satisfying the advection–diffusion equation. The CFD results obtained using the estimated parameters were validated against prior literature on arterial pulsation-only and combined mechanism models. PINN-CFD analysis revealed that under arterial pulsation alone, net flow was negligible (0.46 ± 0.17 μm/s), consistent with simulations by Daversin-Catty et al. (< 0.5 μm/s). In contrast, introducing a small static pressure gradient markedly increased net flow velocity to 19.6 ± 5.8 μm/s, in close agreement with in vivo measurements by Mestre et al. (18.7 μm/s) and combined mechanism predictions by Daversin-Catty et al. (20–30 μm/s). This study presents the hybrid PINN-CFD method for quantifying deep-cerebral PVS fluid velocity from limited datasets. Consequently, this PINN-CFD approach serves as a potent tool for estimating parameters in various biofluidic systems within observationally inaccessible regions. | N/A | psb_2026_poster.pdf |
| 30 | Seung Mi | Lee | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise | Poster only | Kim | Machine learning-based prediction model for fetal acidemia at cesarean deliveries using preoperative and intraoperative variables | Sun Min Kim, Seoyoung Moon, Byoung Jae Kim, Ji Hoi Kim, So Hee Kim, Chan-Wook Par, Joong Shin Park, Seung-Bo Lee, Seung Mi Lee | Department
of Obstetrics & Gynecology, Seoul National University College of
Medicine, Seoul, Korea, Department of Obstetrics & Gynecology, Seoul
Metropolitan Government-Seoul National University Boramae Medical Center,
Seoul, Korea, Department of Medical Informatics, Keimyung University School
of Medicine, Daegu, Korea, Department of Obstetrics and Gynecology, Seoul
National University Hospital, Seoul, Korea, Healthcare AI Research Institute,
Seoul National University Hospital, Seoul, Korea, Medical Big Data Research
Center & Institute of Reproductive Medicine and Population, Medical
Research Center, Seoul National University, Seoul, Korea, Interdisciplinary
Program in Artificial Intelligence, Seoul National University, Seoul, Korea |
Objective:
To develop and validate a machine learning-based prediction model for fetal
acidemia at cesarean deliveries using preoperative clinical data and
intraoperative hemodynamic monitoring variables. Study design: We retrospectively analyzed data of 1,319 patients with vital signs during Cesarean delivery at Seoul National University Hospital between 2016 and 2023. Fetal acidemia was defined as umbilical arterial pH < 7.2, and occurred in 27 (2.0%) cases. Preoperative variables were extracted from electronic medical records and included maternal demographics, comorbidities, and laboratory findings. Intraoperative data including maternal blood pressure, heart rate, and photoplethysmogram (PPG) were obtained from high-resolution vital sign recordings. After preprocessing, machine learning models (LightGBM, XGBoost, CatBoost) were trained and validated using 5-fold cross-validation. Class imbalance was addressed using SMOTE to improve model performance. Feature selection was performed using sequential forward floating selection (SFFS), and SHapley Additive exPlanations (SHAP) was used for interpretability. Results: The final CatBoost model using selected features achieved the highest predictive performance (AUROC 0.913). Key features associated with fetal acidemia included lower gestational age, elevated systolic blood pressure (SBP), reduced minimum PPG values, and maternal comorbidities. Combining preoperative and intraoperative features significantly improved model performance compared to using either alone. SHAP analysis identified SBP mean, PPG minimum, and gestational age as the most influential features. Conclusion: This study demonstrates the feasibility of predicting fetal acidemia at cesarean delivery using machine learning applied to intraoperative hemodynamic and preoperative clinical data. The findings show the importance of maintaining maternal hemodynamic stability during cesarean delivery to prevent adverse neonatal acid-base outcomes. |
N/A | abstract_psbsubmission.pdf |
| 31 | Sraavya | Sambara | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise | Accepted proceedings paper with oral presentation | Sambara | 3DReasonKnee: Advancing Grounded Reasoning in Medical Vision Language Models | Sraavya
Sambara, Sung Eun Kim, Xiaoman Zhang, Luyang Luo, Shreya Johri, Mohammed
Baharoon, Du Hyun Ro, Pranav Rajpurkar |
Harvard
University, Seoul National University Hospital |
Current
Vision-Language Models (VLMs) struggle to ground anatomical regions in 3D
medical images and reason about them in a step-by-step manner—a key
requirement of real- world diagnostic assessment. This ability is essential
for aligning model outputs with the diagnostic workflows clinicians use in
practice, enabling trustworthy clinician-AI collabora- tion. Existing 3D
datasets provide localization labels, but none support this “grounded rea-
soning” ability. To address this gap, we introduce 3DReasonKnee, the first 3D
grounded reasoning dataset for medical images, which provides 494k
high-quality quintuples derived from 7,970 3D knee MRI volumes. Each
quintuple includes: (1) the 3D MRI volume, (2) a diagnostic question
targeting a specific anatomical region (3) a 3D bounding box local- izing the
relevant anatomical structures, (4) clinician-generated diagnostic reasoning
steps that explicitly detail the 3D reasoning process, and (5) structured
severity assessments for the relevant anatomical region. The meticulous
creation and validation of 3DReasonKnee, involving over 450 hours of expert
clinician time for manually segmenting MRIs and gen- erating reasoning
chains, ensures its superior quality and clinical relevance. We establish
ReasonKnee-Bench to evaluate localization and diagnostic accuracy, providing
novel in- sight into VLM ability to perform grounding and severity assessment
across diverse anatom- ical regions and diagnostic inquiries. We benchmark
five state-of-the-art VLMs, providing baseline performance for
ReasonKnee-Bench. By providing this unique resource of expert- annotated 3D
reasoning pathways, 3DReasonKnee serves as a repository of orthopedic
surgeons’ diagnostic expertise and offers a vital testbed for advancing
multimodal medi- cal AI systems towards 3D, clinically aligned, localized
decision-making capabilities. The dataset can be found in HuggingFace:
rajpurkarlab/3DReasonKnee. |
https://doi.org/10.7490/f1000research.1120304.1 | |
| 32 | Eric | Strobl | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise | Accepted proceedings paper with oral presentation | Strobl | Learning Causally Predictable Outcomes from Psychiatric Longitudinal Data | Eric V. Strobl | University of Pittsburgh | Causal inference in longitudinal biomedical data remains a central challenge, especially in psychiatry, where symptom heterogeneity and latent confounding frequently undermine classical estimators. Most existing methods for treatment effect estimation presuppose a fixed outcome variable and address confounding through observed covariate adjustment. However, the assumption of unconfoundedness may not hold for a fixed outcome in practice. To address this foundational limitation, we directly optimize the outcome definition to maximize causal identifiability. Our DEBIAS (Durable Effects with Backdoor-Invariant Aggregated Symptoms) algorithm learns non-negative, clinically interpretable weights for outcome aggregation, maximizing durable treatment effects and empirically minimizing both observed and latent confounding by leveraging the time-limited direct effects of prior treatments in psychiatric longitudinal data. The algorithm also furnishes an empirically verifiable test for outcome unconfoundedness. DEBIAS consistently outperforms state-of-the-art methods in recovering causal effects for clinically interpretable composite outcomes across comprehensive experiments in depression and schizophrenia. R code is available at github.com/ericstrobl/DEBIAS. | https://doi.org/10.7490/f1000research.1120300.1 | |
| 33 | Andrew | Zolensky | AI and Machine Learning in Clinical Medicine: Bridging or Separating Model Intelligence and Human Expertise | Accepted proceedings paper with oral presentation | Zolensky | Speaker Role Identification in Clinical Conversations | Andrew Zolensky, Kuk Jin Jang, Janice Sabin, Andrea Hartzler, Basam Alasaly, Sriharsha Mopidevi, Mark Liberman, Kevin Johnson | University of Pennsylvania, Hongik University, University of Washington, University of Washington, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania | Patient-clinician communication research is crucial for understanding interaction dynamics and for predicting outcomes that are associated with clinical discourse. Traditionally, interaction analysis is conducted manually because of challenges such as Speaker Role Identification (SRI), which must reliably differentiate between doctors, medical assistants, patients, and other caregivers in the same room. Although automatic speech recognition with diarization can efficiently create a transcript with separate labels for each speaker, these systems are not able to assign roles to each person in the interaction. Previous SRI studies in task-oriented scenarios have directly predicted roles using linguistic features, bypassing diarization. However, to our knowledge nobody has investigated SRI in clinical settings. We explored whether Large Language Models (LLMs) such as BERT could accurately identify speaker roles in clinical transcripts, with and without diarization. We used veridical turn segmentation and diarization identifiers, fine-tuning each model at varying levels of identifier corruption to assess impact on performance. Our results demonstrate that BERT achieves high performance with linguistic signals alone (82% accuracy/82% F1-score), while incorporating accurate diarization identifiers further enhances accuracy (95%/95%). We conclude that fine-tuned LLMs are effective tools for SRI in clinical settings. | https://doi.org/10.7490/f1000research.1120293.1 | |
| 34 | In-Jung | Kim | Biological molecular function: methods and benchmarks for finding function in biological dark matter | Poster only | Batkhuu | Genome-wide analysis of genetic variation in a new red-peel citrus R2 mutant line | Enkhtuya Batkhuu, MinSeop Kim, Kyuchang Chang, In-Jung Kim | Department
of Biomaterials Science and Technology, Faculty of Biotechnology, Department
of Artificial Inelligence, Department of Biomaterials Science and Technology
& Faculty of Biotechnology & Subtropical Horticulture Research
Institute & Bio-Resources Computing Research Center & Research
Institute for Subtropical Agriculture and Biotechnology |
‘Miyagawa-wase’
mandarin (Citrus unshiu Marc. cv. Miyagawa-wase, Miyagawa-wase early) is one
of the most wildly cultivated variety in Korea. Mutation breeding is useful
tools for induction of genetic diversity for creating new variation in a
short time. Previously, we have been conducted the mutation breeding with
gamma irradiation for development of new citrus varieties. Of the new citrus
mutants, we found R2 mutant line having a unique peel color, fruit shape high
sugar content and hardness peel. In this study, gamma irradiation-induced
mutation breeding was applied to generate genetic variation from
Miyagawa-wase early (wild type, WT). A promising mutant line, designated R2,
was identified, exhibiting a red color, firm peel, greater fruit weight, and
elevated sugar content compared with WT. The genome sequence data of WT fruit
used in this study were from our previously published sources (NCBI:
PRNKA745525). Miyagawa-wase early (CUMW-v1.0) was used as a reference genome
and had a length of 359.7Mb. After sequence pre-processing the raw data from
R2 line and WT control, we obtained 89,593,396 and 80,693,250 total trimmed
reads, respectively. The Mapped region rates were 84.29% and 86.75%, for R2
and the WT plants, respectively. Future studies should focus on identifying
genetic variation linked to key horticultural traits such as peel
pigmentation, texture, fruit size, and sweetness. Funding: This research was funded by the Basic Science Research Program through the National Research Foundation of Korea [grant numbers: 2025-RISE-17-001] and by the Basic Science Research Program through the National Research Foundation of Korea [grant numbers: 2017R1D1A1B06034883]. |
https://f1000research.com/posters/14-1335 | r2_abstract_for_psb20262.pdf |
| 35 | SUJIN | KIM | Biological molecular function: methods and benchmarks for finding function in biological dark matter | Poster only | Byun | Genome-wide polymorphism and Transcriptome analysis of the citrus mutant ‘Yein-early’ | 1.
Ye-Nan Byun 2. Su-jin Kim 3. Jong-Eun Park 4. In-Jung Kim |
1.
Ye-Nan Byun, Department of Biomaterials Science and Technology, Graduate
School, Jeju National University, Jeju 63243, Korea 2. Su-Jin Kim, Department of Horticulture and Environment, Faculty of Biological Industry, Jeju National University, Jeju 63243, Korea 3. Jong-Eun Park, Faculty of Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju 63243, Korea *4. In-Jung Kim, Bio‐Resources Computing Research Center, Research Institute for Subtropical Agriculture and Biotechnology, SARI, Jeju National University, Jeju 63243, Korea |
Yein-early is a gamma-irradiated mutant of Citrus unshiu that exhibits two stable traits distinct from the wild type (WT): a deeper red peel color and narrow, curled leaves. Fruit quality analysis showed that Yein-early has higher firmness and sugar content, while fruit size, acidity, and peel thickness were comparable to the WT. To uncover the genetic basis of these traits, we conducted whole-genome sequencing. The WT sequence data were sourced from our previous work (NCBI: PRJNA745525). Reads were mapped to the C. unshiu Marc. Miyagawa-wase reference genome (CUMW_v1.0; 359.7 Mb), producing 55,897 Mbp and 56,418 Mbp of clean reads for Yein-early and WT, with mapping rates of 84.8% and 86.9%, respectively. Comparative genomic analysis revealed extensive polymorphisms in Yein-early: 650,257 SNPs and 105,817 InDels, including 72,155 homozygous SNPs and 4,683 homozygous InDels. Most variants occurred in gene regions (26,339 genes for SNPs; 14,889 for InDels), indicating broad genomic perturbation. SNP polymorphisms were markedly more abundant than InDels, and Yein-early displayed a large number of fixed mutations, suggesting radiation-induced genomic restructuring. Transcriptome profiling of immature fruit identified differentially expressed genes. Up-regulated genes were associated with metabolism, transport, and photosynthetic processes, while genes linked to ribosomal function and redox balance were down-regulated. These multi-omic data connect phenotype, genomic variation, and transcriptional regulation, providing insight into the mechanisms underlying the peel pigmentation and altered leaf morphology of Yein-early. | https://f1000research.com/posters/14-1336 | yein_2026_psb_abstract2.pdf |
| 36 | Jong-Eun | Park | Biological molecular function: methods and benchmarks for finding function in biological dark matter | Poster only | Liyanage | A Systems-Level Analysis of the Temporal Transcriptomic Response to Sustained Heat Stress in Swine | D.S. Liyanage, Md Mortuza Hossain, Yeonhee Park, Yujin Ko, Sanghoon Lee, Jong-Eun Park | Department of Animal Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju Self-Governing Province, 63243, Republic of Korea | Sustained
heat stress poses major welfare and economic challenges to global swine
production, yet the systems-level mechanisms driving the transition from
acute to chronic stress remain poorly understood. In this study, we generated
a comprehensive temporal map of the regulatory response to 14 days of
continuous heat stress in pigs. Whole-blood RNA-seq collected at Day 0, Day
7, and Day 14 was analyzed through an integrative framework combining
differential mRNA expression, allele-specific expression (ASE), long
non-coding RNA (lncRNA) analysis, and weighted gene co-expression network
analysis (WGCNA). The transcriptome showed pronounced temporal reprogramming. By Day 7, pigs showed a response centered on cellular maintenance and damage mitigation, including protein folding and DNA repair. By Day 14, this profile shifted to a chronic stress state dominated by immune activation, particularly Toll-like receptor and inflammatory signaling pathways. ASE revealed that this shift was driven overwhelmingly by trans-acting factors at both time points (74% at Day 7; 78% at Day 14), indicating a coordinated systemic response rather than local cis-regulatory changes. WGCNA identified a key co-expression module (ME13) consistently activated during heat exposure, functionally linking innate immune responses with cellular energy metabolism. Notably, genes under combined cis+trans regulation during the transition to chronic stress were 4.5 times more likely to reside within QTLs associated with economically important health and production traits. Together, this multi-layer systems analysis reveals a fundamental reprogramming of immune and metabolic networks during chronic heat exposure. The regulatory hubs identified here provide strong candidates for targeted mitigation strategies and genetic selection programs aimed at improving livestock resilience to a warming climate. |
N/A | psb_ds_liyanage_je_park_poster_vf.pdf |
| 37 | Peter | Hoover | Fairness and Bias in Biomedical AI/ML: Defining Goals and Putting Them Into Practice | Poster only | Hoover | Evaluating Fairness in Machine Learning-Based Fall Prediction Across Veteran Demographic Groups | Peter J. Hoover(1); Terri L. Blumke(1); Anna D. Ware(1); David M. Arreola (1); & Jennifer S. Lee(1,2) | 1National
Center for Collaborative Healthcare Innovation, VA Palo Alto Healthcare
System, Palo Alto, CA, USA 2Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA |
Objective:
To evaluate demographic differences in fall risk and assess fairness in a
newly developed fall risk prediction model among inpatients within the
Veterans Health Administration. Materials and Methods: This national assessment included Veterans admitted to a Veterans Health Administration acute care setting from July 1, 2020, to June 30, 2022, with a length of stay between 1 and 7 days. Demographic and clinical data were obtained through electronic health records. A classification model was developed utilizing a ROCKET transformer along with a Light Gradient-Boosting Machine. Model performance was evaluated overall and across demographic subgroups defined by age, sex, race, and ethnicity. Accuracy and Brier scores were used to quantify subgroup performance and assess potential fairness and bias concerns. Results: Among 242,844 Veterans assessed, 5,965 (2.5%) experienced a fall during their clinical stay. Veterans with a documented fall were generally older, more likely to be male, and predominately White compared to those without a documented fall. The model resulted with an overall accuracy of 76.3%, specificity of 76.2%, and sensitivity of 77.3%. Subgroup performance analysis indicated age-related disparities, revealing a decrease in model accuracy as patient age increased. This trend was further supported by rising Brier scores with increasing age, suggesting reduced performance in older groups. No statistically significant differences in model performance were observed among sex, race, or ethnicity. Discussion: While the model improved upon previously utilized fall-risk tools, it showed age-related performance disparities, suggesting that older Veterans’ EHR patterns may be less consistently represented or may reflect unmodeled clinical differences. Future efforts should include prospective subgroup-specific validation to ensure fair and accurate outcomes for diverse VHA populations. Conclusion: This model improved overall fall-risk prediction but highlights important fairness concerns, particularly a reduced performance in older Veterans. |
N/A | fall_bias_psb_poster.pdf |
| 38 | Younghee | Lee | Fairness and Bias in Biomedical AI/ML: Defining Goals and Putting Them Into Practice | Poster only | Lee | AIChatVet: Implementation and validation of a multi-agent LLM architecture for veterinary clinical decision assistance | Hyeeongjin
Ju, Byungwook Oh, Minkyung Choi, Wongyung Choi, Chansik Kim, Taehoon Ko, Arok Choi, and Younghee Lee* |
College
of Veterinary Medicine Seoul National University Seoul Korea, Department of Medical Informatics College of Medicine The Catholic University of Korea Seoul Korea |
While
Large Language Models (LLMs) have become a key Artificial Intelligence (AI)
model in a broad range of biomedical informatics research, including medical
imaging and EMR data for humans, it is an early stage in veterinary medicine
to apply LLM in animal data. Especially, veterinary data has been intensively
accumulated in pet hospitals (i.e., EMR charts) and utilized for clinical
decision support development and triage prediction with traditional AI
methods (i.e., machine learning or deep learning), but not yet with LLMs.
Therefore, we conducted a pilot study called “AIChatVet”, which demonstrates
the ability of LLM in assisting a veterinarian in veterinary consulting for a
pet. We used veterinary consultation records (n = 280) composed of seven major symptom categories. There were two evaluation systems: a single-agent system and a multi-agent system. The multi-agent system has four agents: Triage Specialist, Veterinary Diagnostician, AI Veterinary Specialist, and Pet Health Educator. The evaluation rubric followed the eleven clinical criteria that originated from the PACES (Practical Assessment of Clinical Examination Skills) scale. Our LLM architectures employed GPT-5 and GPT-5-mini as state-of-the-art and also included a Retrieval-Augmented Generation (RAG) approach with SNOMED CT knowledge graph for clinical reasoning. LLM-generating responses and veterinarians' original answers were evaluated by an automated pipeline with GPT-5 as the evaluator based on that rubric. The GPT-5-based multi-agent system showed the best performance across all evaluation criteria and presented a response comparable to practicing veterinarians. Although the multi-agent system overall outperformed the single-agent, the single-agent was slightly better in differential diagnosis. In conclusion, this pilot study demonstrated the feasibility that LLM may be able to assist the veterinarian in clinical decisions. We will also discuss any benefits of SNOMED CT knowledge graph in the RAG architecture and practical functionality of the multi-agent specialist in this system, and propose further studies |
N/A | |
| 39 | Sanghoon | Lee | General | Poster only | Lee | High wind conditions modulate gut microbial composition in broilers under high temperature stress | Md Mortuza Hossain, Yeonhee Park, Yujin Ko, In-Jung Kim, Jong-Eun Park, Sanghoon Lee | Faculty of Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju, Republic of Korea | High
ambient temperatures impose considerable stress on poultry, affecting their
physiology, metabolism, and performance. The intestinal microbiome serves as
a crucial mediator of these stress responses, influencing host resilience and
adaptation. This study used 16S rRNA gene sequencing to assess how wind speed
modulates the gut microbiota under heat stress conditions. 25-day-old ROSS 308 broilers were exposed for 14 days to three treatments differing in wind speed: LWH (33°C, 60% RH, 0 m/s), MWH (33°C, 60% RH, 1 m/s), and HWH (33°C, 60% RH, 2 m/s). The V3–V4 region of the 16S rRNA gene was amplified with primers 341F/806R and sequenced on an Illumina platform. Reads were processed in QIIME2 (v2024.10.1) using DADA2 for denoising and SILVA 138 for taxonomy assignment. Functional profiles were inferred with PICRUSt2, and differential taxa were identified using LEfSe (Kruskal–Wallis p < 0.05 and LDA ≥ 4.0). Alpha diversity (Shannon, Simpson, Chao1, ACE) and beta diversity (Bray–Curtis, weighted/unweighted UniFrac) were analyzed in R (qiime2R, vegan). Broilers under higher wind speeds exhibited greater body weight and higher richness (Chao1, ACE; p < 0.01) compared with those in low or medium wind groups. Distinct beta-diversity clustering was observed among treatments, and Venn analysis revealed both shared and unique ASVs. LEfSe identified Lactobacillus and Weissella as enriched under low/medium wind, whereas Faecalibacterium, Lachnospiraceae, and Ruminococcaceae were predominant under high wind. PICRUSt2 predicted enhanced carbohydrate fermentation, short-chain fatty acid production, and amino acid biosynthesis pathways in the HWH group. These findings indicate that increased airflow mitigates heat-stress effects by maintaining microbial diversity and metabolic function, thereby improving broiler performance under high temperatures. |
N/A | psb2026_poster_sanghoon_lee.pdf |
| 40 | Andres | Cardenas | General | Poster only | Cardenas | DNA Methylation Biomarkers of Smoking in Leukocytes: Development and Association with Cardiovascular Disease Risk | Andres Cardenas, Dennis Khodasevich, Aladdin H. Shadyab, Adam X. Maihofer, Caroline M. Nievergelt, Robert Wallace, Lisa W. Martin, Anne K. Bozack, Rosemarie de la Rosa, Marcia L. Stefanick and Nora Franceschini | 1.
Department of Epidemiology and Population Health, Stanford University School
of Medicine, Stanford, CA, USA 2. Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA 3. Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, La Jolla, CA, USA 4. Department of Psychiatry, School of Medicine, University of California San Diego, La Jolla, CA, USA 5. Department of Psychiatry, School of Medicine, University of California San Diego, La Jolla, CA, USA 6. Epidemiology and Internal Medicine, College of Public Health, University of Iowa, IA, USA 7. Division of Cardiology, Department of Medicine, School of Medicine and Health Sciences, George Washington University, DC, USA 8. Department of Population Health, Grossman School of Medicine, New York University, NY, USA 9. Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, CA, USA 10. Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA 11. Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA |
Background:
Smoking is one of the strongest predictors of cardiovascular disease (CVD),
with widespread DNA methylation changes in leukocytes consistently observed
among smokers. However, the impact of smoking cessation and biomarkers
associated with intensity and behavior remain limited. Methods: Leukocyte DNAm was measured in whole blood with the EPICv2 array, and self-reported smoking activity, intensity and cessation was ascertained via self-report using standardized questionnaires in 6,005 women from the Women’s Health Initiative Memory Study (WHIMS). We performed epigenome-wide association studies in leukocytes of each smoking phenotype in multivariable regression models adjusted for demographic, health, and genetic variables. We developed epigenetic predictors of each smoking phenotype using elastic net regression and validated the predictors in two independent datasets. We used Cox proportional hazard models to assess the ability of DNAm and self-reported pack-years to predict CVD-related and all-cause mortality. Results: Among current smokers 20,572 FDR-significant CpGs were differentially methylated, 2,199 CpGs for former smokers, 14,725 for smoking pack-years, and 744 individual CpGs for years since smoking cessation. The epigenetic predictors of pack-years and years since cessation were all highly correlated with their respective smoking phenotypes across the training, testing, and validation datasets (Rrange = 0.549 - 0.801). The WHI DNAm pack-years predictor was more strongly associated with CVD-related and all-cause mortality compared to self-reported pack-years. Conclusions: Smoking is strongly related to DNAm in women, and we report both robust validation of previous findings and novel associations. We further introduce epigenetic predictors of three-category smoking, pack-years, and years since cessation, which are highly predictive of their respective phenotypes, and of CVD-related and all-cause mortality. |
https://github.com/cardenasca/PSB_2026/blob/main/PSB%202026%20-%20Poster.pdf | psb_2026__poster.pdf |
| 41 | Kaho | Hitomi | General | Poster only | Hitomi | Gene Expression Prediction in Single Cells via Data-Driven Discrete Dynamical Systems | Kaho Hitomi, Yuki Kato | The University of Osaka, The University of Osaka | Single-cell RNA-sequencing (scRNA-seq) technologies have revealed cell identities from the perspective of gene expression. Time-series scRNA-seq datasets, which capture temporal changes in cell populations, provide new opportunities to investigate cellular dynamics. Dynamical systems offer a promising framework for understanding cellular behavior by characterizing each cell’s state as a gene expression profile. However, developing methods to discover dynamical systems that describe cellular dynamics directly from data remains challenging. Here, we present a data-driven approach for predicting gene expression in future cellular states. Our method does not require governing equations and instead models the system purely from measurement data (e.g., gene expression count data). Assuming linearity of the system as an initial step, we analyzed a time-series single-cell dataset of mouse embryonic fibroblasts in which reprogramming was induced (Schiebinger et al., Cell 2019). Based on dynamic mode decomposition (DMD), we constructed snapshot-pair matrices using optimal transport and computed a linear operator A such that the one-step-ahead gene expression matrix X' is well approximated by X'=AX where X denotes the gene expression matrix. Singular value decomposition of X was used to obtain the low-dimensional representation needed to determine A. The results showed reasonable prediction accuracy in terms of the coefficient of determination for one-step-ahead expression; however, repeated application of the linear operator to the initial values to forecast gene expression in later time points resulted in suboptimal accuracy, indicating substantial room for improvement. We are currently working to enhance prediction accuracy by incorporating not only forward but also backward prediction. In the future, we plan to introduce nonlinearity to further improve model performance using extended DMD or Koopman operators. | https://doi.org/10.7490/f1000research.1120401.1 | |
| 42 | EunSol | Hyun | General | Poster only | Hyun | Phenotype–Genotype Integration Using Jedae-unshiu Reference SNPs for Citrus Line Characterization | Eun-sol Hyun, Kyu-chang Chang, In-Jung Kim | Department
of Artificial Intelligence – Jeju National University, Department of Artificial Intelligence – Jeju National University, Faculty of Biotechnology - College of Applied Life Sciences – Jeju National University |
Citrus
breeding is challenged by seedlessness, polyembryony, and high
heterozygosity, making it difficult to evaluate genetic variation and
phenotypic diversity across lines. Jedae-unshiu is a γ-irradiation–derived
mutant known for its unique peel morphology and fruit-quality traits.
Although previous genomic studies have characterized its SNP and InDel
variation, an integrated phenotype–genotype comparison across multiple citrus
lines has not yet been performed. In this study, we analyzed phenotypic traits and whole-genome SNP variation from seven citrus lines, including Jedae-unshiu, to investigate line-level relationships. Phenotypic measurements collected from 2022 to 2024—covering size, color, peel thickness, sugar content, acidity, and hardness—were normalized to remove annual effects. Principal component analysis and clustering (hierarchical and k-means) revealed three major phenotype-driven clusters. PCA loadings indicated that color traits (L, a, b), acidity, and hardness were the primary factors driving the cluster containing Jedae-unshiu, Araunshiu, and 6b4-16, with sugar content contributing more weakly. Genotype analysis was performed using a pre-filtered SNP set (~660k variants) and IUPAC-encoded genotypes. IBS distances calculated with Jedae-unshiu as the reference showed that Araunshiu, 6b4-16, and Satsuma were relatively close genetically, reflecting a pattern broadly consistent with phenotype-based clustering. These findings demonstrate that phenotypic similarity and genomic proximity converge to reveal underlying structure among citrus lines. The agreement between trait-driven clusters and IBS-based distances suggests that some of the 160 Jedae-unshiu–specific SNP markers may be associated with key fruit-quality traits such as acidity, sugar content, or color characteristics. This integrated framework provides a foundation for identifying candidate markers and supporting data-driven citrus breeding strategies. |
https://f1000research.com/posters/14-1337 | |
| 43 | Jeff | Jaureguy | General | Poster only | Jaureguy | Variant-aware deep learning for chromatin accessibility: QTL detection and rare variant effects in iPSCORE | Jeff Jaureguy, Aaron Ho, Ko-Han Lee, David Laub, Tim Arthur, Jennifer Nguyen, Kelly Frazer, Graham McVicker | UC San Diego, Salk Institute for Biological Studies, UC San Diego, UC San Diego, UC San Diego, UC San Diego, UC San Diego, Salk Institute for Biological Studies | Determining
the functional impact of trait associated variants from genome-wide
association studies (GWAS) remains challenging because most lie in noncoding
regions and are in linkage disequilibrium. Chromatin accessibility QTLs
(caQTLs) provide functional evidence for regulatory variants but have limited
power for rare variants and do not directly generalize to new individuals. In
parallel, sequence-to-function deep learning models trained only on the
reference genome ignore personal genetic variation, constraining their
ability to predict inter-individual regulatory differences. To address these
gaps, we developed a variant-aware training framework that conditions
sequence-based deep learning models on phased whole-genome haplotypes to
predict chromatin accessibility (CA). Using ATAC-seq and genotype data from 133 iPSCORE induced pluripotent stem cell lines, we constructed donor-specific haplotype sequences and adapted two state-of-the-art sequence models, FlashZoi and ChromBPNet, to accept personalized haplotypes for training, benchmarking them alongside AlphaGenome. We evaluated generalization in donor-only and donor-plus-chromosome holdouts and assessed performance using multiple complementary metrics. Across within-peak lead caQTLs, both reference and variant-aware models achieved high Spearman correlations between predicted and observed allelic effects (up to ~0.76), and Tn5 coverage correlations were similar across training regimes. We further quantify allelic imbalance using WASP2 to compare model-based predictions of allele-specific accessibility to observed allelic effects at heterozygous sites. Finally, we benchmark AlphaGenome, FlashZoi, and ChromBPNet on their ability to predict donor-specific chromatin accessibility, caQTL effect sizes, and allelic imbalance on held-out chromosomes and donors. Ongoing work extends the framework to rare variants (minor allele frequency <1%) and cross-cohort validation in iPSCORE and I2QTL. Together, these analyses characterize how different model architectures and training regimes perform for both common and rare variants and help define when incorporating phased personal genomes improves regulatory effect prediction. |
N/A | psb_2026_jaureguy.pdf |
| 44 | Rong | Jiang | General | Poster only | Kai Xia | Cross-Omics Block-missing Imputation via Transfer Learning | Kai Xia, Rong Jiang, Laura Raffield, Yun Li, Kent D. Taylor, Peter Durda, Yongmei Liu, Craig Johnson, Francois Aguet, Kristin Ardlie, Ani Manichaikul, Xiuqing Guo, Rob Gerszten, Clary Clish, Usman Tahir, Jia Wen, Hazel Milla, Anthony Zannas, Alexander Reiner, Bingxin Zhao, Jerry Rotter, Steve S. Rich, Fei Zou | Department
of Psychiatry University of North
Carolina at Chapel Hill, Department of Head and Neck Surgery & Communication Sciences Duke University School of Medicine, Department of Genetics University of North Carolina at Chapel Hill, Department of Biostatistics Department of Genetics University of North Carolina at Chapel Hill, TOPMed Multi-Omics Working Group, Department of Biostatistics Department of Genetics University of North Carolina at Chapel Hill |
Integrative
multi-omics analysis, encompassing genomic, transcriptomic, epigenomic,
proteomic, and metabolomic data, provides unprecedented insights to elucidate
the molecular basis of complex traits. However, a persistent challenge in
cohort-based studies is block-wise missingness, where different omics are
available only for subsets of samples. This fragmented data structure
significantly reduces statistical power and hinders comprehensive biological
interpretation. We present a machine learning (ML)-based framework for imputing block-wise missingness, leveraging large-scale, heterogeneous datasets from resources such as TOPMed, GTEx, and UK Biobank. In this work, we use Elastic Net as an example, which can be generalized to other ML algorithms. Going beyond TWAS or genetic risk prediction methods that focus on genotype-based prediction, our approach proposes three integrated steps utilizing both local and transferred information: (1) constructing cross-omics predictive models between cross-omics layers in observed data; (2) building predictive models of omics outcomes as functions of cross-omics features predicted from pre-trained genetic scores (e.g., genetic-predicted gene expression as predictors for DNA methylation); and (3) efficient fine-tuning layer on predictive functions from two previous steps along with covariates to refine parameters and improve imputation in fully observed subjects. Our results indicate models with transferred knowledge substantially outperforms locally trained models relying solely on information in target dataset. The imputation performance, measured by correlation coefficient between true and imputed values, shows a median improvement of 10%, and up to 10-fold increase for individual omics features. Our findings further suggest pre-trained models derived from single omicsQTL context can be repurposed to impute features across different omics modalities, enabling cross-omics prediction. Overall, our findings underscore the potential of transfer-learning in overcoming limitations in cohort-based multi-omics research by leveraging knowledge from external reference datasets. This approach not only enhances imputation accuracy but also promotes broader data integration, enabling more comprehensive multi-omics analyses across populations and study designs. |
N/A | cross_omics_block_imputation.pdf |
| 45 | Stephen | Kocsis | General | Poster only | Kocsis | Biologically Informed WGCNA Modules Combined with Clinical Questionnaire Data Provide Multimodal Insights Into Fundamental Cellular Dysfunction in hEDS and Mitochondrial Dysregulation in HSD | Kocsis,
Stephen P.C., MSc, Dias, Raquel, PhD, Fairweather, DeLisa, PhD |
Mayo Clinic/University of Florida, University of Florida, Mayo Clinic | Hypermobile
Ehlers-Danlos Syndrome (hEDS) and Hypermobility Spectrum Disorder (HSD)
present overlapping clinical features but remain difficult to distinguish
mechanistically. To identify underlying biological differences, we integrated
whole-blood transcriptomics with clinical questionnaire data in a multimodal
framework. RNA-seq data from Controls (n=10), HSD (n=14), and hEDS (n=12)
were VST-normalized and analyzed with weighted gene co-expression network
analysis (WGCNA) to derive biologically informed co-expression modules.
Modules significantly associated with diagnoses were extracted and refined
using SHAP to identify high-impact molecular features. Questionnaire features
were selected using one-way ANOVA-mutual information overlap and SHAP filtering.
Logistic regression, random forest, and XGBoost models were trained on
questionnaire data alone, module eigengenes (MEs) alone, and the combined
multimodal dataset, with leave-one-out cross-validation (LOOCV) used for
evaluation. The multimodal model improved interpretability and revealed
distinct mechanistic patterns: hEDS was characterized by fundamental cellular
dysfunction, while HSD exhibited signatures of mitochondrial impairment.
Over-representation analysis (ORA) using GSEApy supported these pathway-level
differences in the modules contributing most strongly to the final random
forest model. Together, these findings demonstrate that integrating
biologically informed transcriptomic modules with clinical data can uncover
subtle cellular mechanisms differentiating hEDS and HSD, offering a promising
framework for mechanistic stratification in hypermobility disorders. |
N/A | kocsis_poster_psb_2026.pdf |
| 46 | Alex | Abbas | General | Poster only | Koytiger | Generative genomics accurately predicts future experimental results | Gregory Koytiger, Alice M. Walsh, Vaishali Marar, Kayla Johnson, Max Highsmith, Alex Abbas, Andrew Stirn, Ariel Brumbaugh, Alex David, Darren Hui, Jeffrey Kahn, Sheng-Yong Niu, Liza Ray, Candace Savonen, Stein Setvik, Jeffrey Leek, Robert K. Bradley | Synthesize Bio, Synthesize Bio, Synthesize Bio, Synthesize Bio, Synthesize Bio, Synthesize Bio, Variational AI, Synthesize Bio, Synthesize Bio, Synthesize Bio, Synthesize Bio, Synthesize Bio, Synthesize Bio, Synthesize Bio, Synthesize Bio, Synthesize Bio, Synthesize Bio | AI
models capable of predicting experimental outcomes could accelerate
biomedical research by circumventing fundamental constraints of laboratory
experimentation and clinical trials. We developed GEM-1 (Generate Expression
Model-1), a generative genomics framework that models the diversity of
real-world gene expression experiments and accurately predicts future
experimental results. We trained GEM-1 using 470,691 bulk RNA-seq samples from 24,715 datasets in the NCBI Sequence Read Archive, spanning diverse tissues, diseases, and over 18,000 distinct perturbations. An automated metadata agent harmonized fragmented experimental descriptions using large language models. GEM-1 employs a deep latent variable model that partitions experimental metadata into biological, technical, and perturbational components, using pretrained foundation model embeddings to enable generalization to novel perturbations. Testing on holdout data deposited after training, GEM-1 achieved pseudoreplicate-level accuracy (pearson correlation of gene rank across samples, r_gene, of 0.65-0.75) for previously observed contexts and maintained strong performance for completely novel genetic (r_gene = 0.58-0.63) and chemical perturbations (r_gene = 0.52-0.68). We extended GEM-1 to single-cell data using 41.5 million cells, achieving comparable performance to established models for cell type annotation while enabling interpretable biological feature inference. We demonstrated clinical utility by generating synthetic cohorts that accurately recapitulated key biological phenomena in SLE (200 samples correctly modeling lupus interferon dysregulation and cancer (10,523 samples exhibiting known molecular features of cancer). This approach represents a significant advance toward AI systems that can predict experimental outcomes before physical experiments are conducted, shortcutting fundamental limitations in experimental speed and clinical trial recruitment and potentially revolutionizing drug development and personalized medicine. |
https://doi.org/10.7490/f1000research.1120377.1 | psb_poster.pdf |
| 47 | Michael | Larsen | General | Poster only | Larsen | Extraction of Human Phenotype Ontology (HPO) Concepts from Clinical Notes Utilizing Large Language Models (LLM) with Model Context Protocol (MCP) | Michael Larsen, Nephi Walton | University of Utah School of Medicine, University of Utah School of Medicine | Background:
Accurate extraction of Human Phenotype Ontology (HPO) terms from clinical
notes is essential for variant prioritization and subsequent genetic
diagnosis. Large language models (LLMs) often struggle to balance precision,
hallucination avoidance, and fidelity to ontology mappings. We hypothesized
that grounding LLMs through the Model Context Protocol (MCP)—a standardized
framework for integrating external tools—would simultaneously enhance all
three metrics without requiring model fine-tuning. Methods: We evaluated four frontier LLMs with reasoning capabilities (Claude Sonnet 4.5, Gemini Pro 2.5, GPT-5.1, and Grok 4.1) on the task of extracting HPO terms from simulated clinical notes. We compared two experimental conditions: a baseline "No Tools" approach versus a "With Tools" approach. In the tool-enabled condition, models were granted access to the HPO database via an MCP server (or equivalent function calling), allowing them to search for and verify term IDs in real-time. We performed 50 iterations per model per condition (N=400 total runs. Results: The integration of ontology tools yielded significant improvements across all models. For instance, Grok demonstrated a dramatic reduction in mapping errors from 40% to 0% (p < 10-37) while improving recall accuracy from 46% to 72% (p < 10-44). Crucially, tool use effectively eliminated hallucinations across the board, most notably reducing Claude’s rate from 24% to <1% (p < 10-19). Statistical analysis confirmed that tool access significantly reduced mapping errors and increased accuracy for every model tested (p < 0.05). Conclusions: These findings demonstrate that "knowledge grounding" via MCP is superior to relying solely on an LLM's internal training weights for clinical extraction. Even without task-specific fine-tuning, allowing models to query the ontology significantly reduces reliability issues. We propose that MCP-based retrieval should be adopted as a standard architectural requirement for clinical LLM pipelines to ensure data integrity in genomic medicine. |
N/A | larsen_michael_pbs_poster.pdf |
| 48 | Steven | Brenner | General | Poster only | Lin | Variant Impact Predictor database (VIPdb) version 3 will be enabled by automated curation assistance | Yu-Jen Lin, Anjali Sujithan, Steven E. Brenner | University of California, Berkeley | Variant
interpretation is essential for identifying patients’ disease-causing genetic
variants amongst the millions detected in their genomes. Hundreds of Variant
Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs),
have been developed, spanning diverse variant types, modeling strategies, and
use cases. To facilitate the exploration of VIP options, we previously
released VIPdb version 2, in which we manually curated 407 VIPs with
standardized metadata on variant classes, methodological features,
availability, and CAGI assessments. In VIPdb version 3, we expand the
metadata to include further features such as training data, capacity to
predict gain of function, ability to evaluate nonsense variants, and
prediction objective (e.g., clinical pathogenicity, stability, enzyme
activity, splicing regulation). VIPdb version 3 is being constructed with an automated curation-assistance framework that enables systematic updates. First, a paper selection module proposes articles to be included in VIPdb. The module retrieves PubMed abstracts of candidate papers, converts each abstract into a feature vector, and uses a linear SVM classifier to identify candidate VIP publications. Second, a curation module proposes values for each VIPdb categorical field (e.g., training data, gain-of-function, nonsense, authors, licenses). The curation module retrieves full text articles, segments and embeds documents in a vector store, and performs question-conditioned semantic search to identify passages relevant to each categorical field. These passages are then passed to a Llama-based large language model in a retrieval-augmented generation setting, which returns proposed field assignments, confidence scores, and concise rationales, for review by curators. We will use this framework to curate newly published VIPs and new fields into VIPdb version 3, which will be made available via the VIPdb website at https://genomeinterpretation.org/vipdb |
N/A | 260103_brennerse_psb_vipdb_poster_vipdb_v2.pdf |
| 49 | Siru | Liu | General | Poster only | Liu | Flawed Questions, Flawed Logic: A Taxonomy and Correction of Reasoning Errors in Medical LLMs Using Sparse Autoencoders | Siru Liu, Jialin Liu, Adam Wright | Department
of Biomedical Informatics, Vanderbilt University Medical Center, Nashville,
TN, USA Department of Computer Science, Vanderbilt University, Nashville, TN, USA Department of Otolaryngology-Head and Neck Surgery, West China Hospital, Sichuan University, Chengdu, China |
Objective:
Investigate failure modes of reasoning-based Large Language Models (LLMs) in
medicine by auditing benchmark quality, building a clinically-informed
taxonomy of reasoning errors, and testing a mechanistic intervention. Materials and Methods: We evaluated OpenAI o1 on the MedQA (n=1,273) and cross-referenced incorrect answers against source materials to flag benchmark flaws. For 37 confirmed model failures, we developed a reasoning error taxonomy through inductive coding and validated it on three additional LLMs (GPT-4.5-preview, o3-mini, and DeepSeek-R1). We developed a sparse autoencoder (SAE) to isolate reasoning-specific features, steered these features with a positive bias, and measured effects on accuracy and reasoning length across MedQA, MedMCQA, and PubMedQA. Hallucination in reasoning traces was evaluated with an LLM-as-a-judge. Results: Forty-one percent of initial errors reflected benchmark problems, including missing figures (22%) and ambiguities (19%). Our taxonomy classified failures into four categories: Information Synthesis, Therapeutic Decision, Diagnostic Reasoning, and Foundational Principle Errors, revealing distinct failure profiles across models. Steering reasoning-specific features significantly improved accuracy across all benchmarks, but increased reasoning length. We identified five functional categories whose roles aligned with the taxonomy. Discussion: Benchmark flaws account for some LLM “errors,” motivating dynamic, multimodal, version-controlled medical benchmarks. Linking a clinically grounded error taxonomy to SAE-based steering improves accuracy, pointing toward process-aware training. Conclusion: The reliability of medical LLMs is limited by flawed evaluations and recurrent reasoning failures. Isolating and steering reasoning-specific features via SAEs could mechanistically correct these patterns, advancing interpretable, “glass-box” clinical AI. |
N/A | taxonomy_psb_pdf.pdf |
| 50 | Xueying | Liu | General | Poster only | Liu | Integrating Generative AI Models into Computational Pipelines for Scalable Interpretation of Single-Cell Gene Expression Programs | Xueying Liu, Alessandro Davini, Paul Geeleher | Department
of Computational Biology, St. Jude Children's Research Hospital |
The
analysis of large-scale single-cell RNA sequencing (scRNA-seq) data often
yields tens or hundreds of gene expression programs (GEPs), each capturing a
complex biological pattern. While our scalable, NMF-based framework, CSI-GEP
(Consensus and Scalable Inference of Gene Expression Programs), enables the
discovery of GEPs across millions of cells, the interpretation and annotation
of these programs remain a major bottleneck requiring a breadth of biological
expertise that is often intractable for a human. To address this bottleneck, we developed a generative AI-assisted annotation system that integrates large language models (LLMs) into the CSI-GEP workflow. Through structured API calls, the system automatically summarizes each GEP by its top-loading genes, pathway enrichments, and the characteristics of top-expressing cells, including Numbat-inferred malignancy probabilities. It then prompts the LLM with context-specific instructions to infer the potential biological identities and malignancy status of cells expressing each GEP. The design emphasizes automation, robustness, and transparency, with all prompts and outputs recorded for traceable validation. We benchmarked the approach by re-analyzing single-cell RNA-seq data generated from a genetically engineered mouse model (GEMM) of neuroblastoma previously annotated by human experts. The LLM produced annotations largely consistent with the humans but reclassified several GEPs from tumor-specific to epithelial non-malignant. We validated the LMMs annotation by probing for the expression of firefly luciferase, a cancer-specific reporter in the GEMM design, confirmed that these programs were indeed non-cancerous, and that the human-expert annotations had been erroneous. Beyond scalability and speed, our approach demonstrates that LLMs can reveal overlooked biological insights by leveraging their broad, cross-domain knowledge and human-like reasoning patterns. This hybrid computational framework highlights a path toward more interpretable, automated, and insight-driven single-cell analysis. |
N/A | psb_2026_poster.pdf |
| 51 | Onur | Mutlu | General | Poster only | Mutlu | Storage-Centric Systems for Genomics and Metagenomics | Nika Mansouri Ghiasi, Onur Mutlu | ETH Zurich, ETH Zurich | Analyzing
and storing the exponentially growing volumes of genomic and metagenomic data
pose unprecedented challenges in terms of performance, energy, cost, and
sustainability. In our research, we focus on designing novel computing
systems (e.g., storage-centric designs and algorithm-architecture co-designs)
to fundamentally address these challenges. First, we demonstrate how to reduce both data movement and computational overheads of genomics and metagenomics applications by analyzing large amounts of low-reuse genomic data within the storage system, where the data originally resides. To this end, we introduce (1) GenStore, an in-storage processing system designed for genome sequence analysis, with new low-cost and accurate in-storage filters, (2) MegIS, an in-storage processing system designed for metagenomic analysis, which effectively leverages and orchestrates processing inside and outside the storage system, (3) GRAINS, a storage-aware algorithm-architecture co-design to accelerate graph-based genome analysis in the storage system, and (4) MARS, which performs processing-in-memory within the storage system to accelerate raw signal genome analysis. These storage-centric designs significantly improve the performance and energy consumption of genomics and metagenomics applications. They also reduce the hardware needed to process such workloads, making genomics and metagenomics more accessible for wider adoption and helping sustainability. Second, we demonstrate how to address a major bottleneck that greatly limits the benefits of genomics accelerators: the data preparation bottleneck, where genomic data is stored in compressed form and needs to be decompressed and formatted first before an accelerator can operate on it. To this end, we introduce SAGe, an algorithm-architecture co-design for highly compressed storage and high-performance access of large-scale genomic data. We leverage properties of genomic data to co-design SAGe’s algorithm and architecture, such that the highly compressed data can be interpreted by lightweight hardware and rapidly prepared for analysis. |
https://f1000research.com/posters/14-1334 | |
| 52 | Sarah | Nace | General | Poster only | Nace | Host cytokine and chemokine profiles linked to survival in mice challenged with Burkholderia pseudomallei | Sarah Nace, Christopher Cote, Sergei Biryukov, Kristen Wilding, Jessica Kubicek-Sutherland, Katy Martinez | Physical
Chemistry & Applied Spectroscopy Group- Chemistry Division- Los Alamos
National Laboratory, United States Army Medical Research Institute of Infectious Diseases (USAMRIID)- Bacteriology Division, United States Army Medical Research Institute of Infectious Diseases (USAMRIID)- Bacteriology Division, Theoretical Biology and Biophysics Group- Theoretical Division- Los Alamos National Laboratory, Physical Chemistry & Applied Spectroscopy Group- Chemistry Division- Los Alamos National Laboratory, Information Systems and Modeling Group- Analytics, Intelligence and Technology Division- Los Alamos National Laboratory |
Burkholderia pseudomallei is a Gram-negative environmental bacteria that causes melioidosis and is classified as a Tier 1 select agent and biosafety level 3 pathogen. There are currently no licensed vaccines available for B. pseudomallei. In collaboration with the United States Army Medical Research Institute of Infectious Diseases (USAMRIID), two experiments were performed to directly compare several vaccine platforms for immunization against B. pseudomallei using a C57BL/6 mouse model in which mice were challenged with B. pseudomallei by a whole-body aerosol route. Here, we present an analyses of these experimental datasets to examine the reproducibility and immunological outcomes of these two experiments. The first dataset presented analytical challenges due to low mouse survival rates (≤40%) following a high challenge dose of bacteria (8 LD50) and lost ear tags used for tracking individual mice. The second dataset presented a lower challenge dose (5 LD50) yielding higher survival rates (60-70%) and more closely tracked ear tags, supporting a more comprehensive analysis. The chemokines and cytokines that may be responsible for these differences are: INF-γ, IL-15, CCL7, GM-CSF, CXCL-10, IL-17A, IL-18, IL-4, CCL-2, LIF. With additional analysis, these chemokines and cytokines may be markers to consider for survival within these challenge studies. | N/A | burkposteredited.snace.pdf |
| 53 | Matteo | Pellegrini | General | Poster only | Pellegrini | Inferring protein domain functional similarities from phylogenetic distributions | Matteo
Pellegrini, Lukasz Salwinski, Thomas Holton, Stephanie Trinh, Sagi Snir |
Institute
of Genomics and Proteomics, University of California Los Angeles, CA 90095, USA, Department of Evolutionary and Environmental Biology and The Institute of Evolution, University of Haifa Haifa, 31905 ISRAEL |
Protein domains are the functional building blocks of proteins and often act in concert to carry out complex biological roles. To identify domains with related functions, we measured the counts of over 20,000 Pfam domains across 10,000 species, constructing domain-wise phylogenetic profiles. By comparing the counts vectors for pairs of domains, we quantified domain similarity and used Uniform Manifold Approximation and Projection (UMAP) to embed the resulting similarity relationships in two dimensions. The resulting UMAP visualization reveals clusters of domains that are co-distributed across similar subsets of species. Many of these clusters correspond to known biological complexes, such as the bacterial flagellum and the ribosome, demonstrating the utility of this approach for recovering functionally related domain groups. Our analysis represents a large-scale extension of the classical phylogenetic profiling method and provides a promising framework for identifying novel domain associations as more species genomes become available. | N/A | pellegrini_psb.pdf |
| 54 | Ruijiang | Li | General | Poster only | Ruijiang | A generalizable multiple instance learning framework for computational pathology | Xiangde Luo, Jinxi Xiang, Yuanfeng Ji, Ruijiang Li | Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA, USA | Computational pathology holds substantial promise for improving diagnosis and guiding treatment decisions. Recent pathology foundation models enable the extraction of rich patch-level representations from large-scale whole-slide images (WSIs), but current approaches for aggregating these features into slide-level predictions remain constrained by design limitations that hinder generalizability and reliability. Here we present nnMIL, a simple yet broadly applicable multiple-instance learning framework that connects patch-level foundation models to robust slide-level clinical inference. nnMIL introduces random sampling at both the patch and feature levels, enabling large-batch optimization, task-aware sampling strategies, and efficient and scalable training across datasets and model architectures. A lightweight aggregator performs sliding-window inference to generate ensemble slide-level predictions and supports principled uncertainty estimation. Across 40,000 WSIs encompassing 35 clinical tasks and four pathology foundation models, nnMIL consistently outperformed existing MIL methods for disease diagnosis, histologic subtyping, molecular biomarker detection, and pan-cancer prognosis prediction. It further demonstrated strong cross-model generalization, reliable uncertainty quantification, and robust survival stratification in multiple external cohorts. In conclusion, nnMIL offers a practical and generalizable solution for translating pathology foundation models into clinically meaningful predictions, advancing the development and deployment of reliable AI systems in real-world settings. | N/A | poster.pdf |
| 55 | Jaini | Shah | General | Poster only | Shah | Asking the right QUESTIONS: A framework for modeling temporal signal in omics networks | Carly Bobak, Jaini Shah, James O'Malley | Research
Computing and Data Services & Biomedical Data Science, Information,
Technology & Consulting, Dartmouth College, Hanover, NH 03784, USA Geisel School of Medicine, Dartmouth College, Lebanon, NH 03755, USA The Dartmouth Institute for Health Policy and Clinical Practice & Department of Biomedical Data Science, Lebanon, NH 03755, USA |
Biological
systems change continuously, but many network-based analyses treat each
timepoint as an isolated snapshot. Insights about how early gene activity
structures later transcriptional programs can be lost without a proper way to
connect networks across time. To distinguish this gap, we adapted a spillover
framework originally used in social network analysis and applied it to
temporal genomics data. This approach, called QUESTIONS, models how early
gene activity diffuses through evolving co-expression networks. The main initiative of QUESTIONS is a transition matrix built from each timepoint’s co-expression structure and influence. This matrix represents local diffusion of “infuence”, and a parameter, alpha, determines the balance between signals from previous stages and new activity that emerges at each timepoint. Iterative propagation creates spillover scores that quantify temporal continuity at the gene and module level. An open-source R package implements all steps, from network pruning to normalization and propagation, making this framework more accessible for more downstream biological interactions. When applied to real omics time-series, QUESTIONS recovered temporal signal even when module membership changed greatly. Early hub genes showed high downstream spillover, and module-level spillover trends aligned with changes in eigengene expression and pathway activity. By integrating temporal relationships, QUESTIONS elongates static co-expression analysis into a dynamic modeling framework, giving a new lens for understanding how regulatory programs uncover over time. |
N/A | psb_poster.pdf |
| 56 | Moshe | Steyn | General | Poster only | Steyn | Addiction by Design: Detection and Role of Phage Toxin-Antitoxin Systems in Microbial Ecosystems | M.D. Steyn, Ruonan Wu, David Baltrus, Jason McDermott | University of Arizona and Pacific Northwest National Laboratory, Pacific Northwest National Laboratory, University of Arizona, Pacific Northwest National Laboratory | Prokaryotic toxin–antitoxin (TA) systems consist of a stable toxin neutralized by a short-lived antitoxin, which degrades upon cassette loss to release the toxin and kill the cell. These cassettes are best studied in plasmids, where they commonly contribute to antibiotic resistance, but they are also widespread in temperate bacteriophages, where their functions remain poorly understood. We developed TAfinder3D, an extension of TAfinder2, to broaden TA system annotation through improved remote-homology detection. Using several curated phage databases, we show that the vast majority of complete TA systems occur in temperate phages, making them a robust marker of lifestyle. We then integrate KEGG and Pfam annotations from more than 15,000 high-quality phage genomes with TA system presence/absence to characterize the ecological roles of TA systems in bacteriophage. | N/A | |
| 57 | Peter | Washington | General | Poster only | Sun | Engineering and Secure Implementation of a Two-Player Video Game Diagnostic for ADHD and Autism | Yinan
Sun, Aditi Jaiswal, Aayush Nandkeolyar, Kaitlyn Dunlap, Katy Tarrit, Dennis P
Wall, Peter Washington |
Ohio State University, University of Hawaii at Manoa, University of California - San Francisco (UCSF), Stanford University | We
describe how a two-player, browser-based assessment platform (SocialPlai)
that records short video and audio clips for ADHD/autism research was
engineered to run inside UCSF’s HIPAA-aligned cloud environment. The
deployment of the website, SocialPlai.UCSF.edu, relies on a segregated AWS
account governed by UCSF security controls, inbound HTTPS via enterprise
load-balancing and network translation, strict perimeter rules,
institution-issued TLS, and service integrations to institutional middleware.
We distill these choices into a reusable blueprint that other academic
centers can adapt to ship similar data-collecting digital diagnostics
safely. |
https://doi.org/10.7490/f1000research.1120302.1 | poster_peter.pdf |
| 58 | Peter | Washington | General | Poster only | Sun | Engineering and Secure Implementation of a Two-Player Video Game Diagnostic for ADHD and Autism | Yinan
Sun, Aditi Jaiswal, Aayush Nandkeolyar, Kaitlyn Dunlap, Katy Tarrit, Dennis P
Wall, Peter Washington |
The Ohio State Comprehensive Cancer Center, University of Hawaii at Manoa, University of California San Francisco (UCSF), Stanford University | We
describe how a two-player, browser-based assessment platform (SocialPlai)
that records short video and audio clips for ADHD/autism research was
engineered to run inside UCSF’s HIPAA-aligned cloud environment. The
deployment of the website, SocialPlai.UCSF.edu, relies on a segregated AWS
account governed by UCSF security controls, inbound HTTPS via enterprise
load-balancing and network translation, strict perimeter rules,
institution-issued TLS, and service integrations to institutional middleware.
We distill these choices into a reusable blueprint that other academic
centers can adapt to ship similar data-collecting digital diagnostics
safely. |
https://f1000research.com/posters/14-994 | poster_peter.pdf |
| 59 | Shosuke | Suzuki | General | Poster only | Suzuki | Predicting alternative protein conformations by perturbing pair representations | Shosuke Suzuki, Toshiyuki Amagasa | Graduate School of Science and Technology and Center for Computational Sciences, University of Tsukuba | Deep learning models such as AlphaFold2 achieve accurate protein structure predictions but usually return only one static conformation, even though many proteins are dynamic and can adopt several states. In this poster, we present a simple and lightweight way to sample alternative conformations within the Boltz 2 framework by globally rescaling an internal pair representation that encodes residue pair couplings. We multiply this representation by a single scalar parameter beta, which changes how strongly the model enforces sequence structure couplings without retraining the network or changing the input multiple sequence alignment. We show that this pair representation perturbation can improve the coverage of functionally relevant states for hinge like proteins and transporters, and can recover both folds for several fold switching proteins, while some difficult cases remain. These results suggest that small and interpretable changes to internal representations can be a practical way to explore conformational diversity with current deep learning protein structure predictors. | https://doi.org/10.7490/f1000research.1120397.1 | |
| 60 | Craig | Teerlink | General | Poster only | Teerlink | Identity-by-descent mapping for rare variants in biobank-scale datasets | Craig C. Teerlink, Julie A. Lynch, Josephine P. Johnson, Marijana Vujkovic, Kyong-Mi Chang, Scott Damrauer, Phil Tsao, Alun Thomas | VA
Informatics and Computing Infrastructure (VINCI), VA Salt Lake City Health
Care System, Salt Lake City, UT, USA, Department of Internal Medicine,
University of Utah School of Medicine, Salt Lake City, UT, USA, Corporal
Michael J. Crescenz VA Medical Center, Philadelphia, PA, USA, Department of
Medicine, University of Pennsylvania Perelman School of Medicine,
Philadelphia, PA, USA, Department of Epidemiology, University of Pennsylvania
Perelman School of Medicine, Philadelphia, PA, USA, Department of Surgery,
University of Pennsylvania Perelman School of Medicine, Philadelphia, PA,
USA, Palo Alto Epidemiology Research and Information Center for Genomics, VA
Palo Alto, CA, USA, Department of Medicine, Stanford University School of
Medicine, Stanford, CA, USA |
Background:
We have implemented an identity-by-descent (IBD) approach to identify rare,
causal variants for common diseases by creating an approximate genealogy of
the Million Veterans Program (MVP) biobank. This IBD method is based on the
observation that long-range haplotype sharing between study subjects
necessitates inheritance from a recent common ancestor. Biobank genomic
datasets contain hundreds of thousands of multi-generational extended
pedigrees. Furthermore, IBD inference offers a more statistically powerful
approach to identify rare variants contributing to disease than case-control
association. Methods: Our IBD mapping strategy specifies all pair-wise long, shared haplotypes (3 cM+) observed from genetic marker data (~650K MVP subjects) and clusters them via a graphical modeling algorithm. Then, phased whole genome sequences (WGS) (~102K MVP subjects) belonging to haplotype carriers are consulted to identify rare coding variants. These variants are then tested in external biobank datasets (Penn Medicine Biobank (PMBB) and UKBiobank (UKB)) for validation using gene-level burden tests (SAIGE software). Results: We present the results from this approach using two phenotypes: amyloidosis (proof of concept) and cirrhosis (exploratory). 1008 MVP subjects had a diagnosis of amyloidosis, and the nonsynonymous variant chr18:31595157:A:G in TTR gene (CADD score 20.5) had RVsharing p = 1.3e-3. Amyloidosis is a useful benchmark of our method because a subset of amyloidosis subjects in MVP is explained by rare variants in the TTR gene. 12,983 MVP subjects had a diagnosis of cirrhosis. Out of 161 candidate variants emerging from our analysis of MVP, the nonsynonymous variant chr17:29121027:G:A in MYO18A (CADD score 26.2) had RVSharing p=3.6e-4 in MVP, gene-level association p=3.1e-2 in PMBB and p=3.2e-2 in UKB, indicating variants in MYO18A may convey risk for cirrhosis. Conclusion: IBD-based gene discovery methods offer a statistically powerful approach to identify rare variants in biobank datasets and strongly complement other approaches being applied to such datasets now. |
N/A | psb_poster_ibd_mapping_20251201.pdf |
| 61 | Stephen | Parker | General | Poster only | Vu | PanKbase: an integrated knowledgebase platform for human pancreatic and diabetes research | Ha
T.H. Vu1, Han Sun2, Seth A. Sharp2, Parul Kudtarkar3, Liza Brusman3, Yiqun
Wang1, Yuanhao Huang1, Runbo Mao1, Fan Feng4, Amanda K. Huber1, Sierra
Corban3, Ying Sun3, Sara Narayanaswamy3, Alex Shilin5, Julie Jurgens5,
Dongkeun Jang5, Cassie C. Robertson1, Shristi Shrestha4, Thomas Bate4, Trang
Nguyen5, Patrick Smadbeck5, Mackenzie Brandes5, The PanKbase Consortium,
Jason Flannick5, Noel Burtt5, Shuibing Chen6, Jie Liu1, Jean-Philippe
Cartailler7, Benjamin F. Voight8, Michael L. Stitzel9, Marcela Brissova4,
Anna L. Gloyn2, Kyle Gaulton3, Stephen C.J. Parker1 |
1University
of Michigan, Ann Arbor, 2Stanford University, 3University of California, San Diego, 4Vanderbilt University Medical Center, 5Broad Institute of MIT and Harvard, 6Weill Cornell Medicine, 7Vanderbilt University, 8University of Pennsylvania, 9The Jackson Laboratory for Genomic Medicine |
Single-cell
RNA sequencing (scRNA-seq) from pancreatic islets can help unravel how
diabetes develops, but limited sample availability and inconsistencies in
metadata, experimental methods, and computational analyses present challenges
for combining data across studies. Our single-cell atlas integrates data from 191 samples collected from 140 donors (59 female, 81 male), representing the most extensive islet single-cell map to date. The atlas features five disease phenotypes: 69 non-diabetic, 12 autoantibody-positive, 11 pre-diabetic, 12 with T1D, and 36 with T2D, plus samples subjected to treatments including SARS-CoV-2 exposure and cytokine stimulation. Rigorous quality control and integration steps accounted for variables including age, sex, BMI, and sequencing chemistry. Our map comprises 448,935 cells grouped into 13 populations, with alpha and beta cells representing 43.3% and 26.8% respectively. Using latent variable analysis to disentangle biological variation from technical noise, we identified 1,805 unique genes differentially expressed (FDR<5%) between individuals with or without T1D across six cell populations. Beta cells showed the greatest transcriptomic changes with 970 differentially expressed genes, enriched in pathways associated with "Type I diabetes mellitus" and "oxidative phosphorylation." Notably, 697 of these genes represent novel findings not previously reported. One particularly noteworthy novel gene is OPLAH, involved in glutathione metabolism, though its specific role in T1D pathogenesis remains to be elucidated. Network analysis of T1D up-regulated beta cell genes identified seven distinct sub-networks enriched for biological processes including immune response-regulating signaling pathways, antigen processing via MHC class I, and WNT signaling. Hub genes including STAT1, SIN3A, and SYT1 may represent pivotal drivers in T1D pathogenesis. This comprehensive, publicly available atlas (www.pankbase.org) serves as a powerful platform to advance hypothesis-driven research into diabetes pathophysiology. |
https://doi.org/10.5281/zenodo.15596314 | abstractpsb.pdf |
| 62 | Clayton | Wandishin | General | Poster only | Wandishin | A Multi-‘Omics Integration Workflow for BioProcess Modeling and Analysis | Clayton
Wandishin, Logan Running, Patrick Perkins, Dana Motabar, Eric Zinn, Amr Ali,
and Delia Lyons |
PDS&T,
AbbVie Bioresearch Center, Worcester, MA, United States |
As bioprocess analysis continues to advance, incorporation of next-generation data modalities such as lipidomics, proteomics, and transcriptomics, alongside traditional measures, has garnered increased attention and focus from the field. In the age of A.I., we are suddenly poised to realize the advantages from the richness of a data package such as this where previously these have often proved too cumbersome to implement effectively. However, integrating these diverse datasets presents significant challenges due to their differences in format, size, annotation depth, scale, missing data patterns, and temporal resolution. Moreover, effective multi-‘omics integration requires robust data preprocessing steps including normalization, batch correction, alignment of timepoints, handling of missing values, and duplicate uncertainty. To enable meaningful biologic interpretation, best practice data should also be carefully quality controlled and standardized prior to integration, minimizing the technical variability. Here we present a comprehensive technical workflow that addresses these items in the multi-‘omics space, as well as merges the final results with traditional process data in order to enable the correlative analysis between molecular changes and process performance. This holistic approach aims to support predictive modeling, pathway analysis, and identification of critical control points in bioprocess optimization. | N/A | psb_multiomics_workflow_poster_v3.pdf |
| 63 | Shriram | Nallamshetty | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Poster only | Cogill | Large Language Model-enabled Natural Language Processing Pipelines for Resource-Efficient and Scalable Classification of Coronary Disease Severity in Unstructured Cardiac Catheterization Patterns. | Steven Cogill, PhD, Shriram Nallamshetty, MD, Kent Heberer, PhD, Mei-Chung Shih, PhD, Ana Maldanado, PhD, Ying Q. Chen, PhD, Adam Bress, PhD2, Julie A. Lynch, PhD, Thomas Meredith, MD, Celina Yong, MD, Jennifer S. Lee, MD, PhD, MBA | Steven
Cogill: VA Palo Alto Healthcare System, Palo Alto, CA; Veterans Affairs
Cooperative Studies Program Leveraging Electronic Health Information to
Advance Precision Medicine( VA LEAP) Initiative; VA Palo Alto Cooperative
Studies Coordinating Center, Palo Alto, CA. Shriram Nallamshetty: VA Palo Alto Healthcare System, Palo Alto, CA; Veterans Affairs Cooperative Studies Program Leveraging Electronic Health Information to Advance Precision Medicine( VA LEAP) Initiative; Division of Cardiovascular Medicine, Stanford School of Medicine, Stanford CA. Kent Heberer: VA Palo Alto Healthcare System, Palo Alto, CA; Veterans Affairs Cooperative Studies Program Leveraging Electronic Health Information to Advance Precision Medicine( VA LEAP) Initiative; VA Palo Alto Cooperative Studies Coordinating Center, Palo Alto, CA. Mei-Chung Shih: VA Palo Alto Healthcare System, Palo Alto, CA; Veterans Affairs Cooperative Studies Program Leveraging Electronic Health Information to Advance Precision Medicine( VA LEAP) Initiative; VA Palo Alto Cooperative Studies Coordinating Center, Palo Alto, CA; Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA. Ana Maldanado: VA Palo Alto Healthcare System, Palo Alto, CA; Veterans Affairs Cooperative Studies Program Leveraging Electronic Health Information to Advance Precision Medicine( VA LEAP) Initiative; VA Palo Alto Cooperative Studies Coordinating Center, Palo Alto, CA. Ying Q. Chen: A Palo Alto Healthcare System, Palo Alto, CA; Veterans Affairs Cooperative Studies Program Leveraging Electronic Health Information to Advance Precision Medicine( VA LEAP) Initiative; VA Palo Alto Cooperative Studies Coordinating Center, Palo Alto, CA; Stanford Prevention Research Center, Stanford, CA. Adam Bress: Veterans Affairs Cooperative Studies Program Leveraging Electronic Health Information to Advance Precision Medicine( VA LEAP) Initiative; VA Salt Lake City Healthcare System, Salt Lake City, UT; Department of Population Health Sciences, University of Utah School of Medicine, Salt Lake City, Utah. Julie A. Lynch: Veterans Affairs Cooperative Studies Program Leveraging Electronic Health Information to Advance Precision Medicine( VA LEAP) Initiative; VA Salt Lake City Healthcare System, Salt Lake City, UT; Division of Epidemiology, University of Utah School of Medicine, Salty Lake City, UT. Thomas Meredith: Division of Cardiovascular Medicine, Stanford School of Medicine, Stanford CA. Celina Yong: VA Palo Alto Healthcare System, Palo Alto, CA; Division of Cardiovascular Medicine, Stanford School of Medicine, Stanford CA. Jennifer S. Lee: VA Palo Alto Healthcare System, Palo Alto, CA; Veterans Affairs Cooperative Studies Program Leveraging Electronic Health Information to Advance Precision Medicine( VA LEAP) Initiative; VA Palo Alto Cooperative Studies Coordinating Center, Palo Alto, CA; Division of Endocrinology, Gerontology, and Metabolism, Stanford School of Medicine. |
BACKGROUND.
Coronary artery disease (CAD) is a heterogeneous disorder. Differentiating
obstructive CAD from non-obstructive coronary artery disease (NOCAD), which
is present in over one third of patients undergoing cardiac catheterization,
has important implications for long-term management and prevention.
Currently, classifying NOCAD from obstructive CAD requires manual review of
unstructured text data in cardiac catheterization reports. Objective. Develop computationally efficient and scalable large language model (LLM)-enabled natural language processing (NLP) pipelines for automated classification of CAD severity on cardiac catheterization reports in the Veterans Healthcare Administration (VHA). METHODS. Annotated training and test sets (80:20 ratio) were developed using cardiac catheterization reports (1/1/2024 to 12/31/2024) from a single VA medical center. The reports were independently reviewed by three cardiologists and classified as one of three patterns: (1) Normal, (2) NOCAD, or (3) Obstructive CAD. We investigated three resource-efficient approaches: (1) A zero-shot small LLM classification open-source model (Llama 3.2-3b-Instruct), (2) An automated large commercial LLM-generated NLP pipeline with no human input, and (3) An LLM-generated NLP algorithm with a NLP researchers and cardiologists in the loop to optimize the pipeline. Results. The LLM-generated NLP with researcher input demonstrated the best performance with 99.2% and 100% classification accuracy on training (123/124) and test sets (25/25), respectively. This approach was time-efficient and computationally flexible without the need for intensive GPU resources. For our first iteration training set (n=38), the NLP pipeline without human intervention achieved an accuracy of 76.3%, and the zero-shot approach was not capable of detecting nuance and only achieved an accuracy of 28.9%. CONCLUSION AND RELEVANCE. LLM-enabled NLP pipelines with human input can classify CAD disease severity in unstructured catheterization reports with high accuracy and fidelity in a resource-efficient manner. These automated classifier tools can enable key advances by facilitating the identification of complex clinical phenotypes from unstructured data. |
N/A | llmnlpcath_poster_1212025_final.pdf |
| 64 | Timothy | O'Connor | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Poster only | Dindu | Predicting Risk of Emotional Hunger Using Machine Learning-Assisted Gene Risk Scores | Srikanth
Dindu Elle Kolkin Diego Anazco Andres Acosta Timothy O’Connor |
Phenomix
Sciences, Menlo Park, CA Phenomix Sciences, Menlo Park, CA Mayo Clinic, Rochester MN Mayo Clinic, Rochester MN Phenomix Sciences, Menlo Park, CA |
The Mayo Clinic Obesity Deep Phenotyping studies have shown intermediary phenotypes useful for selecting effective obesity therapies: Hungry Gut, Hungry Brain, Emotional Hunger, and Slow Burn. In prior work, trait-based Gene Risk Scores coupled with Machine Learning (ML-GRS) were effective at predicting Hungry Gut and Hungry Brain, forming the basis of the MyPhenome Test by Phenomix Sciences. Presented here is a similar approach for Emotional Hunger, which is defined by administration of the Hospital Anxiety and Depression Scale (HADS), requiring significant patient engagement. Because Emotional-Hunger-targeted lifestyle and medication interventions lead to improved outcomes, an Emotional Hunger GRS (EH-GRS) was developed. GRS values were calculated using variants within 500Kbp of each of the following genes: HTR2A, TPH2, DRD2, ANKK1 on 483 participants. The final model was trained using these GRS and height. Predictions were validated on further GRS data from independent testing (n=57) and validation (n=48) data sets used in prior work. An AUC of 0.69 was observed in the training set, followed by a 0.72 and 0.85 AUC on the test and validation sets, respectively. Individual questions from the Three Factor Eating Questionnaire (TFEQ) were tested further for which most improved the model. Using a subset of 137 samples with responses to the TFEQ, two questions were observed with notable improvement in accuracy (+0.04 AUC), both of which relate to loneliness. Overall, a simple ML-GRS predictor using a small number of genes accurately predicts the risk of having the EH intermediate phenotype for obesity. By using the EH-GRS, the relationship between EH and therapies can be tested in biobanks, retrospective studies, and other cases where patient responses to HADS cannot be obtained. Finally, for patients who do not have obesity (BMI>30), EH-GRS could assess future vulnerability and risk for having obesity. | https://phenomixsciences.box.com/s/ihro5in90qbk0zsjo9mt8o7ssr4d6xok | |
| 65 | Ece | Eksi | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Poster only | Eksi | UniFORM: Towards universal immunofluorescence normalization for multiplex tissue imaging | Kunlun Wang 1 , Kaoutar Ait-Ahmad 1 , Sam Kupp 1 , Zachary Sims 2 , Eric Cramer 2 , Zeynep Sayar 1 , Jessica Yu 1 , Melissa H Wong 3 , Gordon B Mills 4 , S Ece Eksi 5 , Young Hwan Chang 6 | 1
Cancer Early Detection Advanced Research (CEDAR), Knight Cancer Institute,
Oregon Health & Science University (OHSU), Portland, OR, USA. 2 Department of Biomedical Engineering and Computational Biology Program, OHSU, Portland, OR, USA. 3 Department of Cell, Developmental and Cancer Biology, OHSU, Portland, OR, USA; Knight Cancer Institute, OHSU, Portland, OR, USA. 4 Knight Cancer Institute, OHSU, Portland, OR, USA. 5 Cancer Early Detection Advanced Research (CEDAR), Knight Cancer Institute, Oregon Health & Science University (OHSU), Portland, OR, USA; Knight Cancer Institute, OHSU, Portland, OR, USA. Electronic address: eksi@ohsu.edu. 6 Department of Biomedical Engineering and Computational Biology Program, OHSU, Portland, OR, USA; Knight Cancer Institute, OHSU, Portland, OR, USA. Electronic address: chanyo@ohsu.edu. |
We present UniFORM, a non-parametric, Python-based pipeline for normalizing multiplex tissue imaging (MTI) data at both the feature and pixel levels. UniFORM employs an automated rigid landmark registration method tailored to the distributional characteristics of MTI, with UniFORM operating without prior distributional assumptions and handling both unimodal and bimodal patterns. By aligning the biologically invariant negative populations, UniFORM removes technical variation while preserving tissue-specific expression patterns in positive populations. Benchmarked on three MTI platforms, UniFORM consistently outperforms existing methods in mitigating batch effects while maintaining biological signal fidelity. This is evidenced by improved marker distribution alignment and positive population preservation, enhanced k-nearest neighbor batch effect test (kBET) and silhouette scores, and more coherent downstream analyses, such as uniform manifold approximation and projection (UMAP) visualizations and Leiden clustering. UniFORM also offers an optional guided fine-tuning mode for complex or heterogeneous datasets. While optimized for fluorescence-based MTI, its scalable design supports broad applications for MTI data normalization, enabling accurate and biologically meaningful interpretations. | https://doi.org/10.7490/f1000research.1120376.1 | |
| 66 | Margarita | Geleta | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Poster only | Geleta | PM1: a foundation model fusing genotype, phenotype, and image for precision medicine | Margarita Geleta, Christophe Thomassin, Marçal Comajoan Cara, David Bonet, Benet Oriol Sabat, Daniel Mas Montserrat, Alexander G. Ioannidis | Department of Biomedical Data Science Stanford University and Department of Computer Science University of California Berkeley, Department of Biomedical Data Science Stanford University, Department of Biomedical Data Science Stanford University and Department of Computer Science University of California Berkeley, Genomics Institute University of California Santa Cruz, Department of Computer Science University of California Los Angeles, Department of Biomedical Data Science Stanford University, Department of Biomedical Data Science Stanford University and Genomics Institute University of California Santa Cruz and Department of Genetics Stanford University | Precision medicine aims to personalize disease prevention, prediction, and diagnosis by leveraging genomic patient data. Although patient genomes provide valuable predictive insight, they cannot capture the full complexity of an individual’s health. Integrating genomics with additional patient data modalities, such as clinical phenotypes and medical imaging, enables more accurate and comprehensive disease modeling. We introduce PM1, a multimodal foundation model trained on genomic data from 438,668 individuals linked to 3,421 clinical and lifestyle traits and 211,416 retinal fundus photographs drawn from the UK Biobank and EyePACS cohorts. PM1 couples modality-specific encoders with a transformer encoder trained with an information noise-contrastive estimation objective that fuses modalities into a joint latent space, plus generative modality decoders for cross-modal reconstruction and synthesis. A token-level masking schedule lets PM1 use participants with any subset of modalities (in UK Biobank only ≈6% have all three), substantially expanding effective training data. Joint modeling of retinal images, clinical traits, and genomic data surpasses single-modality and multimodal baselines. PM1 enables cross-modal genotype inference, raises predictive performance for retinal diseases and systemic conditions, and supports conditioned single nucleotide polymorphism sequence and retinal image generation. As a group-level validation, a GWAS on PM1’s image-conditioned fusion embeddings recovers genome-wide significant HERC2 pigmentation variants. | N/A | pm1_a_foundation_model_fusing_genotype_phenotype_and_image_for_precision_medicine_final.pdf |
| 67 | Jici | Jiang | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Accepted proceedings paper with oral presentation | Jiang | Literature-driven extraction and computational prediction of causal statements linking genetic variants to biological processes, pathways and phenotypes | Jici Jiang, Predrag Radivojac*, Benjamin M. Gyori* | Northeastern University | Understanding
the mechanistic basis of pathogenic genetic variants requires reconstructing
the molecular pathways connecting the variant, via a chain of molecular
intermediates, to a disease-causing biological process and phenotype.
However, a literature-wide assembly of causal networks connecting variants,
molecular pathways, biological processes and phenotypes has not been
previously available. To create such a resource, we developed an automated
pathway reconstruction approach building on the Integrated Network and
Dynamical Reasoning Assembler (INDRA) system which extracts causal
mechanistic statements (positive regulation, phosphorylation, complex
formation, etc.) by combining structured databases and literature mining. We
traversed INDRA statements extracted from publications to identify those describing a genetic variant resulting in a protein point mutation. We then reconstructed directed paths (consisting of one or more linked INDRA statements) connecting this variant to a term representing a biological process, phenotype or disease within the same publication. This resulted in a directed multigraph obtained from 25,862 paths for variants in 2,561 proteins. Each node in this graph corresponds to an ontology-grounded molecular or process term and each edge is explicitly linked to supporting literature evidence, enabling full auditability of inferred mechanisms. To leverage the assembled networks, we trained a classification model to predict likely downstream biological processes or specific disease associations for protein variants. As features to the model, we integrated molecular annotations (including protein sequence features, ClinVar pathogenicity labels, and UniProt domain mappings) in combination with representations from the ESM2 transformer-based protein language model. The performance achieved by this model shows promise for reconstructing causal mechanistic statements associated with function of genetic variants, a framing of the variant effect prediction task that goes significantly beyond simple assessment of pathogenicity. This integrative framework enables the mechanistic interpretation of known variants and prediction of functional relevance for variants lacking prior phenotypic annotation. |
https://doi.org/10.7490/f1000research.1120375.1 | |
| 68 | Jennifer | Lee | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Poster only | Maldonado | Incident Suicidality after initiating Semaglutide vs. SGTL2is in U.S. Veterans with Type 2 Diabetes | Ana
Maldonado, PhD1,2,3*, Kent Heberer, PhD1,2,3*, Adam Bress, PhD4,5, Steven
Cogill, PhD1,2,3, Shriram Nallamshetty, MD1,2,6, Mei-Chung Shih, PhD1,2,3,7,
Ying Q. Chen, PhD1,2,3,8, Julie Lynch, PhD2,4,9, Jennifer S. Lee, MD, PhD,
MBA1,2 |
1VA
Palo Alto Healthcare System, Palo Alto, CA;2Veterans Affairs Cooperative
Studies Program Leveraging Electronic Health Information to Advance Precision
Medicine (VA LEAP) Initiative; 3VA Palo Alto Cooperative Studies Coordinating
Center, Palo Alto, CA; 4VA Salt Lake City Healthcare System, Salt Lake City,
UT; 5Department of Population Health Sciences, University of Utah School of
Medicine, Salt Lake City, Utah; 6Division of Cardiovascular Medicine,
Stanford School of Medicine, Stanford CA; 7Department of Biomedical Data
Science, Stanford School of Medicine, Stanford, CA; 8Stanford Prevention
Research Center, Stanford, CA9Division of Epidemiology, University of Utah
School of Medicine, Salty Lake City, UT. *co-first authorship |
Importance:
Glucagon-like peptide-1 receptor agonists (GLP-1RAs) are effective
medications for type 2 diabetes (T2D). Earlier studies found an increased
risk of suicidality with GLP-1RA use, while more recent studies have found no
link. For U.S. veterans with T2D, robust research is crucial given that
veterans have an increased risk of suicidality. Objective: To emulate a target trial evaluating the risk of suicidality (ideation and attempts/completion) associated with initiation of semaglutide, a GLP-1RA, compared with a sodium-glucose cotransporter-2 inhibitor (SGLT2i) as second-line therapy for T2D in a nationwide cohort of U.S. Veterans. Design: Active-comparator, new-user, target trial emulation. Marginal cause-specific hazard ratios (HRs) were estimated using overlap weighting to account for confounding. Setting: The Veterans Health Administration (VHA) nationwide health care system between March 1, 2018 and March 1, 2025. Participants: U.S. Veterans with T2D using metformin and no prior exposure to GLP-1RAs or SGLT2is. Exposure: Initiation of semaglutide or SGLT2is. Outcomes: 1) incident suicidal ideation, defined as ICD-10 codes, clinician-administered or self-report questionnaires, e.g., CSSRS, PHQ9; and 2) incident suicide attempts/completions. Results: A total of 102,361 Veterans met inclusion criteria (semaglutide: 11,478; SGLT2i: 90,883). Baseline characteristics were well balanced between treatment groups after overlap weighting (mean [SD] age, 60.1 [11.7] years; BMI, 37.8 [6.8] kg/m2; A1c, 7.0% [1.4]; 85.5% male; 61.9% non-Hispanic White; 20.7% Black; 8.1% Hispanic). Over a median follow-up of 2.1 years, patients who initiated semaglutide were not at a higher risk than patients who initiated SGLT2i for suicidal ideation (HR, 1.02; 95% CI, 0.96–1.08; P=0.6) nor suicide attempts/completions (HR, 1.08; 95% CI, 0.86–1.36; P=0.5). Results were similar when stratified by history of suicidality. Conclusion: U.S. Veterans with T2D who initiated semaglutide were not observed to have a higher risk of suicidality compared to those who initiated SGLT2i regardless of their history of suicidality. |
N/A | psb_glp1_si.pdf |
| 69 | Samantha | Piekos | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Poster only | Piekos | Deep Phenotyping and Multimodal Data Integration for Early Prediction of Pregnancy Complications | Samantha Piekos, Oren Barak, Nathan Price, Leroy Hood, Yoel Sadovsky | Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA USA; Department of Obstetrics and Gynecology, Kaplan Medical Center, Rehovot, Israel, affiliated with the Hebrew University and Hadassah School of Medicine, Jerusalem, Israel; Buck Institute for Research on Aging, 8001 Redwood Blvd, Novato, CA USA; Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA USA; Pediatrics, Stanford University, 453 Quarry Rd, Stanford, CA USA | Pregnancy
complications, including hypertensive disorders of pregnancy (HDP), fetal
growth restriction (FGR), and spontaneous preterm birth (sPTB), are leading
causes of maternal and neonatal morbidity and mortality. Current clinical
diagnosis commonly occurs after opportunities for early intervention have
passed. A systems biology approach integrating longitudinal, multimodal data
is needed to detect the earliest deviations from healthy pregnancy
trajectories for timely personalized interventions. We assembled comprehensive deep phenotyping datasets tracking 347 women from first prenatal appointment through delivery: 105 (30.3%) developed pregnancy complications. Approximately 20% were high-risk (chronic hypertension, gestational diabetes, or prior pregnancy complications). All study data are linked to electronic health records, providing extensive clinical features and medical history. Vital signs and weight were recorded at each visit. Questionnaires evaluated stress, depression, nausea, and diet each trimester. Environmental exposures were captured via home air pollution sensors, and Fitbits tracked activity, heart rate, and sleep. Blood and urine samples were collected 8-10 weeks and placenta upon delivery for multiomics (lipidomics, metabolomics, proteomics, and transcriptomics). Fecal and vaginal samples were collected every four weeks for 16S microbiome sequencing. We performed multiomic profiling on 133 participants with the most complete temporally paired samples (40 HDP, 20 FGR, 4 FGR+HDP, 13 sPTB, 56 controls. Of those with complications, only 32 (41.6%) were classified as high-risk due to preconception risk factors. This unique dataset enables data-driven methods to model normal system-level pregnancy dynamics, identify complication-associated deviations, and pinpoint the earliest individualized signs of pathology. We are constructing interomic networks to characterize temporal and cross-sectional changes between healthy and complicated pregnancies. Using late-fusion multimodal machine learning, we will identify the earliest points of deviation from healthy trajectories for individual patients. This work lays the foundation for early, multimodal diagnostic and prognostic tools in obstetrics, and the approach is broadly applicable to other complex diseases. |
N/A | 260103_psb_poster_dp3_study.pdf |
| 70 | Jubair Ibn Malik | Rifat | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Accepted proceedings paper with oral presentation | Rifat | BioLM-NET: an interpretable deep learning model combining prior biological knowledge and contextual LLM gene embeddings on multi-omics data to predict disease | Jubair Ibn Malik Rifat, Thasina Tabashum, Md Marufi Rahman, Md Farhad Mokter, Sarthak Engala, Serdar Bozdag | Department
of Computer Science & Engineering, University of North Texas, Department of Computer Science & Engineering, University of North Texas, Department of Computer Science & Engineering, University of North Texas, Department of Computer Science & Engineering, University of North Texas, Department of Computer Science & Engineering, University of North Texas, Department of Computer Science & Engineering, University of North Texas, Department of Mathematics, University of North Texas Center for Computational Life Sciences, University of North Texas |
Biologically informed deep neural networks, which connect input layer to hidden layers based on gene-pathway relationship have gained popularity in recent years. However, most existing methods do not incorporate protein-protein interactions (PPI) and protein-DNA interactions (PDI) in their designs. In this study, we introduce BioLM-NET, a deep learning-based framework that fuses single cell or bulk gene expression data and DNA methylation data with prior biological knowledge including Protein-Protein Interactions (PPI), Protein-DNA Interactions (PDI). BioLM-NET also aggregates latent representation of omics signals at pathway-level through an attention-based pathway layer where a pre-trained large language model (LLM) was incorporated to generate context-specific gene embeddings. We evaluated BioLM-NET on single cell colorectal cancer data from scTrioseq2 platform to predict primary and metastatic cancer cells, on TCGA-BRCA, TCGA-GBM, TCGA-COAD to predict cancer subtypes and ROSMAP data to predict Alzheimer’s disease patient. Our results showed that BioLM-NET outperformed baseline and state-of-the-art (SOTA) methods, P-NET and PASNet with statistical significance on scTrioseq2 data, TCGA-COAD and ROSMAP data and ties with SVM and Dense neural network on TCGA-BRCA data. Our ablation studies demonstrated the importance of incorporating PPI, PDI data and attention-based pathway layer. We also interpret our models and found out that our important input features are significantly enriched in GO terms and KEGG pathways and can serve as potential biomarkers or therapeutic targets for the corresponding disease. | https://doi.org/10.7490/f1000research.1120294.1 | |
| 71 | Yangxi | Yu | Precision Medicine: Integrating large scale data and intermediate phenotypes for understanding health and treating disease | Poster only | Yu | Construction of the Largest Healthy Human PBMC Atlas | Yangxi Yu, Wanjun Gu | Yangxi
Yu, State Key Lab of Bioelectronics-Southeast University Wanjun Gu, School of Artificial Intelligence and Information Technology-Nanjing University of Chinese Medicine |
Introduction With the rapid expansion of single-cell RNA-seq data, an atlas for rapid reference and annotation is demanded. Especially for peripheral blood, which is easy to acquire and relatively stable to analyze. Several atlases have been published, but either the cell amount or the heterogeneity is not enough as a comprehensive reference. Result Here we are proposing a newly constructed PBMC Atlas for healthy human by integrating single-cell RNA-seq data, from 171 datasets and 3,076 samples. The corresponding phenotype metadata including age, gender, race and region was also collected, each were available for over 2000 samples. The upstream data QC and integration process are generally based on the scanpy framework with some adjustment. In the core-step we manually annotated 68 cell subtypes from 1,417,158 cells in the highest annotation level, and in the extension step we mapped the dataset through scArches framework, and demonstrated some extra cell types that were not discovered in the first step, thereby amended the atlas to over 70 cell types. The final atlas included 10,247,764 PBMC cells, annotated as three annotation levels. Specially, the label-transfer process applied a new artificial neural network annotation technology developed by us, which has similar annotation quality and much faster calculating speed than wKNN methods in current scArches framework. Another innovative process is in core annotation, we split the dataset through expression level of ribosomal genes, temporarily excluded high-expression cells and applied an extra transfer-learning process to take it back. Conclusion This work constructed the largest current comprehensive atlas for healthy human, which also covers the maximum current data variation. In addition, the newly developed classifier would be a perfect supplement to current scArches framework. |
https://doi.org/10.7490/f1000research.1120374.1 | construction_of_the_largest_healthy_human_pbmc_atlas.pdf |
| 72 | Brendan | Ball | Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms | Poster only | Ball | Mouse lymph node colonization models predict human tumor metastasis in melanoma | Brendan K. Ball, Andrew J. Gentles | 1.
Department of Pathology, Stanford University School of Medicine, Stanford
CA 2. Department of Biomedical Data Science, Stanford University, Stanford CA |
Aggressive tumor progression and distant metastases of cancers such as skin cutaneous melanoma is thought to be initiated by lymph node (LN) involvement. Previous mouse studies have selectively enriched for tumor cells with enhanced LN migratory capacity but have not been directly translated to human outcomes. As a result, the involvement of LN colonization in distant metastasis remains unclear. To translate findings from mouse models to human, we used a computational framework termed Translatable Components Regression (TransComp-R) to synthesize mouse tumor samples from the LN with human primary and metastatic tumors. Using TransComp-R, we identified a principal component encoding transcriptomic variation of early- and late-stage mouse tumor generations that stratified primary and metastatic tumors from human melanoma patients. We identified enriched biological pathways associated with the cell cycle and G2/M checkpoint. To reveal potential cell types involved in tumor LN colonization, we applied CIBERSORTx, a deconvolution approach to infer cell type proportions. Notably, we identified differentially abundant CD4 memory activated and follicular helper T cells across human metastatic and primary tumor samples. In both cells, we identified a panel of genes driving LN colonization that were also differentially expressed across metastatic and primary tumors in human. These cells and genes may contribute to the tumor-immune microenvironment that facilitates LN colonization in cases of melanoma. Nevertheless, our synthesis of pre-clinical models with human data reveals potential biological pathways and transcriptomic signatures for future investigation. | N/A | psb_poster_brendanball.pdf |
| 73 | Jungyeon | Kim | Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms | Poster only | Kim | Multi-Omics Analysis and Metabolic Engineering to Overcome Metabolic Bottlenecks of Escherichia coli Nissle 1917 in the Gut | Jungyeon Kim | Graduate
School of International Agricultural Technology, Seoul National University,
Gangwon-do, Pyeongchang-gun 25354, Republic of Korea. Institute of Food Industrialization, Institutes of Green Bioscience and Technology, Seoul National University, Gangwon-do 25354, Republic of Korea. |
Escherichia
coli Nissle 1917 (EcN) is a clinically established probiotic used for
gastrointestinal disorders and has recently been engineered to deliver
therapeutic proteins for targeted intestinal treatments. However, its complex
metabolic behavior in the gut environment complicates predictions of
colonization dynamics and therapeutic efficacy.Using in silico metabolic
modeling and controlled fermentation profiling, we identified EcN’s ability
to metabolize mucin, highlighting its active interaction with host intestinal
mucus. Comprehensive multi-omics analyses revealed that fucose utilization
enhances intestinal colonization by promoting flagellar biosynthesis and
nutrient uptake pathways. Additionally, excessive intracellular trehalose
accumulation, linked to galactose metabolism via the otsAB pathway, was
identified as a critical metabolic bottleneck impairing bacterial growth and
metabolic efficiency.To alleviate this bottleneck, we constructed a ΔotsAB
mutant strain lacking trehalose synthesis. This engineered strain
demonstrated a 1.47-fold improvement in galactose metabolism and exhibited
enhanced growth under gut-mimicking conditions.These findings provide a
systems-level understanding of EcN’s intestinal metabolic adaptations and
establish a rational framework for improving its colonization and therapeutic
performance through targeted metabolic engineering. |
N.A | 2026_psb_jungyeonkim.pdf |
| 74 | Kord | Kober | Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms | Poster only | Kober | Alternative Splicing And Global Transcriptome Changes Associated With LPS Stimulation In Human Peripheral Blood Mononuclear Cells | Esther Chavez-Iglesias, Anatol Sucher, Nidhi Thati, Julia Trudeau, Sue Yom, Marina Sirota, Adam Olshen, Nam Woo Cho, Kord Kober | UCSF School of Nursing, UCSF School of Nursing, UCSF School of Nursing, UC Irvine School of Pharmacy and Pharmaceutical Sciences, UCSF School of Medicine, UCSF School of Medicine, UCSF School of Medicine, UCSF School of Medicine, UCSF School of Nursing | Lipopolysaccharide (LPS), a key component of gram-negative bacterial cell walls, is a potent activator of the innate immune system and widely used as a model to study inflammatory responses. While the transcriptional response to LPS stimulation has been characterized, the role of alternative splicing (AS) in modulating this response remains largely unexplored. This study aims to comprehensively characterize the transcriptomic effects of LPS by utilizing deep RNA sequencing to evaluate both differential gene expression and alternative splicing associated with LPS treatment. Additionally, it seeks to uncover underlying mechanisms of immune regulation to identify potential therapeutic targets. | https://doi.org/10.7490/f1000research.1120403.1 | |
| 75 | Hanbi | Lee | Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms | Poster only | Lee | Comprehensive transcriptomic analysis identifies Lrg1 as a potential therapeutic target for preventing muscle atrophy in cancer cachexia | Hanbi Lee, Aeyung Kim, Kyuwon Son, Ahyoung Choi, Seongwon Cha, Hyunjin Shin, No Soo Kim, and Haeseung Lee | MOGAM
Institute for Biomedical Research in Seoul 06730 Republic of Korea, Korean Medicine Application Center of the Korea Institute of Oriental Medicine in Daegu 41062 Republic of Korea, College of Pharmacy and the Research Institute for Drug Development of Pusan National University in Busan 46241 Republic of Korea, MOGAM Institute for Biomedical Research in Seoul 06730 Republic of Korea, KM Data Division of the Korea Institute of Oriental Medicine in Daejeon 34054 Republic of Korea, MOGAM Institute for Biomedical Research in Seoul 06730 Republic of Korea, KM Convergence Research Division of the Korea Institute of Oriental Medicine in Daejeon 34054 Republic of Korea, College of Pharmacy and the Research Institute for Drug Development of Pusan National University in Busan 46241 Republic of Korea |
Cancer cachexia is a debilitating syndrome characterized by progressive skeletal muscle wasting and systemic inflammation, primarily observed in patients with advanced-stage cancer. Cachexia severely impacts patients' quality of life and even increases mortality rates; however, effective therapeutic interventions remain elusive. To identify key mediators of muscle atrophy, we integrated more than one hundred bulk and single-cell transcriptomic datasets from diverse murine cachexia models, including colorectal, lung, and pancreatic cancer. This analysis identified leucine-rich alpha-2-glycoprotein 1 (Lrg1), as consistently upregulated in skeletal muscle endothelial cells across cachexia models and progressively increased during disease progression. Functional studies demonstrated that recombinant Lrg1 induced myotube atrophy in vitro, accompanied by reduced fusion index, shortened myotube length, and increased expression of the atrogenes MAFbx and MuRF1. Neutralization of Lrg1 or pharmacological inhibition of Stat3 prevented these effects. Our findings nominate Lrg1 as a candidate biomarker and potential therapeutic target for preventing skeletal muscle wasting in cancer cachexia. | N/A | |
| 76 | Robert | Modlin | Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms | Poster only | Modlin | Decoding early human TB lesions with multimodal biocomputing identifies a permissive alveolar niche for Mycobacterium tuberculosis | Robert L. Modlin | UCLA | The
earliest stages of tuberculosis (TB) pathogenesis remain poorly
characterized, in part due to the difficulty of resolving heterogeneous human
lung lesions. To address this, we developed a multimodal biocomputing
framework that integrates spatial transcriptomics, single-cell RNA
sequencing, and high-resolution tissue imaging from human lung biopsies.
Using advanced computational pipelines for cell state alignment, spatial
mapping, and transcript–pathogen co-localization, we decoded the organization
of early pulmonary lesions. These analyses revealed alveolar regions filled
with lipid-associated macrophages, harboring abundant Mycobacterium
tuberculosis antigen and mRNA but lacking robust antimicrobial or lymphocyte
responses. By contrast, granuloma cores exhibited enriched immune activation
and lymphocyte recruitment. Biocomputational reconstruction of disease
trajectories highlighted a transition from permissive alveolar pneumonia to
organized granulomas, identifying an underappreciated early niche that
supports bacterial persistence. This study demonstrates how integrative
biocomputing can resolve hidden compartments of infectious disease and inform
opportunities for early intervention and prevention of transmission. |
https://doi.org/10.7490/f1000research.1120414.1 | modlin_poster_psb_reduced_suze.pdf |
| 77 | YounChul | Ryu | Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms | Poster only | Lee | Network-based Multi-omics Integration Workflow Linking Transcriptomics and Phenotypic Traits in Hanwoo cattle under methane-mitigating feeding strategies | Sanghoon Lee, KyoungBo Ko, GwangHeun Kim, Jong-Eun Park, YounChul Ryu | Department of Animal Biotechnology, College of Applied Life Sciences, Jeju National University, Jeju, Republic of Korea. | Linking
high-dimensional omics data to measurable traits remains a major challenge in
livestock biology. We introduce a network-based multi-omics integration
workflow that connects muscle transcriptome profiles with carcass and
meat-quality traits in a modular and reproducible manner. The workflow
proceeds through four stages: (i) RNA-seq preprocessing and normalization;
(ii) differential expression and clustering; (iii) pathway-level functional
enrichment combined with network topology analysis; and (iv) a trait-guided
layer that integrates machine-learning–based feature prioritization with
rank-based correlation analysis. Instead of focusing on single-gene effects,
the framework emphasizes network-level interpretation and trait context,
improving biological interpretability and reducing noise from marginal
statistics. When applied to a dataset generated under sustainable feeding strategies, transcriptomic variation alone showed limited separation, yet network-level modeling revealed coherent gene communities associated with metabolic and regulatory adaptation. Incorporating trait information enabled prioritization of modules linked to measurable phenotypes, providing insight into how nutritional intervention reshapes regulatory architecture. Rather than reporting a list of differentially expressed genes, this framework offers a practical route to convert omics observations into biological hypotheses. By revealing regulatory modules associated with phenotypic variation, it supports downstream functional validation and decision-making. The conceptual design can be adapted to other omics contexts where molecular data need to be anchored to traits, making the approach applicable across agricultural, biomedical, and environmental research domains. |
N/A | psb_2026_poster.pdf |
| 78 | Sun | Park | Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms | Poster only | Park | Computational reanalysis identifies intestinal and metabolic axes linking gut microbiota to Chronic Spontaneous Urticaria | Sun Park, Jiseon Yang, Jin-Young Yang | Mucosal Immunology Laboratory, Department of Integrated Biological Science, Pusan National University, Busan 46241, Republic of Korea; Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, 85281, USA; Department of Biological Science, Pusan National University, Busan, 46241, Republic of Korea | Chronic spontaneous urticaria (CSU) is a persistent inflammatory skin disease characterized by immune dysregulation and altered metabolic signaling. Growing evidence suggests that gut microbiota contributes to CSU through immune modulation and production of bioactive metabolites. However, previous microbiome studies have reported inconsistent findings regarding which bacterial taxa and metabolic pathways are involved, mainly due to variations in sequencing pipelines, analytical tools, and reference databases. To address three discrepancies, we reanalyzed raw 16S rRNA gene sequencing data from three independent CSU cohorts using a standardized pipeline. We conducted differential abundance analyses with MaAsLin2, Limma-voom, and Wilcoxon tests, which helped identify microbial taxa that were consistently different between CSU and control groups. Functional potential was predicted with PICRUSt2, followed by contribution and correlation analyses linking taxa to KEGG orthologs (KOs). In all datasets, we found 82 taxa with significant differences between groups. Among those frequently described in earlier reports, Pseudomonadota, Enterobacteriaceae, and Dialister were enriched in CSU, while Megamonas was consistently reduced. The functional analysis showed 212 KOs that varied between groups, with 121 of these linked to the four taxa. Important features included K03399, which appeared in several taxa, and pathways related to membrane transport, cofactor production, stress response, and DNA repair. These functions reflect how the microbes adapt to inflammatory conditions. In contrast, functions associated with Megamonas suggested a loss of metabolic balance and a decrease in anti-inflammatory capability. This unified computational framework outlines two main microbial pathways-one that supports intestinal environmental adaptation and another that influences inflammatory metabolism. Together, these pathways offer a clear model linking microbial function to ongoing immune activation in CSU. | https://doi.org/10.7490/f1000research.1120366.1 | psb_poster_f.pdf |
| 79 | Audrey | Lacy | Workshop: Advances of AI methods in single cell spatial omics | Accepted workshop abstract | Lacy | CELLestial: A scalable end-to-end spatial proteomics analysis framework | Audrey Lacy, Andressa Dias Costa, Carla Rujana, Emma Coleman, Muhammad Shaban, Faisal Mahmood, Brian Wolpin, Jonathan Nowak, Simona Cristea | Department of Data Science Dana-Farber Cancer Institute, Department of Medical Oncology Dana-Farber Cancer Institute and Harvard Medical School, Department of Pathology Brigham and Women’s Hospital and Harvard Medical School, Department of Pathology Massachusetts General Hospital and Harvard Medical School, Broad Institute of Harvard and MIT, Harvard Data Science Initiative, Harvard T.H. Chan School of Public Health | Spatial proteomics provides a rich source of data generation allowing scientists to visualize and quantify high-plex protein assays, yet analytical bottlenecks such as cellular phenotyping remain. Currently, cells are phenotyped via subjective visual gating, user-guided machine learning (ML) algorithms, or manual cluster validation and annotation. These approaches are labor-intensive and struggle to generalize to new cohorts where the data is out of distribution. Here, we propose CELLestial, an efficient, generalizable spatial proteomics workflow that uses the probability distributions of the pixel intensities within the spatial proteomics images to generate per-sample, per-marker pixel intensity thresholds. CELLestial further phenotypes cells, produces an assortment of autogenerated marker-level and phenotype-level QC figures, and allows for the data to be projected onto the spatial proteomics image in a custom Napari user-interface for visualization, verification, and further annotation of cell identities. As a zero-shot method, CELLestial consistently achieves the highest precision when considering biologically significant cell types and either outperforms or performs comparably to state-of-the-art methods in benchmarking, with the additional benefit of being fully automated and processing approximately quarter-million cells into a ready-to-analyze AnnData object every 30 minutes. As a companion tool for vision transformer spatial proteomics foundation models, fine-tuning with CELLestial annotations improves cell type assignment compared to zero-shot model performance, thus demonstrating CELLestial's utility to generate a subset of cell type labels to be used in the supervised fine-tuning of foundation models. | https://drive.google.com/file/d/1Ggesunr5mX9Uk2dUIvet5U73ZtEc76wv/view?usp=sharing | cellestial_psb2026_poster.pdf |
| 80 | Hyeonggyu | Choi | Workshop: AI for Health: Leveraging Artificial Intelligence to Revolutionize Healthcare | Poster only | Choi | Sequence-Based PK Prediction Models for Monoclonal Antibody Formulation Optimization: Leveraging FDA/EMA-Approved Therapeutic Data and Protein Language Models | Hyeonggyu Choi, Hyoyoung Kim, Gyuseong Lee, Daekeun Park | CHA University, Tech University of Korea, CHA University, CHA University | Converting
monoclonal antibodies (mAbs) to subcutaneous (SC) formulations is challenging
due to the difficulty and risk in predicting key pharmacokinetic (PK)
parameters such as bioavailability (F), clearance (CL), and volume of
distribution (Vd). Existing computational models typically predict only a
single PK parameter—most commonly the absorption rate constant (ka) or
F—thereby limiting their usefulness for comprehensive SC formulation risk
assessment. In this study, we developed a single-task, sequence-based machine
learning framework capable of predicting all three regulatory PK metrics (F,
CL, and Vd) independently from an antibody’s amino acid sequence. We curated
a dataset of 482 FDA/EMA-approved therapeutic mAbs and retrieved their
heavy/light chain sequences from the Thera-SAbDab database. Each sequence was
embedded using the ESM-2 protein language model, followed by SelectKBest
feature selection, principal component analysis (PCA), and Random Forest
classification. To further improve robustness, we incorporated a voting
ensemble combining multiple independently trained models. The final framework
achieved strong predictive performance, with AUROC values of 0.818 for F,
0.833 for CL, and 0.869 for Vd, consistently outperforming baseline models
across validation folds. These findings demonstrate that regulatory-grade
clinical PK parameters can be meaningfully inferred from sequence-derived
representations alone, providing an effective in silico tool to de-risk SC
formulation development. keywords: Monoclonal antibodies, Pharmacokinetic prediction, Subcutaneous formulation, Protein language models (PLMs), ESM-2, Bioavailability (F), Clearance (CL), Volume of distribution (Vd), Voting ensemble, Machine learning (Random Forest) |
N/A | poster__sequencebased_pk_prediction_models_for_monoclonal_antibody_formulation_optimization_leveraging_fda_emaapproved_therapeutic_data_and_protein_language_models.pdf |
| 81 | Ahyoung | Choi | Workshop: AI for Health: Leveraging Artificial Intelligence to Revolutionize Healthcare | Poster only | Choi | Mechanistic Roles of FoxO and Circadian Regulation in Temporal Transcriptional Dynamics during Dexamethasone-Induced Muscle Atrophy | Ahyoung
Choi, Hanbi Lee, Bum Suk Kim, No Soo Kim, Aeyung Kim, Yoomi Baek, Haeseung
Lee, Seongwon Cha and Hyunjin Shin |
MOGAM Institute for Biomedical Research in Seoul 06730 Republic of Korea, MOGAM Institute for Biomedical Research in Seoul 06730 Republic of Korea, MOGAM Institute for Biomedical Research in Seoul 06730 Republic of Korea, KM Convergence Research Division of the Korea Institute of Oriental Medicine in Daejeon 34054 Republic of Korea, Korean Medicine Application Center of the Korea Institute of Oriental Medicine in Daegu 41062 Republic of Korea, MOGAM Institute for Biomedical Research in Seoul 06730 Republic of Korea, College of Pharmacy and the Research Institute for Drug Development of Pusan National University in Busan 46241 Republic of Korea, KM Data Division of the Korea Institute of Oriental Medicine in Daejeon 34054 Republic of Korea, MOGAM Institute for Biomedical Research in Seoul 06730 Republic of Korea | Muscle
atrophy induced by glucocorticoids represents a major clinical challenge.
However, temporal and molecular mechanisms underlying dexamethasone
(DEX)-driven muscle wasting remain incompletely understood. Here, we
investigated the dynamic transcriptional landscape of DEX-induced muscle
atrophy in the tibialis anterior by integrating bulk and single-nucleus
RNA-sequencing (snRNA-seq) data. Weighted gene co-expression network analysis
(WGCNA) identified distinct gene modules that delineate early and late
molecular responses to DEX. The early-activated module was enriched by
components of the FoxO signaling pathway, highlighting accelerated protein
degradation as an initiating event in the muscle wasting process. In
contrast, the late-activated module exhibited strong enrichment for pathways
related to circadian rhythm, suggesting a progressive disruption of molecular
clock function during sustained glucocorticoid exposure. Among core circadian
rhythm genes, we experimentally confirmed a reduction in phosphorylation of
Bmal1 at serine 42, a key regulatory modification required for protein
synthesis. Collectively, our integrative analysis uncovers early activation
of the FoxO pathway and subsequent circadian dysfunction as central
mechanisms in dexamethasone-induced muscle degradation, thereby providing
mechanistic insights into the development of effective therapeutic
strategies. |
N/A | psb_poster__final.pdf |
| 82 | Gyuseong | Lee | Workshop: Applications of AI & ML in Biomanufacturing of Cell and Gene Therapies | Poster only | Lee | Construction and Pattern Analysis of mAb Formulation Dataset Based on EMA and FDA Public Documents | Gyuseong Lee, Daekeun Park, Hyeonggyu Choi, Hyoyoung Kim | CHA University, CHA University, CHA University, Tech University of Korea | Subcutaneous
(SC) high-concentration monoclonal antibody (mAb) formulations suffer from an
excessively large formulation design space and an inflated number of Design
of Experiments (DoE) scenarios. Previous reviews have described trends in pH
and excipients for marketed mAb formulations, but a machine-readable,
quantitative database covering all EMA and FDA approved products is lacking.
In this study, we used a generative AI based pipeline to extract formulation
and concentration information from 483 EMA/FDA mAb products and 6,844 mAb
study related ClinicalTrials records, and integrated them into a standardized
schema. Using this dataset, we quantified concentration, pH, and the use
patterns of buffers, stabilizers, and surfactants in intravenous (IV) and SC
formulations. SC formulations exhibited approximately five fold higher
protein concentrations, lower pH, and smaller dose volumes than IV
formulations. We further delineated practical design ranges, including pH
5.5-6.5, protein concentration 50-150mg/mL, and excipient combinations
centered on histidine/acetate buffers with polyols and polysorbates. This mAb
formulation dataset provides a practical design reference that can
substantially reduce the number of in-silico formulation experiment scenarios
required for IV to SC transitions and for narrowing DoE design spaces. Keyword: Monoclonal antibody, mAb, Subcutaneous, Intravenous, Formulation, Design of Experiments, Generative AI, EMA, FDA, ClinicalTrials, in-silico |
N/A | poster__construction_and_pattern_analysis_of_mab_formulation_dataset_based_on_ema_and_fda_public_documents.pdf |
| 83 | Florencia | Martino | Workshop: Trust, Reproducibility, and Progress: The Role of Benchmarking in Computational Biology | Poster only | Martino | Benchmarking Rao’s Q as a Reproducible, Quantitative, Evolution-Aware Metric of Viral α-diversity for Metagenomic Data | Florencia Martino, Kakhangchung Panmei, Dylan Duchen, David L. Thomas, Abraham J. Kandathil, Steven J. Clipman | 1. Division of Infectious Diseases,
Johns Hopkins University School of Medicine, Baltimore, Maryland, United
States of America 2. Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America 3. Department of Pathology, Yale School of Medicine, New Haven, Connecticut, United States of America 4. Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America 5. Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America 6. Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America |
Within-sample
viral diversity in metagenomic viromes can serve as a bioindicator of host
immune state and infection risk. Traditional indices (Shannon, Simpson, Hill
numbers) capture heterogeneity and abundance but ignore evolutionary
distance, while phylogenetic indices such as Faith’s PD disregard relative
abundance, masking differences between near-clonal expansions and distinct
lineages. We present a reproducible benchmarking framework based on Rao’s
quadratic diversity (Rao’s Q) as a phylogenetically informed α-diversity
metric for viromes, characterizing behavior, reproducibility, and
interpretability in clinical metagenomic samples. Using controlled plasma viromes with a constant background and serial dilutions of bacteriophage M13mp18 (1.4 × 102 – 4.3 × 106 copies/mL), Rao’s Q increased linearly with lineage depletion (slope: 0.003 per log10 copies/mL; 95% CI: -0.0003 – 0.0056). In silico perturbations confirmed that ΔRao’s Q scales with phylogenetic distance (R2 = 0.98) and is robust to read cross-mapping and tree-collapsing operations. Across seven Oxford Nanopore runs of Anelloviridae, Rao’s Q showed high reproducibility (within-run CV = 0.026; between-run CV = 0.118; median ρ = 0.70), defining empirical thresholds serving as quantitative performance benchmarks for evolutionary diversity metrics. Two negative-control dilution series, spanning a 104-fold range in input, confirmed bounded variability (CV ≤ 0.05). In patient plasma viromes (n = 92), Rao’s Q ranged from 0.00 – 0.30 (median = 0.17), exceeding technical variability and revealing evolutionary dispersion undetected by abundance-only indices. By integrating phylogeny and abundance, Rao’s Q quantifies α-diversity as evolutionary dispersion. In viral metagenomes, where closely related genomes generate extensive read cross-mapping and inflate apparent homogeneity, this framework distinguishes evolutionary heterogeneity from mapping artifacts. Rao’s Q provides a quantitative and reproducible measure of within-sample diversity that remains stable across sequencing runs and is sensitive to genuine lineage turnover. Beyond viromes, this distance-based formalism generalizes to other genomic systems, enabling standardized diversity comparisons across microbiomes, plasmidomes, and pangenomes. |
https://github.com/MERIDAIN-Lab/raoQ-viral-diversity-PSB2026/blob/main/poster/Poster_RaoQ_PSB_Hawaii_2026.pdf |