

DCU Researchers Advance Medical Image Segmentation with New Vision-Language Model Ensemble
Researchers at Dublin City University’s Faculty of Engineering and Computing; Julia Dietlmeier, Oluwabukola Grace Adegboro, Vayangi Ganepola, Dr Claudia Mazo, and Prof Noel E. O’Connor, have developed a new approach to improve medical image segmentation.
Their work, a collaboration between the Insight SFI Research Centre for Data Analytics, the Centre for Research Training in Machine Learning, and the School of Computing, has been accepted for presentation at the Medical Imaging with Deep Learning (MIDL) 2025 conference.
The team introduced an ensemble-based approach that combines CLIP-derived vision-language segmentation models (BiomedCLIPSeg and CLIPSeg) with a lightweight convolutional neural network (UNet).
Instead of focusing on text prompt engineering, the researchers applied stacking techniques to fuse model outputs, resulting in significant accuracy improvements on several benchmark datasets. Notably, their BiomedCLIPSeg ensemble achieved a 6.3 percent improvement on the BKAI polyp dataset.
Across five public datasets, including radiology and non-radiology imaging, the ensemble models consistently outperformed individual vision-language models in most cases.
However, results also revealed varied performance across datasets, with some combinations underperforming compared to leading models like CRIS, suggesting new directions for future research. The approach showed clear benefits in reducing false positives and negatives in key examples.
This work highlights the potential of collaborative model design to enhance medical diagnostics. With further refinement, the DCU team’s method could contribute significantly to clinical applications where segmentation precision is critical.
Their code and models are openly available, enabling continued exploration and development within the research community.