An approach to align biological datasets across modalities and biological contexts
Date : 2026/07/08 (Wed.) 11:00~12:00
Location : Room 639, Institute of Mathematics (NTU Campus)
Speaker : Lani Wu
Abstract : Data describing the same biology are often not directly comparable. Different experiments, instruments, and analysis pipelines produce measurements with different features and scales, even when studying the same underlying process. As a result, many valuable datasets sit isolated, unable to be combined for greater insight.
Twenty years ago, we introduced high-content image-based phenotypic screening (HCS) (Perlman, Science, 2004), showing that compounds with similar mechanisms produce similar cellular profiles — allowing the function of an unknown compound to be predicted by comparison to known "reference" compounds. HCS is now widely used, but each screen is typically custom-built, so datasets from different labs cannot be directly compared, even when they test the same reference compounds.
We developed CLIPn, a deep-learning method that aligns diverse datasets into a shared space using a small set of shared reference points, then uses that alignment to predict the identity of unknown samples by comparison across datasets. We validated CLIPn on 14 HCS datasets spanning 20 years, correctly predicting and experimentally confirming compound functions that could not be resolved in the original, isolated screens — establishing a general strategy for uniting fragmented biological data.