Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

High content imaging (HCI) plays a pivotal role in target-directed drug discovery (TDD) by identifying compound activities across tests (or assays) designed for specific therapeutic targets. However, real-world assays often exhibit extreme label sparsity over large compound libraries, making accurate predictions challenging. Recent studies following multi-label learning (MLL) struggle in such scenarios when optimizing a single objective across multiple assays without assay-specific adaptations. To address this, we propose Mixture of Multi-Instance Learners (MoMIL), a multi-task learning (MTL) framework integrating hard-parameter sharing with assay-specific Multiple Instance Learners (MILs), enabling knowledge sharing and task-specific adaptations. Furthermore, we introduce complementary enhancements: HCI-specific foundation models (FMs), an assay selection algorithm, and a label imputation method to boost MoMIL’s learning capabilities. We benchmark MoMIL on two extensive HCI datasets, achieving up to ∼6% and ∼8% improvement over state-of-the-art MLL and MTL methods. Moreover, MoMIL shows strong generalization to unseen assays, outperforming assay-specific single-task learning (STL) methods in 11 out of 12 assays.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1746_paper.pdf

SharedIt Link: https://rdcu.be/eHw8P

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05169-1_36

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{PatPus_MoMIL_MICCAI2025,
        author = { Pati, Pushpak AND Cheng, Hsiu-Chi AND Jaensch, Steffen AND Abdelmoula, Walid M. AND Chaitanya, Krishna AND Van Dyck, Michiel AND Albuquerque, Tomé AND Allen, Samantha AND Zhang, Litao AND Mansi, Tommaso AND Liao, Rui AND Xu, Zhoubing},
        title = { { MoMIL: Mixture of Multi-Instance Learners for Modeling Multiple Compound Activities in High Content Imaging } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {371 -- 381}
}

Reviews

Review #1

Please describe the contribution of the paper

The authors propose MoMIL (Mixture of Multi-Instance Learners), a novel multi-task learning (MTL) framework designed to address the challenge of extreme label sparsity in high content imaging (HCI) data for drug discovery. MoMIL combines hard-parameter sharing for global knowledge transfer with assay-specific Multiple Instance Learning (MIL) heads to retain task-specific adaptation. It further introduces three complementary enhancements: (1) foundation models (FMs) pre-trained on HCI for better feature extraction, (2) an assay selection algorithm to improve transfer learning by identifying relevant auxiliary tasks, and (3) a label imputation method using conformal prediction for sparse label augmentation. Evaluated on large-scale datasets (U2OS and iNeuron) with over 200 assays, MoMIL outperforms several state-of-the-art MLL and MTL baselines, achieving up to 8% improvement in AUC and demonstrating generalization to unseen assays by comparision of MoMIL(STL) and full MoMIL archictecture.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Tailored Framework for Extreme Label Sparsity: By incorporating assay selection and label imputation with conformal prediction, MoMIL addressed the realistic challenge of extreme label sparsity (2~5%) in HCI-based drug discovery tasks.
- Biologically-driven Assay Selection Algorithm: The paper introduces a selection algorithm that integrates biological similarity (via STRING), assay performance, and transferability.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Lack of Biological Case Studies: Despite promising assay predictions, the paper does not demonstrate why MoMIL works biologically (e.g., identifying import FOVs for prediction).
- Uncertain Explanation of D, D: Although the authors state that compounds in D and D (used for pretraining) do not overlap, they do not mention whether similar compounds exist between D and D. Did the authors investigate compound similarity between D and D? Or consider strategies like scaffold-split to ensure meaningful separation?
- Request for Standard Deviation in Table 2: It is commendable that the authors conducted statistical significance tests in Table 2. However, it is unclear why these tests were only performed against a specific model. Typically, comparisons are made between the best and the second-best models. To better assess the stability of the model’s performance, the authors should also report the standard deviation for each result.
- On the Construction of Unseen Tasks: The authors state that, for evaluating unseen tasks, “we selected unseen T_P^{relt} sharing the same target but different assay protocols.” However, wouldn’t it be more appropriate to construct unseen tasks based on similar targets rather than identical targets? If the target is exactly the same, differing only in assay protocols, this may not represent a truly novel task. In real-world scenarios, accurately predicting the activity of compounds on completely novel targets—which are not present during training—is a more meaningful and challenging test of generalization.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the paper proposes a well-structured framework with strong empirical results, it lacks essential biological validation and leaves key experimental choices underexplained. Clarifications on dataset separation, statistical reporting, and the construction of unseen tasks are necessary to fully assess the robustness and generalizability of the method.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The paper introduces MoMIL (Mixture of Multi-Instance Learners), a model that takes in data from high content imaging (HCI) single-concentration assays and predicts the likely outcome of multiple different assays, while groundtruth labels only exist for a small number of assays for each sample. Their methodological contributions are (1) training Dino and DinoV2 on two HCI datasets and evaluating these models as feature extractors, (2) an assay selection algorithm to ensure positive knowledge transfer and (3) an adaptive conformal-based label imputation method to generate pseudo-labels which are used to expand impute labels in the training data. They benchmark their approach on two in-house HCI datasets (U2OS, iPSC) against four baseline methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper is written in a clear and technical manner and well understandable.
2. The pipeline is well assembled using many SOTA components (Dino / DinoV2, DSMIL) from the natural image domain.
3. The assay selection algorithm is well motivated and principled and good described in the paper.
4. Generalization to unseen assays seem to work based on their few shot performance results.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Code and data availability is not mentioned in the paper. Will the data and / or the code be made publicly available?
2. More MIL methods could have been tested. That DSMIL beats mean pooling is clear, but does it beat ABMIL for this task?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
1. The authors use a form of hard pseudo-labelling for the data imputation step (i.e. assigning positive and negative labels to wells when their predictions are above / below a certain probability threshold). This might have a negative impact on the calibration of the method. Overall, I think a calibration study of their method could have been interesting.
2. Some information on the computational costs could be added. How long did you train for on what hardware at which batch size?
3. The paper uses many abbreviations and I needed to write a glossary in order to refresh my working memory. Maybe a little less abbreviations would improve readability.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper introduces a novel pipeline for tackling multi-assay prediction with sparse labels. The proposed work seems to be a clear advancement over prior work and their generalization study yields promising results. It’s only a weak accept because of concerns regarding the availability of code and data and the therefore hindered reproducibility of their results and utility of their work to the community.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper
This paper proposes MoMIL, a novel multi-task learning (MTL) framework for predicting compound activities across multiple assays using high content imaging (HCI), especially under conditions of extreme label sparsity commonly seen in drug discovery. The key contributions are:
1. Hybrid Architecture with Assay-Specific MILs MoMIL combines hard-parameter sharing with assay-specific Multiple Instance Learning (MIL) heads, allowing shared representation learning while capturing assay-specific patterns, outperforming conventional multi-label approaches.
2. HCI-Pretrained Vision Transformers The model uses self-supervised ViT-based foundation models trained on large-scale HCI data, resulting in better morphological representations than generic ImageNet features.
3. Assay Selection Algorithm A novel assay selection method is introduced to choose relevant auxiliary assays based on biological similarity, assay performance, and knowledge transferability, improving generalization while reducing model complexity.
4. Adaptive Label Imputation The paper proposes a confidence-based label imputation strategy using conformal prediction, enabling label expansion under extreme sparsity without introducing noise.
5. Extensive Validation and Generalization MoMIL is validated on large U2OS and iNeuron datasets (>40K compounds, 200 assays), achieving up to 9.7% AUC improvement over existing methods and demonstrating strong few-shot generalization to unseen assays. In summary, MoMIL effectively addresses the limitations of prior methods in sparse-label, multi-assay environments, offering a robust and scalable solution for HCI-based drug discovery.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper presents several major strengths. First, the novel architecture of MoMIL, combining hard-parameter sharing with assay-specific MIL heads, enables both robust knowledge sharing and fine-grained assay-level modeling—particularly valuable under extreme label sparsity. Second, the use of HCI-specific foundation models pretrained with self-supervision on large Cell Painting datasets leads to superior feature representations compared to conventional ImageNet-based models. Third, the paper proposes an interpretable assay selection algorithm leveraging biological similarity and transfer influence, which improves learning efficiency and generalization. Fourth, the adaptive label imputation based on conformal prediction addresses missing labels in a statistically grounded manner. Finally, the extensive benchmarking on large, realistic datasets with over 40,000 compounds and 200 assays, including generalization to unseen tasks, demonstrates strong empirical rigor and practical value for real-world drug discovery.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

One potential weakness of the paper is that while the overall architecture of MoMIL is effective, the individual components—such as MIL, hard-parameter sharing, and label imputation—are adaptations of existing techniques rather than fundamentally novel. For example, the use of MIL with attention mechanisms is well-established in prior works such as DSMIL [Li et al., CVPR 2021], and conformal prediction for label imputation has been previously explored [Angelopoulos & Bates, 2021]. The novelty lies more in the thoughtful integration rather than in inventing new algorithms. Additionally, although the assay selection algorithm is biologically informed and interpretable, it relies heavily on external databases (e.g., STRING) and heuristic thresholds, which could limit its adaptability to new domains or noisy biological annotations. Lastly, the method was only evaluated on in-house Cell Painting datasets, limiting external reproducibility or assessment on public benchmarks. Broader validation would further strengthen the work.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My recommendation to accept the paper was based on several compelling strengths. The authors present a well-motivated and technically sound framework, MoMIL, that effectively addresses a critical challenge in HCI-based drug discovery: extreme label sparsity across multiple assays. The integration of assay-specific MIL heads with hard-parameter sharing in a multi-task learning context is both elegant and impactful. The paper goes beyond method development by incorporating biologically grounded assay selection and a principled label imputation strategy, which together significantly enhance performance and generalization. The empirical evaluation is particularly strong, using large-scale, realistic datasets and demonstrating superior performance over multiple strong baselines, including in few-shot and unseen assay settings. These contributions, coupled with careful ablation studies and clear exposition, make the paper a valuable and timely addition to the field.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

R1: Q1: Lack of biological case studies

MoMIL employs MIL to identify assay-relevant regions, within the biological heterogeneity in HCI, and aggregates them optimally via attention. The improved performance of MoMIL over traditional pooling methods, e.g., mean pooling, indicate that traditional methods fail to capture this nuanced information.

Due to the nuanced complexity of assay prediction, driven by subtle compound-target interactions, it is not straightforward to draw definitive biological conclusions from highly attended regions. Consequently, such analyses were not included in the manuscript. These investigations are in progress and the findings will be shared in future work.

Q2: Uncertain explanation of D, D: We thank the reviewer for the observation. While D and D share compounds with chemical similarity, this is common in practice, as FMs are typically pretrained on domain-relevant datasets to ensure generalization for related tasks. Importantly, D* and D have no direct overlap, and scaffold-level splitting was applied to D during downstream evaluation to ensure fairness. To highlight, a scaffold-level splitting between D* and D while balancing sufficient class-wise annotations across multiple assays in D was not feasible due to extreme annotation sparsity.

Q3: Std and significance test: Significance tests were conducted between the best method (MoMIL + AS + LI) and the second-best public method (FM → Multi-label-MIL) based on Average AUC. Standard deviations (computed over 3 runs) were omitted due to space limits but will be included in the final version.

Q4: On the construction of unseen tasks: Thank you for this great suggestion. We evaluated generalization to unseen assays from similar targets. Specifically, for each target of interest in Table 3, we identified similar targets using target similarity scores from STRINGDB. Then we evaluated the generalizability via linear probing on all the assays for the identified targets. For brevity, we report the mean and std of “All_MTL” vs “All_STL*” AUC gains (in %) in the format: Key target: [# similar targets, # related assays, # assays with positive gain, mean, std]

Target1: [6, 16, 11, 2.0, 3.3]

Target2: [12, 26, 19, 1.7, 2.6]

Target3: [2, 8, 4, 1.7, 4.0]

Target4: [5, 12, 5, 2.3, 3.3]

Target5: [1, 3, 2, 0.4, 0.1]

Target6: [3, 10, 8, 1.5, 1.4] In summary 49/75 (65%) assays resulted positive gains, signifying the generalizability of our framework.

R2: Q1: Code and data availability Code will be made publicly available after publication. As noted by Reviewer 1, the paper provides a clear and detailed description of the algorithm to ensure reproducibility. Unfortunately, the data cannot be shared due to proprietary constraints.

Q2: More MIL methods: We appreciate the reviewer’s suggestion. Our main contribution is the Mixture of MILs framework for advancing MTL under extreme label sparsity. The choice of MIL was a hyperparameter, and DSMIL consistently outperformed ABMIL, TransMIL, and Additive ABMIL in extensive evaluations.

R3: Q1: Novelty: We thank the reviewer for recognizing the novelty of MoMIL, which integrates established techniques to address multi-task assay modelling under extreme label sparsity. This approach achieves synergistic improvements and consistently outperforms competing baselines.

Q2: Assay selection algorithm:

We proposed a flexible algorithm that can integrate target relation priors from any knowledge base.

The assay selection algorithm uses a single threshold (th_perf=70%) based on uni-assay validation-set performance. It can be easily adjusted to suit the assays and datasets being analyzed, ensuring adaptability across domains.

Q3: In-house datasets: We acknowledge the need for broader validation. However, this field is emerging, and no public dataset currently provides both large compound diversity and multi-assay annotations. Validation will be prioritized as such datasets become available.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

MoMIL: Mixture of Multi-Instance Learners for Modeling Multiple Compound Activities in High Content Imaging

Author(s):