Abstract

Drug-target interaction (DTI) prediction is crucial for drug discovery, as it accelerates candidate screening and reduces development costs. However, existing computational methods are often limited to a single perspective and cannot simultaneously consider the biological information and complex associations of drugs and targets. Although multimodal data have been introduced, the complementarity and interaction of multi-source information remain underutilized, making efficient multi-view feature fusion a key challenge. In this paper, we propose a DTI prediction framework based on multi-view feature fusion and contrastive learning, named MFCL-DTI. It integrates sequence feature as well as structural and semantic information of heterogeneous graph. A multi-view adaptive fusion module facilitates cross-view feature fusion, while multi-view contrastive learning enhances feature representation. Experimental results demonstrate that MFCL-DTI outperforms existing methods, validating its effectiveness in DTI prediction.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2714_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZenXia_Multiview_MICCAI2025,
        author = { Zeng, Xiaoting and Li, Li and Liang, Yu and Chen, Weilin and Lei, Baiying},
        title = { { Multiview Feature Fusion and Contrastive Learning for Drug-Target Interaction Prediction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {390 -- 399}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper is about Drug-Target Interaction Prediction, which is an interesting topic. The authors propose a DTI prediction framework based on multiview feature fusion and contrastive learning, named MFCL-DTI. The paper is well written and well organized. However, there are several concerns in the current version of the paper that addressing them will increase the quality of this paper.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1 The research directions in this paper are popular and of interest to readers and can provide new ideas.

    2 The paper is written in a logical and easy to understand manner.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1 In the abstract section, the authors can further describe the challenges of the current research, e.g., what are all the sources of multiview data and what are the practical problems associated with not considering multiview data.

    2 More captions are recommended in figures.

    3 The computational complexity needs to be discussed or be analyzed experimentally.

    4 Are there more publicly available datasets in this research area that could be used for validation? There might be an overfitting effect of performance improvement on a single dataset.

    5 The experimental component is inadequate, especially the in-depth analysis of the results and phenomena.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The results and analysis of the experiment were inadequate.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The author’s response addressed my concerns, and I think the article can be considered accepted.



Review #2

  • Please describe the contribution of the paper

    This work proposed a multi-view feature fusion framework to combine and better utilize the relationship between multi-modal data. They also use contrastive learning techniques to enhance the feature representation. Although the result improves, I still have many concerns including the clarification, novelty, and implementation.

    Weakness

    1. The paper lacks a clear explanation of drug-target interaction and the instruction of this concept is too brief. It is hard to understand what is the concept if the reader is coming from other fields.
    2. The novelty of this work is limited. The techniques used, such as adaptive fusion and constrastive learning, are not new anymore.
    3. From the ablation study, we can see that the benefits from the proposed contrastive learning module is little.
    4. The fusion between drug and target is still a simple concatenation after the authors claim there are better-designed fusion between multi-modal data. The fusion enhancement is mainly focusing on the heterogeneous graph and each modality from drug and target. Can the authors explain why they design like this?
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    as above

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    as above

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    as above

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    I don’t think authors convince me about the novelty and model design question. I still think the innovation is limited. Suggest to reject.



Review #3

  • Please describe the contribution of the paper

    Proposal of an integration framework for drug-target interaction prediction, leveraging both sequence and graph views, where the graph view includes both the node neighborhood view and the meta-path view, novel integration of multi-modal multi-view data. The fusion of multi-view and multi-modal data is a novel approach, especially with the use of an adaptive fusion module that integrates dynamic weighting mechanisms for different views, along with cross-attention to improve feature alignment. This aspect of the paper contributes to the novelty and innovation of the methodology.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) consideration of multiple aspects of data, including the sequence data of drugs and targets, as well as graph representation learning. The approach captures structural features from direct node relationships and semantic features through associations in the graph representation. 2) proposal of dynamic weighting mechanism which assigns adaptive fusion weights to different views (meta-path and neighborhood views) based on their contributions. 3) cross modal attention to merge meta-path, neighborhood and sequence views.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1)The computational complexity of the model seems expensive, especially with the use of 6-7 loss functions. 2) The methodology is evaluated using only a single dataset, which limits the ability to assess the generalizability of the approach across different datasets.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Fig 2 is not readable , as the font size is too small.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the methodology is well explained, and the fusion of the views, with consideration of their contributions and importance, is novel. The approach also has strong potential for application in other multimodal data fusion tasks, making it adaptable to a wide range of problems. However, as mentioned in the weakness section above, there are some concerns about generalizability and computational complexity.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The paper overall is interesting with adequate novelty, although there are two issues 1) figure captions are not much informative 2) The computational complexity needs to be discussed or be analyzed experimentally. but still I think the paper is acceptable.



Review #4

  • Please describe the contribution of the paper

    In this paper, the authors propose a drug–target interaction (DTI) prediction framework based on multi-view feature fusion and contrastive learning. The method effectively integrates drug sequence information, target (protein) sequence information, and a heterogeneous graph comprising drugs, diseases, targets, and side effects.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper presents a well-structured design for feature extraction across multiple data types, including drug and target sequences, as well as a heterogeneous graph that links drugs, diseases, targets, and side effects.

    Within the heterogeneous graph, the authors extract neighborhood view features for nodes and meta-path view features to capture interactions among nodes.

    An adaptive fusion mechanism with learnable attention weights is introduced to effectively integrate information from different feature types.

    Contrastive learning is employed to enhance the feature fusion process, and the resulting fused features are fed into a multilayer perceptron (MLP) for final classification.

    The experimental results demonstrate strong performance compared to seven existing methods, and an ablation study is conducted to further validate the effectiveness of the proposed components.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The main innovation of this paper lies in its overall framework and motivation—applying multi-view feature fusion for drug–target interaction (DTI) prediction. However, the specific implementations, including how features are extracted from different views, how feature fusion is performed, and the use of contrastive learning, largely rely on established methods and do not introduce significant novel techniques.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although it has some weaknesses, this paper offers valuable insights into drug–target interaction (DTI) prediction through the use of multi-view features.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Overall, I believe this paper offers a novel perspective on drug–target interaction (DTI) prediction, which is its main strength. However, its weakness lies in the implementation, as the methods used—such as contrastive learning—are well-established and not particularly innovative. Accept this paper or not is depends on how we weight the strength and weakness.




Author Feedback

Thanks for the rebuttal invitation. We itemize our responses to significant points as follows: 1.Computational complexity (R1,R2): The computational complexity of MFCL-DTI is well-controlled despite multiple loss functions and components. First, contrastive learning uses batch-parallel computation with negative sampling to reduce memory overhead. Second, sequence and graph features are reused in losses to avoid repeated calculation, while lightweight 1D CNN minimize parameters. Third, attention mechanisms in graphs eliminate redundant computation. Besides, precomputing metapath adjacency matrices shifts online computation to offline preprocessing. Experiments confirm the model converges in 40 epochs on standard GPU without memory and computation bottlenecks. 2.Dataset (R1,R2): We use a single dataset based on the current data status of multimodal DTI research: the selected dataset is the only publicly available dataset that contains multimodal information of drug sequences, target sequences, and heterogeneous graphs. Although theoretical overfitting risks exist, the model’s performance gains are shown to stem from architectural improvement rather than dataset specificity, as evidenced by the regularization effect of contrastive learning and ablation study indicating independent contributions from each module. These findings show the intrinsic robustness of our model, rather than bias from the dataset itself. 3.Experiment results and analysis (R2): Due to space limitations, we have analyzed the experimental results as much as possible. Given the scarcity of multimodal methods in DTI prediction, our model compares with main existing methods. The results include comparisons against 7 state-of-the-art methods, ablation studies, and case studies, validating the method’s effectiveness. 4.Model innovation (R3,R4): The innovation of our model lies in addressing the insufficient multiview fusion in DTI prediction. Although adaptive fusion and contrastive learning techniques are not original, we achieve the following breakthroughs: First, a multiview adaptive fusion module is designed to integrate semantic and structural features from metapath and neiborhood views, while using sequence and graph views for cross-modal feature complementarity. Second, a contrastive learning strategy is innovatively developed to collaborate across modals (sequence-graph) and within modal. Experiments show these innovations improve performance and the ideas hold potential for generalization to other multimodal tasks. 5.Comparative learning effect (R3): Ablation studies confirm that contrastive learning enhances feature representation and improves performance. The full model achieves 2% higher AUC (0.9472 vs 0.9274) and 2.7% higher AUPR (0.9512 vs 0.9239) compared to variants without contrastive learning. Notably, removing either sequence-graph or metapath-neighborhood views contrastive learning alone causes 1.5% and 1.8% AUC drops respectively, with more declines in ACC and F1. In DTI prediction, a 2% AUC improvement is considered significant, and our results not only meet this benchmark but also show the critical role of contrastive learning through ablation study. 6.Model design (R3): In feature extraction, the MVAF module fully captures intra-modal complementarity and unique information of each modality, generating comprehensive drug and target features. We use concatenation for DTI feature extraction because the fused features already contain rich modal information. Direct concatenation preserves feature space consistency, avoids disrupting learned features, and allows classifier to learn interaction by non-linear transformation. With minimal computational cost, it reduces parameter redundancy while retaining interaction modeling capacity. This design ensures DTI feature extraction and balances efficiency with prediction accuracy. 7.Issues such as the abstract, concept explanation, and figure title (R2,R3): The final version will be revised accordingly.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors propose a drug–target interaction prediction framework using multi-modal and multi-view feature fusion, integrating sequence and heterogeneous graphs. The use of meta-path and neighborhood views, adaptive fusion, and cross-modal attention is the main innovation. While concerns remain about implementation and computational cost, the method is well-justified. Post-rebuttal responses addressed key issues. I recommend acceptance.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top