List of Papers Browse by Subject Areas Author List
Abstract
Digital pathology slides can serve medical practitioners or aid in computer-assisted diagnosis and treatment. Collection personnel typically employ hyperspectral microscopes to scan pathology slides into Whole Slide Images (WSI) with pixel counts reaching the million level. However, this process incurs significant acquisition time and data storage costs. Utilizing super-resolution imaging techniques to enhance low-resolution pathological images enables downstream analysis of pathological tissue slice data under low-resource and cost-effective medical conditions. Nevertheless, existing super-resolution methods cannot integrate attention information containing variable receptive fields and effective means to handle distortions and artifacts in the output data. This leads to differences between super-resolution images and authentic images depicting cell contours and tissue morphology. We propose a method named MiHATP: A Multi(Mi)-Hybrid(H) Attention(A) Network Based on Transformation(T) Pool(P) Contrastive Learning to address these challenges. By constructing contrastive losses through reversible image transformation and irreversible low-quality image transformation, MiHATP effectively reduces distortion in super-resolution pathological images. Additionally, within MiHATP, we design a Multi-Hybrid Attention structure to ensure strong modeling capability for long-distance and short-distance information, thereby ensuring that the super-resolution network can obtain richer image information. Experimental results demonstrate superior performance compared to existing methods. Furthermore, we conduct tests on the output images of the super-resolution network for downstream cell segmentation and phenotypes tasks, achieving performance similar to that of original high-resolution images.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1260_paper.pdf
SharedIt Link: https://rdcu.be/dV5Eo
SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72104-5_47
Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1260_supp.pdf
Link to the Code Repository
https://github.com/rabberk/MiHATP.git
Link to the Dataset(s)
N/A
BibTex
@InProceedings{Xu_MiHATPA_MICCAI2024,
author = { Xu, Zhufeng and Qin, Jiaxin and Li, Chenhao and Bu, Dechao and Zhao, Yi},
title = { { MiHATP:A Multi-Hybrid Attention Super-Resolution Network for Pathological Image Based on Transformation Pool Contrastive Learning } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
year = {2024},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15007},
month = {October},
page = {488 -- 497}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper presents MiHATP, a novel super-resolution method for pathological images, includes a contrastive learning framework that constructs contrastive losses through reversible and irreversible image transformations in both image and feature spaces. This aims to reduce distortion in super-resolution pathological images. The proposed Multi-Hybrid Attention mechanism combines multiple attention strategies, allowing the network to adaptively acquire suitable receptive field information.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Strengths:
- The use of reversible and irreversible transformation pools for constructing positive and negative samples, respectively, for contrastive learning is a novel formulation. This aims to preserve quality and avoid introducing artifacts in super-resolution pathological images, which is an important consideration in medical imaging applications.
- The Multi-Hybrid Attention mechanism is a smart way to integrate different attention strategies, enabling the network to capture both short and long-range dependencies effectively.
- The paper demonstrates the clinical feasibility of the proposed method by evaluating its performance on downstream tasks such as cell segmentation and phenotype classification, in addition to standard super-resolution metrics.
- The evaluation is comprehensive, including comparisons with several state-of-the-art methods on multiple datasets and magnification factors, as well as ablation studies to analyze the impact of different components and hyper-parameters.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Weaknesses:
- While the authors claim that their method outperforms existing super-resolution methods, they do not provide specific references or comparisons to other contrastive learning-based super-resolution approaches.
- The paper lacks details on the computational complexity and inference time of the proposed method, which could be important considerations for practical deployment in clinical settings.
- The paper does not discuss the potential limitations or failure cases of the proposed method, which could provide valuable insights for future research.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Do you have any additional comments regarding the paper’s reproducibility?
N/A
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
- It would be beneficial to include a discussion on the potential limitations or failure cases of the proposed method, such as specific types of pathological images or patterns where it might not perform well.
- Explore the possibility of extending the proposed method to other medical imaging modalities or applications beyond pathological images, such as CT, MRI, or microscopy data. This could further demonstrate the generalizability and potential impact of the method.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Weak Accept — could be accepted, dependent on rebuttal (4)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Paper presents a well-motivated and promising approach to super-resolution for pathological images, with a strong evaluation and potential clinical impact. Addressing the weaknesses and incorporating the additional comments could further strengthen the paper and provide valuable insights for both researchers and practitioners in the field.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #2
- Please describe the contribution of the paper
This paper proposes a contrastive learning based method for single-image super-resolution of pathological images. It uses a novel multi-bybrid attention based network architecture to capture long-range dependencies in the input images. The network has two branches. The base branch generates the super-resolution image from the low-resolution image, and the contrastive branch generates positive samples for contrastive learning. To generate these positive samples, some reversible transformations are first applied to the low resolution image, which is then passed through the super-resolution network, and then the respective inverse transforms are applied. In order to generate low-quality negative samples, irreversible transformations are applied to the ground truth high resolution image. Finally, contrastive learning is applied to these positive and negative samples in both the image space and feature space, by passing through a feature extraction network.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- This paper combines two strong approaches to super-resolution: use of attention mechanism from vision transformers, and contrastive learning, to propose a new method.
- The method proposed in the paper for generating positive and negative samples for contrastive learning is novel and well suited for pathological images.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
-
The proposed method is not easy to follow. While individual components such as reversible and irreversible transformations, multi-hybrid architecture, etc. have been explained, an overall picture of the method seems missing. Besides, several things are left unsaid that the reader is expected to know a-priori. For instance, why are the desirable properties of positive and negative samples? What motivates the choice of transforms used in the reversible and irreversible transformation pools?
-
Several concepts stated in the paper such as RHAG, OCAB, pixel shuffle, CAB, DAT have not been defined and/or not referenced.
-
The method mainly derives from two preceding papers on super-resolution: HAT [5] for the hybrid attention modules, and PCL-SISR [24] for contrastive learning based super-resolution. Howerver, neither of these methods has been compared with in the experimental study in Table 1.
-
The proposed method modifies the HAB module of [5] by adding a DAT block to it. However, the motivation behind this is unclear. Further, the ablation study does not discuss the impact of using the additional DAT block.
-
In section 2.3, authors claim they propose a novel contrastive loss function that works in both image and feature space. However, it is unclear how the proposed loss is different from that in [24] that also performs a summation over the layers of the embedding network.
-
The details of the feature extraction network used for contrastive loss calculation are not mentioned.
-
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Do you have any additional comments regarding the paper’s reproducibility?
N/A
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
Please refer to the weaknesses.
A few additional minor comments:
- Please provide an intuition behind opting for a contrastive learning based super-resolution method over others.
- Section 2.2: “However, this approach …. long-range attention information.” Provide reference here.
- Section 2.2: “alpha and beta are weights set to prevent and avoid the possible conflict”. What conflict?
- Please explain what is the baseline in the ablation study. Also, what does it mean to not use M-HAB?
- What do the numbers below the images in Fig 2 depict?
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Weak Accept — could be accepted, dependent on rebuttal (4)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Overall, the method seems interesting, but the paper does not provide the motivation and intuition behind several of the components in their proposed method. It also lacks experimental comparison with the two paper that it is majorly based on.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #3
- Please describe the contribution of the paper
This work introduces MiHATP (Multi-Hybrid Attention Network Based on Transformation Pool Contrastive Learning) utilizing contrastive losses, in reversible and irreversible low-resolution image transformations, and multi-hybrid attention for super-resolution in digital pathology. The authors validate the super-resolution results (at multiple scales) in cell segmentation and phenotypes, showcasing improved image quality and model performance compared with several super-resolution methods over two datasets.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well structured and written.
- Super-resolution in medical imaging is a relevant topic with extensions beyond digital pathology. Findings and experiments in one modality can greatly benefit further applications.
- The evaluation of the suggested method, including comparisons with several other baselines, two datasets, an ablation study, and application showcase in a downstream task.
- The super-resolution image quality and similarily result gains (compared to other methods) when the network is trained on the lowest-resolution patches.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- No extensive mention of the reversible and irreversible transformation pools. Which transformations were tested and selected, any ablation conducted for them?
- Driven by the bicubic performance on the cell segmentation and phenotypes, it would be interesting to see it as a method in Table 1, to assess the image quality and similarity metrics compared to the sota super-resolution methods.
- Per class accuracies or AUC would be more informative than accuracy for the phenotype classification task.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Do you have any additional comments regarding the paper’s reproducibility?
N/A
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
Dear authors, thank you for your work. Few points to consider, in addition to the above:
- Please revise the text for clear sentence formulation and typos. E.g., sec. 2, par. 1: “The entire pipeline of the MiHAPT work (Fig. 1) consists of a Dual-Branch Structure and a Contrastive Process, the framework operates as follows: “; sec 2.2, par. 3: “α and β are the weights set to prevent the avoid the possible conflict”; sec. 2.3, par. 1: “In MiHATP, The total loss…”.
- You mention that for all experiments, they randomly selected 2000 and 100 patches for the training and the test sets respectively. What does apply in the cell segmentation and phenotypes tasks (for all explored methods)? This task is tested on the CoNSeP test set, applying the super-resolution net trained in COAD and with no further training. Is this test set patch-based selected? If yes, it would be more clinically relevant to a patient-based split (it is clear that this is not the paper’s objective, but it would give a better sense of the method’s applicability in reality).
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Accept — should be accepted, independent of rebuttal (5)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This is a well-studied and presented work, combining some of the latest advances in attention mechanisms and contrastive learning to perform super-resolution in digital pathology images.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Author Feedback
We thank all reviewers for their constructive comments. In our approach, we propose a transform-pool-based contrast learning super-resolution framework to effectively improve the model’s superpartition performance. Here we try our best to clarify the reviewer’s main questions as follows within the scope permitted by the response rules: Computational complexity [r1] : Although two basic models of brench are required for training during the training of the model, in the actual reasoning process, we only need to use base brench that has been strengthened by comparative learning for reasoning. Under our experimental conditions, the pre-processing of pathological data can reach an average speed of ten to the minus second order of magnitude per patch (there will be certain differences in the actual test of different devices), which is completely feasible in the real application background. About the transformation pool [r3] : We are sorry for the undetailed explanation in the previous version. In our Settings, we selected rotation and image inversion at different angles for the reversible transformation part, and Gaussian filter, salt and pepper noise, median filter and arithmetic average filter were selected for the irreversible transformation pool. Random sampling of these transformations was performed during the training process. As for the evaluation of bicubic method, our original intention is to show the results after pre-evaluation with non-parametric interpolation method using low-resolution pathologic patch for the downstream verification task. In other words, it is to show the potential performance limit of various super-resolution images on this task. Therefore, due to space and other reasons, After careful consideration, we concluded that bicubic as an interpolation scheme and various models with trainable parameters were not comparable in the direct representation of super resolution, and finally did not use it as an evaluation method for our comparison. Method compliance [r4] : First of all, I would like to thank the reviewers for their questions about the method framework, which will be elaborated in more detail in our camera ready version. As for the selection of reversible transformation and irreversible transformation, I have explained in the previous answer. For the selection of irreversible transformation, we hope to select some noise, ambiguity and other problems that may exist in hypercomponent problems. We artificially reproduce part of this kind of data and adopt a contra learning scheme to avoid the same problem caused by our method (extrapolating the negative example set) as much as possible. For the selection of reversible transformations, we believe that all reversible image transformations can be selected for sampling. About the definition of some modules: due to space limitations, we did not expand some modules in detail in the body, but their definitions are consistent with the original sources, you can refer to the following literature For HAB block: Chen, X., Wang, X., Zhou, J., Qiao, Y., Dong, C.: Activating more pixels in image super-resolution transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 22367–22377 (2023) For DAT block: Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4794–4803 (2022) On the difference between comparative learning loss functions [r4] : layer 0 is included in the calculation of our contrast function, that is, the comparison calculation of information before the input feature extraction network. In other words, the distribution of image space is also included in the calculation of contrast learning. In order to ensure the aesthetic definition of the formula in this paper, we did not define the contrast between feature space and image space separately.
Meta-Review
Meta-review not available, early accepted paper.