Abstract

Microscopic hyperspectral image segmentation faces dual challenges of limited labeled data and insufficient utilization of unlabeled data. However, existing semi-supervised methods often isolate the training processes for labeled and unlabeled data, neglecting their potential synergistic effects. To address this, we propose a semi-supervised method based on Virtual Domain Collaborative Learning (VDCL) to enhance the collaborative learning ability between labeled and unlabeled data and improve the quality of pseudo-labels. Specifically, by combining unlabeled background with labeled foreground and labeled background with unlabeled foreground to construct virtual domain data pairs, we established a collaborative learning bridge between labeled and unlabeled samples. Furthermore, we establish a repository of optimal models and employ an alternating co-training strategy. The current and historically optimal models jointly guide training, and this dynamic framework significantly improves pseudo-labels quality. We have verified the novel semi-supervised segmentation method on the widely-used public microscopic hyperspectral choledoch dataset from Kaggle and the oral squamous cell carcinoma dataset. On these datasets, our method has achieved the state-of-the-art performance. The code is available at https://github.com/Qugeryolo/Virual-Domain.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1142_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Qugeryolo/Virual-Domain

Link to the Dataset(s)

https://www.kaggle.com/datasets/ethelzq/multidimensional-choledoch-database/

BibTex

@InProceedings{QinGen_AVirtual_MICCAI2025,
        author = { Qin, Geng and Liu, Huan and Li, Wei and Zhang, Haihao and Guo, Yuxing},
        title = { { A Virtual Domain Collaborative Learning Framework for Semi-supervised Microscopic Hyperspectral Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {23 -- 32}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a technique to generate a virtual image that combines labeled and unlabeled data and as such the training can benefit from the unlabeled data too.

    • creates virtual foreground and virtual background images by combining the labeled and unlabeled data
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • consistently better performance than compared methods
    • maintains Dice coefficient even when the labeled/unlabeled get to 5% / 96%
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • the formulation of the “model repository” would benefit from added detail
    • are only two models kept?
    • is one of the models always trained from scratch?
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Please, update figure captions + detail what the symbols (fire) and colors mean in the caption rather than only in the text. Fig 1 - both top and bottom rows seem to be identical. Is this expected? Run a spellchecker, please (e.g. virual, currunt…)

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes an interesting approach to use unlabeled data. Fusing labeled images with unlabeled ones would be rather sound as a standalone data augmentation technique. However, the paper proposes an additional tool - model repository - and it is not obvious whether the improvements are coming from the first or the second part. A short ablation study and an improvement in paper clarity would be beneficial. Therefore, I recommend weak rejection of the paper in its current state.

  • Reviewer confidence

    Not confident (1)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors propose a semi-supervised method called virtual domain collaborative learning (VDCL) to enhance collaborative learning between labelled and unlabeled data. They combine unlabeled background with labeled foreground data to construct virtual domain data pairs for collaborative learning. The paper, for the most part, is written well; however, the motivation for evaluating performance only on MHSI data is unclear. I am also concerned about the size and the quality of the dataset used in the experiments.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Following are the main strengths of this paper:

    1. The literature survey of semi-supervised approaches for natural images is quite extensive and up to date. However, I cannot say the for medical imaging data.

    2. The figures and illustrations are detailing and provide a good overview of the proposed approach.

    3. The results seem quite detailed and extensive and performance has been compared against a number of baseline semi-supervised algorithms.

    4. The idea of creating bridges between labeled and unlabeled data is interesting; although, I feel that the authors have not done a good job of motivating the need for this approach.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    There are a number of major weaknesses which make me somewhat skeptical of the efficacy of the proposed approach.

    1. The paper does not clearly establish why the approach is tested only on HSI data. The datasets used seem small in size e.g., why is only a small subset of the HSI dataset (containing 325 scenes, available on Kaggle) used when the full dataset is also accessible (880 scenes, available on original dataset portal) to researchers? Furthermore, authors have not elaborated whether they have extracted any patches or spectral bands from each scene. Without this information, a reader with no substantial background on HSI data is inclined to think that the dataset of 325 scenes (1280x124) pixels is too small for effectively testing a semi-supervised approach.

    2. Authors state (in first paragraph of page-2) that labels for HSI data are scarce and challenging to generate; however, without concrete numbers this statement seems to lose impact. Furthermore, all papers listed in the 2nd paragraph page-2, [12-20], with one exception (ref [18]) are tested on natural image datasets which tend to have thousands of images. This raises the question, wouldn’t it be better to test performance on natural image datasets first and then later extend the approach to HSI data?

    3. I don’t agree with use of the term “optimal models”; optimal in what sense? Unless this is a term widely accepted by the community, I would recommend replacing with best performing models or a similar term. The authors claim that their approach exploits historical memory of optimal models, I wouldn’t use the term ‘optimal’ unless there is theoretical evidence to support this.

    4. The acronym ACTL suddenly appears in section 3.3 without any definition or prior context, leaving the reader confused about what it represents. Similarly, on page-4, the symbols M1 and M2 are added after eqn (4) without clearly stating what they denote; the reader needs to move to the next section (2.3) to infer that these denote model 1 and model 2.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is interesting to read and proposes a novel technique to tackle an interesting problem. However, I am concerned about the size and quality of the dataset used during the experiments. I also feel that the authors need to review section 2 and section 3 to ensure that all terms/acronyms are clearly defined. They should also provide a clearer description of the used dataset and clearly justify that a sufficient amount of data was used to ensure good generalization performance and that overfitting was avoided.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper introduces a novel framework for microscopic hyperspectral imaging (MHSI) called Virtual Domain Collaborative Learning (VDCL). The main idea is that it constructs virtual domains that mix labeled and unlabeled data to enable collaborative learning for better identification of tumors.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The paper is well drafted. 2) It introduces the concept of virtual domain construction, where labeled foregrounds are combined with unlabeled backgrounds (and vice versa). and also introduces a dynamic co-training framework, which exploits the historical memory of optimal models. 3) Shows comparisons with other methods on the two dataset.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1) In Fig. 2 (first set of images), it’s unclear how the ground truth (image b) was obtained. Was it labeled by a trained pathologist? The ground truth seems to offer a rough boundary guide rather than a precise segmentation. 2) No Fig 3 in manuscript. (I beleive its Fig 2 itself). 3) While the paper gives a qualitative undertanding (which is appreciated) of the metrics: false positives, false negatives, true positives, and true negatives, the authors could also quantify the same. Including these metrics—possibly based on area or pixel counts—would provide a more comprehensive understanding of the model’s strengths and/or weaknesses.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Well writtern and has a impact in the field of HSI.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Response to Reviewer 1 Thank you very much for your useful and detailed comments. Below are our responses. 1 The original dataset available on the official portal indeed contains 880 scenes, but this includes RGB images and classification labels. For the segmentation task with corresponding pixel-level annotations required in our study, only 325 hyperspectral scenes are available. Additionally, to facilitate training with high-dimensional and complex hyperspectral data, we partition each scene of size 1280×1024 pixels into non-overlapping patches of 256×256 pixels. This patch-based strategy ensures manageable input sizes and better computational efficiency without compromising data integrity. 2 Hyperspectral imaging (HSI) data indeed present unique challenges compared to natural images. Specifically, hyperspectral data typically have lower spatial resolution than RGB images, which makes manual annotation significantly more time-consuming and labor-intensive. Annotators must carefully inspect the spatial information to accurately delineate lesion areas, increasing the difficulty and cost of obtaining reliable pixel-level labels. 3 In our manuscript, the term “optimal models” was intended to refer to models achieving the highest Dice Similarity Coefficient (DSC) scores during training, which we use as the primary evaluation metric for segmentation performance. We agree that without theoretical guarantees, “optimal” may be misleading and is not a standard term in this context. 4 We acknowledge that the acronym ACTL was introduced in Section 3.3 without a prior definition, which could cause confusion. Similarly, the symbols M1 and M2 were added after Equation (4) without explicit explanation at that point in the text. To address this, we will revise the manuscript by introducing ACTL (Alternate Co-training Learning) with a clear definition when it first appears and explicitly define M1 and M2 as model 1 and model 2 immediately after their introduction in Equation (4). Response to Reviewer 2 Thank you very much for your useful and detailed comments. Below are our responses. 1 we construct a model repository consisting of historically best-performing models (M2_hist) selected based on their DSC scores during training. At each stage, models achieving superior DSC performance are stored in this repository and subsequently used to generate more reliable pseudo-labels, thereby improving supervision for unlabeled data. In our approach, M2 alternates between the current model (M2_curr) and the best-performing historical model (M2_hist), enabling a balance between current learning and stable guidance from past optimal states. We will revise the manuscript to incorporate this explanation in Section 3.3 to improve clarity. 2 In our current implementation, the model repository dynamically retains the single best-performing historical model (M2_hist) based on its DSC score during training, alongside the current model (M2_curr). 3 In our framework, neither of the models is trained from scratch at each iteration. Both M2_curr and M2_hist are initialized from the same initial state at the start of training. As training progresses, M2_curr is continuously updated via backpropagation, while M2_hist retains the parameters of the historically best-performing model based on its DSC score. Response to Reviewer 3 Thank you very much for your useful and detailed comments. Below are our responses. 1 The ground truth shown in Fig. 2(b) was indeed annotated by experienced pathologists. Due to the inherent difficulty of precisely delineating lesion boundaries in hyperspectral images, which often have lower spatial resolution and complex tissue structures. 2 We will correct this in the revised manuscript to avoid any confusion. 3 In our study, we primarily use quantitative metrics such as Dice Similarity Coefficient (DSC) and Intersection over Union (IoU) to comprehensively evaluate segmentation performance.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top