Abstract

Training deep neural networks for 3D segmentation tasks can be challenging, often requiring efficient and effective strategies to improve model performance. In this study, we introduce a novel approach, DeCode, that utilizes label-derived features for model conditioning to support the decoder in the reconstruction process dynamically, aiming to enhance the efficiency of the training process. DeCode focuses on improving 3D segmentation performance through the incorporation of conditioning embedding with learned numerical representation of 3D-label shape features. Specifically, we develop an approach, where conditioning is applied during the training phase to guide the network toward robust segmentation. When labels are not available during inference, our model infers the necessary conditioning embedding directly from the input data, thanks to a feed-forward network learned during the training phase. This approach is tested using synthetic data and cone-beam computed tomography (CBCT) images of teeth. For CBCT, three datasets are used: one publicly available and two in-house. Our results show that DeCode significantly outperforms traditional, unconditioned models in terms of generalization to unseen data, achieving higher accuracy at a reduced computational cost. This work represents the first of its kind to explore conditioning strategies in 3D data segmentation, offering a novel and more efficient method for leveraging annotated data. Our code, pre-trained models are publicly available at https://github.com/SanoScience/DeCode.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3398_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3398_supp.pdf

Link to the Code Repository

https://github.com/SanoScience/DeCode

Link to the Dataset(s)

https://www.nature.com/articles/s41467-022-29637-2#data-availability

BibTex

@InProceedings{Szc_Let_MICCAI2024,
        author = { Szczepański, Tomasz and Grzeszczyk, Michal K. and Płotka, Szymon and Adamowicz, Arleta and Fudalej, Piotr and Korzeniowski, Przemysław and Trzciński, Tomasz and Sitek, Arkadiusz},
        title = { { Let Me DeCode You: Decoder Conditioning with Tabular Data } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The manuscript introduces an approach DeCode for improving the performance of 3D segmentation tasks. The approach utilizes label-derived features for model conditioning to enhance the efficiency of the training process. DeCode learns conditioning numerical vector representation of 3D-labeled shape features. During training, conditioning is applied to guide the network towards robust segmentation. In the prediction the model infers the conditioning embedding directly from the input data using a feed-forward network learned during training. The approach is tested using synthetic and cone-beam computed tomography (CBCT) images of teeth. The results show that DeCode outperforms on par with existing models at a reduced computational cost.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Using tabular data as conditioning to train a deep segmentation model with less parameters is an idea with novelty.
    2. The ablation study is comprehensive and complete.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The assessment on the CBCT teeth dataset cannot differentiate performances between different models.
    2. The propose approach may have its limitations, e.g. when the structures to segment lack a consistent relationship.
    3. Model training may be more vulnerable to issues in label quality.
    4. Overall the model is on the simpler side.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Source code is provided which adds to the reproducibility of the work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The training of the proposed model would depend on the segmentation results of the targeted structures, and the shape features would need to be extracted. To certain degree, this would add additional processing steps and more vulnerability to label quality issues. Additional experiments may be needed to demonstrate the robustness to noise in the features used to generate the conditioning, and the relationship between model training, label quality, and feature robustness should be discussed.
    2. The CBCT teeth segmentation problem were solved with a decent level of accuracy by a set of existing segmentation models, and the difference between them and the proposed approach was not significant. To further assess the proposed approach and understand its capabilities, the authors should consider more challenging segmentation problems, e.g. the 3D vertebrae segmentation problem.
    3. It is suspected that the approach could be limited to segmenting structures with consistent spatial relationship. The authors are welcomed to provide evidence against this assumption.
    4. For teeth segmentation problem please cite the following publications: Polizzi, A., Quinzi, V., Ronsivalle, V., Venezia, P., Santonocito, S., Lo Giudice, A., … & Isola, G. (2023). Tooth automatic segmentation from CBCT images: a systematic review. Clinical Oral Investigations, 27(7), 3363-3378. Zheng, Q., Gao, Y., Zhou, M., Li, H., Lin, J., Zhang, W., & Chen, X. (2024). Semi or fully automatic tooth segmentation in CBCT images: a review. PeerJ Computer Science, 10, e1994.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed approach is effective in segmenting structures of spatial relationship. The quantitative analysis in complete. The level of clarity is good. However, the experiment on the CBCT teeth dataset showed that the proposed approach did not significantly outperform several existing models which are not considered strong baselines in today’s standard. This may require more challenging experiments to be carried out to fully evaluate the proposed approach.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have done a diligent job responding to the questions raised by the reviewers. In their answers, a clear understanding of the inherent strengths and weaknesses of the proposed framework is valuable. I can adjust the rating to a slightly higher level based on the responses.



Review #2

  • Please describe the contribution of the paper

    The paper shows a novel approach for decoder conditioning in image segmentation using (learned) radiomic feature embeddings. A synthetic dataset 3DeCode is introduced which needs conditioning to correctly segment the given shapes. A comparison between a baseline and different ablations is given and results are discussed thoroughly.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Incorporation of radiomics features encoded into vector and automatically retrieved during inference, deep fusion of features.
    • Introduction of a new simulated dataset which needs conditioning based on shape or size.
    • Authors clearly describe their contribution and give detailed insights about background of the topic.
    • It is a nice feature that the embedding of shape features can be learned quite well and leads to a good segmentation performance.
    • The presentation is clear and thorough.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It would be nice to see some visual comparison / qualitative results for the individual datasets and methods (e.g. the 3DeCode data samples.)
    • For the synthetic dataset: it is obvious that when only some of the shapes are segmented an unconditioned model doesn’t know which of the shapes are relevant.
    • Are there more information about the private datasets like measurements setting etc?

    Minor:

    • Fig. 2 referenced as Fig. 1 (page 4)
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The synthetic dataset is clear (but only with supplementary material). However, performance difference between baseline and conditioned U-Net is a logical consequence of how the synthetic dataset is build. It is stated that the dataset needs conditioning to be segmented accurately. The dataset is good for a proof of concept. However, it would be more interesting to see more evaluation on the real CBCT dataset and visual comparison. Explanation of the shape features would be interesting as well.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Really interesting approach to include radiomics features into a segmentation task and condition a U-Net during each decoding step. A nice property as well is that the shape feature embedding can be learned during the training phase. Minor drawback is the lack of quantitative evaluation of both datasets.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors responded well to the reviewers concerns and plan to address them in the camera-ready version.



Review #3

  • Please describe the contribution of the paper

    Proposes a method to use shape feature embedding to condition segmentation and enhance performance of segmentation methods. This provides an interesting method to add an additional loss which can make better use of annotated data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This makes a clever use of additional image properties that can be calculated to provide conditioning information to a segmentation process. The creation of an artificial dataset that highlights the utility of their contribution is a nice touch.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The presentation of the result in Table 2 is a bit confusing and possibly misleading. CSF in this table is an oracle method - the perfect segmentation was available at runtime. LESF (row 8) is what could actually be done. Oracle methods should be clearly set apart. The conclusions to be drawn from Dataset A, B and VAL might then differ.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    spelling mall should be small, pg 6 The number for center b for DeCode in Table 3 doesnt match the one in table 2. (Minor, doesn’t change conclusion)

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This approach can generalize beyond the specific application presented, and shows an interesting way to squeeze some more value out of annotated data. Strong reproducibility.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I maintain my accept rating for this paper




Author Feedback

We appreciate the Reviewers’ feedback and address their comments here, which we will apply in the camera-ready if accepted. We are pleased that each Reviewer recognized the novelty of our method. R1 (Performance matches SOTA, 3D CBCT not challenging enough) Although our performance matches existing methods, DeCode has a lower generalization error and is statistically significantly better than unconditioned models. Tooth delineation in CBCT scans is challenging due to variability in tooth shape, artifacts, and noise, with differences between scans from various centers hindering generalization. While current methods achieve high DSC results, root segmentation is crucial for clinical applicability, which makes any improvements valuable. Secondly, compared to the similarly performing VNet, DeCode is 10x lighter and trains 4x faster. We also plan to apply DeCode to vertebrae 3D segmentation. R1 (DeCode robustness) In DeCode, an unconditioned residual connection bypasses the conditioning layer, making it more robust to annotation errors and noisy shape features (Fig. 1-Di). It helps the model prioritize image features if conditioning is uninformative regarding segmentation loss, ensuring it performs no worse than one without conditioning in worst-case scenarios. Secondly, DeCode learns shape feature embeddings from many samples, reducing the impact of individual annotation errors. Our study improves the utilization of existing labels and aims to inspire further research in this field. R1 (Method limited to structures with consistent relationships) We appreciate this point and agree it is a limitation; however, physiological structures like teeth, spine, and organs are consistent in shape. This consistency, as seen for teeth in Fig. 2, is used to condition in DeCode. For structures such as tumors, characterized by unpredictable shapes, our approach may face constraints which we will address in the discussion. R1 (Simple model) Our findings highlight the strength of simplicity. DeCode excels in decoding learned shape representations, outperforming statistically significantly unconditioned models on unknown data. Our model has only 4M parameters and requires 3h training to achieve a DSC of 93.83 on an external test set. At the same time, ResUNet34 (DSC of 93.71) barely fits 3D data patches on an 80GB GPU (impossible for SwinUNETR), requires 11h of training, has 70M (17x more) parameters, and yet falls short of DeCode. R1 (Missing citations) We will add citations to related works. R3 Thank you for pointing out typos and for strongly supporting the acceptance of our work. We will mark oracle configurations more clearly. R4 (3DeCode and CBCT qualitative results) Due to space limitations, we presented quantitative results only because they clearly demonstrated better generalization with DeCode. We will add 3D segmentation results in our git repository. R4 (Synthetic dataset needs conditioning) The Reviewer’s point about the need for conditioning in the 3DeCode dataset is valid. This dataset was designed for that purpose. Having confirmed the feasibility of conditioning, we show through extensive quantitative analysis that conditioning with learned radiomics shape features improves generalization to real CBCT data. Similarly, Jacenkow et al. (MICCAI20) showed that without conditioning, they could not segment the desired image quadrant in synthetic 2D data, and demonstrated how such conditioning with the cardiac cycle phase improves segmentation performance. R4 (Shape features explanation) We use PyRadiomics to calculate shape features for each tooth: sphericity, volume, elongation etc. Such morphometric descriptors analyze size, form, and shape, thus closely linked to the morphology of the segmented objects. Incorporating shape features aims to decode morphologically accurate masks. We will provide details on shape features for camera-ready. R4 (Private CBCT measurements setting) Camera-ready will include more details about CBCT dataset.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper is interesting and novel. The rebuttal effectively addressed the reviewer’s concerns. Therefore, I recommend acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper is interesting and novel. The rebuttal effectively addressed the reviewer’s concerns. Therefore, I recommend acceptance.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper proposed, DeCode, an approach using label-derived features for model conditioning to enhance 3D segmentation and training efficiency. DeCode applies conditioning during training and infers the necessary conditioning embedding from input data during inference. It was tested on synthetic data and CBCT images of teeth. All reviewers acknowledged the technical contributions of the proposed method. They also raised some questions about the performance comparison, potential weakness, and some technical details. The authors responded well about these concerns. They pointed out the proposed model better generalizes to unseen data and is much lighter than VNet. Reviewers were convinced to accept this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper proposed, DeCode, an approach using label-derived features for model conditioning to enhance 3D segmentation and training efficiency. DeCode applies conditioning during training and infers the necessary conditioning embedding from input data during inference. It was tested on synthetic data and CBCT images of teeth. All reviewers acknowledged the technical contributions of the proposed method. They also raised some questions about the performance comparison, potential weakness, and some technical details. The authors responded well about these concerns. They pointed out the proposed model better generalizes to unseen data and is much lighter than VNet. Reviewers were convinced to accept this paper.



back to top