List of Papers Browse by Subject Areas Author List
Abstract
This study addresses the challenge of detecting bone marrow edema (BME) using dual-energy CT (DECT), a task complicated by the lower contrast DECT offers compared to MRI and the presence of artifacts inherent in the image formation process. Despite the advancements in AI-based solutions for image enhancement, achieving an artifact-free outcome in DECT remains difficult due to the impracticality of obtaining paired ground-truth and artifact-containing images for supervised learning. To overcome this, we explore unsupervised techniques such as CycleGAN and AttGAN for artifact removal, which, while effective in other domains, face challenges in DECT due to the similarity between artifact and pathological patterns. Our contribution, the Conditional Attribute Preservation through Unveiling Realistic GAN (CAPTURE-GAN), innovatively combines a generative model with conditional constraints through masking and classification models to not only minimize artifacts but also preserve the pathology of BME and the anatomical integrity of bone. By incorporating bone priors into CycleGAN and adding a disease classification network, CAPTURE-GAN significantly improves the specificity and sensitivity of BME detection in DECT imaging. Our approach demonstrates a substantial enhancement in generating artifact-free images, ensuring that critical diagnostic patterns are not obscured, thereby advancing the potential for DECT in diagnosing and localizing lesions accurately.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3097_paper.pdf
SharedIt Link: https://rdcu.be/dV5BJ
SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72104-5_15
Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3097_supp.pdf
Link to the Code Repository
https://github.com/pnu-amilab/CAPTURE-GAN
Link to the Dataset(s)
N/A
BibTex
@InProceedings{Par_CAPTUREGAN_MICCAI2024,
author = { Park, Chunsu and Kim, Seonho and Lee, DongEon and Lee, SiYeoul and Kambaluru, Ashok and Park, Chankue and Kim, MinWoo},
title = { { CAPTURE-GAN: Conditional Attribute Preservation through Unveiling Realistic GAN for artifact removal in dual-energy CT imaging } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
year = {2024},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15007},
month = {October},
page = {150 -- 160}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper introduces CAPTURE-GAN, a novel framework for artifact removal in dual-energy CT (DECT) images. This GAN-based model leverages unsupervised learning techniques to address the challenge of DECT artifact removal without requiring paired images. It innovatively combines a generative model with a pre-trained classifier and automatically generated masks to selectively modify local regions, thus preserving the critical structural and pathological details necessary for medical diagnosis.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The CAPTURE-GAN article highlights several significant advantages in the field of medical imaging.
- it introduces an innovative approach by integrating CycleGAN with automatically generated masks and a disease classification constraint, which is specifically designed to preserve bone structures and pathological patterns while effectively removing imaging artifacts.
- it makes efficient use of original data by utilizing unpaired DECT and MRI images, allowing the model to train on more realistic datasets without the need for matched pairs.
- the model’s effectiveness is confirmed through comprehensive qualitative and quantitative evaluations, showing that CAPTURE-GAN outperforms existing models in maintaining important clinical features while removing artifacts.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Lack of Comparisons with recent Methods: While the paper compares the proposed model with several existing methods, it could strengthen its argument by including more recent and directly comparable methods in artifact removal specifically for DECT images.
- The paper could improve the explanation of how the masks are generated and applied within the model. Details about the thresholding and the specific steps involved in creating masks from the different images could be better illustrated or described to enhance reproducibility.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Do you have any additional comments regarding the paper’s reproducibility?
N/A
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
- Needs to be compared with more recent GAN methods, such as U-Net GAN and Res-UNet GAN.
- Needs to give the more detailed reasons of using the weights of losses.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Weak Reject — could be rejected, dependent on rebuttal (3)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper did not redo the experiments properly, cutting corners in the ablation study, and most importantly, it did not compare with the latest methods, which may mean it is not truly state-of-the-art.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Reject — should be rejected, independent of rebuttal (2)
- [Post rebuttal] Please justify your decision
The authors acknowledge that there are some recent methods, but these methods are substantially different from their philosophy, and as a result, what is the essential difference not stated in the rebuttal?Our doubts are equivalent to not being fully answered
Review #2
- Please describe the contribution of the paper
The authors propose a clever idea to remove artifacts from DECT images, while preserving pathological disease areas. The main idea is based around a CycleGan framework, but augmenting this with a disease and artifact classifiers to preserve patterns in the image that relate to such disease. Automatic masking is also used to further focus the artifact detection on the bone region.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The main idea of distinguishing between artifacts and disease is creative and clever.
- The paper is written well and easy to follow.
- The results show excellent performance of the classifiers, re-generation of artifact free images, with the ablation highlighting the contribution of several components. The visual results, also those in the supplement, support the idea that some areas are corrected (red boxes relating to artifacts) while others are maintained (blue arrows relating to disease), better than other methods.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- It is not clear from the paper what the artifacts are that the DECT images suffer from. Also the visualization do no tell me what sort of artifacts we are looking at, and if the changes in the red boxes are truly artifacts that get removed, or simply “a difference”. This then remains an abstract concept throughout the paper.
- For the loss weights “empirical” values were chosen (section 2.3), while section 3.1 reports the use of only a training and a test set, but no validation set. This sounds to me like these weights and presumably other hyper-parameters as well, were tuned on the test set.
- If I understand correctly, the method assumes artifacts and disease are located in different areas. Is this correct? What happens when artifacts overlap disease areas? Should conclusions then be sharpened to consider this?
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Do you have any additional comments regarding the paper’s reproducibility?
no
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
- It is not clear to me why artifacts can occur on bones only. This seems unrealistic. Please comment.
- The masks in the paper are roundish in shape while common streaking artifacts in CT take linear form. Also ringing, motion and beam hardening effects are much more thin and not perse local. Can the method deal with those as well? Should the conclusions be sharpened so that only certain artifacts are removed while others are not addressed by this method?
- Please comment on the following: If I understand the method correctly, the classifiers can retain certain patterns (very nice!), but only those that you model for, in this case the disease label. Would it be possible that the method is at risk to clean up too much still, as there may be possible unknown patterns you have not build a classifier for? Or are there mechanisms in the CycleGAN that would prevent too much clean up?
- What was the MRI used for exactly? The term “precise annotation” is not clear enough to me.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Weak Accept — could be accepted, dependent on rebuttal (4)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The main strength of the paper is the technical novelty of the method, with a clever core idea. The loose concept of what an artifact is, and in particular a (potential) tuning on the test set, limits the value of the paper somewhat.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Weak Accept — could be accepted, dependent on rebuttal (4)
- [Post rebuttal] Please justify your decision
addressed part of the concerns, but importantly did not address the potential tuning on the test set.
Review #3
- Please describe the contribution of the paper
The paper proposes a method for artifact removal for dual-energe CT images. The method uses CycleGAN as baseline, and adds masks as input to the artifact-generating generator to specify location of synthesized artifact, as well as an edema classifier to help preserve pathological details in the artifact-removed image. Experiments show that images produced by the method lead to higher accuracy for the edema classification model than other approaches.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper is well-written and well-structured. The idea of using mask-conditioning to introduce domain knowledge is interesting. By constraining the mask foreground inside the bone, it helps the generator synthesize artifacts within bone regions.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Method evaluation seems not convincing enough. Most of the comparison are done visually, or via performance on downstream tasks such as edema classification or artifact detection. There is no direct comparison of the artifact-removal results with a ground-truth image. Although ground-truth may be hard to obtain in this setting, would it be possible to simulate some artifact-corrupted images from artifact-free images and evaluate the methods on simulated results?
Futhermore, the artifact detection accuracy metric is quite confusing. This metric appears in Fig 3. and Table 1. Intuitively, it is a metric for artifact detection, not artifact removal. Based on the reviewer’s understanding, the authors’ rationale might be, if the detector cannot detect artifact in output image, then artifact removal is effective. Then the metric should probably be “artifact detection rate”, and the number should be the lower the better. However, in the paper, artifact detection accuracy is higher for the proposed method. It is unclear how to interprete these numbers.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Do you have any additional comments regarding the paper’s reproducibility?
N/A
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
Questions and comments:
How is the generator G_a conditioned on mask? Is the mask an extra input channel?
“This mask is derived during the forward cycle from the difference between the input and the output of the artifact-removal generator Gf , calculated as U(max(Xa− ˆXaf , 0), θ), with U(·) representing the unit (binary) step function and θ a prede ned threshold value.” Why is the difference defined as max(X_a - \hat{X}{af}, 0), rather than the absolute value of X_a - \hat{X}{af}?
Section 2.2 says the arifiact mask uses a predefined threshold \theta, but Fig. 1 shows Ostu thresholding, which is adaptive thresholding. It is a bit confusing.
Eq. 5 seems to have a slight error. The classifier takes the images produced by artifact-removal generator as input. However, in Eq. 5, the classifier input is real artifact-free/corrupted images, not related to the generator.
Some sentences read incorrect. For example, “The preservation of edema was assessed using a pre-trained disease classifier obtained by processing artifact-corrupted images through each model, were inputted into the classifier to evaluate diagnoistic scores.”
“It is important to note that the filtering resulted in the edema classifier, which evaluated the fitered images, displaying significantly improved diagnostic scores over the classifier that evaluated unfiltered images.”
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Weak Accept — could be accepted, dependent on rebuttal (4)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Although the paper presents an interesting approach clearly, the validation is not convincing enough and somewhat confusing.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Weak Accept — could be accepted, dependent on rebuttal (4)
- [Post rebuttal] Please justify your decision
The authors clear most of my concerns, if they implement the clarification and modification promised in the rebuttal. I am still concerned by the lack of evaluation against ground-truth artifact-free images, which explains my score of 4.
Author Feedback
We appreciate the thoughtful analysis provided by the reviewers (R1, R3, R4). Below, we address their main concerns regarding our paper point by point.
Mask Generation Description (R1, R4): We acknowledge the condensed expression of the mask generation process. We will provide a more detailed explanation in each step and add appropriate references in the revised version. Specifically, the artifact mask was generated by computing the difference between the input and output of G_f, zero-filling negative parts to highlight positive discrepancies as artifacts, and applying Otsu thresholding [N. Otsu et al., 1975] to exclude minor changes that are not considered artifacts. The bone mask was produced using a standard graph-cut technique [Y. Boykov et al., 2006] to delineate the bone area, and the intersecting area of both masks was used as additional input channels for G_a. Comparisons with Recent Methods (R1): We carefully selected models based on the unsupervised GAN framework for attribute editing tasks. Although suggested models such as U-Net GAN and Res-UNet GAN are more recent, they differs fundamentally from our study’s approach manipulating images based on a user-specified attributes. Loss Weight Determination (R1, R3): The preservation of the outline and internal structure of the bones is crucial for our generated images. We evaluated 100 randomly selected images from the training set, choosing final weights that produced the most realistic images while maintaining bone structure and edema patterns. We provided quantitative evaluations for various weight configurations in Supple. Table 1. Further details will be elaborated in the manuscript revision. Target Artifacts and the Purpose of MRI (R3): Our focus is on unique artifacts that appear in DECT but not in standard CT, resulting from errors in extracting spectral information. These artifacts often resemble edema patterns, hence the use of MRI taken concurrently with DECT to meticulously annotate these artifacts. Our primary interest is not in streaking artifacts, which are irrelevant for bone marrow edema (BME) detection. The red box in Fig. 2 highlights the artifacts we aim to remove. The artifacts can occur in soft tissue, but our focus was within bones for helping BME detection. Artifacts Overlapping Disease Areas (R3): Preserving the edema pattern is our top priority. In regions where artifacts and edema patterns overlap, we consider these as edema areas. Risk of Deleting Unknown Patterns (R3): Our model incorporates identity mappings, highlighted in yellow boxes in Fig. 1, to prevent excessive alteration or removal of important patterns, particularly edema. Evaluation with Simulated Results (R4): Thank you for the suggestion. We considered using simulated artifacts for evaluation. However, if the artifacts are artificially generated and deviate significantly from real patterns, the evaluation may not be reliable for real-world applications. Simulation could be beneficial for comparative analysis with other deep learning models. Artifact Detection Accuracy (R4): The metric “artifact elimination” is graphed in Fig. 3 alongside the edema detection rate. This naming convention was chosen to emphasize that a higher value indicates better performance. We will replace “artifact elimination” with “artifact-free detection” in the revision to avoid confusion. Generator G_a conditioned by Mask (R4): Indeed, G_a was fed by the mask as well as X_f as a two-channel input. U() function (R4): The target artifacts are bright patterns within the bone that may resemble pathological changes. Thus, we regarded positive discrepancies (X_a-X_af) as artifacts. Otsu Thresholding (R4): We omitted to specify that the threshold \theta was adaptively determined by Otsu’s method. This will be clarified in the revision. Eq. 5 Error (R4): We will correct the expression ‘y=C_e(X)’ to ‘y=C_e(G_f(X))’ in the revised manuscript. Incorrect Sentences (R4): We will make the necessary corrections. Thanks.
Meta-Review
Meta-review #1
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
N/A