Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Chest X-ray diagnosis models face domain generalization challenges due to cross-institutional variations in imaging protocols and scanner specifications, which degrade diagnostic accuracy on unseen domains. To address this, we propose a domain-invariant learning framework leveraging the inherent anatomical consistency of medical imaging. Our method first applies a Neighborhood-Consistent Binarization Transformation (NCBT) to convert grayscale images into topology-preserving high-dimensional binary tensors, encoding pixel intensity relationships within local neighborhoods to strip device-specific textures while retaining anatomical structures. These tensors are then reconstructed into an intermediate domain via an Intermediate Domain Style-preserving Autoencoder (IDSP-AE), decoupling structural information from domain-specific features. Crucially, our framework aligns domains without requiring target domain data during training, leveraging anatomical consistency. Experiments on four public datasets show superior generalization and improved diagnostic accuracy compared to state-of-the-art methods. The source code is available at https://github.com/LZL501/NCBT.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0720_paper.pdf

SharedIt Link: https://rdcu.be/eHwUE

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04971-1_44

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiuZel_NeighborhoodConsistent_MICCAI2025,
        author = { Liu, Zelong AND Zhu, Huachao AND Sun, Zhichao AND Zou, Yuda AND Gu, Yuliang AND Du, Bo AND Xu, Yongchao},
        title = { { Neighborhood-Consistent Binary Transformation for Domain-Invariant Chest X-ray Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {465 -- 475}
}

Reviews

Review #1

Please describe the contribution of the paper

A two-part plug-and-play (PNP) module for running ML models inter-domain on planar X-ray images is proposed. The first part transforms planar X-ray images into a binarised high-dimensional representation intended to suppress device-specific features. The second part is an autoencoder trained to reproduce images of a specific source domain from this new representation. The authors compare their approach to 6 similar PNP modules from the literature on 4 unseen test domain public datasets for two different tasks. Overall, the authors demonstrate competitive performance, generally surpassing other approaches. The complete approach is validated through ablation studies on the individual components of their proposed methodology.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- To the best of my knowledge, the authors proposed method to re-encoding X-ray images into high-dimensional binary tensors in this way is novel.
- The authors appear to compare their proposed technique to recently published methods that are intended for the same purpose. They generally surpass these other approaches.
- The paper is well organised with detailed methodology and well designed experiments. This, in addition to using public datasets and promising to release source code makes the paper highly reproducible.
- The simplicity and plug and play nature of their approach makes it relatively easy to integrate into new frameworks and applications.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The authors introduce a major hyperparameter, N (neighbourhood size) that is key to their methodology. They demonstrate that performance increases as N increases. However, it is only evaluated up to N=15 and use this in other experiments without exploring N>15. It is understood that increasing N comes at computational cost, however, it currently appears as an arbitrary cutoff.
- The authors overstate on claims such as “Our method first applies a […] to strip device-specific textures while retaining anatomical structures” in the abstract and “This approach eliminates device-specific stylistic variations” in conclusions. It is understood that this is the intended effect, though it is not known if it actually does so. In fact, Fig. 1. shows that the device anatomical labels are retained.
- In Table 2, +MixStyle achieves the most ideal metric for accuracy in the CheXpert Dataset. Not the proposed which the authors indicated by bold formatting. I would suggest adapting the claims “show superior generalization and improved diagnostic accuracy compared to state-of-the-art methods” in response.
- In Table 1, numerical results are very close, the authors should reconsider displaying them to a higher level of precision. In particular, F1 scores of +MixStyle and the proposed appear to be the same at the displayed precision.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- I suggest the authors include practical runtime or memory demand or even theoretical measures of these alongside performance gains (table 3) to better contextualise the practical implications of increasing N.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed method is novel and a simple plug and play module that has demonstrated compelling performance against similar methods. My main reservation regards the limited experimentation with the main introduced hyperparameter of the proposed method, N.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The paper proposes a lightweight, plug-in pipeline to make chest-X-ray classifiers more robust against cross-hospital distribution shifts in unseen target data. First, it introduces Neighborhood-Consistent Binary Transformation (NCBT): every pixel is compared with its N×N neighbours and the outcomes are written to separate binary channels, so only ordinal intensity patterns—stable across scanners—are preserved while device-specific brightness/contrast is discarded. Second, a style-neutral auto-encoder (IDSP-AE), trained once on a large independent public dataset, reconstructs these binary tensors into images that share one standard appearance. Plugged in the model CheXzero, this two-step pre-processingimproves AUC, accuracy and F1 on four public test datasets and beats six recent domain-generalisation baselines.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Key strenghts: 1) Introduction of NCBT: by storing only each pixel’s binary order vs. its N×N neighbours, NCBT keeps anatomy-stable ordinal relationships and deletes scanner-specific gray-level style.

2) Introduction of IDSP-AE to exploit large amounts of external data: a U-Net trained once on a seperate ChestX-ray set learns a neutral CXR style and is then frozen to map all images, source or unseen target, into the same appearance, aligning domains without adversarial losses or target images.

3) Meticolous experimental setup: across four public datasets and both binary and multi-label tasks, it beats six recent DG baselines.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Key limitations: 1) Limited methodological novelty: NCBT is a dense local binary pattern; similar local intensity order encodings already appear in Shi et al., IEEE Transactions in Image Processing 2022 (reference 14 in the paper). The paper does not explain enough how this is different from the method in Shi et al and its advantages over it.

2) Auto-encoder normalisation already explored: style-neutral U-Nets for cross-domain X-ray transfer were used by Zhang, Y., Miao, S., Mansi, T., & Liao, R. “Task Driven Generative Modeling for Unsupervised Domain Adaptation: Application to X-ray Image Segmentation.” Medical Image Computing and Computer-Assisted Intervention – MICCAI 2018, Lecture Notes in Computer Science, vol. 11070, pp. 599-607, Springer, 2018 and by Sanchez, K., Hinojosa, C., Arguello, H., Kouamé, D., Meyrignac, O., & Basarab, A. “CX-DaGAN: Domain Adaptation for Pneumonia Diagnosis on a Small Chest X-ray Dataset.” IEEE Transactions on Medical Imaging, vol. 41, no. 11, pp. 3278-3288, Nov. 2022. All three approaches learn a generative module that rewrites the input image into a common style domain to reduce domain shift.

3) Image-quality is untested: binarising intensities may erase faint lesions, yet no visual examples, lesion-level metrics, or reader study confirm diagnostic fidelity.

4) Unknown computational cost: N=15 yields 224 channels plus a full U-Net pass, but runtime/memory figures are not provided.

5) Dependence on a single “neutral” dataset: IDSP-AE is trained only on ChestX-ray8; potential bias or failure when that style differs from vendor-specific images is not analysed.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I would like to see the concerns expressed in the weaknesses section addressed.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The paper introduces a novel, plug-and-play domain generalization framework for chest X-ray diagnosis that operates without requiring target domain data. Its core contribution is the Neighborhood-Consistent Binary Transformation (NCBT), which converts grayscale images into multi-channel binary tensors by encoding local ordinal intensity relationships. This transformation discards domain-specific features such as scanner-dependent brightness and contrast while preserving anatomical structure. To restore image interpretability, the authors propose an Intermediate Domain Style-Preserving Autoencoder (IDSP-AE) that reconstructs NCBT representations into a harmonized, style-neutral visual space. Together, these components enable improved generalization across unseen domains without modifying downstream diagnostic models.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The illustrations in Fig. 1 and Fig. 2 are helpful. The idea of such image transformation is clever, and it’s intuitive that it could effectively discard many domain-specific features from the images.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Missing Ablation: In Table 4, the combination of with NCBT and without IDSP-AE is not included. This setting is important to isolate the individual contribution of IDSP-AE, given that NCBT is enabled. Its omission makes it difficult to fully assess the additive effect of IDSP-AE in the presence of NCBT. Please include this ablation for completeness.
- Table 3 shows performance consistently improves as the neighborhood size N increases up to 15, with no sign of saturation. Yet, the authors stop at N=15 without exploring larger sizes. Although computational cost is cited as a limitation, no quantitative analysis (e.g., runtime or memory vs. performance) is provided to justify this choice. A trade-off analysis would better support N=15 as a practical optimum; otherwise, the selection appears arbitrary.
- NCBT removes domain-specific features by encoding local ordinal intensity, while IDSP-AE reconstructs images using MSE loss against the original input—which still contains style artifacts. This creates a contradiction: minimizing MSE may reintroduce the very domain-specific features NCBT aims to eliminate, potentially undermining domain invariance. See more explanation in additional comments.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- The proposed NCBT aims to remove domain-specific features (e.g., scanner-induced contrast/brightness artifacts) by converting input images into domain-invariant binary tensors based on local ordinal intensity comparisons. However, the reconstruction step via IDSP-AE is trained using a mean squared error (MSE) loss between the original input image (which contains domain-sensitive style) and the reconstructed image (intended to be style-neutral).
This introduces a conceptual inconsistency: minimizing MSE-loss directly encourages the autoencoder to reproduce domain-specific visual characteristics—precisely what NCBT tries to remove. If the reconstruction target retains scanner-specific textures, the model may learn to reintroduce those features, potentially undermining the goal of domain invariance.
- NCBT discards absolute pixel intensities and retains only local ordinal relationships to enhance domain invariance. However, this may remove clinically important cues. Certain diagnoses—such as pleural effusion, pneumothorax, or subtle infiltrates—depend on absolute attenuation values rather than local contrast. This raises an important question: which disease types are most affected by this loss? The paper reports only average performance across multiple conditions; a per-disease analysis would provide valuable insight into the clinical trade-offs of NCBT.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed NCBT is a clever and well-motivated transformation for improving domain generalization, with helpful illustrations and promising results. However, key ablations are missing, the reconstruction loss introduces conceptual inconsistency, and potential loss of clinically important intensity cues is not addressed. Despite these concerns, the idea is novel and impactful enough to warrant a weak accept.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

All three reviewers give positive scores 444. We thank them for their constructive comments and suggestions. Per rebuttal policy, we cannot provide more experimental results. We respond to their concerns below.

The choice of N (R1, R4, R5) As stated in the paper, increasing N raises both parameter number and computational cost due to the O(N²) complexity of NCBT. We observed a 33% (resp. 29%) reconstruction runtime (resp. memory) increase from N=3 to 15, and a further 25% (12%) increase from 15 to 17. Larger N may yield better results. We simply set N to 15 for the trade-off between performance and computation cost. It is noteworthy that the added reconstruction U-Net introduces negligible overhead compared to the CheXzero baseline.

Preservation of device/anatomy labels in Fig. 1 (R1) The proposed NCBT aims to alleviate domain-specific style features (e.g., scanner-induced contrast/brightness). Some domain-specific content such as device anatomical labels may be retained. We shall revise “eliminate” to “alleviate” in the final version for academic rigor.

Bolding error in Table 2 (R1) We thank R1 for pointing this out and will correct the error in the final version.

Numerical precision in Table 1 (R1) Good suggestion! We will increase the numerical precision in the final version.

Comparison with LIOT (R4) While LIOT also encodes local intensity order, our proposed NCBT differs in key aspects. NCBT captures intensity order within an N×N neighborhood, whereas LIOT is designed for line-like structures using four directional encodings, making it less suitable for chest X-ray diagnosis. Moreover, we introduce IDSP-AE to reconstruct an RGB image in the intermediate domain from the high-dimensional NCBT representation. In contrast, LIOT feeds the transformed features directly into the segmentation network, which may lead to information loss and reduced interpretability.

Comparison with existing autoencoder-based normalization methods (R4) Thank you for pointing out these related works. While both works also use generative models for style normalization, our approach differs fundamentally: (1) We use binarized invariant anatomical encoding (NCBT) as input instead of raw images; (2) Our IDSP-AE is trained solely once on a disjoint intermediate dataset, which remains unseen during diagnosis model training and evaluation. Yet, the suggested domain adaptation methods requires re-training for each unseen target domain.

Potential loss of clinical cues due to NCBT (R4, R5) Excellent suggestion! Since the NCBT captures local intensity order changes, the lesion characteristic is still well preserved even for faint lesions. Lesion-level metrics or reader study confirming diagnostic fidelity would strength the paper. Yet, since the involved datasets only provide image-level labels without lesion annotations, we did not provide lesion-level metrics. Due to limited space in submission, we did not analyze each disease category in detail, which will be included in final version.

Limited diversity of the intermediate domain dataset (R4) Great question! This indeed may have an impact on the performance. Using larger and more diverse ‘neutral’ dataset shall mitigate such issue.

Missing ablation (R5) Our goal is to suppress domain-specific styles using NCBT while maintaining image interpretability. Using NCBT alone may be a good alternative. Yet, it lacks visual-level interpretability. Besides, directly applying the high-dimensional NCBT features to downstream tasks may also cause detail loss and suboptimal performance.

Confusion about MSE loss (R5) It is true that minimizing MSE loss encourages the autoencoder to reproduce domain-specific visual features. In our case, the reproduced features are from the intermediate domain. NCBT removes source/target-specific styles. MSE loss guides IDSP-AE to generate images with intermediate-domain characteristics, thereby reducing the domain gap.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

Neighborhood-Consistent Binary Transformation for Domain-Invariant Chest X-ray Diagnosis

Author(s):