Abstract

While the availability of open 3D medical shape datasets is increasing, offering substantial benefits to the research community, we have found that many of these datasets are, unfortunately, disorganized and contain artifacts. These issues limit the development and training of robust models, particularly for accurate 3D reconstruction tasks.
In this paper, we examine the current state of available 3D liver shape datasets and propose a solution using diffusion models combined with implicit neural representations (INRs) to augment and expand existing datasets.
Our approach utilizes the generative capabilities of diffusion models to create realistic, diverse 3D liver shapes, capturing a wide range of anatomical variations and addressing the problem of data scarcity.
Experimental results indicate that our method enhances dataset diversity, providing a scalable solution to improve the accuracy and reliability of 3D liver reconstruction and generation in medical applications. Finally, we suggest that diffusion models can also be applied to other downstream tasks in 3D medical imaging. Our code is available at \url{https://github.com/Khoa-NT/hyperdiffusion_liver}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2124_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: https://papers.miccai.org/miccai-2025/supp/2124_supp.zip

Link to the Code Repository

https://github.com/Khoa-NT/hyperdiffusion_liver

Link to the Dataset(s)

https://github.com/Khoa-NT/hyperdiffusion_liver

BibTex

@InProceedings{NguKho_Boosting_MICCAI2025,
        author = { Nguyen, Khoa Tuan and Tozzi, Francesca and Willaert, Wouter and Vankerschaver, Joris and Rashidian, Niki and De Neve, Wesley},
        title = { { Boosting 3D Liver Shape Datasets with Diffusion Models and Implicit Neural Representations } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {67 -- 77}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors highlight the problem of unusable data in 3D shape datasets of the liver. To address the issue, the authors propose a Diffusion model to synthesise new shapes via generation of neural network parameters of an INR via a hypernetwork. They validate the synthesised shapes with the help of medical experts.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • potentially useful solution to a real problem in machine learning research for 3D models of the liver
    • expert data analysis and cleaning on the TotalSeg dataset
    • the proposed generative model is potentially useful
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The authors do not mention if they plan on publishing the analysed and cleaned dataset. The proposed generative model is potentially useful, but the manuscript reads more like project report than a scientific paper. The authors are encouraged to add proper motivation, beyond the (very relatable) statement of their practical problem and in the broader context of biomedical research, e.g., foundation models or structural mechanics. More evaluation of the assessment of the surgical team is necessary, e.g., where there disagreements between individuals and what was the number of false positives and negatives. Furthermore, was the method able to faithfully recover the unusable samples or is it simply generating plausible new samples? The former seems a lot more valuable to me in the given context.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • Introduction: You state that “INRs lack generalizability because MLPs tend to suffer from overfitting to individual objects” when representing individual objects is the whole purpose of an INR. Is this in contrast to conditional INRs?
    • 2 Dataset analysis: What does is mean that “these datasets generate 3D objects based on 3D organ segmentations from CT scans, with the most common source being the TotalSegmentator dataset”? What is the function of the source here?
    • 3.1 Instance liver MLP:
      • What is the difference between $\mathbf{x}$ and $x$?
      • Since neural fields are typically smooth, does the minimiser to the loss function resemble a signed distance field?
      • How many parameters do the MLPs have?
    • Table 1: This table does not provide much information. Are the shown values the average among all fitted objects? Consider adding standard deviation across.
    • 4 Experiments:
      • Please clarify what it means if “synthesized 3D objects match the existing real 3D liver objects”. Which synthesised object matches which real object?
      • How many experts did the study include? What was their level of experience? Did they all agree on every sample or where there differences? Please give more background.
      • Were “all [liver objects] classified as real” or were “some cases […] labeled ‘Fake’”. These sentences seem to contradict.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • the manuscript needs some more attention to be up to standard (see above)
    • more evaluation of the assessment of the surgical team is necessary
    • with a bit more polishing, the work could be quite impactful
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper analyses existing datasets of 3D liver shapes and identifies key limitations: some of the data are poorly annotated and some of the shapes are incomplete. To extend the dataset the authors propose using a HyperDiffusion model trained on implicit neural representations of 3D liver shapes from the existing dataset to generate synthetic samples.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Presentation: The paper is well written and clearly structured, making the contributions easy to follow. Figures are of high quality and effectively support the narrative. Insightful Analysis: The authors highlight an important issue of the inadequacy of existing datasets for training deep learning models on shape-related tasks. They provide a useful analysis of current 3D liver shape datasets, which helps to contextualize the need for improved or augmented data in this domain. Method: While the method itself is not novel, it builds on state-of-the-art techniques in 3D shape synthesis using implicit neural representations. The design choices are sound and appropriate for the problem at hand, making this a solid and well-executed application study.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Motivation a. The paper’s main motivation is that the existing datasets are too small. The authors mentioned 3D shape reconstruction as an example of the task. A more detailed analysis of the tasks for which the dataset is insufficient would strengthen the argument. b. The dataset analysis reveals that only about 3% of the data are unusable. While the “no full shape” category may also be unsuitable for certain tasks, this point is not clearly explained in the paper and should be addressed explicitly.

    Evaluation a. While the paper emphasizes data scarcity and presents dataset augmentation as a key contribution, it lacks empirical validation that the synthesized data are useful. For instance, demonstrating improvements in a downstream task such as shape completion or 3D reconstruction would make the case more convincing. b. The primary validation relies on qualitative analysis (visual realism) of the generated shapes. However, the paper also claims increased data variability in the abstract, which is not supported by quantitative analysis. A statistical evaluation of the diversity and distribution of the synthesized data would strengthen the contribution.

    Miscellaneous (Minor): The authors state that INRs lack generalizability “because MLPs tend to suffer from overfitting to individual objects.” While this is a valid concern, prior work—such as DeepSDF [1]—has shown that this limitation can be mitigated by learning a latent space of shapes. It would be helpful for the authors to acknowledge such approaches and clarify how their method relates to or differs from them.

    [1] Park, J.J., Florence, P., Straub, J., Newcombe, R. and Lovegrove, S., 2019. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 165-174).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    It is not clear if the analyzed data are going to be published. Could you please clarify?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a solid analysis of existing 3D liver shape datasets and proposes a method to mitigate data scarcity using a generative approach. Since limited data is a key barrier to the development and application of large neural networks in the medical domain, this work represents a valuable step toward advancing research in this area. The approach may be of particular interest to the medical imaging and shape reconstruction communities. While the paper is well written and the dataset analysis is insightful, it would benefit from a more thorough evaluation of the synthesized data. In particular, the usefulness of the generated samples is not sufficiently justified—empirical validation on downstream tasks is missing. Additionally, a more rigorous statistical analysis of the synthetic data would strengthen the claims regarding increased data variability and quality.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    Summary of Paper:

    • The authors propose a method to augment liver datasets using newly generated livers.
    • For this, they propose using a diffusion process to produce the weights (hyper-network) of an MLP that represents livers implicitly.
    • The final liver model can then be obtained from the implicit representation.
    • The results indicate that the generated livers look realistic.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Strengths:

    • The authors make it very clear what the goal of the paper is (generate liver models).
    • I am a pretty big fan of Fig. 2. It captures the essence of the method very nicely!
    • The paper was very easy to follow and I felt the language was very clear. Overall, I felt it was very digestible, well done.
    • I recognise the importance of generating realistic organ models for many down-stream tasks and appreciate this is getting attention in the medical field.
    • The evaluation using real medical staff is valuable.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Weaknesses:

    • I find the contribution limited. Ultimately, it feels like the authors applied existing methods [4, 24] to generate liver models (instead of other structures.). This is also evident in the “Our contributions” part, which feels a bit limited.
    • I understand that finding baselines is difficult in the field. I also understand that we may not ask for major changes, but I would like to take the opportunity to provide ideas which may be helpful beyond this MICCAI submission. The authors could implement baselines for implicit representations. For example using an autodecoder (like DeepSDF) and simply searching the latent space (for example by wiggling the principle components) to generate more livers.
    • The authors missed the opportunity to compare different MLPs in Table 1. It would have been good to see that differently sized MLPs, position encoders and losses were considered.
    • I believe the authors should discuss their limitations further, currently they only propose what the next steps are. For example: the training is based on a single dataset.
    • The authors do not evaluate how dissimilar the generated livers are from the training data. For example, the authors should consider the average difference between meshes in the training dataset and then the differences to the generated meshes. Right now, the model could simply be outputting the training dataset and we are not able to tell.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Detailed:

    • The conclusion should be in the present tense
    • In the introduction-INRs section, I don’t think it is fair to say that MLPs lack generalizability and suffer from over fitting to individual objects. In many works they are used to fit specific objects on purpose. I am also not sure why such properties would make it infeasible to train an MLP for new objects. The authors should either provide a citation for such a bold claim or revise this sentence.
    • Please check the spelling of the title in citation [24] (Sinha et al.)
    • In Section 2 the authors state that they asked medical professionals to categorise liver models. I find the world “Usable” not very suitable, as “Usable” infers a specific use case (which has not been explained in the paper yet). Why not call it Complete or something similar?
    • At the end of Section 2 the authors state “… propose generating synthetic datasets… rather than investing time in analyzing additional datasets.” This does not feel like a valid argument, as the submission is about generating models. Saying this makes it feel like the authors are artificially creating a problem they solve. Why not just leave this part out?
    • In Fig. 3 (and “MLP Training”), why reconstruct the mesh during training? The authors only compute the loss on the occupancy values correct? I believe the authors should make it clear that this is only done during inference to obtain the final object from the implicit representation.
    • What was the reason for using occupancy over signed distances? I am a bit surprised by the small size of the used MLP.
    • I appreciate that the authors included real livers in the evaluation of the generated livers. The authors should state the overall (Real, Fake, and Not sure) values for the types of livers (real, generated) separately.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is clearly written, easy to follow, and approaches a relevant problem in medical data generation. While the main idea applies existing methods with limited novelty, the practical evaluation with medical professionals and the clear presentation add value. The work would benefit from stronger baselines and diversity evaluation.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We sincerely thank the reviewers for their meaningful and constructive feedback. We appreciate your interest in our paper. Below, we provide clarifications and responses to your comments.

1) Code and Dataset Release We will release the cleaned and annotated dataset along with the corresponding code in the camera-ready version.

2) Clarifications in the Paper We have revised the manuscript to address the following issues: • Corrected inconsistent notation between $\mathbf{x}$ and $x$. • Clarified the data source. • Added missing standard deviation values. • Cited the DeepSDF paper appropriately. • Rephrased the conclusion to present tense. • Corrected the citation title for reference [24]. • Added the missing inference explanation to Figure 3.

3) Data Categorization We used the term “Usable” as a general label to distinguish it clearly from other categories. All liver objects initially labeled in red were considered “Unusable.” To provide finer detail, we introduced the following subcategories: • No Full Shape • Not Usable • Not Sure • Requires Editing These subcategories offer a more granular interpretation of the data quality.

4) Motivation We aimed to keep the motivation section grounded in realistic constraints. Our goal was to raise awareness about the quality issues present in existing datasets. Due to limited human resources (only three authors are medical experts affiliated with Ghent University Hospital, thus resulting in limited annotation capacity), we were unable to conduct a full analysis of all publicly available datasets. This limitation motivated our shift toward the generation and analysis of synthetic data. We appreciate the suggestion to discuss broader work, although doing so would expand beyond the scope of this paper. We will take it into consideration for future work.

5) MLP Parameters Given the input $\mathbf{x} \in \mathbb{R}^3$ and positional encoding (PE) using 4 frequency bands (resulting in 4 sine and 4 cosine terms), the total input dimension becomes: 3×4+3×4+3=27 The MLP architecture includes three hidden layers and one output layer. Each layer contains its respective weights and biases: • Layer 1: (27, 128), bias (128) • Layer 2: (128, 128), bias (128) • Layer 3: (128, 128), bias (128) • Output Layer: (128, 1), bias (1) When flattened, the parameters can be listed as: [3456, 128, 16384, 128, 16384, 128, 128, 1] Total parameters: 36,737

6) Recommended Experiments We thank the reviewers for the suggestions regarding additional experiments, including: • Ablation study on MLPs • Further evaluation of synthesized 3D liver models • Downstream tasks such as shape completion and 3D reconstruction While we agree these experiments are valuable, they may exceed the scope of a conference paper. We will incorporate these directions into our future work.

7) Survey Experiments Thank you for your suggestion regarding clarity in survey results. Below is a breakdown of participant responses based on the ground-truth label of each liver object: Liver Type | Predicted Real | Predicted Fake | Not Sure ————————————————————— Real Liver | 72 | 2 | 1 Generated Liver | 67 | 2 | 6




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top