Abstract

Vestibular schwannomas (VS) are benign tumors that are generally managed by active surveillance with MRI examination. To further assist clinical decision-making and avoid overtreatment, an accurate prediction of tumor growth based on longitudinal imaging is highly desirable. In this paper, we introduce DeepGrowth, a deep learning method that incorporates neural fields and recurrent neural networks for prospective tumor growth prediction. In the proposed method, each tumor is represented as a signed distance function (SDF) conditioned on a low-dimensional latent code. Unlike previous studies, we predict the latent codes of the future tumor and generate the tumor shapes from it using a multilayer perceptron (MLP). To deal with irregular time intervals, we introduce a time-conditioned recurrent module based on a ConvLSTM and a novel temporal encoding strategy, which enables the proposed model to output varying tumor shapes over time. The experiments on an in-house longitudinal VS dataset showed that the proposed model significantly improved the performance (>=1.6% Dice score and >=0.20 mm 95% Hausdorff distance), in particular for top 20% tumors that grow or shrink the most (>=4.6% Dice score and >= 0.73 mm 95% Hausdorff distance). Our code is available at https://github.com/cyjdswx/DeepGrowth.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2068_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/cyjdswx/DeepGrowth

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Che_Vestibular_MICCAI2024,
        author = { Chen, Yunjie and Wolterink, Jelmer M. and Neve, Olaf M. and Romeijn, Stephan R. and Verbist, Berit M. and Hensen, Erik F. and Tao, Qian and Staring, Marius},
        title = { { Vestibular schwannoma growth prediction from longitudinal MRI by time-conditioned neural fields } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a method to predict the spatial tumor evolution from a set (>1) prior visits. The method consists of the encoder that provides embeddings for each prior scan, ConvLSTM that predicts image embeddings of the requested visit, and MLP converting a singe sampled location embedding into the signed distance to the border of the tumor.

    • the authors proposed a method that models a time-dependent signed distance function of the tumor;
    • the authors introduce temporal encodings strategy;
    • the method is compared against the baselines in a set of experiments.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • the paper is easy to follow and describes the method well
    • the idea is straightforward and easy to implement and translate to other tasks
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Aspects of novelty: a. From the abstract “Unlike previous studies that perform tumor shape prediction directly in the image space, we predict the latent codes instead and then reconstruct future shapes”. I would argue that in cited [23] and [23.Fig3] the authors of [23] first create latent codes, then apply ConvLSTM on the generated latent codes, and finally decode the latent code of the last visit. In [23] latent codes has a shape of Cx8x8 when in the reviewed paper Cx16x16, and the decoder is replaced with MLP. b. What was the motivation behind using MLP? It’s equivalent to a CNN decoder with 1x1x1 convolutions. Does sampling from section 2.1 improve the score over trilinear upsampling to original resolution and running decoder over the resulting feature map? What is sampling method? pseudo random uniform numbers, Sobol sequence, importance sampling (of the tumor border for instance) or something else? c. The validation in its current form is limited. The method compares with a ConvLSTM of 2 types. I expected to see at least non-DL models (e.g. level set) or PDE models based (e.g. [3]). There is a wide range of physics based models of tumor growth. d. In addition, to c. Dice, HD95 and RVD doesn’t represent predictive power of the method well. In addition, the growth volume (mse, R squared, r) and dice of the growth should be also reported. For the growth volume and RVD, the method should be compared with a simple linear regression. Table 1 suggests that the proposed method is worse than constant prediction in quantifying the volume.

    2. the paper doesn’t acknowledge the existing prior art on the disease progression modelling. e.g. [1-3]. This intersects with validation comment above.

    [1] Petersen, Jens et al. “Continuous-Time Deep Glioma Growth Models.” International Conference on Medical Image Computing and Computer-Assisted Intervention (2021). [2] Petersen, Jens et al. “Deep Probabilistic Modeling of Glioma Growth.” International Conference on Medical Image Computing and Computer-Assisted Intervention (2019). [3] Meghdadi, N. et al. “Personalized image-based tumor growth prediction in a convection–diffusion–reaction model.” Acta Neurologica Belgica 120 (2018): 49 - 57. [23] Zhang, L., Lu, L., Wang, X., Zhu, R.M., Bagheri, M., Summers, R.M., Yao, J.: Spatio-temporal convolutional LSTMs for tumor growth prediction by learning 4D longitudinal patient data.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please address the comments in the weaknesses section.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the paper lacks strong validaton; the method is a bit worse than constant prediction, and can be potentially outperformed by linear regression and reaction-diffusion model.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    I’m keeping the score the same. Provided the clarifications about the method motivation and sampling scheme, I think the method has a certain merit. However, I still see the validation part as a major weakness of the paper. Considering the rebuttal, the constant prediction is still better than all of the baselines, which raises questions about the validity of the results.



Review #2

  • Please describe the contribution of the paper

    The paper presents a process by which to predict tumor growth via contours produced by neural fields. This is done by taking a series of N scans along a time interval, encoding a spatial grid of latent codes, and using a ConvLSTM to predict how the latent grid will develop at time N+1.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper provides a mechanism for temporal modeling that surpasses existing work. Unlike other work, for which the predictions of later time points do not need to happen by sampling at regular intervals, the authors show the model is capable of predicting at arbitrarily-spaced time intervals. The overall work shows promising predictive performance for an overall ill-posed modelling problem.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors do not provide a reasoning for their choice to use neural fields over a traditional grid approach. Previous work have shown signed distance functions (SDF) can be modeled by within a grid. Their latent space is already modeled as a grid of latent vectors, is processed by a convolutional LSTM, and their their labels are mentioned to be derived from a grid-like segmentation. Yet the authors provide little reasoning on their choice to model the SDF with a point-wise function (MLP) instead of a traditional convolutional decoder.

    It is also not clear why the architecture is chosen to use a sequence modelling module (LSTM) as opposed to more modern attention-based architecture. There is no reason to restrict the model to seeing only a single time-point at a time and presented in a sequence. Part of the reason transformers have taken over the field of sequence modelling is due to the nature of their input being a set. Similar to how a clinician can look at all previous time points at the same time, an optimal model should be able to predict D_n given (D_(n-1), …, D_(1)), as opposed to the proposed approach: D_n given D_(n-1) given …. given D_(1). If anything, given that their dataset’s sequences are only made up 3 images (with the 3rd one being the target), it is not clear why they choose an LSTM; a model designed to encode temporal information across long time sequences, or in fact a sequence modeller at all, as opposed to a simple concatenation-based approach.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    While the paper structure is solidly written, my main criticism has to do with the choice of modules that make up the model architecture. A simpler, minimalistic architecture would have served an equal purpose, or at the very least a good baseline with which to ablate the temporal-prediction component and the SDF decoder.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While certain aspects of the architecture are dated approaches by today’s machine learning standards and naively put together, the overall paper’s structure and content are both well-rounded. The authors also display strong expertise of the medical component of the work.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision
    1. Motivation for neural fields (MLP decoder): The authors give “output continuous 3D shapes” as motivation for an MLP decoder, yet they mention “being able to get a shape on a much finer GRID during inference”. It is not clear why they would want to have a point-wise learned interpolation function, as opposed to a locally-aware architecture like a CNN which would be able to locally integrate multiple latent vector (with kernels larger than 1x1x1), allowing for more locally consistent outputs. It is true CNNs are confined to a given resolution, but their output can be interpolated using naive interpolation (ex. bilinear) from pixel-level resolutions to sub-pixel resolution with negligible inaccuracies. In practice, the MLP decoder is also trained within the original image resolution (as it has to be supervised on existing image pixels), so it is not obvious why the authors expect the MLP to generalize in between pixels more accurately than naive interpolation, especially when the input resolution of latent vectors is much lower than the resolution of the images it was trained on.

    2. The reason for using ConvLSTM: The authors seem to imply they applied Transfomers to the entire set of latent vectors, which would indeed explain their quadratically-growing complexity claim. However, there is no reason for the attention mechanism to span the entire spatio-temporal domain. Local attention mechanisms are common in the literature, showing significantly reduced complexities [1, 2] with little-to-no loss in accuracy. Such approaches provide equivalent receptive fields to the author’s ConvLSTM approach without imposing a strictly linear temporal structure.

    [1] - Bulat, Adrian, et al. “Space-time mixing attention for video transformer.” Advances in neural information processing systems 34 (2021): 19594-19607. [2] - Hassani, Ali, et al. “Neighborhood attention transformer.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.abs/2204.07143



Review #3

  • Please describe the contribution of the paper

    The authors present DeepGrowth, a novel method for tumor growth modeling, which combines the strengths of neural implicit representations (i.e., neural fields) and a time-conditioned recurrent neural network to learn the growth dynamics in a latent space. The DeepGrowth model was trained and evaluated on an in-house dataset comprising longitudinal data of vestibular schwannomas. The method shows both quantitatively and qualitatively improved results over the baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very well written, well-structured, it provides a concise but complete introduction to the topic, and the authors have taken care to properly motivate and explain every component of the proposed architecture. The graphics are of high-quality and overall the paper is very easy to read and accessible.

    The presented method is highly interesting and the use of a time-conditioned ConvLSTM to model the dynamics in the latent space seems novel compared to other implicit methods.

    Despite the low amount of data samples, the quantitative and qualitative results are convincing and the ablation study complements the findings.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Even though the proposed method showed improvements over the baselines, it is not clear from the quantitative and qualitative results whether these differences would be significant for the clinical decision making (in particular, when looking at the results of row 2 within Fig. 2).

    The experiments compare the proposed method against two “standard” ConvLSTM baselines, but there are already several alternative solutions (see e.g., Wiesner et al. - citation 21). While I understand that it may not be possible to re-implement every method, I strongly believe that it would strengthen the paper if the ablation study also investigates the influence of major general changes to the existing methods: (1) the impact of choosing (d x h x w) conditioning vectors over a single vector; and (2) the impact of using the ConvLSTM conditioning over adding time as an input to the MLP (a spatio-temporal implicit neural field).

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Overall, the paper provides all information to reproduce the results and the authors intend to share the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This paper has been very interesting to read! In addition to the comments above, I have just very few questions:

    • How was the scan date Dt normalized?
    • Apart from tumor growth, does the tumor size play a role in the decision making? Would it be possible to get a histogram or some other diagram that shows the relation between prediction error, the tumor growth, and the tumor size?
    • Can the network handle sudden changes in tumor growth?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents an interesting and novel method with a solid evaluation and encouraging results. The very few concerns and questions can be easily adressed in the rebuttal and would not impact the contribution of the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have adequately addressed all of my questions. I still stand by my initial rating: The well-written paper with a novel and interesting method for tumor growth modeling should be accepted to MICCAI.




Author Feedback

We thank all reviewers for their feedback and constructive comments. The main concerns are addressed as follows:

  1. Motivation for neural fields (MLP decoder) (R3 & R4) The proposed method is mainly motivated by the successful application of neural fields in 3D modeling [15], in which MLP decoders outperform traditional CNN approaches. Unlike CNN decoders, MLP does not discretize 3D space, which helps to reduce topology restrictions and output continuous 3D shapes [16]. Evaluation of the MLP on a regular grid could indeed be implemented as stacked 1x1x1 convolutions. However, by using the MLP, we can sample points that are independent on the grid for training and also query arbitrary points to get a shape on a much finer grid during inference. Replacing the MLP with 1x1x1 convolutions will limit the applicability of the model.
  2. Suggestions for additional validation (R3) R3 questions the performance of the model compared to constant prediction in terms of RVD (Table. 1). We want to clarify that this was caused by one extreme outlier with cystic components, displayed in the last row of Fig. 2. The sudden disappearance of the cystic component is difficult to predict, but also less important for clinical decision-making. We report the results without this outlier in the second paragraph in Sec. 3.3, which shows that the proposed model is clearly better in all metrics. R3 also pointed out that additional benchmarks and metrics could strengthen the evaluation. Given the page limitations of a conference submission, we only include a number of the most important results. More comparison experiments will be addressed in an extended journal paper.
  3. The reason for using ConvLSTM (R4) We proposed a time-conditioned recurrent module in which ConvLSTM is indeed not the only solution. In our preliminary experiments, we found that ConvLSTM works well with the temporal encoding while training with a Transformer was unstable. One possible reason is that in our model, each scan yields thousands of latent vectors (tokens), while the complexity of the self-attention grows quadratically with the number of tokens. However, we acknowledge that the proposed framework theoretically supports any sequence models (e.g. Transformer, MAMBA) and will explore the possibility of using other architectures in future work.
  4. Unclear novelty statement in the abstract (R3) This will be revised in the final version: “Unlike previous studies, we predict the latent codes of the future tumor and generate the tumor shapes from these using an MLP.”
  5. Sampling method clarification (R3) We use importance sampling, in which 80% of the points are sampled near the contour, and the rest are sampled from the entire space. This helps to capture a more detailed SDF near the contour [15]. More clarification will be included in the final version.
  6. Ablation studies (R1 & R4) R1 suggested ablating our model with a single latent vector and spatio-temporal neural fields [21]. In preliminary experiments, we found that multiple latent vectors significantly improved performance, which is consistent with the conclusion in [16]. R4 suggested ablation studies about the individual modules. The reason for the choice of the modules has been addressed above (1) (3), however, we agree it’s worth comparing different architectures (e.g. CNN) with neural fields. Due to the page limitations, we focus on the impact of the downsampling factor and temporal encoding. More comprehensive ablation studies will be done in our extended journal paper.
  7. Prior art (R3) We thank the reviewer for the valuable references. More related work will be included in the final version.
  8. Other comments (R1) Both the diameter and size of the tumor are used for clinical decision-making. In this paper, we focused on the methodological study and considered the clinical impact for a clinical venue. Dt is normalized within the original range of 0 to 10 years. Discussion on sudden changes is addressed above (2).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The proposed method has certain merits. However, the authors may lack sufficient in-depth study on the problem of tumor growth prediction and the utilized experimental data. For example, even after the rebuttal (excluding the case of the cystic tumor), it is still difficult to explain why the constant prediction outperforms all the baselines - which raises questions about the validity of the results.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The proposed method has certain merits. However, the authors may lack sufficient in-depth study on the problem of tumor growth prediction and the utilized experimental data. For example, even after the rebuttal (excluding the case of the cystic tumor), it is still difficult to explain why the constant prediction outperforms all the baselines - which raises questions about the validity of the results.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This is a well-written paper. Although there are limitations regarding the methods and experiments as pointed out by the reviewers, I think the tumor growth researchers may find this paper interesting.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This is a well-written paper. Although there are limitations regarding the methods and experiments as pointed out by the reviewers, I think the tumor growth researchers may find this paper interesting.



back to top