List of Papers Browse by Subject Areas Author List
Abstract
Rapid advances in medical imaging technology underscore the critical need for precise and automated image quality assessment (IQA) to ensure diagnostic accuracy. Existing medical IQA methods, however, struggle to generalize across diverse modalities and clinical scenarios. In response, we introduce MedIQA, the first comprehensive foundation model for medical IQA, designed to handle variability in image dimensions, modalities, anatomical regions, and types. We developed a large-scale multi-modality dataset with plentiful manually annotated quality scores to support this. Our model integrates a salient slice assessment module to focus on diagnostically relevant regions feature retrieval and employs an automatic prompt strategy that aligns upstream physical parameter pre-training with downstream expert annotation fine-tuning. Extensive experiments demonstrate that MedIQA significantly outperforms baselines in multiple downstream tasks, establishing a scalable framework for medical IQA and advancing diagnostic workflows and clinical decision-making. Our code is available at https://github.com/siyi-xun/MedIQA.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2487_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/siyi-xun/MedIQA
Link to the Dataset(s)
NYU fastMRI Initiative Database: https://fastmri.med.nyu.edu/
Duke Breast Cancer MRI Dataset: https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70226903
LDCTIQAC2023 Dataset: https://ldctiqac2023.grand-challenge.org/
ADNI MRI Quality Control Dataset: https://adni.loni.usc.edu/data-samples/adni-data/neuroimaging/mri/mri-quality-control/
Kaggle DR Image Quality Dataset: https://www.kaggle.com/c/diabetic-retinopathy-detection/data
BibTex
@InProceedings{XunSiy_MedIQA_MICCAI2025,
author = { Xun, Siyi and Sun, Yue and Chen, Jingkun and Yu, Zitong and Tong, Tong and Liu, Xiaohong and Wu, Mingxiang and Tan, Tao},
title = { { MedIQA: A Scalable Foundation Model for Prompt-Driven Medical Image Quality Assessment } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15972},
month = {September},
page = {338 -- 348}
}
Reviews
Review #1
- Please describe the contribution of the paper
This manuscript proposes a framework for medical image quality assessment (IQA) leveraging prompt-driven foundation models and multi-modal data. While the problem addressed—medical IQA—is critically important, the current submission does not meet the publication standards of MICCAI due to significant methodological, experimental, and theoretical shortcomings.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
This manuscript proposes a framework for medical image quality assessment (IQA) leveraging prompt-driven foundation models and multi-modal data. While the problem addressed—medical IQA—is critically important, the current submission does not meet the publication standards of MICCAI due to significant methodological, experimental, and theoretical shortcomings.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The proposed prompt-driven approach (e.g., anatomical region prompts, modality-specific prompts) is insufficiently novel. The authors claim that their framework dynamically adapts to cross-modality tasks, but the theoretical justification for how these prompts resolve feature redundancy in medical images is weak. For instance:
- The “salient slice assessment module” (Section 2.2) is presented as a key innovation, but its design mirrors existing slice selection strategies in 3D medical imaging The authors do not provide a clear rationale for why this approach outperforms simpler alternatives.
- The two-stage training strategy (upstream pre-training on physical parameters and downstream fine-tuning) is conceptually similar to prior work. The authors fail to demonstrate how their framework uniquely bridges physical parameters (e.g., dose, magnetic field strength) with subjective quality scores compared to existing methods.
- The claim that the model “generalizes across diverse modalities” is overstated. The MedIQA dataset (Section 2.1) combines CT, MRI, and fundus images, but the experimental validation lacks rigor:
- The pre-training dataset (2,500 cases) is small compared to large-scale medical imaging benchmarks (e.g., RadImageNet). The inclusion of public datasets like ADNI MRI and Kaggle DR introduces domain mismatches (e.g., brain MRI vs. retinal fundus images), which the authors do not address.
- The authors claim their model is “interpretable” due to the link between physical parameters and quality scores. However:
- The “explicit association” between dose and image features (Section 2.2) is not rigorously validated. The MSE loss (Equation 1) does not account for non-linear relationships between physical parameters and quality.
- The proposed prompt-driven approach (e.g., anatomical region prompts, modality-specific prompts) is insufficiently novel. The authors claim that their framework dynamically adapts to cross-modality tasks, but the theoretical justification for how these prompts resolve feature redundancy in medical images is weak. For instance:
- Please rate the clarity and organization of this paper
Poor
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(1) Strong Reject — must be rejected due to major flaws
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The contribution of the work is limited.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
1) This paper introduces a foundation model for medical IQA to handle variability in image dimensions, modalities, anatomical regions, and types. They developed a large-scale multi-modality dataset with manually annotated quality scores
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Developed a large foundation model for image quality assessment. The model integrates a salient slice assessment module to focus on diagnostically relevant regions feature retrieval and employs an automatic prompt strategy that aligns upstream physical parameter pre-training with downstream expert annotation finetuning.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The presentation of the paper needs much improvement. The details of the training is not clearly described.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
2) Have you compared with some up-to-date image quality assessment methods for medical images? Consider cite the recent image quality assessment work: H. Fu, et al, “Evaluation of retinal image quality assessment networks in different color-spaces,” in Proc. of 22nd Int. Conf. on Medical Image Computing and
Computer Assisted Intervention-MICCAI. 2019, vol. 11764 of Lecture Notes in
Computer Science, pp. 48–56, Springer.H. Yang, A.S Coyner et al. , “A minimally supervised approach for medical image quality assessment in domain shift settings”, ICASSP’22, pp. 1286-1290
Instead of using the cross-validation to evaluate the performance, can you include discussions on the performance/evaluation using train and test data from different sources to estimate the model adaptation/transferability capability in the practical quality assessment task?
3) Grammar checking. The sentence in the 1st paragraph of page 4, “Prompt strategy matches upstream physical parameters-driven foundation model learning with downstream expert annotation-driven domain-specific knowledge learning to achieve dual supervision of expert annotation and physical characteristics.” Is too long, consider revising and pls check other similar ones as well. 4) Some details are needed on how to train the multiple datasets with different modalities in Section 3.2
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
It is a good attempt to develop a large foundation for medical image quality assessment that can be applied to different downstream quality assessment task.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
The MedIQA model addresses key challenges in medical image quality assessment (IQA) by introducing a comprehensive framework that includes an extensive annotated dataset, a salient slice assessment module, a two-stage training strategy, and an automated prompt design mechanism. These innovations collectively enhance the model’s capacity to focus on diagnostically relevant regions, improve interpretability by linking image quality to physical imaging parameters, and support dynamic adaptation across diverse imaging modalities such as fundus images, chest X-rays, and CT scans. By leveraging a prompt-driven approach and integrating image-text interactions, MedIQA generalizes well across multiple IQA tasks and offers scalability for real-world clinical deployment. The model demonstrates significant improvements over existing methods, positioning it as a robust and effective solution for modern clinical IQA applications.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Comprehensive Dataset: The MedIQA dataset is a large-scale, multimodal, and richly annotated resource that addresses the critical shortage of labelled data in medical image quality assessment. Its diversity across imaging modalities and quality attributes enhances the robustness and generalizability of trained models. Innovative Model Design: Incorporating a salient slice assessment module enables the model to identify and focus on diagnostically significant regions within medical images. This targeted approach improves both the efficiency and accuracy of quality assessments. Effective Training Strategy: The two-stage training framework bridges the gap between raw imaging parameters and expert-derived quality annotations. This strategy enhances the interpretability of model predictions and contributes to improved performance under varying clinical and imaging conditions. Cross-Modality Adaptability: The model employs an automated prompt generation mechanism to adapt dynamically to various imaging modalities and clinical quality assessment tasks. This flexibility makes MedIQA suitable for diverse real-world healthcare environments. Strong Experimental Results: Extensive evaluations show that MedIQA consistently outperforms existing state-of-the-art models across multiple IQA benchmarks. These results highlight its scalability and practical value in clinical image quality assurance workflows. Future Directions: The paper outlines promising avenues for future research, including refining the prompt mechanism, extending to additional modalities, and incorporating user feedback. These directions reflect the model’s strong foundation and the authors’ forward-looking approach to enhancing clinical utility and adoption.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Data Variability Limitations: Although MedIQA utilizes a large-scale pretraining dataset, it may not encompass the full range of variability across all imaging modalities and clinical scenarios. This limitation could hinder the model’s ability to generalize effectively to unseen or rare conditions encountered in real-world settings. Prompt Strategy Dependence: The model’s performance is closely tied to the quality and appropriateness of the automated prompt generation strategy. If prompts are not well-optimized or fail to capture task-specific nuances, the model may exhibit reduced effectiveness, especially in less common modalities or clinical tasks. Constraints of the Salient Slice Assessment Module: While the salient slice mechanism effectively filters out redundant data, it may overlook subtle but clinically significant quality degradations—particularly in long image sequences affected by excessive noise, motion artefacts, or missing frames. Interpretability Challenges: Despite efforts to improve interpretability by correlating quality scores with physical imaging parameters, the inherently opaque nature of deep learning models may still pose barriers to clinical trust and adoption, especially among practitioners seeking transparent and explainable systems. Limited Robustness Under Extreme Conditions: The model’s robustness in handling extreme scenarios—such as severe artefacts, highly degraded images, or anomalous acquisition settings—remains uncertain. Further validation across diverse and challenging clinical environments is necessary to ensure reliability. Scalability and Resource Constraints: While the paper proposes future extensions involving larger datasets and unsupervised learning approaches, these enhancements may be constrained by the availability of computational resources and high-quality labelled data, potentially impacting the model’s scalability and broader applicability.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
The paper presents a substantial and innovative contribution to the medical image quality assessment field by introducing the MedIQA framework. Below are some constructive suggestions to further enhance the clarity, reproducibility, and overall impact of the paper: Methodological Clarity: While the methodology is promising, specific components—particularly the Cross-View Encoder (CVE) and Graph Adapter (GA) modules—would benefit from more detailed descriptions. Including schematic diagrams or flowcharts could significantly improve the reader’s understanding. A step-by-step methodology breakdown would also enhance clarity and facilitate comprehension, especially for readers less familiar with the architectural innovations introduced. Reproducibility: The inclusion of anonymized code is commendable. We recommend adding a clear set of usage instructions to support further reproducibility, including setup details, dependency management, and example commands. A section describing common pitfalls and suggested troubleshooting steps would be helpful for replicators. This guidance can bridge potential gaps between code availability and practical usability. Dataset Transparency: Expanding the dataset description to include specific information on the data acquisition process, preprocessing steps, and inclusion/exclusion criteria would help readers evaluate the model’s generalizability. Clarifying the dataset’s size, diversity across modalities, and demographic balance would also help. If data access is restricted, stating this explicitly will inform potential users and set expectations. Evaluation Metrics Justification: While standard metrics such as Sensitivity, ROC-AUC, PR-AUC, and F1-score are used, a more detailed explanation of their relevance to clinical image quality assessment would strengthen the paper. Discussing why these metrics were selected over others and how they align with clinical requirements would add depth. Supplementary visualizations (e.g., ROC curves or confusion matrices) would enrich the evaluation section. Clinical Relevance and Artifact Distillation: The manuscript would benefit from a more explicit connection between model outputs and their clinical implications. Specifically, discussing how the model supports artefact detection or guides radiologists in quality assurance tasks could enhance its translational value. Outlining how these insights could be distilled into guidelines or tools for clinical use further solidifies the paper’s impact. Transparency in Reporting Results: Comprehensive reporting—including both strong and weaker results—adds credibility. We encourage the authors to include all relevant findings and discuss statistical significance. Providing raw results or additional supplementary materials would further support transparency and allow readers to validate and interpret the findings independently, if possible.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1.Strong Experimental Results: The paper reports superior performance compared to existing methods across multiple benchmarks, showcasing the model’s effectiveness and robustness in various medical imaging scenarios. 2.Clarity and Reproducibility: While the paper demonstrates substantial methodological and experimental contributions, some areas need improvement in terms of clarity, particularly in the description of the Cross-View Encoder (CVE) and Graph Adapter (GA) modules. The methodology could benefit from additional details and diagrams to enhance reader understanding. Additionally, providing more comprehensive reproducibility guidelines and more explicit dataset information would strengthen the paper. 3.Future Directions and Impact: The discussion on future work is encouraging and demonstrates the authors’ commitment to refining the approach, addressing limitations, and improving clinical relevance. The potential for scalability and application in diverse clinical environments adds to the paper’s overall value.
4.Minor Limitations: Despite its strengths, the paper does have some limitations, including potential issues with generalizability to extreme conditions and dependency on prompt strategy optimization. Additionally, some aspects of interpretability and robustness need further exploration.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We thank all reviewers (R) for their valuable feedback and have responded with the necessary information:
Q1: Novelty, Robustness and Explainability (R1, R3) Salient Slice Selection: Integrating anatomical priors based on geometric center constraints focus on local quality features from anchor slices, fused with 3D IQA features, can optimize the quality feature extraction while reducing cross-slice feature redundancy with optimized computational efficiency. Prompt Strategy: Our proposed prompts (dimension/modality/organ/sequence) cover various imaging scenarios comprehensively. Efficient alignment on the prompts between upstream training with physics priors and downstream-task training with annotations ensures task relevance during cross-domain knowledge transfer. Training Strategy: Building on the linear relationship between physical parameters and image quality features (refs.[22,23]), upstream tasks employ MSE to quantify parametric impacts. Downstream-task models can further be adapted to quantify visual quality with expert annotations, with domain-specific prompts enhancing feature explainability as bridges between upstream and downstream trainings. MedIQA outperforms SOTA methods (e.g., Liu et al. ICCV 2021, Dosovitskiy ICLR 2021, Yang et al. CVPR 2022) (Table 1) thanks to each proposed module (Table 2). Its predictions show strong correlation with expert scores in downstream tasks. Detailed explainability comparisons will be expanded in the journal version.
Q2: Dataset Considerations (R1, R3) Scale: We see that increasing samples size (2500→3500 cases) only bring marginal improvements (SRCC: 0.777→0.772, PLCC: 0.811→0.816) in this study. Therefore, future efforts will also focus on incorporating diversified organ types and imaging sequences. Domain Mismatches: Dynamic prompt matching achieves upstream-downstream feature alignment as there are certain overlaps on dimension/modality/organ/sequence. Prompt match can leverage corresponding features from upstream learning, ensuring the transfer and usage of cross-domain tasks. Scalability: MedIQA’s lightweight design (127M parameters, 108 GFLOPs, 1.45 GB GPU) ensures efficient deployment. Leveraging physics-based general feature banks from pre-training and prompt-driven activation, task-specific features can be elicited without exhaustive data training, ensuring adaptability to rare clinical scenarios.
Q3: Details and Repeatability (R1, R2, R3) Section 3.1 details the hyperparameters and procedures. We’ll update training and dataset details in the revision. Most data are from public repositories (fastMRI, Duke, LDCTIQAC2023, ADNI, Kaggle DRD). For the in-house data, we are working with collaborating hospitals to open-source the access, with full code release upon acceptance.
Q4: Comparative/Cross-Dataset Validation (R2) We appreciate reviewers’ valuable references, which will be discussed and cited in the revision. Our work establishes a scalable general assessment framework. Given the scarcity of expert-annotated IQA datasets, we utilize all available data to construct multi-domain downstream tasks. Disparities between pre-training and downstream data (e.g., FLAIR/fundus) simulate cross-dataset validation scenarios. Experimental results demonstrate 7.79% SRCC (76.43%→84.22%) and 7.88% PLCC (78.80%→86.68%) improvements over baselines. More comprehensive external validation will be added in the journal extension.
Q5: Presentation (R2) We appreciate the suggestions for enhancing our writing. We will make sure that we fix all those typos and grammar errors in the revision.
Q6: Clinical Value (R3) MedIQA can be integrated into medical quality control systems to flag suboptimal images after image acquisition. Its lightweight design enables on-device deployment for various medical sequence quality control. Metal and motion artifacts for CT imaging are already included in expert-annotated protocols.
Meta-Review
Meta-review #1
- Your recommendation
Provisional Reject
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
The submission introduces an interesting concept of a prompt-driven foundation model for medical image quality assessment. However, it does not establish clear innovation over recent IQA literature, and the experimental evaluation lacks strong comparisons with advanced methods. The first reviewer raises serious concerns about the model’s claims of cross-modality scalability, arguing that the dataset is too limited and the improvements are marginal. The other two reviewers acknowledge the potential benefits of the framework but remain unconvinced about the clarity of the training strategy, completeness of experimental validation, and reproducibility details. Although the paper addresses an important topic, it does not meet the standards of clarity and methodological rigor required for acceptance. The recommendation is therefore to reject at this time. The authors are encouraged to refine their comparisons with current IQA approaches, expand or better characterize their dataset to validate cross-modality generalization, and provide clearer documentation of training procedures and code.
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper proposes a novel method for medical image quality assessment. The methodology contribution is clear, and the experimental evaluation is comprehensive, demonstrating superior performance over existing methods. Certain parts of the framework are not very clear, such as how the pre-trained models are leveraged in the downstream domain-specific training. The authors are encouraged to improve the writing clarity in the final version.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Reject
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper introduces MedIQA, a foundation model for medical image quality assessment that leverages a new multi-modal dataset, a salient slice selection module, and a two-stage, prompt-driven training strategy. The work aims to create a generalizable framework for IQA across diverse imaging modalities. While one reviewer found merit in the approach, another raised significant concerns about its novelty and scientific rigor. A third review was disregarded due to significant factual inaccuracies, indicating it was written for a different paper. The recommendation for this paper hinges on the conflicting views of Reviewer #1 (Strong Reject) and Reviewer #2 (Weak Accept), and the effectiveness of the authors’ rebuttal in addressing their concerns. • Reviewer #1 raised fundamental criticisms regarding the work’s novelty, questioning whether the proposed techniques (salient slice selection, two-stage training) are sufficiently distinct from prior art. This reviewer also challenged the paper’s rigor, citing the small pre-training dataset (2,500 cases) as inadequate for a “foundation model” and noting that the problem of domain mismatch was not sufficiently addressed. These are critical scientific concerns that challenge the core contributions of the paper. • Reviewer #2’s concerns were largely focused on presentation and clarity. The suggestions to improve writing, provide clearer training details, and add citations are constructive but do not contest the paper’s underlying scientific validity. • The authors’ rebuttal failed to substantively resolve the critical issues raised by Reviewer #1. (1) In response to novelty concerns, the rebuttal primarily re-described the methodology without providing a compelling argument or evidence for its distinction from existing work. (2) The defense of the small dataset was unconvincing. The authors claimed that “increasing samples size…only bring marginal improvements”, an assertion that sidesteps the criticism and appears to contradict the rationale for large-scale pre-training. (3) The rebuttal simply asserted that prompt matching handles domain mismatches, offering no new evidence to support this claim. Recommendation: Reject This decision is based on the significant and unresolved scientific concerns raised by Reviewer #1. The authors’ rebuttal did not adequately address the fundamental questions regarding the novelty of the proposed methods and the methodological rigor of the study, particularly concerning the scale of the dataset. While the paper addresses an important problem, the work in its current state does not sufficiently demonstrate a novel contribution to the field.