Elle Adams Project Review
1 Overview
Title of project: Identifying Key Metadata Predictors of Salmonella AMR Genotypes through Machine Learning Name of project author(s): Marco Reina Name of project reviewer: Elle Adams
1.1 Background, Context and Motivation
How well is the context of the project described? Is a comprehensive background, including summary of previous/related work given? Is the project well placed into the context of existing work (including proper referencing of existing work). Is it clear why the project was undertaken and what new information it hopes to provide?
1.1.1 Feedback and Comments
Very clear on the problem of antimicrobial resistance, and the limited access to whole genome sequencing tools as opposed to metadata. Sets you up really well for your question/hypothesis.
1.1.2 Summary assessment (PICK ONE, DELETE THE OTHERS)
- strong contextualization and motivation
1.2 Question description
How well and clear are the question(s)/hypotheses the project aims to address described? Is it clear how the questions relate to the data?
1.2.1 Feedback and Comments
Can metadata alone be used to predict AMR in Salmonella spp. Designing a machine learning model
1.2.2 Summary assessment
- question/hypotheses fully clear
1.3 Data description
How well is the data overall described? Is the source provided? Is a codebook or other meta-information available that makes it clear what the data is?
1.3.1 Feedback and Comments
Well explained, no notes
1.3.2 Summary assessment
- source and overall structure of data well explained
1.4 Data wrangling and exploratory analysis
How well is the data cleaned/processed and explored? Are all steps reasonable and well explained? Are alternatives discussed and considered? Are meaningful exploratory results shown (e.g. in the supplementary materials)?
1.4.1 Feedback and Comments
Looking at yours makes me feel like I should go back and add to mine and explain my process more. Very thorough.
1.4.2 Summary assessment
- essentially no weaknesses in wrangling and exploratory component
1.5 Appropriateness of Analysis
Were the analysis methods appropriate for the data? Was the analysis done properly? Were different components of the analysis (e.g. performance measure, variable selection, data pre-processing, model evaluation) done in the best way possible and explained well?
1.5.1 Feedback and Comments
Your random forest models are well done and well defended in your manuscript, and you do mention toward the end why you chose to go with random forest. I wonder if you tested any other models anywhere else - only because Prof Handel has talked before about trying multiple and justifying. Not a deal breaker, just wanted to mention.
1.5.2 Summary assessment
- strong and reasonable analysis
1.6 Presentation
How well are results presented? Are tables and figures easy to read and understand? Are the main figures/tables publication level quality?
1.6.1 Feedback and Comments
Very nice
1.6.2 Summary assessment
- results are very well presented
1.7 Discussion/Conclusions
Are the study findings properly discussed? Are strengths and limitations acknowledged? Are findings interpreted properly?
1.7.1 Feedback and Comments
Really well done
1.7.2 Summary assessment
- strong, complete and clear discussion
1.8 Further comments
Side note: When running your code, I got this warning: Warning message: Since gt v0.6.0 fmt_missing() is deprecated and will soon be removed. ℹ Use sub_missing() instead. You might want to make this substiution to maintain reproducibility for the the future.
2 Overall project content evaluation
Evaluate overall features of the project by filling in the sections below.
2.1 Structure
Is the project well structured? Are files in well labeled folders? Do files have reasonable names? Are all “junk” files not needed for analysis/reproduction removed? By just looking at files and folders, can you get an idea of how things fit together?
2.1.1 Feedback and Comments
Initially confused where the processing-code.qmd was, but I see you’ve included what you did in eda.qmd. So no problem.
2.1.2 Summary assessment
- well structured
2.2 Documentation
How well is the project documented? Are you able to understand each step of the whole analysis, each decision that was made, and each line of code? Is enough information provided as comments in code or as part of Rmd files?
2.2.1 Feedback and Comments
Lots of descriptions of what the code does throughout
2.2.2 Summary assessment
- fully and well documented
2.3 Reproducibility
Are all results fully reproducible? Is documentation provided which clearly explains how to reproduce things, and does it work without the need for any manual intervention? Are you able to re-run the whole analysis without having to do manual interventions/edits?
2.3.1 Feedback and Comments
I was able to perfectly rerun, the only thing I had to do was install some packages, which is nothing.
2.3.2 Summary assessment
- fully reproducible without issues
2.4 Thoroughness
How thorough was the overall study? Were alternatives (e.g. different ways of processing the data or different models) considered? Were alternatives discussed? Were the questions/hypotheses fully and thoroughly addressed?
2.4.1 Feedback and Comments
Extremely thorough, again makes me doubt mine.
2.4.2 Summary assessment
- strong level of thorougness
2.5 Further comments
N/A