Machine Learning Model Training, Evaluation and Deployment

profilejack lee
Assignment3ModelTrainingquestionsheet.docx

Phase III Model Training (Data File Phase 3 HTML)

Click on the Phase 3 Model Training Scenario file (copy from HTML data) to read the scenario. After reading through the case, please review the following questions.

Select your choice by using Bold size 14, Explain your choice within a few short sentences. I have indicated my first choice in bold size 12.

Keep the HTML scenario file open so that it is easier for you to look for the information questioned in the quiz/exercise.

Q 1

Part 1.

What learning phenomena is the team observing now?

Convergence

Generalization

Overfitting

Underfitting

Part 2. (a)

What are some techniques that can be applied in order to improve generalization performance? Check all that apply.

Weight decay (L2 regularization)

Increasing the number of model parameters

Dropout

Stronger data augmentation

Early stopping

Part 2 (b)

Sometimes, overfitting is attributed to the task being “too hard.” Given what we know about model behavior during overfitting, how can this explanation be justified?

Answer this: in 5 sentences or less-----------------------------------------------------------------

Part 3.

What is weight decay?

An added penalty in the loss function that encourages semantic clustering in feature space

An added penalty in the loss function that mitigates class imbalance

An added penalty in the loss function that ensures the model is well calibrated

An added penalty in the loss function that discourages models from becoming overly complex

Part 4.

What does dropout do?

Dropout randomly removes layers in the network during training in order to improve the rate of convergence

Dropout randomly removes neurons in the network during training in order to prevent overreliance on any one neuron

Dropout randomly removes layers in the network during training in order to prevent overreliance on any one layer

Dropout randomly removes neurons in the network during training in order to improve the rate of convergence

Part 5. Which of the following are tunable hyperparameters? Check all that apply.

Weight decay strength

Dropout probability

Learning rate

Model weights

Part 6.

The team is noticing counterintuitive results regarding the performance of the model when measured with accuracy and AUROC. What is likely occurring? NOTE: There are 27,000 COVID-negative exams and 3,000 COVID-positive exams, a breakdown of 90% negative cases and 10% positive cases.

Accuracy is a poor metric for performance because of the small number of samples in the test set

Accuracy is a poor metric for performance because of the high class imbalance

AUROC is a poor metric for performance because they have a predetermined threshold in mind

AUROC is a poor metric for performance because it can only be used in multi-class settings

Part 7.

Further analysis shows that the model is predicting that every patient is COVID-negative. What can be done to mitigate this effect? Check all that apply.

Using dropout during training to improve performance on the test set

Undersampling COVID-positive exams during training

Upweighting COVID-positive exams loss during training

Lowering the learning rate to improve convergence

Q 2

Part 1.

What is a pro of using k-fold cross-validation instead of a hold-out validation set for hyperparameter tuning?

Improves model convergence rates because many hyperparameters can be tested at the same time

Regularizes the model by randomly selecting training examples automatically

Requires less overall time to train a model, due to the reduced number of training samples

Produces a more reliable estimate of the generalization performance of the model

Part 2.

What is a con of using k-fold cross-validation instead of a hold-out validation set for hyperparameter tuning?

Increases the number of parameters in the overall model, which leads to overfitting

Decreases model generalization performance because the model is able to learn on the test set

Requires more overall time to train a model, due to the repeated training runs associated with each experiment

Increases the overall memory requirements of the model during training, due to the higher number of samples seen during training

Part 3.

What are common criteria used for early stopping? Check all that apply.

Training AUROC

Test loss

Training loss

Validation loss

Validation AUROC

Test AUROC

Part 4.

Which of the following hyperparameters are exclusive to deep learning models? Check all that apply.

Number of layers

Dropout probability

Weight decay strength

Learning rate

Class weights (loss function)

Part 5 to 10 short answers in 5 sentences or less

In the sensitivity analysis, you identify that a prediction within 2 hours gives you a much higher AUC and PPV. Does this provide a better model to deploy, why or why not?

Part 6

What considerations must be made when applying k-fold cross-validation?

Part 7

You recall that you have another EHR dataset composed of patients with varying respiratory illnesses. One promising direction of research could be to use this dataset as the training dataset for this project and using the COVID dataset as the evaluation dataset. What conditions must be met in order for it to be useful?

Part 8

The team is employing random zoom for the data augmentation task. In general, how should data augmentation transforms be selected?

Part 9

A colleague approaches you and suggests that it would be better if you created a model that relied only on observable feature and exam metadata (patient age, gender, ethnicity, etc.). What tradeoffs must be considered when using lab values as features?

Part 10

Before using the new public COVID dataset, you want to verify that there is no PHI in the data. What are some privacy issues that could come into play with imaging data?

image5.png

image5.wmf

image7.wmf

image1.wmf

image2.wmf

image3.wmf

image4.wmf