Data Collection for Machine Learning

profilejack lee
Phase1MultiplechoiceonDataCollection.docx

Phase 1 Data Collection

· Describe the opportunities and challenges for utilization of clinical data.

· Apply the framework for conceptualizing data usage in healthcare.

Q 1 (Select the correct answer with an explanation in 2 or 3 sentences)

Part 1.

How are images commonly represented when given to a deep learning model?

As a 1-dimensional vector, where each number is a hand-picked feature

As layers of number grids, where each number is pixel intensity

As a sequence of 1-dimensional vectors, where each number is pixel intensity

As a 1-dimensional vector, where each number is pixel intensity

Part 2.

What deep neural network architecture is most commonly used for image classification?

Recurrent Neural Network (RNN)

Multi-layer Perceptron (MLP)

Generative Adversarial Network (GAN)

Convolutional Neural Network (CNN)

Part 3.

What is the kind of question being answered via the COVID detector?

Multi-label classification

Binary classification

Sequence-to-sequence translation

Linear regression

Part 4.

You are interested in further leveraging hospital resources in order to boost the performance of your COVID detector. Which of the following actions would improve the likelihood of a high performing model? Check all that apply.

Giving the machine learning team segmentation labels for a small subset of the COVID chest x-ray dataset.

Giving the machine learning team access to an existing COVID detector.

Giving the machine learning team a large dataset of chest x-rays, even if they do not originate from COVID-positive patients.

Giving the machine learning team the text reports associated with each of the COVID chest x-ray examinations.

Q 2 Data Collection ( check the correct answer)

Part 1.

How can we represent a patient’s electronic health record, a form of structured data, to a machine learning model?

As layers of number grids, where each number is pixel intensity

As a 1-dimensional vector, where each number is a hand-picked feature

As a 1-dimensional vector, where each number is pixel intensity

As a sequence of 1-dimensional vectors, where each number is pixel intensity

Part 2.

Given the type of data available, which of the following are reasonable alternative framings of task at hand, from a machine learning perspective? Check all that apply.

A regression model that predicts the patient’s date of death.

A model that predicts the number of days before a patient requires invasive mechanical ventilation. This model would be trained only on patients who required invasive mechanical ventilation.

A binary classification model that predicts whether or not the patient will require hospitalization.

A model that predicts what range of days it will take for a patient to require invasive mechanical ventilation. The 4 categories include: [“0-4 days”, “5-9 days”, “10-14 days”, “14+ days or will not need one”]

Part 3.

Given that we are training a model to predict whether or not the patient requires invasive mechanical ventilation, which of these values should NOT be passed into the model as a feature? Check all that apply.

Ferritin

Invasive mechanical ventilation date

Patient birth date

D-DIMER

White Blood Cell count

Ventilator setting

Patient inpatient arrival date

Part 4.

Imagine the path that the patient data took through the healthcare system. What are some possible errors that might have gotten introduced to the data before it was published? Check all that apply.

The patient was a recent transfer from another system

The patient comes to ED and gets immediately intubated, thus no labs are provided

Labs are logged AFTER the invasive mechanical ventilation

The patient had been to the hospital multiple times

image2.wmf

image1.wmf