Applied Machine Learning
{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# MIT552 Midterm Exam" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# 1. Conduct Lab03A Linear Regression operation by the following procedure." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Import numpy, pandas, and stasmodels.api. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Read “Magnitude.csv” data set as data set df and display head data set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ " # Import seaborn and matplotlib.pyplot." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Draw a regplot diagram between Radian (x-axis) and Magnitude (y-axis). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# a) Run OLS (Ordinary Least Square) between Radian as a predictor, and \n", "# Magnitude as a response. Print the summary of the linear regression result." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# b) Run OLS (Ordinary Least Square) between Magnitude as a response, and Radian and square of Radian\n", "# as predictors.\n", "# Print summary of the linear regression result." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Describe the reason why the OLS performance is better when the sqaure term of Radian is added\n", "# to the Radian in the OLS formula." ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# 2. Conduct KNN method to classify on the response,'AHD' from the Heart.csv data. \n", "# Import necessary modules and read the Heart.csv data into a data frame, df. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# From df, set the first 200 'Age', 'RestBP', 'Chol', 'MaxHR', 'Oldpeak', and 'Slope' data as \n", "# training data, X_train, and the rest as test data, X_test. Set the first 200 ‘AHD’ data as \n", "# training responses, y_train, and the rest as test responses, y_test. \n", "# (y_train and y_test need to convert into arrays with values.ravel().)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Run the KNN method with k = 1 to 5 to fit and predict for X_train, y_train, and X_test.\n", "# Print confusion_matrix and classification_report for each iteration." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# What KNN k value produces the best prediction performance? \n", "# Why KNN k=1 does not produce the best prediction performance?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# 3. Conduct Logistic Regression to classify on the response, 'AHD' in the Heart.csv data. \n", "# Import necessary modules and read the Heart.csv data into a data frame, df. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# From #2 data frame df, set the first 200 'Age', 'RestBP', 'Chol', 'MaxHR', 'Oldpeak', 'Slope', \n", "# and 'AHD' data as training data, X_train1, and the rest as test data, X_test1. (Note that 'AHD'\n", "# is added this time.) \n", "# Set the first 200 ‘AHD’ data as training responses, y_train1, and the rest as test responses, \n", "# y_test1. (y_train1 and y_test1 need to convert into arrays with values.ravel().)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Using the Logistic Regression method, fit the training data.\n", "# Print confusion_matrix and classification_report between y_test1 and predicted data from X_test1. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# If we compare KNN and Logistic Regression for the same Heart.csv data, which method is better?\n", "# Why do you think it is better?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# 4. Conduct LDA-QDA method to classify on the response, 'AHD' from the Heart.csv data. \n", "# Import necessary packages for LDA-QDA analysis." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# Using LDA method, fit a model with X_train and y_train data and predict X_test that were used \n", "# for KNN method.\n", "# Print confusion_matrix and classification_report for the y_test and predictions of X_test." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# Using QDA method, fit a model with X_train and y_train data and predict X_test that were used \n", "# for KNN method.\n", "# Print confusion_matrix and classification_report for the y_test and predictions of X_test." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# For the same data set with the same predictor variables, which analysis tool is working best\n", "# among KNN, Logistic Regression, LDA, and QDA? What might be the reason for the best performance\n", "# by this method?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# For the same data set with the same predictor variables, which analysis tool is working worst\n", "# among KNN, Logistic Regression, LDA, and QDA? What might be the reason for the worst performance\n", "# by this method?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }