Matlab Project

profiledead_fed
Project1-2.pdf

ECE 203: Introduction to MATLAB

Spring 2022: Final Project: Digit Recognition

Introduction

In this project, you will accomplish the task of handwritten digit recognition using a simple

distance based classification. The dataset is posted on Isidore as a part of this project

(“digit_data.mat”). This dataset comprises 5000 distinct handwritten images of numbers 0

– 9. The dataset in the MAT file is organized in such a way that first 500 images belong

to ‘0’, second 500 images belong to ‘1’ and so on. Please have a look at the starter code

provided in your resources.

Algorithmic Steps

1. Create a live script file called “run_digit_recognition.mlx”. Clear workspace and

then load the data. When you load the data, you will observe 2 variables named

images and labels being created in workspace.

2. Visualize every 500th image (1,501, 1001…etc.) to visualize each digit using

imagesc() and gray color map. Try to check the labels for those cases.

3. Now, split the dataset into a set of training and testing data. Use 60th image of each

digit for testing (1, 61, 121 and so on) and rest of the dataset is used for training

Make sure the size of testing images and training images are 20 × 20 × 84 and

20 × 20 × 4916 respectively.

4. Make sure to note the labels of training and testing data as well. Note that train labels

and test labels are of size 4916 × 1 and 84 × 1 respectively.

5. Now, write a nested loop in which you compute the Euclidean distance of each test

image from all training images. So distance would be a vector of size 4916 × 1 for each test image.

6. Later, determine the case with minimum Euclidean distance for each test image and

note the corresponding index.

7. Determine the label with minimum Euclidean distance and that represents your

predicted label. Repeat this process for all test images.

8. Create another ‘for’ loop to display each test image and its closest match image

using subplot() and imagesc() with gray colormap. Do a help on commands if

necessary. Make sure to assign title to each subplot. Use a pause of 0.3 seconds.

You can check the video uploaded as part of this project for reference “Desired

results.avi”. You may also notice that the algorithm fails for a total of ‘6’ cases in

that video.

9. Now, determine the accuracy of your distance based classification algorithm using

the below mentioned formula. (Expected performance is around 92.86%).

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝐼𝑑𝑒𝑛𝑡𝑖𝑓𝑖𝑒𝑑 𝑇𝑒𝑠𝑡 𝑆𝑎𝑚𝑝𝑙𝑒𝑠

𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑇𝑒𝑠𝑡 𝑆𝑎𝑚𝑝𝑙𝑒𝑠 ∗ 100

10. Write a detailed formal report. Make sure to include equations as necessary

(especially Euclidean distance). Make sure to include some example cases where

the prediction is correct and incorrect. Suggest methods that can help in improving

the accuracy.

Things to be submitted:

 run_digit_recognition.mlx

 run_digit_recognition.PDF (Live script exported to PDF)

 Report (in pdf format)