Engineering
Image Processing and Deep Learning (EEEM063)
DIGITS Introductory Deep Learning Lab
Dr John Collomosse - Autumn 2017
Introduction
In this lab you will use a web based interface called “DIGITS” to run some simple classification experiments using a convolution neural network (CNN). This lab assumes use of DIGITS v3.0.0 or higher.
CNNs are state of the art deep neural networks that perform well at machine perception tasks such as image classification. You will start learning about these in detail within Week 6.
To do this work you will need to connect to a teaching server called aineko1 on the campus network. The server is not visible off campus, so you are encouraged to use the lab or library PCs on campus to do this work. If you want to use your own laptop then you will be able to connect via the campus wide “eduroam” wifi network. The campus wide “The Cloud” network will not work. It is possible to connect from offcampus using the https://anywhere.surrey.ac.uk facility and entering the web interface address including http:// into the text box in the upper-right of the portal.
The address of the web interface is http://aineko.eps.surrey.ac.uk:34448
Verify now that you can connect to the web interface via your browser, or you will not be able to progress any further.
It is possible to install your own version of DIGITS on your local lab PC (e.g. in the Penguin lab). Since we have a large class this year this may be useful if the aineko server becomes very busy. You can refer to the supplementary instructions on SurreyLearn if you find is desirable to do this. If you want to install DIGITS yourself on your own machine then you might find those instructions a useful starting point but note that we do not have the resource to assist 50+ students installing their own DIGITS on their own variants/configurations of Linux so if you try this you are on your own.
The aineko server (pictured right) is an Intel i7 PC with 4 Nvidia Titan-X GPUs which power our deep learning experiments.
1. Getting Started
We will be working with a dataset of hand-written numbers 0-9, collected by the US postal service from mail. The dataset is called MNIST and contains 70k images each only 28x28 pixels in size.
1 https://en.wikipedia.org/wiki/Accelerando
You must first create your own work area on the server and download the database.
Remotely login to the teaching server using ssh. In the Linux labs you can open up a terminal window using Ctrl-Alt-T and then enter the following
ssh aineko.eps.surrey.ac.uk –l your_username
Note that character after the hyphen is a lower case L not a 1! Please substitute username you’re your actual username. The password is your URN. If you are prompted by an “are you sure…” prompt just type the word yes. If you can’t log in the we will have to create you an account.
Alternatively you may issue the same command in the Mac OS “Terminal” (usually found under /Applications/Utilities) or use a Windows ssh client such as PuTTY.
When you have logged in, please create a folder in the ‘scratch’ area of the server to work in:
mkdir /scratch/Teaching/your_username
Please note that anyone will be able to access anyone else’s work on this part of the teaching server, so take care not to trample over each others’ files or leave anything sensitive such as coursework submissions in this space.
Now download the MNIST dataset in a format that can be used with DIGITS, using the built-in tool
cd /opt/DIGITS python -m digits.download_data mnist /scratch/Teaching/your_username/mnist Change into the workspace you created
cd /scratch/Teaching/your_username
If you type ls to list the files in your workspace you will see that a folder mnist has been created containing folders train and test. Both folders contain subfolders 0,1,2..,9 which contain the images.
We will be using train as our training image set (contains around 60k images) which we will show to the CNN during training. We will use test as our test image set (contains around 10k images) which we won’t show to the CNN during training, but will use after training to measure how well the CNN has learned to recognise the ten kinds of digit.
In addition to images, each folder train and test contains a pair of text files
train.txt or test.txt which is a list of every file in the image set, a space, and then a number which is associated with a class (there are 10 classes, numbered 0-9). One line in the file corresponds to one image file.
labels.txt which contains 10 lines each providing a descriptive name for each of the 10 classes – which coincidently in this case also the names 0,1,2,..,9.
Take a look inside the files using the Linux cat command to see how they are formatted e.g.
cat train/train.txt Remember you can use Ctrl-C to stop if it is scrolling for a long time. cat train/labels.txt
Imagine if we were working a different image classification with the ImageNet dataset, which contains 16 million images of 1000 classes of object. We would see numbers in train.txt from 0-999 and then 1000 lines in labels.txt containing the actual names of each class e.g. dog, cat, tree, etc.
2. Import the dataset into DIGITS
Go to http://aineko.eps.surrey.ac.uk:34448 and look at the “Datasets” tab – there may or may not be datasets already listed in there from other users. In any case you will be creating your own by following these steps. Click on the blue “Images” button by “New Dataset” and select “Classification” as the dataset type from the dropdown menu.
or
Now fill in the form you are presented with as per the following page. On the server everyone has access to everything – there are no private work areas. So, it is very important that you name everything using a standard convention. We will create a dataset called ‘yourusername_mnist_dataset’.
Make sure that everything you create in DIGITS starts with the prefix yourusername_
You need to click on the “Use Text Files” tab which will use the train.txt etc. files we just inspected as lists from which to build.
Note that the dataset name starts myusername_ i.e. it is jc0028_mnist_dataset. Ensure you follow this naming convention to prevent problems with other users. Note that:
• images are greyscale and of size 28x28. • we have unchecked “validation” and checked “test”. • we are going to use files already on the teaching server (in your area) rather than uploading
them via the browser, so check “Use local paths on server”. • the locations of the training, test and labels text files images… i.e.
/scratch/Teaching/yourusername/mnist/train/train.txt /scratch/Teaching/yourusername/mnist/test/test.txt /scratch/Teaching/yourusername/mnist/train/labels.txt
• finally note that “image folder (optional)” is filled in with /scratch/Teaching/yourusername/
Click create and you will see some progress bars in blue on the right hand side of the screen. It will take about 60 seconds to create the dataset from the 70k images in MNIST.
If you get errors, check you didn’t lead off the trailing / on that last field, and check all spelling.
If you click on the word “DIGITS” on the top-left to go home, or go to the original URL, you will see your dataset in the active datasets listed with “Done” (or in progress if you didn’t wait for the blue bars to go to 100%, in which case wait for the job to complete on the main screen)
3. Training a CNN Now we will train a standard CNN called “LeNet” to recognise the numbers 0-9 in the MNIST dataset. This popular yet simple CNN architecture is included as a preset within DIGITS so is easy to try out. On the Models tab/box click on the blue Images button by New Model, and pick Classification.
Then fill in the form that appears to tell DIGITS how to run the training. First you need to select the dataset your prepared (which will be easy to find, because you named it using yourusername_ as a prefix). Next you need to select the LeNet CNN Finally you need to name this training job. Again we keep to a careful convention. We will use: yourusername_dataset_network_anythingyouwant So, I have used for example, jc0028_mnist_lenet_exp1 Where exp1 means experiment 1, but you can use anything you like for this. It makes sense to keep a notepad beside the PC to record what settings you used for each experiment, for ease of use later. For now, leave all the other settings as they were originally and click Create.
You will see a blue progress bar again on the right, and a graph in the centre of the page which will update itself as training proceeds. After about 60 seconds (30 iterations or to use the terminology “epochs”) of training, the job will be complete. If you click out of this screen back to the DIGITS home page (click on DIGITS on the top-left) you will see a list of all experiments. You will be able to get back into these results by clicking on the old experiment. Later, when you run longer experiments, you can do this to leave a job running on the server and return later to analyse the results. Please do not use more than 1 GPU out of the 4 available for any single job. Your end result should look something like this – a blue graph spanning 30 epochs that converges to zero after about 5 epochs.
The first graph is a plot of something called the “training loss”. The second is a plot of the “learning rate”. Both are vs. the epoch number 1-30. We will discuss the meaning of the graphs shortly.
NOTE: Choosing a GPU
The server you are using has 4 GPUs. You can either let DIGITS choose which GPU to use (the default option) or select GPUs at the bottom of the screen in DIGITS by highlighting them in a list.
We advise you let DIGITS choose the GPU for you by not adjusting anything in this section.
However sometimes DIGITS will get confused and allocate all jobs to a single GPU and you may see “out of memory” or “cuDNN 0=4” or similarly phrased errors. In this case the GPU is fully loaded and you should select manually a different GPU. If you want to see which GPUs are heavily loaded you can run the nvidia-smi command within the ssh window you created in step 1.
Last login: Fri Nov 4 16:28:04 2016 from penguin33.eps.surrey.ac.uk $ nvidia-smi Fri Nov 4 16:55:42 2016 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 361.45.18 Driver Version: 361.45.18 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX TIT... Off | 0000:05:00.0 On | N/A | | 28% 61C P8 18W / 250W | 240MiB / 12287MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX TIT... Off | 0000:06:00.0 Off | N/A | | 27% 62C P8 18W / 250W | 24MiB / 12287MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX TIT... Off | 0000:09:00.0 Off | N/A | | 28% 61C P8 15W / 250W | 24MiB / 12287MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX TIT... Off | 0000:0A:00.0 Off | N/A | | 50% 81C P2 90W / 250W | 10200MiB / 12287MiB | 53% Default | +-------------------------------+----------------------+----------------------+ --+
Here you can see the first three GPUs are idle (the 4th GPU has a job taking up 53% of its capacity)
4. Testing the trained model When you clicked Create to train the LeNet CNN, a process called “supervised training” took place. In supervised training, a classifier (here, a CNN) is shown many examples of images and their correct labels e.g. a picture of a 2 and the label 2. Eventually the classifier learns a model that can be applied to new, unseen data – which we call the “test” data. The supervised training of CNN is performed iteratively. Subsets of labelled training data are shown to the network in training iterations called “epochs”. The CNN should get better at classifying data each epoch. If you followed the steps above correctly, your LeNet CNN was trained for 30 epochs using batches of training data sampled from the 60k images in the MNIST dataset.
In LeNet there are 1 million weights internally within the neural network that are configured during training. We call this learned configuration the “model”. The trained model can now be applied to some new data (some or all of the “test” image set) to check how well it is performing. This tests the CNN’s ability to generalise over unseen data i.e. how well it learned. Recall that MNIST contains a test set of 10k images, entirely separate from the training data. We will use some of those images now to test the network. The local path to a particular image in the test set is as follows: /scratch/Teaching/yourusername/mnist/test/n/mmmmm.png Where you can substitute m for any digit 0-9, and mmmmm.png for some 5 digit number e.g. 01570.png to get different test images. Note not all numbers are used. Let’s download a single image to our local workstation from the teaching server. In your terminal window (hit Ctrl-Alt-T in Linux) type the following: scp yourusername@aineko.eps.surrey.ac.uk:/scratch/Teaching/yourusername/mnist/test/0/01570.png . Remember your password is your URN unless you changed it (use passwd to change it) Any problems ensure you haven’t omitted the <space> <dot> at the end. Make sure you are using a fresh local terminal window, and aren’t typing in the original window you used to download MNIST since typing in commands there will run them on the server – not on your local machine. This will download the specified test image from your area on teaching server to your home folder. Now you can upload it to DIGITS to see if your trained CNN can correctly recognise the image. Recall your model from the DIGITS home screen and scroll to just below the blue graphs.
Above the “Classify One” button, use “choose file” to select the test image you just downloaded. Click on Classify One and upload the image back to the teaching server. You will get a result like:
We can see here the image on the left, and the top 5 labels the CNN believes should be associated with that image. In this case (01570.png) the CNN is saying with 98.66% probability that the digit is a zero (correct). Now let’s test a more substantial set of test images. Download the entire 10k image test.txt list: scp deepteach@aineko.uplink.li:/scratch/Teaching/yourusername/mnist/test/test.txt . Now, we need to edit the test.txt file… it will look like somethin this:
./mnist/test/7/00000.png 7 ./mnist/test/2/00001.png 2 ./mnist/test/1/00002.png 1 ./mnist/test/0/00003.png 0
./mnist/test/4/00004.png 4
./mnist/test/1/00005.png 1 etc…
Use a text editor to modify each line to point to the absolute path of each file i.e.
/scratch/Teaching/yourusername/mnist/test/7/00000.png 7 /scratch/Teaching/yourusername/mnist/test/2/00001.png 2 /scratch/Teaching/yourusername/mnist/test/1/00002.png 1 /scratch/Teaching/yourusername/mnist/test/0/00003.png 0 /scratch/Teaching/yourusername/mnist/test/4/00004.png 4 etc.
You can delete the rest of the file – there is no need to test all 10k images! Now select the edited test.txt file on your local machine using the “Choose file” button above the “Classify Many” button. Once chose, hit Classify Many. This will sample 100 test images at random from the file and show the output.
In this output, the column “Ground Truth” tells you what the test image actually is, and then the top 5 classes are shown in successive columns. Here we can see the CNN performs very well, as the correct class is identified in the first column with near 100% accuracy every time. Congratulations you have now trained and tested your first CNN! 5. Analysing the training of the CNN model
Testing your trained model takes time, and we can get some indication as to how well the network learned, without even looking at test data. From the DIGITS home screen, click on your trained model to pull up the training graph again. You should have ended up with a graph similar to:
The blue line on the graph is the “training loss”. When the CNN is trained, the weights in the network are adjusted to minimise a mathematical expression called a “loss function.” These come in various forms, but a popular one for classification is the “SoftMax Loss” (discussed in lectures). Training occurs iteratively in “epochs”. Here, we have used 30 epochs of training but it is clear this was excessive as the loss bottomed out at around 5 epochs. During a single epoch of training, a batch of data is sampled at random from all of the training data. Each image in the batch is fed through the CNN with its current configuration weights, to get a classification decision. The loss function measures how “wrong” that decision is.. for example an image of a 4 might be classified as a 5. These losses are combined to produce a score– the loss - which is shown here in blue. The loss is also used internally during training to update the weights of the CNN via a process called back propagation so that it performs better in the next epoch. Initially – in the first epoch – the weights in the CNN are totally random so the loss is very high, but after just a few epochs of training, the loss is much lower (better). If the loss didn’t tend toward zero quickly, and hovered around the same high value despite many epochs of training we say the network has “not converged”. There are several reasons a network might not converge and is the main obstacle to overcome when applying deep learning to your classification problem. Practical reasons for non-convergence include:
• Problems with the training data e.g. not diverse enough, not enough of it (soln. more data) • CNN architecture is the wrong design (soln. try other designs) • The learning rate is wrong (soln. try other learning rates)
6. Learning Rate
The other graph we saw on the model page plotted the learning rate over time. In the default DIGITS configuration, the learning rate starts high, and automatically reduces as the epoch count rises. The initial learning rate was set in the box “learning rate” on the left of the screen when you created the model – it’s value was 0.01.
The learning rate is a critical factor in getting the CNN to train correctly. For every machine learning problem there is a “sweet spot”. Too low, and the CNN will take a very long time to train and may not converge at all. Too high, and the network will converge to a high loss value i.e. not train. Go back to your model e.g. yourusername_mnist_lenet_exp1 and hit the “Clone Job” button. You are now setting up another model for training – so modify the name of the model to exp2. Now, try changing the learning rate to 0.001 i.e. an order of magnitude lower. Similarly try changing to 0.1. What happens? When hunting for a good learning rate, it is normal to vary in orders of magnitude like this rather than 0.01, 0.02 etc. As you can see there are also “advanced learning rate” options which control the stepping down behaviour observed in the graph. 7. Working with a Validation Set Monitoring the training loss graph is an important debugging tool when trying to get your CNN to train properly. However it is often insufficient to predict how well the trained CNN will perform over unseen test data. This is because the network might be learning to classify the training data very well, but will be hopeless at classifying unseen test data. We call this “overfitting” and it is a common problem in training any machine learning system. Overfitting usually occurs if your training data is not sufficiently diverse to capture likely test data scenarios, or you have run the training for too many epochs. To counter this problem, we keep hold back a small amount of training set – call the “validation” set – and we calculate the loss over this validation set too. Whilst it does not impact the training of the network directly, it allows us to see – at each epoch - how the trained network would perform against some unseen data. Normally we would expect both the validation and training loss to converge i.e. go low but then after further epochs the validation loss might go high. That combination i.e. a low training loss and a high validation loss shows us that the network has overfitted. It means we should have stopped the training at an earlier epoch. We will now work with a more challenging dataset that is split 3 ways into train, validation and test data
The dataset is called “iCub” and was created by waving 4 different objects in front of iCub robot’s webcam and saving the resulting 3029 video frames as separate images, separated into 4 folders – ball, cube, cup, tractor. Such a dataset could be created using any video camera and free software to break a video file up into individual frames. Step 1 – Create the dataset in DIGITS Create a new Image classification dataset from the DIGITS homepage by clicking on the blue button in the Datasets box, and selecting “classification”. Name the dataset yourusername_icub_dataset. For this dataset you can simply use the “Use Image Folder” tab. In the “Training images” box enter /scratch/Teaching/robotobjects In the % for validation box, enter 25%. In the % for testing, enter 10%. Leave everything else as initially set. Hit Create the build the dataset.
Step 2 – Train the CNN Create a new CNN training model from the DIGITS home screen, as you did before. Instead of using LeNet we will select AlexNet – a deeper CNN (has more layers) that you will learn about in lectures. Change the number of training epochs from 30 to 15, and enter a model name under the usual naming convention e.g. yourusername_icub_alexnet_exp1
Training will proceed as before (it will take a few minutes) and your graph will contain not only the training loss (blue) but a further two traces, based on the validation data.
The green trace is a loss calculated over the validation data. It does not influence the training process but is a useful monitor to check we are not overfitting. At 15 epochs we can see both nicely converge to zero so we can be confident this is not the case. The yellow trace is the accuracy, which should be roughly the inverse of the green (validation loss) trace. It is the percentage of the validation data that was classified correctly using the CNN at that epoch. In this case we get around 100% at 15 epochs. We can see this training has been a success. In the simple datasets in this labsheet, it is difficult to cause the CNN to overfit. However this is something you should watch for (i.e. blue drops but green starts to rise or yellow starts to drop) in other more substantial classification problems. 8. Resuming training in DIGITS if a job aborts Sometimes a job will (seemingly inexplicably) abort itself in DIGITS. Or, it may be interesting to continue training beyond the epoch count initially specified. In that case, you can Clone Job on the model and resume training by specifying a new job name, and selecting “Previous Networks” tab. Just pick the model corresponding to the job that stopped, and which Epoch you wish to resume from (practically if the job stopped at Epoch n, restart it from n-1 or n-2).
Congratulations you have now trained and tested a couple of CNNs for the task of image classification. In the lectures you will learn a lot more about the internals / how CNNs work. This concludes the DIGITS Introductory lab for EEEM063.