Tensorflow: The first project

We will try to build a deep learning model that differentiates between images of two kinds of objects. For this we will use the fashion mnist data which could be readily used from the keras API included with tensorflow.

First import the necessary packages.

Python





import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras

from sklearn.model_selection import train_test_split

#%% shuffle
from sklearn.utils import shuffle

Read data included in the keras library.

Python





fmnist = keras.datasets.fashion_mnist
(train_data, train_labels), (test_data, test_labels) = fmnist.load_data()




Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
29515/29515 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26421880/26421880 ━━━━━━━━━━━━━━━━━━━━ 1s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
5148/5148 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4422102/4422102 ━━━━━━━━━━━━━━━━━━━━ 1s 0us/step

Let’s first see the size of the data and different classes present.

Python





print('Size of train_data: ', train_data.shape)
print('Size of test_data: ', test_data.shape)

print('Unique classes: \n', np.unique(train_labels))




Size of train_data:  (60000, 28, 28)
Size of test_data:  (10000, 28, 28)
Unique classes: 
 [0 1 2 3 4 5 6 7 8 9]

We can see that there are 60 thousand samples in the train data and 10 thousand samples in the test data.
Each sample is of dimension (28,28)

Let’s see what is the range of values the first sample occupies.

Python





s1 = train_data[0]
print('Min value: ', s1.min())
print('Max value: ', s1.max())




Min value:  0
Max value:  255

We will divide all the values by 255 to normalize the data. Machine learning algorithms perform better on normalized data.

Python





train_data = train_data / 255
test_data = test_data / 255

We will also reshape the data such that each sample would have shape = (28,28,1).

Python





train_data = train_data.reshape(train_data.shape[0], train_data.shape[1], train_data.shape[2], 1)
test_data = test_data.reshape(test_data.shape[0], test_data.shape[1], test_data.shape[2], 1)

# see the shape of one of the samples
print(train_data[0].shape)




(28, 28, 1)

As preprocessing protocol, we normalized and reshaped the raw data. We can combine these two steps into a function and run the process again.

Combining a set of processes in a function makes our code much clearer and easy to use if we want to perform the exact preprocessing steps on multiple sets of data and their labels.

Python





def preprocess(X, y):

    num_samples = X.shape[0]
    num_rows = X.shape[1]
    num_cols = X.shape[2]

    # labels to binary
    y_rv = np.where(y == 1, 0, 1)
    #reshape
    y_rv = y_rv.reshape(num_samples, 1)


    # divide X by 255
    X_rv = X / 255
    # reshape
    X_rv = X_rv.reshape(num_samples, num_rows, num_cols, 1)

    return X_rv, y_rv

Now, we can use this preprocessing function for our raw data.
We will start from reading the data

Python





fmnist = keras.datasets.fashion_mnist
(train_data, train_labels), (test_data, test_labels) = fmnist.load_data()

As we are going to make a classification model for just two classes, we will select the data from two classes only. For example, from class 1 and 8 here.

Python





# select two classes
# from train data
train_inds = np.where((train_labels == 1) | (train_labels == 8))
sel_X_train, sel_y_train = train_data[train_inds], train_labels[train_inds]
# from test data
test_inds = np.where((test_labels == 1) | (test_labels == 8))
sel_X_test, sel_y_test = test_data[test_inds], test_labels[test_inds]

Now, perform the preporcessing on the selected data.

Python





X_train, y_train = preprocess(sel_X_train, sel_y_train)
X_test, y_test = preprocess(sel_X_test, sel_y_test)

The preprocess functon also converts the label to either 0 or 1.
The labels that were 1 in origial data would be converted to 0.
The labels that were 8 in origial data would be converted to 1.

It is also a good practice to shuffle the data so that our results do not depend on the sequence of samples.

Python





X_train, y_train = shuffle(X_train, y_train)
X_test, y_test = shuffle(X_test, y_test)

The data is now ready to be trained by neaural network but, for curiosity, let’s visualize some of the samples.

Python





fig, ax = plt.subplots(2,3) # (num_rows, num_columns)
c = 0 # index counter
for i in range(2): # iterate over rows
  for j in range(3): # iterate over columns
    ax[i,j].imshow(X_train[c]) # the c-th image will be plotted
    c += 1 # increase c

plt.show() # show plot

So, we are trying to make a model that tells us if the image is of a trouser or a bag.

Let’s make a functions for constructing our model. We will name it as get_model. This function will generate all the layers in the neural network and compile

After the layers are assigned, the model is compiled. Information about loss function, optimizer and metrics to be tracked are specified in the compilation statement.

We will be using the functional API of keras to construct the layers of the neural network. This is not essential for our particular example. A simple sequential model could be constructed as the network layers we are going to use are a simple stack of layers one above the other.

The functioanl API allows us to construct more complex model architechtures, hence, learning this API even for a simple model architechture would help later.

Before writing the code for generating the model architechture, following is how it is done in the form of pseudocode. Input of each of the subsequent layer is the output of the previous layer.

inp = input layer (input shape = shape of one sample)

x = conv2D layer 1 (takes inp from previous statement)
x = maxpooling layer 1 (takes x from previous statement)

x = conv2D layer 2 (takes x from previous statement)
x = maxpooling layer 2 (takes x from previous statement)

x = conv2D layer 3 (takes x from previous statement)

x = flatten (takes x from previous statement)

x = Dense layer (takes x from previous statement)

out = Dense layer (takes x from previous statement)

(1 neuron as it is binary classification probelem)

Construct model from the layers

Compile model

Note that we are not using maxpooling layer after the third Conv2D layer.

The output layer has just 1 neuron because we are solving a binary classification problem. The output of such models would give one number between 0 and 1 for each sample. This can be interpreted as the probabilty that the sample belongs to class 1. We will discuss this later when we evaluate the model.

Following is the code for these same steps written in one function.

Python





# model
def get_model():
    # input layer
    inp = keras.layers.Input(shape=X_train.shape[1:])

    # first Conv2D and MaxPooling layer
    x = keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu')(inp)
    x = keras.layers.MaxPooling2D((2,2))(x)

    # second Conv2D and MaxPooling layer
    x = keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu')(x)
    x = keras.layers.MaxPooling2D((2,2))(x)

    # third Conv2D and MaxPooling layer
    x = keras.layers.Conv2D(128, kernel_size=(3,3), activation='relu')(x)

    # flatten
    x = keras.layers.Flatten()(x)

    # dense layer
    x = keras.layers.Dense(64, activation='relu')(x)

    # output layer
    out = keras.layers.Dense(1, activation='sigmoid')(x)

    # construct model
    model = keras.Model(inputs=inp, outputs=out)

    # compile model
    model.compile(loss='binary_crossentropy',
                  optimizer = keras.optimizers.Adam(),
                  metrics=['accuracy'])

    print(model.summary())

    return model

We will train the model now.
First we have to call the get_model function and train the model using fit method of the model.

The fit method would train the model on the data given to it. It takes few arguments, some of which we will use here.

First, obviously, the training data and the labels.

The epochs defines the number of times the model weights need to be updated after fitting all the data. The selection of appropriate number of epochs would avoid overfitting and underfitting of the model.

The batch_size divided the data into smaller chunks. We will use a batch size of 32 here, which means that the model will be shown 32 samples at a time. When one set of 32 samples are processed the model weigths will be updated.

Diving data into smaller batches utilizes lesser amount of computer memory. It is useful when training data has large number of samples. Also, having batch size equal to numbers which are power of 2 have been shown to give better results.

The number of times the model weights are updated depend on the epochs and batch_size. For example, if our data is divided into 5 batches and we have 50 epochs, the model weights would be updated 50*5 times, i.e., 250 times.

The validation_split argument takes a value from 0 to 1. It splits training data into training and validation sets. If the value of validation_split is 0.1, the data is split such that the model is trainined on 90% of the training data. The remaining 10% of the data is used as validation data.

As the model is being trainined, its performance should also increase on validation data. Results on training and validation sets give us an idea about how good a model might be on test sets. if the accuracy on validation set does not improve or gets worse with increasing epochs, the model would not perfrom well on test data.

While training, it can also output some key information that would give us more insights about the training process. The history object is used here to store this information.

Python





model = get_model()
history = model.fit(X_train, y_train,
                        epochs = 50,
                        batch_size = 32,
                        validation_split=0.2)

You will see an output similar to this. I have included a screenshot of the model summary printed out on jupyter, so that the table can be viewed on this post, same as printed in the jupyter notebook.





Total params: 166,529 (650.50 KB)

Trainable params: 166,529 (650.50 KB)

Non-trainable params: 0 (0.00 B)

None
Epoch 1/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 14s 39ms/step - accuracy: 0.9733 - loss: 0.1015 - val_accuracy: 0.9967 - val_loss: 0.0120
Epoch 2/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 20s 37ms/step - accuracy: 0.9963 - loss: 0.0124 - val_accuracy: 0.9967 - val_loss: 0.0123
Epoch 3/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 34ms/step - accuracy: 0.9967 - loss: 0.0104 - val_accuracy: 0.9983 - val_loss: 0.0056
Epoch 4/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 22s 39ms/step - accuracy: 0.9964 - loss: 0.0113 - val_accuracy: 0.9983 - val_loss: 0.0071
Epoch 5/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 21s 42ms/step - accuracy: 0.9976 - loss: 0.0090 - val_accuracy: 0.9971 - val_loss: 0.0094
Epoch 6/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 19s 37ms/step - accuracy: 0.9973 - loss: 0.0057 - val_accuracy: 0.9979 - val_loss: 0.0055
Epoch 7/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 21s 38ms/step - accuracy: 0.9995 - loss: 0.0028 - val_accuracy: 0.9975 - val_loss: 0.0087
Epoch 8/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 11s 38ms/step - accuracy: 0.9991 - loss: 0.0028 - val_accuracy: 0.9975 - val_loss: 0.0088
Epoch 9/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 20s 36ms/step - accuracy: 0.9996 - loss: 0.0016 - val_accuracy: 0.9962 - val_loss: 0.0183
Epoch 10/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 21s 39ms/step - accuracy: 0.9983 - loss: 0.0073 - val_accuracy: 0.9967 - val_loss: 0.0117
Epoch 11/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 11s 38ms/step - accuracy: 0.9990 - loss: 0.0032 - val_accuracy: 0.9975 - val_loss: 0.0069
Epoch 12/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 11s 36ms/step - accuracy: 0.9996 - loss: 0.0017 - val_accuracy: 0.9983 - val_loss: 0.0084
Epoch 13/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 42ms/step - accuracy: 0.9998 - loss: 7.9157e-04 - val_accuracy: 0.9975 - val_loss: 0.0126
Epoch 14/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 39ms/step - accuracy: 0.9977 - loss: 0.0085 - val_accuracy: 0.9962 - val_loss: 0.0133
Epoch 15/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 34ms/step - accuracy: 0.9989 - loss: 0.0037 - val_accuracy: 0.9979 - val_loss: 0.0101
Epoch 16/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 23s 44ms/step - accuracy: 0.9998 - loss: 7.4363e-04 - val_accuracy: 0.9983 - val_loss: 0.0083
Epoch 17/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 39ms/step - accuracy: 0.9995 - loss: 0.0014 - val_accuracy: 0.9979 - val_loss: 0.0116
Epoch 18/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 20s 38ms/step - accuracy: 0.9995 - loss: 0.0010 - val_accuracy: 0.9983 - val_loss: 0.0085
Epoch 19/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 39ms/step - accuracy: 0.9999 - loss: 1.3536e-04 - val_accuracy: 0.9983 - val_loss: 0.0118
Epoch 20/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 21s 40ms/step - accuracy: 1.0000 - loss: 4.9823e-05 - val_accuracy: 0.9983 - val_loss: 0.0131
Epoch 21/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 39ms/step - accuracy: 1.0000 - loss: 9.3231e-06 - val_accuracy: 0.9983 - val_loss: 0.0138
Epoch 22/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 41ms/step - accuracy: 1.0000 - loss: 6.2949e-06 - val_accuracy: 0.9983 - val_loss: 0.0143
Epoch 23/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 20s 38ms/step - accuracy: 1.0000 - loss: 3.4584e-06 - val_accuracy: 0.9983 - val_loss: 0.0146
Epoch 24/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 21s 39ms/step - accuracy: 1.0000 - loss: 6.8804e-06 - val_accuracy: 0.9983 - val_loss: 0.0149
Epoch 25/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 20s 37ms/step - accuracy: 1.0000 - loss: 3.6177e-06 - val_accuracy: 0.9983 - val_loss: 0.0152
Epoch 26/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 39ms/step - accuracy: 1.0000 - loss: 3.6047e-06 - val_accuracy: 0.9983 - val_loss: 0.0155
Epoch 27/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 35ms/step - accuracy: 1.0000 - loss: 3.5686e-06 - val_accuracy: 0.9983 - val_loss: 0.0158
Epoch 28/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 11s 35ms/step - accuracy: 1.0000 - loss: 2.3804e-06 - val_accuracy: 0.9983 - val_loss: 0.0160
Epoch 29/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 39ms/step - accuracy: 1.0000 - loss: 1.0369e-06 - val_accuracy: 0.9983 - val_loss: 0.0164
Epoch 30/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 39ms/step - accuracy: 1.0000 - loss: 9.7236e-07 - val_accuracy: 0.9983 - val_loss: 0.0166
Epoch 31/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 20s 39ms/step - accuracy: 1.0000 - loss: 1.0159e-06 - val_accuracy: 0.9983 - val_loss: 0.0169
Epoch 32/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 11s 37ms/step - accuracy: 1.0000 - loss: 7.0859e-07 - val_accuracy: 0.9983 - val_loss: 0.0172
Epoch 33/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 21s 39ms/step - accuracy: 1.0000 - loss: 5.7727e-07 - val_accuracy: 0.9983 - val_loss: 0.0176
Epoch 34/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 20s 39ms/step - accuracy: 1.0000 - loss: 5.2776e-07 - val_accuracy: 0.9983 - val_loss: 0.0179
Epoch 35/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 19s 34ms/step - accuracy: 1.0000 - loss: 3.8161e-07 - val_accuracy: 0.9983 - val_loss: 0.0179
Epoch 36/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 11s 37ms/step - accuracy: 1.0000 - loss: 5.1309e-07 - val_accuracy: 0.9983 - val_loss: 0.0183
Epoch 37/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 21s 39ms/step - accuracy: 1.0000 - loss: 2.9122e-07 - val_accuracy: 0.9983 - val_loss: 0.0187
Epoch 38/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 11s 36ms/step - accuracy: 1.0000 - loss: 3.4506e-07 - val_accuracy: 0.9983 - val_loss: 0.0190
Epoch 39/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 11s 38ms/step - accuracy: 1.0000 - loss: 2.4684e-07 - val_accuracy: 0.9983 - val_loss: 0.0193
Epoch 40/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 21s 39ms/step - accuracy: 1.0000 - loss: 2.3865e-07 - val_accuracy: 0.9983 - val_loss: 0.0196
Epoch 41/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 39ms/step - accuracy: 1.0000 - loss: 2.4476e-07 - val_accuracy: 0.9983 - val_loss: 0.0199
Epoch 42/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 39ms/step - accuracy: 1.0000 - loss: 1.6218e-07 - val_accuracy: 0.9983 - val_loss: 0.0201
Epoch 43/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 21s 39ms/step - accuracy: 1.0000 - loss: 9.0598e-08 - val_accuracy: 0.9983 - val_loss: 0.0204
Epoch 44/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 20s 38ms/step - accuracy: 1.0000 - loss: 6.1100e-08 - val_accuracy: 0.9983 - val_loss: 0.0208
Epoch 45/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 21s 39ms/step - accuracy: 1.0000 - loss: 1.2490e-07 - val_accuracy: 0.9983 - val_loss: 0.0212
Epoch 46/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 11s 36ms/step - accuracy: 1.0000 - loss: 1.0555e-07 - val_accuracy: 0.9983 - val_loss: 0.0214
Epoch 47/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 20s 34ms/step - accuracy: 1.0000 - loss: 8.2517e-08 - val_accuracy: 0.9983 - val_loss: 0.0218
Epoch 48/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 11s 36ms/step - accuracy: 1.0000 - loss: 4.0571e-08 - val_accuracy: 0.9983 - val_loss: 0.0220
Epoch 49/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 21s 38ms/step - accuracy: 1.0000 - loss: 4.9392e-08 - val_accuracy: 0.9983 - val_loss: 0.0222
Epoch 50/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 19s 34ms/step - accuracy: 1.0000 - loss: 2.6817e-08 - val_accuracy: 0.9983 - val_loss: 0.0225

From the output of the training process, we can see that the training accuracy (accuracy)and validation accuracy (val_accuracy) increase with passing epochs. Also, the loss and the val_loss decrease with subsequent epochs.

At the end of 50 epochs the val_accuracy is more than 0.99. This tells us that the model is likely to perform well on unseen test data.

Let’s test it. To get the predictions of the trained model from the test data, the predict method of the model object is used with the test data as input argument.

Python





# predict
scores = model.predict(X_test)
print(scores)




63/63 ━━━━━━━━━━━━━━━━━━━━ 1s 16ms/step
[[1.7804445e-36]
 [1.0000000e+00]
 [2.9992475e-31]
 ...
 [2.6165179e-24]
 [1.0000000e+00]
 [1.0000000e+00]]

In our case, the scores are an array of numbers, one number for each sample, reprsenting the probability that a sample belonging to class label 1.

From these probabilites, we have to infer how our model has performed on the test data.

We can select a probability cutoff. If the probability score corresponding to any sample is greater than or equal to this cutoff, its predicted label is taken as 1, else the predicted label is taken as 0.

Once we get the predicted labels, we can calculate the confusion matrix. From the confusion matrix we can calculate the sensitivity and specificity of our model.

The cutoff scores is chosen so as to get the best sensitivity and specificity scores. Although, a good model should have high (e.g. 90%) of sensitivity and specificity, the desired best scores vary based on the problem at hand. In some cases having a higher specificity might be essential and vice-a-verca in other cases. For example, if we are building models for fraud detection, having a high sensitivity is desired. Although it would result in some falsely detected frauds, but those can be then manually verified. However, most of the real frauds would be detected.

The code for getting the sensitivity ans specificity with cutoff equal to 0.5, is as follows :

Python





from sklearn import metrics

#confusion matrix
confmat = metrics.confusion_matrix(y_test, scores > 0.5)
disp = metrics.ConfusionMatrixDisplay(confusion_matrix=confmat, display_labels=[0, 1])
disp.plot(cmap=plt.cm.Blues)
plt.title("Confusion Matrix")
plt.show()

# sensitivity and specificity
sens = confmat[1, 1]/(confmat[1, 1]+confmat[1, 0])
spec = confmat[0, 0]/(confmat[0, 0]+confmat[0, 1])
print('Sensitivity: ', sens)
print('Specificity: ', spec)




Sensitivity:  0.996
Specificity:  0.997

We see that the model is performing quite well on this cutoff. Only 7 samples out of 2000 are classified wrongly.

We have seen here how to make a simple neural network using tensorflow and keras for binary classification. We also saw how to test the trained model on test data and evaluate the results.

Once we get a hang of this, we can construct more complex models for different tasks and optimize for model hyperparameters and do a lot of other things.