Tensorflow: Basic deep learning workflow for classification task

In our first project of tensorflow, we used the fashion-mnist data to make a simple deep learning model to identify whether any given image is of a trouser or a bag. We labelled the trouser images as class 0, and images of bag as class 1.

In our neural network, the last output layer was as follows:

Python





# output layer
out = keras.layers.Dense(1, activation='sigmoid')(x)

As discussed earlier, this layer, with activation function sigmoid, outputs one number between 0 and 1 for each sample. This can be interpreted as the predicted probability of a given sample to be of class 1.

A more general way of constructing the same neural network is using a softmax activation function in the output layer and setting the number of neurons to the number of classes present in our data. In our example, we had chosen samples from only two classes, so, we have to set the number of neurons in the output layer as 2. The specific line of code is written as follows:

Python





# output layer
out = keras.layers.Dense(2, activation='softmax')(x)

This will output two numbers for each sample. Each is between 0 and 1. The first number represents the probability that the sample belongs to class 0, and the second number represents the probability that the sample belongs to class 1. The sum of the two numbers is 1.

Learning to construct a deep learning neural network, in this format, has advantage that we can use the same format of network architecture for binary or multi-class classification. For example, if we want to make a classification model for all the 10 classes in the fashion-mnist data, we can just change the number of neurons in the output layer to 10 as below.

Python





# output layer
out = keras.layers.Dense(10, activation='softmax')(x)

So, this method can be considered as a general way of making deep learning models for classification tasks with tensorflow.

However, there are some other things we need to consider for training these kinds of neural networks. We will go through those steps as we repeat solving our binary classification problem with the new method of using softmax activation function on the output layer.

First we will import all the necessary packages.

Python





import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow import keras

from sklearn.model_selection import train_test_split

#%% shuffle
from sklearn.utils import shuffle

We will now load and preprocess the data in the same way as we did in the previous post.

First, we write a function, preprocess and pass the selected data from classes, Trouser (1) and Bag (8). I have just copied the code from the previous post here.

Python





def preprocess(X, y):
    num_samples = X.shape[0]
    num_rows = X.shape[1]
    num_cols = X.shape[2]

    # labels to binary
    y_rv = np.where(y == 1, 0, 1)

    #reshape
    y_rv = y_rv.reshape(num_samples, 1)

    # divide X by 255
    X_rv = X / 255
    # reshape
    X_rv = X_rv.reshape(num_samples, num_rows, num_cols, 1)

    return X_rv, y_rv

# load data
fmnist = keras.datasets.fashion_mnist
(train_data, train_labels), (test_data, test_labels) = fmnist.load_data()

# select two classes
# from train data
train_inds = np.where((train_labels == 1) | (train_labels == 8))
sel_X_train, sel_y_train = train_data[train_inds], train_labels[train_inds]
# from test data
test_inds = np.where((test_labels == 1) | (test_labels == 8))
sel_X_test, sel_y_test = test_data[test_inds], test_labels[test_inds]

# perform preprocessing
X_train, y_train = preprocess(sel_X_train, sel_y_train)
X_test, y_test = preprocess(sel_X_test, sel_y_test)

# shuffle
X_train, y_train = shuffle(X_train, y_train)
X_test, y_test = shuffle(X_test, y_test)




Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
29515/29515 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26421880/26421880 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
5148/5148 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4422102/4422102 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step

Now, we will write the get_model function to construct the model architecture and compile it by defining, the loss function, optimizer, and the metrics. Like before, we will use binary_crossentropy as the loss function, Adam as the optimizer and track the accuracy metric.

Note that we have used the new output layer with two neurons and softmax as the activation function in get_model.

Python





#%% build model
def get_model():
    # input layer
    inp = keras.layers.Input(shape=X_train.shape[1:])

    # first Conv2D and MaxPooling layer
    x = keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu')(inp)
    x = keras.layers.MaxPooling2D((2,2))(x)

    # second Conv2D and MaxPooling layer
    x = keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu')(x)
    x = keras.layers.MaxPooling2D((2,2))(x)

    # third Conv2D and MaxPooling layer
    x = keras.layers.Conv2D(128, kernel_size=(3,3), activation='relu')(x)

    # flatten
    x = keras.layers.Flatten()(x)

    # dense layer
    x = keras.layers.Dense(64, activation='relu')(x)

    # output layer
    out = keras.layers.Dense(2, activation='softmax')(x)

    # construct model
    model = keras.Model(inputs=inp, outputs=out)

    # compile model
    model.compile(loss='binary_crossentropy',
                  optimizer = keras.optimizers.Adam(),
                  metrics=['accuracy'])

    print(model.summary())

    return model

When the model is being trained, the output layer would output an array of shape (1,2), for each sample. This output will be compared with the values of y_train for the calculation of loss. Therefore, our y_train also needs to be of shape (n,2), where n is the number of samples in y_train.

Our y_train is of shape:

Python





print(y_train.shape)




(12000, 1)

OneHot encoding of labels with OneHotEncoder

The y_train has 12000 samples (rows) and 1 column, wherein their respective labels are written. We need to convert it to shape (12000, 2). In this array, for example, if the sample of index 10, belongs to class 1, then the value at row-index 10 and column-index 1 will be equal to 1. The other value will be 0.

This is called as OneHot encoding.

Scikit-learn provides a function to perform the OneHot encoding. The OneHotEncoder gives out a sparse matrix, which needs to be converted to an array using toarray method of the sparse matrix.

We will take two small examples, to see how to use it:

Python





# example to see OneHot encoding
from sklearn.preprocessing import OneHotEncoder

# two class array
array_1 = np.array([1,0, 0, 1, 0]).reshape(-1,1)

# three class array
array_2 = np.array([0, 1, 2, 2, 1, 1, 0, 0]).reshape(-1,1)

# create OneHotEncoder instance
encoder = OneHotEncoder()

# encode array_1
encoded_array_1 = encoder.fit_transform(array_1).toarray()
print(f'\narray_1:\n {array_1}')
print(f'\nOneHot array_1:\n {encoded_array_1}')

# encode array_2
encoded_array_2 = encoder.fit_transform(array_2).toarray()
print(f'\narray_2:\n {array_2}')
print(f'\nOneHot array_2:\n {encoded_array_2}')




array_1:
 [[1]
 [0]
 [0]
 [1]
 [0]]

OneHot array_1:
 [[0. 1.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]]

array_2:
 [[0]
 [1]
 [2]
 [2]
 [1]
 [1]
 [0]
 [0]]

OneHot array_2:
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [1. 0. 0.]]

Now that we have seen how to convert arrays to OneHot encoded form, we will apply this to our training labels, y_train.

Python





from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
# OneHot transform
y_train_one_hot = encoder.fit_transform(y_train).toarray()

Let’s print the first five rows of the original training labels and the OneHot encoded labels.

Python





print('y_train: \n', y_train[:5], '\n\n')
print('y_train_one_hot: \n', y_train_one_hot[:5])




y_train: 
 [[0]
 [1]
 [0]
 [0]
 [1]] 


y_train_one_hot: 
 [[1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]
 [0. 1.]]

We are now ready to train our model. To get the model object, we call the get_model function and use the fit method of model with the training data and labels.

We also define the number of epochs, batch_size, and validation_spit.

Python





# get model
model = get_model()
# train
history = model.fit(X_train, y_train_one_hot,
                        epochs = 50,
                        batch_size = 32,
                        validation_split=0.2)




Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer (InputLayer)        │ (None, 28, 28, 1)      │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d (Conv2D)                 │ (None, 26, 26, 32)     │           320 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d (MaxPooling2D)    │ (None, 13, 13, 32)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_1 (Conv2D)               │ (None, 11, 11, 64)     │        18,496 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ max_pooling2d_1 (MaxPooling2D)  │ (None, 5, 5, 64)       │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2d_2 (Conv2D)               │ (None, 3, 3, 128)      │        73,856 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ flatten (Flatten)               │ (None, 1152)           │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 64)             │        73,792 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 2)              │           130 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 166,594 (650.76 KB)
 Trainable params: 166,594 (650.76 KB)
 Non-trainable params: 0 (0.00 B)
None
Epoch 1/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 33ms/step - accuracy: 0.9729 - loss: 0.1144 - val_accuracy: 0.9992 - val_loss: 0.0046
Epoch 2/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 0.9947 - loss: 0.0156 - val_accuracy: 0.9992 - val_loss: 0.0048
Epoch 3/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 32ms/step - accuracy: 0.9957 - loss: 0.0159 - val_accuracy: 0.9996 - val_loss: 0.0031
Epoch 4/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 9s 29ms/step - accuracy: 0.9975 - loss: 0.0102 - val_accuracy: 0.9979 - val_loss: 0.0064
Epoch 5/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 0.9961 - loss: 0.0085 - val_accuracy: 0.9992 - val_loss: 0.0027
Epoch 6/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 32ms/step - accuracy: 0.9982 - loss: 0.0048 - val_accuracy: 0.9971 - val_loss: 0.0063
Epoch 7/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 9s 31ms/step - accuracy: 0.9980 - loss: 0.0055 - val_accuracy: 0.9979 - val_loss: 0.0035
Epoch 8/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 28ms/step - accuracy: 0.9987 - loss: 0.0040 - val_accuracy: 0.9996 - val_loss: 0.0016
Epoch 9/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 32ms/step - accuracy: 0.9992 - loss: 0.0030 - val_accuracy: 0.9975 - val_loss: 0.0041
Epoch 10/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 32ms/step - accuracy: 0.9988 - loss: 0.0029 - val_accuracy: 0.9996 - val_loss: 0.0021
Epoch 11/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 32ms/step - accuracy: 0.9990 - loss: 0.0032 - val_accuracy: 0.9992 - val_loss: 0.0025
Epoch 12/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 9s 28ms/step - accuracy: 0.9997 - loss: 0.0010 - val_accuracy: 0.9992 - val_loss: 0.0021
Epoch 13/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 11s 31ms/step - accuracy: 1.0000 - loss: 2.8163e-04 - val_accuracy: 0.9996 - val_loss: 0.0018
Epoch 14/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 0.9997 - loss: 0.0014 - val_accuracy: 0.9942 - val_loss: 0.0164
Epoch 15/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 32ms/step - accuracy: 0.9964 - loss: 0.0099 - val_accuracy: 0.9996 - val_loss: 0.0012
Epoch 16/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 32ms/step - accuracy: 0.9987 - loss: 0.0024 - val_accuracy: 0.9996 - val_loss: 0.0024
Epoch 17/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 9s 30ms/step - accuracy: 0.9999 - loss: 2.0541e-04 - val_accuracy: 0.9996 - val_loss: 0.0014
Epoch 18/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 9.6221e-05 - val_accuracy: 1.0000 - val_loss: 9.4788e-04
Epoch 19/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 2.5097e-05 - val_accuracy: 0.9992 - val_loss: 8.9025e-04
Epoch 20/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 1.5368e-05 - val_accuracy: 1.0000 - val_loss: 7.1126e-04
Epoch 21/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 9s 29ms/step - accuracy: 1.0000 - loss: 1.1375e-05 - val_accuracy: 1.0000 - val_loss: 6.5248e-04
Epoch 22/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 8.2308e-06 - val_accuracy: 1.0000 - val_loss: 6.1792e-04
Epoch 23/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 4.6975e-06 - val_accuracy: 1.0000 - val_loss: 5.9025e-04
Epoch 24/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 35ms/step - accuracy: 1.0000 - loss: 5.7720e-06 - val_accuracy: 1.0000 - val_loss: 5.4205e-04
Epoch 25/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 19s 30ms/step - accuracy: 1.0000 - loss: 4.5928e-06 - val_accuracy: 1.0000 - val_loss: 5.3501e-04
Epoch 26/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 4.0486e-06 - val_accuracy: 1.0000 - val_loss: 5.0117e-04
Epoch 27/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 2.5694e-06 - val_accuracy: 1.0000 - val_loss: 4.8131e-04
Epoch 28/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 9s 31ms/step - accuracy: 1.0000 - loss: 2.2886e-06 - val_accuracy: 1.0000 - val_loss: 4.6500e-04
Epoch 29/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 9s 30ms/step - accuracy: 1.0000 - loss: 1.5078e-06 - val_accuracy: 1.0000 - val_loss: 4.4656e-04
Epoch 30/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 1.4132e-06 - val_accuracy: 1.0000 - val_loss: 4.2736e-04
Epoch 31/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 9.8134e-07 - val_accuracy: 1.0000 - val_loss: 4.4054e-04
Epoch 32/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 9s 31ms/step - accuracy: 1.0000 - loss: 6.5820e-07 - val_accuracy: 1.0000 - val_loss: 3.8726e-04
Epoch 33/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 9s 30ms/step - accuracy: 1.0000 - loss: 6.5179e-07 - val_accuracy: 1.0000 - val_loss: 3.7921e-04
Epoch 34/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 8.0048e-07 - val_accuracy: 1.0000 - val_loss: 3.8659e-04
Epoch 35/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 6.4321e-07 - val_accuracy: 1.0000 - val_loss: 3.7006e-04
Epoch 36/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 11s 34ms/step - accuracy: 1.0000 - loss: 4.4455e-07 - val_accuracy: 1.0000 - val_loss: 3.4372e-04
Epoch 37/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 9s 29ms/step - accuracy: 1.0000 - loss: 4.0127e-07 - val_accuracy: 1.0000 - val_loss: 3.3365e-04
Epoch 38/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 2.5293e-07 - val_accuracy: 1.0000 - val_loss: 3.0402e-04
Epoch 39/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 2.6755e-07 - val_accuracy: 1.0000 - val_loss: 3.0674e-04
Epoch 40/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 2.2200e-07 - val_accuracy: 1.0000 - val_loss: 2.9634e-04
Epoch 41/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 9s 29ms/step - accuracy: 1.0000 - loss: 2.4601e-07 - val_accuracy: 1.0000 - val_loss: 2.9610e-04
Epoch 42/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 1.9997e-07 - val_accuracy: 1.0000 - val_loss: 2.7435e-04
Epoch 43/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 2.1821e-07 - val_accuracy: 1.0000 - val_loss: 2.7438e-04
Epoch 44/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 1.6378e-07 - val_accuracy: 1.0000 - val_loss: 2.5651e-04
Epoch 45/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 9s 29ms/step - accuracy: 1.0000 - loss: 1.2592e-07 - val_accuracy: 1.0000 - val_loss: 2.3906e-04
Epoch 46/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 8.1358e-08 - val_accuracy: 1.0000 - val_loss: 2.2761e-04
Epoch 47/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 34ms/step - accuracy: 1.0000 - loss: 8.6346e-08 - val_accuracy: 1.0000 - val_loss: 2.2342e-04
Epoch 48/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 5.4994e-08 - val_accuracy: 1.0000 - val_loss: 1.9421e-04
Epoch 49/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 9s 29ms/step - accuracy: 1.0000 - loss: 5.0482e-08 - val_accuracy: 1.0000 - val_loss: 1.8151e-04
Epoch 50/50
300/300 ━━━━━━━━━━━━━━━━━━━━ 10s 33ms/step - accuracy: 1.0000 - loss: 3.9493e-08 - val_accuracy: 1.0000 - val_loss: 1.6079e-04

We can see that the training loss, as well as the validation loss decreases over the epochs.

Prediction

We will now see how the model performs on the test data. The `predict` method of the `model` takes the test data and outputs the predicted labels.

However, the model in this example, is constructed such that it outputs the predicted labels in OneHot encoded form.

As mentioned previously for training labels, each row of the predicted labels represent one sample. The first number in a row represents the probability that the sample belongs to class 0. The second number represents the probability that the sample belongs to class 1.

Python





# predict
scores = model.predict(X_test)

# print first 5 scores
print(scores[:5])




63/63 ━━━━━━━━━━━━━━━━━━━━ 1s 9ms/step
[[6.45815274e-33 9.99999940e-01]
 [0.00000000e+00 9.99999940e-01]
 [1.07938124e-10 9.99999940e-01]
 [0.00000000e+00 9.99999940e-01]
 [9.99999940e-01 0.00000000e+00]]

To evaluate the performance of the model, we will convert the predicted scores to the same shape as y_test.

Python





print(y_test.shape)




(2000, 1)

The y_test has shape (2000, 1), i.e., 2000 rows and 1 column.

For getting the predicted labels, we use the argmax method of NumPy. For each row of the predicted scores, it will return the value of column index with maximum value in that row.

For example, if in the 10th row, the maximum value is in the column with index 0 (first column), then 10th value of the result of argmax will be 0.

The result of argmax is one dimensional, i.e., it’s shape is like (n,), where n is equal to the number of samples. We need it to reshape it in the form (n,1).

Python





# argmax
y_pred = np.argmax(scores, axis=1)

# reshape to form (n,1)
y_pred = y_pred.reshape(-1,1)
y_pred.shape




(2000, 1)

Now we can calculate the confusion matrix, sensitivity and specificity of the predictions, using the same code we used previously.

Python





from sklearn import metrics

#confusion matrix
confmat = metrics.confusion_matrix(y_test, y_pred)
disp = metrics.ConfusionMatrixDisplay(confusion_matrix=confmat, display_labels=[0, 1])
disp.plot(cmap=plt.cm.Blues)
plt.title("Confusion Matrix")
plt.show()

# sensitivity and specificity
sens = confmat[1, 1]/(confmat[1, 1]+confmat[1, 0])
spec = confmat[0, 0]/(confmat[0, 0]+confmat[0, 1])
print('Sensitivity', sens)
print('Specificity: ', spec)




Sensitivity 0.998
Specificity:  0.998

Discussion

The advantage of this method is that this format of model output can be used for multiclass classification. We only need to change the loss function from binary_crossentropy to categorical_crossentropy.

As we are using the argmax method to get the predicted labels, the probability cutoff scores we used in previous model with sigmoid activation function in the output layer are not applicable here.

This can be useful in cases where we want a clear winner for a predicted class in binary classification problem.

The sigmoid activation function in the output layer can be applied only for binary classification. With sigmoid output layer, the advantage is we can finetune our desired sensitivity and specificity for our objective.