Handwritten Digit Classifier

About this demo

This is a lightweight, professional demo focused on clarity rather than flash. It loads your trained weights and performs inference entirely in the browser. No external ML libraries, no tracking, and no unnecessary visual noise.

The model achieves approximately 90% accuracy on the MNIST test dataset. Note that real-world performance may vary due to factors such as drawing style and brush size settings.

Neural Network Implementation

Data Preprocessing

data = pd.read_csv("mnist_train_small.csv")
data = np.array(data)
np.random.shuffle(data)

train_data = data[0:int(0.8*m), :]  # 80% for training
val_data = data[int(0.8*m):m, :]    # 20% for validation

X_train = train_data[:, 1:].T / 255.0  # Normalize pixel values
Y_train = train_data[:, 0]             # Extract labels

The code above loads the MNIST dataset, splits it into training and validation sets, and normalizes the pixel values to be between 0 and 1.

Neural Network Architecture

def initialize_parameters():
    W1 = np.random.randn(10, 784) * 0.01
    b1 = np.zeros((10, 1))
    W2 = np.random.randn(10, 10) * 0.01
    b2 = np.zeros((10, 1))
    return W1, b1, W2, b2

This initializes a 2-layer neural network with 784 input neurons (28x28 image), 10 hidden neurons, and 10 output neurons (one for each digit 0-9).

Activation Functions

def ReLU(Z):
    return np.maximum(0, Z)

def softmax(Z):
    A = np.exp(Z)/np.sum(np.exp(Z), axis=0)
    return A

ReLU (Rectified Linear Unit) is used as the hidden layer activation function, while Softmax is used in the output layer to get probability distributions.

Forward Propagation

def forward_propagation(W1, b1, W2, b2, X):
    Z1 = np.dot(W1, X) + b1
    A1 = ReLU(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = softmax(Z2)
    return Z1, A1, Z2, A2

Forward propagation computes the network's predictions by applying weights and activation functions in sequence.

Backward Propagation

def backward_propagation(W1, b1, W2, b2, Z1, A1, Z2, A2, X, Y):
    one_hot_Y = one_hot_converter(Y)
    dZ2 = A2 - one_hot_Y
    dW2 = 1/m * np.dot(dZ2, A1.T)
    dB2 = 1/m * np.sum(dZ2)
    dZ1 = np.dot(W2.T, dZ2) * ReLU_derivative(Z1)
    dW1 = 1/m * np.dot(dZ1, X.T)
    dB1 = 1/m * np.sum(dZ1)
    return dW1, dB1, dW2, dB2

Backward propagation computes gradients for all parameters using the chain rule of calculus, enabling the network to learn from its mistakes.

Training Process

def gradient_descent(X, Y, alpha, iterations):
    W1, b1, W2, b2 = initialize_parameters()
    
    for i in range(1, iterations + 1):
        Z1, A1, Z2, A2 = forward_propagation(W1, b1, W2, b2, X)
        dW1, db1, dW2, db2 = backward_propagation(W1, b1, W2, b2, Z1, A1, Z2, A2, X, Y)
        W1, b1, W2, b2 = update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha)
    
    return W1, b1, W2, b2

The training process uses gradient descent to iteratively update the network's parameters, minimizing the prediction error over multiple epochs.

Model Evaluation

def get_predictions(A2):
    return np.argmax(A2, 0)

def get_accuracy(predictions, Y):
    return np.sum(predictions == Y)/Y.size

These functions help evaluate the model's performance by converting output probabilities into predictions and calculating accuracy.