This is a lightweight, professional demo focused on clarity rather than flash. It loads your trained weights and performs inference entirely in the browser. No external ML libraries, no tracking, and no unnecessary visual noise.
The model achieves approximately 90% accuracy on the MNIST test dataset. Note that real-world performance may vary due to factors such as drawing style and brush size settings.
data = pd.read_csv("mnist_train_small.csv")
data = np.array(data)
np.random.shuffle(data)
train_data = data[0:int(0.8*m), :] # 80% for training
val_data = data[int(0.8*m):m, :] # 20% for validation
X_train = train_data[:, 1:].T / 255.0 # Normalize pixel values
Y_train = train_data[:, 0] # Extract labels
The code above loads the MNIST dataset, splits it into training and validation sets, and normalizes the pixel values to be between 0 and 1.
def initialize_parameters():
W1 = np.random.randn(10, 784) * 0.01
b1 = np.zeros((10, 1))
W2 = np.random.randn(10, 10) * 0.01
b2 = np.zeros((10, 1))
return W1, b1, W2, b2
This initializes a 2-layer neural network with 784 input neurons (28x28 image), 10 hidden neurons, and 10 output neurons (one for each digit 0-9).
def ReLU(Z):
return np.maximum(0, Z)
def softmax(Z):
A = np.exp(Z)/np.sum(np.exp(Z), axis=0)
return A
ReLU (Rectified Linear Unit) is used as the hidden layer activation function, while Softmax is used in the output layer to get probability distributions.
def forward_propagation(W1, b1, W2, b2, X):
Z1 = np.dot(W1, X) + b1
A1 = ReLU(Z1)
Z2 = np.dot(W2, A1) + b2
A2 = softmax(Z2)
return Z1, A1, Z2, A2
Forward propagation computes the network's predictions by applying weights and activation functions in sequence.
def backward_propagation(W1, b1, W2, b2, Z1, A1, Z2, A2, X, Y):
one_hot_Y = one_hot_converter(Y)
dZ2 = A2 - one_hot_Y
dW2 = 1/m * np.dot(dZ2, A1.T)
dB2 = 1/m * np.sum(dZ2)
dZ1 = np.dot(W2.T, dZ2) * ReLU_derivative(Z1)
dW1 = 1/m * np.dot(dZ1, X.T)
dB1 = 1/m * np.sum(dZ1)
return dW1, dB1, dW2, dB2
Backward propagation computes gradients for all parameters using the chain rule of calculus, enabling the network to learn from its mistakes.
def gradient_descent(X, Y, alpha, iterations):
W1, b1, W2, b2 = initialize_parameters()
for i in range(1, iterations + 1):
Z1, A1, Z2, A2 = forward_propagation(W1, b1, W2, b2, X)
dW1, db1, dW2, db2 = backward_propagation(W1, b1, W2, b2, Z1, A1, Z2, A2, X, Y)
W1, b1, W2, b2 = update_parameters(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha)
return W1, b1, W2, b2
The training process uses gradient descent to iteratively update the network's parameters, minimizing the prediction error over multiple epochs.
def get_predictions(A2):
return np.argmax(A2, 0)
def get_accuracy(predictions, Y):
return np.sum(predictions == Y)/Y.size
These functions help evaluate the model's performance by converting output probabilities into predictions and calculating accuracy.