Matrices: The Grid That Holds Your Entire Dataset

Open any spreadsheet you have ever worked with.

Rows of data. Columns of features. Every cell a number.

That spreadsheet is a matrix.

Not metaphorically. Not approximately. When you load that spreadsheet into Python for machine learning, it becomes a NumPy matrix, number for number, row for row. The thing you have been looking at in Excel your whole life is the exact data structure that powers AI.

From Vector to Matrix

A vector is one row of numbers.

import numpy as np

student = np.array([85, 92, 78, 88])   # four exam scores

A matrix is multiple rows stacked together.

students = np.array([
    [85, 92, 78, 88],   # student 1
    [91, 76, 83, 95],   # student 2
    [67, 88, 72, 79],   # student 3
    [94, 85, 90, 91]    # student 4
])

print(students)
print(students.shape)

Output:

[[85 92 78 88]
 [91 76 83 95]
 [67 88 72 79]
 [94 85 90 91]]

(4, 4)

Shape (4, 4) means 4 rows and 4 columns. Four students. Four exams. Every data point in its place.

This is a dataset. This is how your machine learning model sees your data before it ever touches an algorithm.

Shape Is Everything

In AI code, the first thing you check about any matrix is its shape.

data = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

print(data.shape)    # (2, 3)
print(data.ndim)     # 2 dimensions
print(data.size)     # 6 total elements

Output:

(2, 3)
2
6

The convention is always (rows, columns). A shape of (1000, 28) means 1000 rows of data with 28 features each. In deep learning you will check .shape constantly. Wrong shapes cause most errors. Get comfortable reading them.

Accessing Elements

Getting a single element needs both row and column index.

students = np.array([
    [85, 92, 78, 88],
    [91, 76, 83, 95],
    [67, 88, 72, 79],
    [94, 85, 90, 91]
])

print(students[0, 0])    # first row, first column: 85
print(students[2, 1])    # third row, second column: 88
print(students[1, :])    # entire second row
print(students[:, 2])    # entire third column

Output:

85
88
[91 76 83 95]
[78 83 72 90]

students[1, :] means: row 1, all columns. The : means everything.

students[:, 2] means: all rows, column 2. That is every student's score on the third exam.

This slicing syntax is how you pull out specific features or specific records from your entire dataset. You will do this hundreds of times.

Creating Matrices You Actually Need

Three matrix creation patterns come up constantly in AI.

zeros = np.zeros((3, 4))
print(zeros)

Output:

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

Start with a matrix of zeros and fill it in. Used for initializing weight matrices in neural networks.

identity = np.eye(4)
print(identity)

Output:

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Ones on the diagonal, zeros everywhere else. The identity matrix. Multiplying any matrix by the identity gives you the original matrix back. The matrix equivalent of multiplying by 1. Comes up constantly in linear algebra operations.

random = np.random.randn(3, 4)
print(random)

Output:

[[ 0.4832 -1.2341  0.7823  0.1029]
 [-0.5541  1.3021 -0.2109  0.8834]
 [ 0.2341 -0.9821  0.4521 -1.1023]]

Random values from a normal distribution. Used to initialize neural network weights. Why random? Because if all weights started at the same value, every neuron in a layer would learn the same thing. Random initialization breaks that symmetry.

Transpose: Flipping Rows and Columns

Transposing a matrix turns its rows into columns and its columns into rows.

a = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

print("Original shape:", a.shape)
print(a)

print("\nTransposed shape:", a.T.shape)
print(a.T)

Output:

Original shape: (2, 3)
[[1 2 3]
 [4 5 6]]

Transposed shape: (3, 2)
[[1 4]
 [2 5]
 [3 6]]

A (2, 3) matrix becomes a (3, 2) matrix. Row 1 [1, 2, 3] became column 1. Row 2 [4, 5, 6] became column 2.

Transpose shows up constantly in neural network math. When shapes do not align for matrix multiplication, transposing one of them often fixes it. You will develop an instinct for when to use .T very quickly.

Math Operations on Matrices

Element-wise operations work exactly like vectors.

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

print(a + b)
print(a * b)
print(a * 2)

Output:

[[ 6  8]
 [10 12]]

[[ 5 12]
 [21 32]]

[[2 4]
 [6 8]]

Each element matched with its counterpart. Added, multiplied, scaled. No loops. NumPy handles all 10 million elements simultaneously if needed.

Reshaping: Same Data, Different Shape

One of the most common operations in deep learning is reshaping. Same numbers, different arrangement.

flat = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

matrix = flat.reshape(3, 4)
print(matrix)
print(matrix.shape)

Output:

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

(3, 4)

12 numbers arranged as a single row became 3 rows of 4. Total elements unchanged. Shape changed.

Use -1 when you want NumPy to figure out one dimension automatically.

matrix = flat.reshape(-1, 4)    # "however many rows needed, 4 columns"
print(matrix.shape)             # (3, 4)

matrix = flat.reshape(2, -1)    # "2 rows, however many columns needed"
print(matrix.shape)             # (2, 6)

Images use this constantly. A 28x28 pixel image stored as a flat vector of 784 numbers needs to be reshaped to (28, 28) for convolutional networks or kept flat for fully connected ones. Reshape is the tool.

Aggregations Across Rows and Columns

Getting summaries across different axes.

scores = np.array([
    [85, 92, 78],    # student 1
    [91, 76, 83],    # student 2
    [67, 88, 72]     # student 3
])

print(np.mean(scores))          # mean of all values
print(np.mean(scores, axis=0))  # mean of each column (each exam)
print(np.mean(scores, axis=1))  # mean of each row (each student)

Output:

81.0
[81.         85.33333333 77.66666667]
[85.         83.33333333 75.66666667]

axis=0 collapses the rows, giving you one value per column.
axis=1 collapses the columns, giving you one value per row.

This axis concept trips people up. Think of it as: axis=0 moves down the rows. axis=1 moves across the columns. In neural networks you will use axis operations to normalize data, calculate losses, and aggregate predictions.

A Real Dataset as a Matrix

Putting it all together with something concrete.

import numpy as np

data = np.array([
    [23, 50000, 1, 0],
    [35, 80000, 0, 1],
    [28, 62000, 1, 1],
    [45, 95000, 0, 1],
    [31, 71000, 1, 0]
])

print(f"Dataset shape: {data.shape}")
print(f"Number of samples: {data.shape[0]}")
print(f"Number of features: {data.shape[1]}")

ages   = data[:, 0]
income = data[:, 1]

print(f"\nAge range: {ages.min()} to {ages.max()}")
print(f"Average income: {income.mean():.0f}")
print(f"\nFirst 3 records:\n{data[:3]}")

Output:

Dataset shape: (5, 4)
Number of samples: 5
Number of features: 4

Age range: 23 to 45
Average income: 71600

First 3 records:
[[23 50000      1      0]
 [35 80000      0      1]
 [28 62000      1      1]]

Five people. Four features each. Select any column as a feature vector. Select any row as a data point. Compute statistics across the whole thing. This is exactly what happens when you load a CSV and pass it to a machine learning model.

Try This

Create matrices_practice.py.

Build a matrix representing a small image. Create a (5, 5) NumPy array filled with values between 0 and 255 representing pixel brightness. Use np.random.randint(0, 256, size=(5, 5)).

Then do all of this:

Print the shape and total number of pixels.

Get the pixel value at row 2, column 3.

Get the entire middle row (row 2).

Calculate the average brightness of the whole image.

Calculate the average brightness of each row.

Flatten it to a 1D vector using .reshape(-1) and print its shape.

Transpose it and print the result.

Normalize the pixel values to be between 0 and 1 by dividing the entire matrix by 255. This is called normalization and you will do it to every image before feeding it to a neural network.

What's Next

You know how to organize data into matrices. Now the question is what do you actually do with them.

The next post is the dot product. A specific way of multiplying vectors together that produces one number representing similarity. It is the single most important operation in all of deep learning and you are two posts away from fully understanding it.