Introduction to Linear Regression Using TensorFlow

A beginner-friendly guide to building linear regression with TensorFlow and Keras — from the math to a working model that learns y = 2x - 1 in minutes.

August 11, 2021 · 3 min read · By Kshitiz Regmi

Linear regression is the foundation of machine learning. Before exploring complex neural architectures, understanding how to model a linear relationship using gradient descent is essential — and TensorFlow's Keras API makes it elegantly simple.

The Math

The basic linear regression model:

$y_i = m \cdot x_i + c$

Where:

$x_i$ = input (independent variable)
$y_i$ = output (dependent variable)
$m$ = slope (weight)
$c$ = intercept (bias)

The goal: find $m$ and $c$ that minimize the error between predictions and true values — the mean squared error:

$\text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2$

Why TensorFlow for Linear Regression?

A linear regressor is simply a 1-neuron neural network with no activation function. Using TensorFlow:

The training loop, backpropagation, and weight updates are handled automatically
The exact same code structure scales to deep networks
You learn the TensorFlow/Keras API on the simplest possible problem

Dataset

We'll use a tiny dataset with a hidden pattern: $y = 2x - 1$

import numpy as np

X = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
y = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)

# Verify the pattern
for xi, yi in zip(X, y):
    print(f"x={xi}, y={yi}, 2x-1={2*xi - 1}")
# x=-1.0, y=-3.0, 2x-1=-3.0 ✓
# x=0.0,  y=-1.0, 2x-1=-1.0 ✓
# ...

Building the Model

A Dense layer with 1 unit and no activation is mathematically identical to linear regression:

import tensorflow as tf
from tensorflow import keras

model = keras.Sequential([
    keras.layers.Dense(units=1, input_shape=[1])
])

model.compile(
    optimizer='sgd',            # Stochastic Gradient Descent
    loss='mean_squared_error'
)

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)      Output Shape     Param #
=================================================================
 dense (Dense)     (None, 1)        2
=================================================================
Total params: 2
Trainable params: 2 (1 weight = slope m, 1 bias = intercept c)

Just 2 parameters: the weight (slope $m$ ) and the bias (intercept $c$ ).

Training

history = model.fit(X, y, epochs=500, verbose=0)
print(f"Final MSE loss: {history.history['loss'][-1]:.8f}")
# Final MSE loss: 0.00000124

During training, SGD iteratively adjusts $m$ and $c$ to minimize MSE:

Forward pass: compute $\hat{y} = m \cdot x + c$
Compute loss: $\text{MSE} = \text{mean}((y - \hat{y})^2)$
Backward pass: compute gradients $\frac{\partial L}{\partial m}$ , $\frac{\partial L}{\partial c}$
Update: $m \leftarrow m - \text{lr} \cdot \frac{\partial L}{\partial m}$ , repeat

Making Predictions

# Should predict ≈ 2*10 - 1 = 19
pred = model.predict([10.0], verbose=0)
print(f"Prediction for x=10: {pred[0][0]:.4f}")
# Prediction for x=10: 18.9998

The model learned $y = 2x - 1$ from just 6 data points!

Inspecting Learned Weights

weights, bias = model.layers[0].get_weights()
print(f"Learned slope (m): {weights[0][0]:.4f}")  # ≈ 2.0
print(f"Learned bias  (c): {bias[0]:.4f}")         # ≈ -1.0

The model recovered the exact underlying pattern.

Visualizing the Fit

import matplotlib.pyplot as plt

x_range = np.linspace(-2, 6, 100)
y_pred_line = model.predict(x_range, verbose=0).flatten()

plt.scatter(X, y, color='blue', s=80, zorder=5, label='Training data')
plt.plot(x_range, y_pred_line, color='red', linewidth=2, label='Learned fit')
plt.plot(x_range, 2*x_range - 1, color='green', linestyle='--', label='True: y=2x-1')
plt.legend()
plt.title("Linear Regression with TensorFlow")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

The learned fit (red) overlaps almost exactly with the true function (green).

Loss Curve

plt.figure(figsize=(8, 4))
plt.plot(history.history['loss'])
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.title('Training Loss — Convergence')
plt.yscale('log')
plt.show()

Loss drops sharply in the first ~50 epochs and plateaus near zero — classic gradient descent convergence on a convex loss surface.

From Regression to Deep Learning

This 2-parameter model is the simplest possible neural network. The Keras API scales identically to much larger models:

Model	Parameters	Description
This tutorial	2	Linear regression: 1 Dense(1)
MNIST classifier	535K	Dense(512) → Dense(256) → Dense(10)
ResNet-50	25M	Deep CNN for ImageNet
GPT-2	117M–1.5B	Transformer language model

The training loop (forward pass → loss → backward pass → weight update) is identical across all of them. TensorFlow and Keras handle the mechanics — you focus on architecture and data.

Next Steps

Multiple features: add more Dense units in the input layer
Non-linearity: add ReLU activation to approximate any function
Regression on real data: try the Boston Housing or California Housing datasets
Feature scaling: normalize inputs with StandardScaler for faster convergence