Introduction to Linear Regression Using TensorFlow
A beginner-friendly guide to building linear regression with TensorFlow and Keras — from the math to a working model that learns y = 2x - 1 in minutes.
August 11, 2021 · 3 min read · By Kshitiz Regmi
Linear regression is the foundation of machine learning. Before exploring complex neural architectures, understanding how to model a linear relationship using gradient descent is essential — and TensorFlow's Keras API makes it elegantly simple.
The Math
The basic linear regression model:
Where:
- = input (independent variable)
- = output (dependent variable)
- = slope (weight)
- = intercept (bias)
The goal: find and that minimize the error between predictions and true values — the mean squared error:
Why TensorFlow for Linear Regression?
A linear regressor is simply a 1-neuron neural network with no activation function. Using TensorFlow:
- The training loop, backpropagation, and weight updates are handled automatically
- The exact same code structure scales to deep networks
- You learn the TensorFlow/Keras API on the simplest possible problem
Dataset
We'll use a tiny dataset with a hidden pattern:
import numpy as np
X = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
y = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)
# Verify the pattern
for xi, yi in zip(X, y):
print(f"x={xi}, y={yi}, 2x-1={2*xi - 1}")
# x=-1.0, y=-3.0, 2x-1=-3.0 ✓
# x=0.0, y=-1.0, 2x-1=-1.0 ✓
# ...
Building the Model
A Dense layer with 1 unit and no activation is mathematically identical to linear regression:
import tensorflow as tf
from tensorflow import keras
model = keras.Sequential([
keras.layers.Dense(units=1, input_shape=[1])
])
model.compile(
optimizer='sgd', # Stochastic Gradient Descent
loss='mean_squared_error'
)
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 1) 2
=================================================================
Total params: 2
Trainable params: 2 (1 weight = slope m, 1 bias = intercept c)
Just 2 parameters: the weight (slope ) and the bias (intercept ).
Training
history = model.fit(X, y, epochs=500, verbose=0)
print(f"Final MSE loss: {history.history['loss'][-1]:.8f}")
# Final MSE loss: 0.00000124
During training, SGD iteratively adjusts and to minimize MSE:
- Forward pass: compute
- Compute loss:
- Backward pass: compute gradients ,
- Update: , repeat
Making Predictions
# Should predict ≈ 2*10 - 1 = 19
pred = model.predict([10.0], verbose=0)
print(f"Prediction for x=10: {pred[0][0]:.4f}")
# Prediction for x=10: 18.9998
The model learned from just 6 data points!
Inspecting Learned Weights
weights, bias = model.layers[0].get_weights()
print(f"Learned slope (m): {weights[0][0]:.4f}") # ≈ 2.0
print(f"Learned bias (c): {bias[0]:.4f}") # ≈ -1.0
The model recovered the exact underlying pattern.
Visualizing the Fit
import matplotlib.pyplot as plt
x_range = np.linspace(-2, 6, 100)
y_pred_line = model.predict(x_range, verbose=0).flatten()
plt.scatter(X, y, color='blue', s=80, zorder=5, label='Training data')
plt.plot(x_range, y_pred_line, color='red', linewidth=2, label='Learned fit')
plt.plot(x_range, 2*x_range - 1, color='green', linestyle='--', label='True: y=2x-1')
plt.legend()
plt.title("Linear Regression with TensorFlow")
plt.xlabel("x")
plt.ylabel("y")
plt.show()
The learned fit (red) overlaps almost exactly with the true function (green).
Loss Curve
plt.figure(figsize=(8, 4))
plt.plot(history.history['loss'])
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.title('Training Loss — Convergence')
plt.yscale('log')
plt.show()
Loss drops sharply in the first ~50 epochs and plateaus near zero — classic gradient descent convergence on a convex loss surface.
From Regression to Deep Learning
This 2-parameter model is the simplest possible neural network. The Keras API scales identically to much larger models:
| Model | Parameters | Description |
|---|---|---|
| This tutorial | 2 | Linear regression: 1 Dense(1) |
| MNIST classifier | 535K | Dense(512) → Dense(256) → Dense(10) |
| ResNet-50 | 25M | Deep CNN for ImageNet |
| GPT-2 | 117M–1.5B | Transformer language model |
The training loop (forward pass → loss → backward pass → weight update) is identical across all of them. TensorFlow and Keras handle the mechanics — you focus on architecture and data.
Next Steps
- Multiple features: add more
Denseunits in the input layer - Non-linearity: add ReLU activation to approximate any function
- Regression on real data: try the Boston Housing or California Housing datasets
- Feature scaling: normalize inputs with
StandardScalerfor faster convergence