Single Neuron and Loss

The Perceptron as a Building Block

The perceptron is one of the earliest neural models for binary classification. It takes an input vector \(x\), combines it linearly with weights \(w\), adds a bias \(b\), then applies an activation function:

\[ z = w^T x + b \]\[ a = g(z) \]

This simple structure is the building block of much larger networks.

Perceptron Versus Logistic Regression

A useful bridge from classical machine learning is logistic regression. In a neural-network view, logistic regression can be seen as a single neuron whose activation is the sigmoid function.

The sigmoid maps any real number into a probability-like value:

\[ \sigma(z) = \frac{1}{1 + e^{-z}} \]

The related logit transform goes in the other direction:

\[ \mathrm{logit}(p) = \log\left(\frac{p}{1-p}\right) \]

This is a helpful intuition: many neural-network ideas start by extending what we already know from logistic regression.

Loss, Cost, and Cross-Entropy

To train a model, we need a way to measure how wrong it is.

A loss measures the error for one example.
A cost aggregates that error over the dataset.

For classification, one of the most important choices is cross-entropy, which matches the probabilistic interpretation of logistic outputs.

For binary targets \(y \in \{0,1\}\), the average logistic cost is:

\[ J(w,b) = -\frac{1}{m}\sum_{i=1}^{m} \left[ y^{(i)} \log a^{(i)} + (1-y^{(i)}) \log(1-a^{(i)}) \right] \]

This cost is equivalent to the negative log-likelihood under a Bernoulli model.

Gradient Descent

Once the cost is defined, we minimize it by updating parameters in the opposite direction of the gradient:

\[ w \leftarrow w - \alpha \frac{\partial J}{\partial w}, \qquad b \leftarrow b - \alpha \frac{\partial J}{\partial b} \]

where \(\alpha\) is the learning rate.

The learning rate controls the step size:

too small, and learning is slow,
too large, and training can become unstable.

Summary

In this lesson we covered:

The perceptron as the simplest neural building block
The linear combination \(z = w^T x + b\)
The sigmoid neuron as a neural view of logistic regression
The difference between loss and cost
Cross-entropy for binary classification
Gradient descent as the main optimization mechanism

Next: We will connect many neurons together and see how forward propagation and backpropagation make learning possible.

Introduction & Background

Simple Linear Regression

Inference & Diagnostic

Multiple Regression and Feature Engineering

Model Selection and Regularization

Generalized Linear Models (GLM) and Logistic Regression

Mathematical Annexes

Introduction and Partitioning

Splitting Criteria and Best Split

Growth Control and Pruning

Foundations of Model Evaluation

Metrics for Regression and Classification

Cross-Validation Strategies

Feature Selection and Preprocessing

Hyperparameter Tuning and Early Stopping

Imbalanced Data and Threshold Selection

Introduction and Hierarchical Clustering

K-means and K-medoids

Gaussian Mixtures and the EM Algorithm

Density-Based Clustering and Practical Guidance