FL Basics

Before we discuss Federated Learning (FL) in detail, let’s recall how classic centralized machine learning (ML) works.

Classic Machine Learning

The classic centralized ML model training process can be represented as follows: Initially, each client has its data (D). The centralized server holds an untrained (gray) ML model (M).
M
D
Client 1
Client N
...
Server
D
M
D
Client 1
Client N
...
Server
D



M
D
D
D
D
Client 1
Client N
...
Server
D
D
...



M
D
D
D
D
Client 1
Client N
...
Server
D
D
...
Data is required to enable model training, which clients must send to the server.
After/during training, the client's data remains on the server and is exposed to potential exploitation and privacy breaches.

( The pink/purple model color symbolizes different data sources used during training. )



M
D
Client 1
Client N
...
Server
D
D
...
D
D
D
D
D



M
D
Client 1
Client N
...
Server
D
D
...
D
D
D
D
D

Federated Learning Overview

In FL, the server/aggregator coordinates the FL processes but does not train the model. Each client/learner does the training locally, which requires setting up an environment to handle training. All FL components must know and possess the local ML model, which is initially untrained.
M
Learner 1
...
Aggregator
Learner N
D
M
D
D
M
D
M
Learner 1
...
Aggregator
Learner N
D
M
D
D
M
D
M
Learner 1
...
Aggregator
Learner N
D
M
D
D
M
D
M
Learner 1
...
Aggregator
Learner N
D
M
D
D
M
D
The aggregator starts the first FL training cycle. Each selected learner starts training its model copy exclusively using local data.

Naming Conventions

In FL, the server is frequently called an aggregator, and clients are called learners. Using the terms server and client in FL is still common. We use various components, including non-FL servers and clients. Thus, we prefer to highlight FL components using aggregators and learners.

After the local training concludes, each learner possesses a uniquely trained local model.
M
Learner 1
...
Aggregator
Learner N
D
M
D
D
M
D
M
Learner 1
...
Aggregator
Learner N
D
M
D
D
M
D

Did you know?

Most ML models (especially DNNs) consist of two parts.

The first part is (usually) a static lightweight model architecture. It includes layer specification, training configuration, and hyperparameters like learning step sizes, loss, and activation functions. Model architecture is typically static and lightweight in classic ML/FL.  

The second dynamic part is model weights and biases that get changed/optimized during training. They allow the model to fulfill its intended use, such as prediction, inference, or generation tasks. These weights and biases significantly contribute to a trained model’s overall size (space utilization).

M
Learner 1
...
Aggregator
Learner N
D
M
D
D
M
D
P P
...
M
Learner 1
...
Aggregator
Learner N
D
M
D
D
M
D
P P
...
FL components can transmit and share weights and biases instead of the entire trained model. We call model relevant data sent between the learners and aggregators (model) parameters and depict it with (P).

The learners extract their model parameters and sent them to the aggregator. The aggregator now has access to these parameters but not the sensitive data used to train them. That is how FL can profit from sensitive data while maintaining its privacy.
The server aggregates these collected parameters into new global parameters and applies them to its model instance. In classic FL aggregation, the mean of the parameters is used for the global model. The result is a global model that was trained for one FL cycle.
Learner 1
...
Aggregator
Learner N
D
M
D
D
M
D
M
P
Learner 1
...
Aggregator
Learner N
D
M
D
D
M
D
M
P
Learner 1
...
Aggregator
Learner N
D
M
D
D
M
D
M
P
Learner 1
...
Aggregator
Learner N
D
M
D
D
M
D
M
P
The aggregator sends its global parameters back to the learners. The learners apply these parameters to their local model instance to make it identical to the aggregator’s global model.

The FL training loop could terminate, and the learners or servers could use their global model copy for inference. Otherwise, another FL training cycle begins. There can be arbitrarily many FL cycles, similar to conventional training rounds in classic ML. FL training eventually terminates due to time/resource constraints or a failure to reach a satisfying performance.