FL Basics
Before we discuss Federated Learning (FL) in detail, let’s recall how classic centralized machine learning (ML) works.
Classic Machine Learning
( The pink/purple model color symbolizes different data sources used during training. )
Federated Learning Overview
Naming Conventions
In FL, the server is frequently called an aggregator, and clients are called learners. Using the terms server and client in FL is still common. We use various components, including non-FL servers and clients. Thus, we prefer to highlight FL components using aggregators and learners.
Did you know?
Most ML models (especially DNNs) consist of two parts.
The first part is (usually) a static lightweight model architecture. It includes layer specification, training configuration, and hyperparameters like learning step sizes, loss, and activation functions. Model architecture is typically static and lightweight in classic ML/FL.
The second dynamic part is model weights and biases that get changed/optimized during training. They allow the model to fulfill its intended use, such as prediction, inference, or generation tasks. These weights and biases significantly contribute to a trained model’s overall size (space utilization).
The learners extract their model parameters and sent them to the aggregator. The aggregator now has access to these parameters but not the sensitive data used to train them. That is how FL can profit from sensitive data while maintaining its privacy.
The FL training loop could terminate, and the learners or servers could use their global model copy for inference. Otherwise, another FL training cycle begins. There can be arbitrarily many FL cycles, similar to conventional training rounds in classic ML. FL training eventually terminates due to time/resource constraints or a failure to reach a satisfying performance.