Stage 3: FL-Actors Image Build

In stage 3, the deployed image-builder service builds the requested images for the learner and aggregator services.

Critical

Building (multi-platform) images for ML/FL dynamically based on flexible user-provided repositories is a delicate and error-prone process.

Building images can take up a significant part of the entire project duration - especially if the training configuration is lightweight (fast / few rounds).


The only current way for FLOps to let you know that something went wrong during building is to send an error message to your project observer.

Watch out for these observer logs - especially when working with new ML repositories for the first time.

The image-builder service does the following:

  • Clones the requested ML Git repository
  • Checks the cloned repository if it satisfies the mandatory structural requirements
  • Checks for potential dependency issues and tries to resolve them if possible
  • Builds the FL-Actor images
  • Pushes the build images to the FLOps image registry
  • Notifies the project observer and the FLOps manager about its success or possible errors

The FLOps manager undeploys and removes the image builder service.

Showcase

This demo shows this build process from the perspective of a priviledged CLI user.

Time Lapse

Building images can easily take 5-30+ minutes.

This depends on:

  • Underlying worker node resources
  • Complexity of the provided ML repo dependencies
  • Number and kind of target platforms that should be supported

This demo cuts out long waiting periods for your viewing pleasure.

Want to know more about FLOps’ image building process?

Explore why and how container images are build in FLOps here.