FedML - The Federated Machine Learning Libraries f...

Sangeetha_K · ‎06-08-2022

According to Gartner, “more than 75% of midsize and large organizations use two or more public cloud providers today and have plans to expand.”

Multi cloud strategy helps companies solve issues around cost, security, and regulatory environments, while still providing consumption flexibility and ensuring that enterprises avoid vendor lock-in.

The Multi Cloud Challenge

One of the side-effects of a multi cloud strategy is the need for businesses to extract their business data out to cheap cloud storages to lay the foundation for Analytics as well as data science experiments on the respective multi cloud platforms.

This forced data replication is also due to the fact that Predictive modelling & building of machine learning models work seamlessly when the data resides on the respective platform’s native cloud storages and currently there is a lack of cross cloud data access as well.

This inadvertently brings in the need for expensive ETL and data pipelines to move the data across systems which leads to data inconsistency issues. As well as taking away the time and focus of data scientists, as they are the ones who end up tackling data sourcing issues.

Why FedML?

Github: https://github.com/SAP-samples/datasphere-fedml

SAP FedML or Federated Machine learning libraries help avoid the extraction and migration of training data from business systems to hyperscaler ML platforms to build & train machine learning models.

The library applies the data federation architecture of SAP Datasphere and provides functions that enable businesses and data scientists to build, train and deploy machine learning models on hyperscalers, thereby eliminating the need for replicating or migrating data out from its original source.

FedML Solution Diagram

FedML 2.0 library is now available free for use with AWS Sagemaker, Azure Machine Learning and Google Vertex AI platforms.

In Version 1.0, FedML had support for automating data sourcing and training of models in respective hyperscalers.

What’s NEW in 2.0?

With FedML 2.0, here are the updated features:

Support for pip installing the library from PyPI repo.

Support for deploying the models in native hyperscaler platform..

Support for deploying the model in SAP BTP Kyma platform.

Support or inferencing / predicting from both native hyperscaler deployment as well as Kyma deployment.

Support for writing inferenced results back to SAP Datasphere.

In a nutshell, FedML 2.0 now allows the data scientists to completely automate the end-to-end flow from data sourcing to model training, deployment, prediction and to persist the results back in SAP Datasphere too, all with just a few lines of code.

How do I install & use FedML ?

Please find sample notebooks and documentation to use FedML with respective hyperscaler here in this github

Please follow the blogs below for trying out FedML in respective cloud platforms.

Federated Machine Learning using SAP Datasphere and Amazon SageMaker 2.0

Federated Machine Learning using SAP Datasphere and Google Vertex AI 2.0

Federated Machine Learning using SAP Datasphere and Azure Machine Learning 2.0

Please also find FedML released recently for Databricks ML platform here:

Using FedML library with SAP Datasphere and Databricks

What’s FedML’s Value Proposition?

FedML helps the organization realize value by eliminating the need to do data duplication for the purpose of machine learning. Thereby saving costs and having their data scientists focus solely on the training of machine learning models, thus giving them instant access to multiple data sources.

This helps the organization avoid vendor lock-in and aids them with reduction of their hyperscaler storage costs, and adherence to GDPR policies, as data migration is eliminated. It also enables instant access to cross-cloud data sources, combined with SAP Business data managed through SAP Datasphere’s unified semantic models.

For more information about this topic or to ask a question, please contact us at paa@sap.com