FedML – The Federated Machine Learning Libraries for Hyperscalers 2.0
According to Gartner, “more than 75% of midsize and large organizations use two or more public cloud providers today and have plans to expand.”
Multi cloud strategy helps companies solve issues around cost, security, and regulatory environments, while still providing consumption flexibility and ensuring that enterprises avoid vendor lock-in.
The Multi Cloud Challenge
One of the side-effects of a multi cloud strategy is the need for businesses to extract their business data out to cheap cloud storages to lay the foundation for Analytics as well as data science experiments on the respective multi cloud platforms.
This forced data replication is also due to the fact that Predictive modelling & building of machine learning models work seamlessly when the data resides on the respective platform’s native cloud storages and currently there is a lack of cross cloud data access as well.
This inadvertently brings in the need for expensive ETL and data pipelines to move the data across systems which leads to data inconsistency issues. As well as taking away the time and focus of data scientists, as they are the ones who end up tackling data sourcing issues.
SAP FedML or Federated Machine learning libraries help avoid the extraction and migration of training data from business systems to hyperscaler ML platforms to build & train machine learning models.
The library applies the data federation architecture of SAP Datasphere and provides functions that enable businesses and data scientists to build, train and deploy machine learning models on hyperscalers, thereby eliminating the need for replicating or migrating data out from its original source.
FedML 2.0 library is now available free for use with AWS Sagemaker, Azure Machine Learning and Google Vertex AI platforms.
In Version 1.0, FedML had support for automating data sourcing and training of models in respective hyperscalers.
What’s NEW in 2.0?
With FedML 2.0, here are the updated features:
- Support for pip installing the library from PyPI repo.
- Support for deploying the models in native hyperscaler platform..
- Support for deploying the model in SAP BTP Kyma platform.
- Support or inferencing / predicting from both native hyperscaler deployment as well as Kyma deployment.
- Support for writing inferenced results back to SAP Datasphere.
In a nutshell, FedML 2.0 now allows the data scientists to completely automate the end-to-end flow from data sourcing to model training, deployment, prediction and to persist the results back in SAP Datasphere too, all with just a few lines of code.
How do I install & use FedML ?
Please find sample notebooks and documentation to use FedML with respective hyperscaler here in this github
Please follow the blogs below for trying out FedML in respective cloud platforms.
Federated Machine Learning using SAP Datasphere and Amazon SageMaker 2.0
Federated Machine Learning using SAP Datasphere and Google Vertex AI 2.0
Federated Machine Learning using SAP Datasphere and Azure Machine Learning 2.0
What’s FedML’s Value Proposition?
FedML helps the organization realize value by eliminating the need to do data duplication for the purpose of machine learning. Thereby saving costs and having their data scientists focus solely on the training of machine learning models, thus giving them instant access to multiple data sources.
This helps the organization avoid vendor lock-in and aids them with reduction of their hyperscaler storage costs, and adherence to GDPR policies, as data migration is eliminated. It also enables instant access to cross-cloud data sources, combined with SAP Business data managed through SAP Datasphere’s unified semantic models.
For more information about this topic or to ask a question, please contact us at email@example.com
Thank you. It is very insightful. May I please also know if FedML can also be tried out in SAP AICore?
Thank you for reaching out . FedML is designed to be used directly on hyperscaler machine learning platform s (eg: AWS Sagemaker, GCP Vertex AI etc. ) and for the use cases where the model training and deployment happens on the native hyperscaler environments.
SAP AI core is coupled closely with SAP BTP and together with SAP AI Lauchpad, helps integrating AI capabilities in SAP solutions. They both serve different purposes. Hope that helps,
Thanks for the blog post, Quite interesting
My question is, How could one bring additional python runtime dependencies at the time of Training and Serving, for instance, I need a particular python library to pre-process my data before feeding it to Scikit-Learn Training, Basically, how could I prepare my runtime environment with all my required dependencies?
Hi Suresh Kumar Raju ,
Thanks for your question. Yes, FedML libraries provides flexibility to bring in additional runtime dependencies. Please refer to individual library documentations for the specifics.
As a example, FedML-Azure provides flexibility to create an environment with any python library dependency included, the same environment with the installed dependencies will be used for both training and serving. Please consult this documentation for FedML-Azure, for example, that shows how to create an environment with any python dependencies included : https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#create_environment
For any further details or to discuss further, please reach out to us at firstname.lastname@example.org.
so if I have understood correctly the "federated ML" SAP is referring to is not the same we find in literature right?
Hello Elisa, Yes, our library helps with ML on hyperscalers with "federated data" (both SAP and non-SAP) via SAP Datasphere. Thanks.