AI is poised to start an equally large transformation as the world went through during the electrical revolution. AI as a technology that will be everywhere, and in everything. It’s once more all about automation, but this time fueled by data. In a recent study, IDC predicts the AI market to reach a stunning $200 Billion until 2025, dominated by tech giants such as Google, Amazon, and Microsoft, offering cloud-based AI solutions and APIs. The beautiful analogy of AI and electricity is repeatedly used by Baidu’s former Chief Data Scientist Andrew Ng, who emphasizes the scarcity of data as major roadblock slowing down AI because the traditional AI methods need a tremendous amount of data to train models. A problem, which turns out to be much deeper as we all expected, and consumers shout loud and clear about not feeling comfortable with sharing data. To address those concerns and facilitate the free flow of data laws, frameworks and advisory boards are springing up like mushrooms. The most popular one is the 119 pages regulation of the European parliament (aka GDPR), a masterpiece of complex regulations, which not only caused panic in the Engineering teams around the world, but also consumer still wonder about the newly gained right of being forgotten (plus some insufferable confirmation dialogues when you browse the web.) The right of being forgotten is celebrated as a major victory to protects an individual’s privacy by the erasure of their information on the internet. However, the data likely was used to train a model — how to teach a machine to unlearn?
How to fix the privacy dilemma?
There are no simple answers when it comes to data privacy, and the truth might be as usual a mix of different methods, which needs to be applied in the right context. One promising piece of technology was quietly announced in a short side note at Google I/O 2019, where Sundar Pichai introduced TensorFlow federated, a distributed, privacy-preserving technology to train AI on small data sets. Simplified: Federated Learning is flipping around the way of training models, instead of centralizing data on servers, the model is sent to the data where it extracts the knowledge and just the underlying mathematical model
Weights are sent back to the cloud, where the models from multiple contributors are aggregated to a global model and shared back with the community. As the data never leaves your device this is lifting the privacy standards to a new level. Google has applied this technique on Gboard, Google’s smart digital keyboard, to improve next-word prediction for example by learning what you want to type next. Besides guaranteed Data privacy and higher data security, this also enables models to learn from small data.
Why does it matter?
Distributed Learning is a powerful instrument, which gives tech firms a competitive edge in transforming distributed data into knowledge, and ultimately owning superior AI models without the need to ask consumer for data, such as the Google Venture funded OWKIN, which is pioneering federated learning in healthcare to overcome the data sharing problem, building collective intelligence from distributed data at scale while preserving data privacy and security.
What are the challenges?
Federated learning is changing the game of how AI learns, but there are challenges too: Model training, and evaluation on an unseen dataset, will be puzzling at first. Today federated learning is limited to a narrow set of applications in which the necessary labels can be derived directly without requiring the user to do additional work. Even tougher, cases where features are not static, such as in an enterprise context. Plus there is the human factor, the established mindset of centrally aggregating data to build customer specific models for competitive advantages is another challenge slowing down the adoption of federated learning.
At first glance Federated Learning is addressing the pain of data privacy and security, which isn’t wrong, but the true value is it can dramatically reduce the amount of data needed while training a model, which makes Intelligence accessible to consumers without big data by leverage intelligence-as-a-service, which may have game changing impact especially in for SMEs. Ultimately allow us to train models, which couldn’t be trained before due to the lack of data.