Retrain your own Customisable Similarity Search in SAP Leonardo MLF-Part 1: An Overview
This is the first part of my series blogs about retraining your own Customisable Similarity Search in SAP Leonard Machine Learning Foundation
- Part 1: An overview introduction to Customisable Similarity Search (this blog)
- Part 2: How to retrain a Customisable Similarity Search?
- Part 3: How to inference your Customisable Similarity Search?
- Part 4: Put the pieces together to create a FaceID solution for customer
In this part one, I will give an overview introduction about customisable similarity scoring to address the questions below.
- What is the similarity scoring?
- When do we need to retrain the similarity search?
- What is the Training Service for Customisable Similarity Search?
- What are the differences between the pretrained similarity scoring and customisable similarity search?
What is the similarity scoring?
Inference Service of Similarity Scoring in SAP Leonardo Machine Learning Foundation compares vectors with respect to cosine similarity, these vectors can be face feature vector, image feature vector or document feature vector, which could be respectively extracted through the inference services of Face Feature Extraction, Image Feature Extraction or Document Feature Extraction. Putting these functional services into a business context, face/image/document feature extraction and similarity scoring could be used in
- Face Recognition as face id for customer in retail shop.
- Product/Parts visually search through images
- Legal document change tracking etc.
You may check out this online demo app to explore the functional services of SAP Leonardo Machine Learning Foundation.
When do we need a Customisable Similarity Search?
The inference service of similarity scoring requires the vectors to be compared in the request body. It may be all right for a small number of vectors but turns inappropriate in a large number of high-dimensional vectors. For example, find out the best matched product out of 10,000 product images, each has a 1000+ dimension feature vector. The overhead of each http request is enormous, no to mention about the peer comparison of vectors one by one in a linear manner.
Fortunately, SAP Leonardo Machine Learning Foundation also provides the training service of customisable similarity search to complement the pretrained similarity scoring, which allows you to train a list vectors, and build the index of the vectors and internalise as accessible model for inference. And only the target feature vector(s) to be searched against will be passed to the inference service.
What is the Training Service for Customisable Similarity Search?
The training service of customisable similarity search builds a forest of lookup tree of the given high dimensional vectors for efficient lookup of NNS(Nearest Neighbour Search), which supports the lookup with both exact method and approximate method using annoy. Exact method compares the pair vectors for searching the nearest neighbour, resulting in linear process time and becoming insufficient on a large number of vectors. In this case, approximate method with annoy is recommended for faster lookup speed.
To train a customisable similarity search, it requires a text file of vectors one comma-separated vector per line, and each vector should have an identical number of dimensions. After a successful training job, a deployable artifact is generated to encapsulate the lookup functionality on the training vectors, which can return the indices and distances of the top k neighbours for the given vector input in inference.
What are the differences between Similarity Scoring and Customisable Similarity Search?
The pretrained similarity scoring service return the similarity scoring (0~1.0), the higher score, the more similar. While the retrained customisable similarity search output the distance and index, the shorter distance, the closer neighbour. However, the distance itself is not human understandable, therefore the similarity scoring is often used to calculate the human understandable score after finding out the nearest neighbour with the customisable similarity search.
In next blog(part 2), I will show you how to retrain a Customisable Similarity Search.