Technical Articles
Retrain a Text Classification with Postman and SAP Leonardo Machine Learning Foundation on SAP Cloud Platform Trial – Sentiment Analysis
In my previous blog post about image retraining, a question of the data structure for text retraining came up. I have answered this question, but I think it makes also sense to write a blog post about it. So I decided to write a small series with at least one more blog post.
For the text retraining I will use Twitter Sentiment Analysis data which classifies positive and negative sentences. SAP Leonardo trains the Machine Learning Model, which can be deployed and used by the text classification service. Here is an example:
Most of the procedure is similar to the image retraining, therefore I will refer to it and describe the differences in more detail.
What do you need?
- A trial account on SAP Cloud Platform.
- Postman
- TEXT Retraining Postman Collection
- Minio
- Twitter Sentiment Analysis Training Corpus (Dataset)
- A script to create the training data: SAP Leonardo Machine Learning Twitter Sentiment Analysis Dataset
Step 1 – SAP Leonardo Machine Learning instance
Take a look at Step 1 of the Image Retraining to create a SCP trial account and a Service Key:
{ "clientid": "sb-42mn3-3z7p-96r-3c79-0x1pm0l!p216|klgco-lag-vitas!d66", … "clientsecret": "5ghsdYM/Z567N5LoQ7nrXBkZ0BV=", "serviceurls": { … "TEXT_LINEAR_RETRAIN_API_URL": "https://mlftrial-retrain-text-linear-api.cfapps.eu10.hana.ondemand.com/api/v2/text/retraining", … "TEXT_CLASSIFIER_URL": "https://mlftrial-text-classifier.cfapps.eu10.hana.ondemand.com/api/v2/text/classification", … }, "url": "https://p2000894545trial.authentication.eu10.hana.ondemand.com" }
Note: Instead of the IMAGE_RETRAIN_API_URL and IMAGE_CLASSIFICATION_URL the TEXT_LINEAR_RETRAIN_API_URL and TEXT_CLASSIFIER_URL are important.
Don’t mess it up like I did. :confounded face:
Step 2 – Storage
Same procedure as Step 2 at the image retraining.
In case you’ve got already a storage, just run the POST again to get the endpoint, accessKey and secretKey.
Step 3 – Training data
The following steps are necessary to create the training data:
- download Twitter Sentiment Analysis Training Corpus (Dataset)
- run the script (needs Node.js): SAP Leonardo Machine Learning Twitter Sentiment Analysis Dataset
The script creates a sentiment_100.zip file with the following structure (see also SAP Help – Uploading Data):
sentiment ├── test │ ├── negative │ └── positive ├── training │ ├── negative │ └── positive └── validation ├── negative └── positive
Step 4 – Upload
Take a look at Step 4 of the image retraining and run this to upload the sentiment data set:
mc cp sentiment_100.zip saps3/data/sentiment
Step 5 – Training
After uploading the training data, start the training with Postman.
Training
URL: TEXT_LINEAR_RETRAIN_API_URL/jobs Doc: https://api.sap.com/api/text_linear_retrain_api/resource
POST https://mlftrial-retrain-text-linear-api.cfapps.eu10.hana.ondemand.com/api/v2/text/retraining/jobs Headers: Authorization: {{Bearer Token}} Content-Type: application/json Body: { "dataset": "sentiment", "modelName": "sentiment", "preprocessingLanguage": "en", "completionTime": 24, "memory": 8192 } Result: { "id": "sentiment-2018-12-01t2235z745432" }
Jobs
You can check, if the job is successful finished, when the status is SUCCEEDED.
URL: TEXT_LINEAR_RETRAIN_API_URL/jobs Doc: https://api.sap.com/api/text_linear_retrain_api/resource
GET https://mlftrial-retrain-text-linear-api.cfapps.eu10.hana.ondemand.com/api/v2/text/retraining/jobs Results: { "finishTime": "2018-12-01T23:31:02+00:00", "message": "", "startTime": "2018-12-01T22:35:53+00:00", "id": "sentiment-2018-12-01t2235z745432", "status": "SUCCEEDED", "submissionTime": "2018-12-01T22:35:51+00:00" }
This took nearly one hour but I use over 300,000 sentiments (sentiment_5) for the training.
Logs
In case of failure or success, download the job logs:
mc cp --recursive saps3/data/<JOB ID>/ logs
example:
mc cp --recursive saps3/data/sentiment-2018-12-01t2235z745432/ logs
Step 6 – Deploy
The model must be deployed after a successful training.
Deploy Model
URL: TEXT_LINEAR_RETRAIN_API_URL/deployments Doc: https://api.sap.com/api/text_linear_retrain_api/resource
POST https://mlftrial-retrain-text-linear-api.cfapps.eu10.hana.ondemand.com/api/v2/text/retraining/deployments Header: Authorization: {{Bearer Token}} Content-Type: application/json Body: { "modelName": "sentiment", "modelVersion": "1" } Result: { "id": "f6b34f68-6bf0-4fe8-98f5-9f9a4310a9b8" }
After some time the model is available for a text classification.
Step 7 – Test
For my first test I’ve used this tweet from Witalij Rudnicki:
https://twitter.com/Sygyzmundovych/status/1061608300490440704
Text Classification
URL: TEXT_CLASSIFIER_URL/models/{model}/versions/{version} Doc: Inference Service for Customizable Text Classification
POST https://mlftrial-text-classifier.cfapps.eu10.hana.ondemand.com/api/v2/text/classification/models/sentiment/versions/1 Header: Authorization: {{Bearer Token}} Content-Type: application/json Body: texts=Starting sampling of a next batch of dark beers. This one has nice velvety taste, but way too sweet. ? - Drinking a Świderskie by Cerkom @ Oporów —
Here is the result for this 88.4% positive tweet:
{ "id": "6288fb40-1671-4c09-7cec-0baa12950d82", "predictions": [ { "results": [ { "label": "positive", "score": 0.8846777437444767 }, { "label": "negative", "score": 0.11532225625552328 } ] } ], "processedTime": "2018-12-02T14:24:14.242227+00:00", "status": "DONE" }
I don’t want to end this blog post with a negative sentence, but you can find one in my Postman collection.
have fun :goofy face: