Balancing between model performance and automation...

former_member16511 · ‎08-05-2020

There are very few real-world data which have the quality for training a Machine Learning (ML) model that can accurately classify hundreds of categories to a granular level. In this blog post, find out how you can make use of Service Ticket Intelligence's model performance simulation feature to help find the right balance between category granularity and model performance, such that your business will still benefit from the automation of triaging tickets.

Real world ticketing data

In most scenarios, ticket distribution across categories tend to be imbalanced, where dominant categories take up 70-80% of the dataset.

The typical result after training a classification model on real-world service tickets is that dominant categories tend to have better prediction accuracies. In contrast, the model almost always mis-classifies minority classes as one of the dominant categories, resulting in false positives.

Category granularity for business to benefit from automation

If we can get ML to predict and classify the dominant categories well enough, it would already have made a significant impact on automating business workflows. As such, the decision lies in balancing between category granularity and model performance, such that the business still benefits from automation in triaging tickets.

Deciding on confidence thresholds

In order to decide on what confidence threshold to set, Service Ticket Intelligence has released a model performance simulation feature in 2007, whereby a subset of training data is used to simulate and arrive at the prediction accuracy and automation rate given various prediction thresholds.

Execute GET /model/accuracy with confidence threshold query

Results of the simulation can be retrieved by making a call to Service Ticket Intelligence's GET /model/accuracy endpoint and by providing a confidence threshold query option from 0.0 to 1.0 as follows.

Details such as accuracy, precision, recall, f1 scores, probability of exceed threshold (automation rate) confusion matrix, etc. will be returned given a confidence threshold.

Given results from the simulation, decide on the following:

Which categories are important for ML to pick out accurately, at the expense of having some false positives? (prioritize recall rate over precision)

Which categories are important for ML to be very precise, at the expense of not being able to always recall these tickets? (prioritize precision rate over recall)

f1 score is a harmonic mean of the precision and recall score, representing a balanced score between the two. (balancing between recalling categories and being accurate)

Based on the outcome of the above questions, identify a confidence threshold value that aligns with a satisfactory model performance and automation rate.

Example

In the example below, we used 4000 test tickets to generate the confusion matrix at no threshold and at 0.7 threshold setting. Without a threshold setting, the overall accuracy of the model is 64%. With a threshold setting of 0.7, prediction accuracy increases to 83% a trade-off in terms of automation rate, where 54% of tickets would have automated predictions.

No confidence threshold setting

64% accuracy

100% automation rate

0.7 confidence threshold setting

83% accuracy

54% automation rate

Making use of the identified confidence thresholds during predictions

Service Ticket Intelligence provides a confidence score (probability that the predicted category is correct) with each category prediction. An application/client retrieving predictions can make use of the confidence score as a basis to accept or reject the prediction based on the identified confidence threshold to adopt.

Using the example below, if the decision has been made adopt a confidence threshold of 0.5, the application will accept and populate the category field automatically based on the prediction returned by Service Ticket Intelligence. For predictions with confidence score < 0.5, the prediction will be rejected and category field will not be populated.

In the following C/4 Service Cloud example, this setting would ensure that predictions would be populated into the ticket service category field only when the confidence score is higher than or equal to 0.6.

Conclusion

Classification of categories at the finest level of granularity does not always yield the highest business outcome. There is a need to strike a balance between model performance and level of automation to get maximum business benefit with Service Ticket Intelligence.

Service Ticket Intelligence is also available on SAP Cloud Platform as a free trial. Find out how you can set up your own trial account with Service Ticket Intelligence here and give the model performance simulation feature a try today!

Additional details on usage

Get Model Accuracy input parameters

Get Model Accuracy object

Get Model Accuracy validation results