Automated service ticket categorization with machine learning: Lessons learned from real-world ticket datasets
Service Ticket Intelligence first debuted at SAPPHIRE 2017 in Hasso Plattner’s keynote. Through working with our first customers over the last 3 years, we have learned a great deal about the practical challenges in getting machine learning classification to work on real-world ticket datasets.
In this blog post, we would like to share the top 3 lessons learned when working with real-world service ticket data. Service Ticket Intelligence is an SAP Cloud Platform AI Business Service that available out of the box for SAP Service Cloud enterprise customers.
If you would like to try out Service Ticket Intelligence on your datasets, this blog should help you with getting started on your trial account.
Lesson #1: Real-world ticket datasets have categorical imbalances
In all the cases we have seen, real-world ticket datasets have categorical imbalances. Perhaps this is related to how human brains categorize and organize information since ticket catalog structures are also set up by humans to organize the incoming streams of customer support tickets.
In the most straightforward setup, one customer had a flat categorization structure with 16 different categories. In most cases, customers organize their tickets into 50-300 different categories, mostly with the aim of ticket triage and reporting. Regardless of the catalog setup, what we have seen is that the ticket distribution across categories tends to be imbalanced, where there would be a few dominant categories that take up 70-80% of the dataset. To that end, one of the most familiar charts we always use at the start of ticket data analysis, is the ticket training data distribution, together with the cumulative percentage.
Figure 1 Training data distribution count with a cumulative percentage
In the above example, the top 6 categories would already account for almost 85% of the tickets, which means that if we can get the machine learning algorithm to automate ticket classification for these 6 categories well enough, it would already have a significant impact on automatic ticket categorization and thus routing of tickets to various teams based on the categorization.
Also read: Not enough training data to classify support tickets? Try classifying tickets by languages or sentiment.
Lesson #2: Use automation thresholds to improve prediction accuracies
Given that real-world training data tends to have categorical imbalances, the typical result after training a classification model on real-world service tickets is that dominant classes tend to have better prediction accuracies. In contrast, the model almost always misclassifies minority classes as one of the dominant categories, resulting in false positives in the dominant categories.
So how do we overcome this issue?
The Service Ticket Intelligence classification service provides a confidence score with every custom category prediction. This confidence score is the probability that the predicted category (amongst all the other possible choices) is the correct category. An application consuming the API (e.g., SAP Service Cloud or S/4HANA Customer Management) can use this confidence score to set automation thresholds.
Figure 2 Screenshot of threshold setting in Service Cloud Administrator > Prediction Services
In the above example, this setting would ensure that predictions would be populated into the ticket service category field only when the confidence score is higher than or equal to 0.7.
But how would a user decide what threshold to set?
In this example below, we used 4000 test tickets to generate the confusion matrices at no threshold and at a 0.7 threshold setting. Without a threshold setting, the overall accuracy of the model is 64%. With a threshold setting of 0.7, prediction accuracy increases to 83% a trade-off in terms of automation rate, where 54% of tickets would have automated predictions.
Figure 3 Model accuracy 64%, Automation rate 100% (no threshold setting)
Figure 4 Model accuracy 83%, Automation rate 54% (0.7 threshold setting)
At this point, it is often up to the business to decide whether the remaining errors are acceptable based on the context of the categories. In future it would be interesting to see if there can be automated suggestions to business based on past decisions or other boundary conditions.
Lesson #3: Real-world data is messy
One of the most frequently asked questions is, “How should I pre-process my data to get better model accuracies?” We always ask: “Would you pre-process your inference data in the same way when making classification calls?”
Some customers also choose to overcome the categorical imbalance problem by dropping minority classes from their training data. Similarly, we would ask: “Would you be okay with false positives in your majority classes, or that the model would never predict a minority class?”
The reality is that customer support tickets in an omnichannel call center tend to be messy. It is a mix of emails (with headers and footers), manually created tickets (short text with little description), webforms (looking like emails but with field labels and descriptions), and even social media messages (short text with acronyms and emojis).
The problem with investing too much into data pre-processing (to achieve high model training accuracy for a particular training dataset) is that the model is not going to perform very well in all cases.
In cases, where customers integrate the API directly to custom-built applications, and where the use case involves tickets that come from a single channel, it makes sense to invest in data pre-processing applied in the same way to training and inference data. For example:
- For emails, identifying actual email text body by removing headers and footers
- For webforms, identifying actual long text description from a text body
We hope that through this sharing of our lessons learned, it can demystify machine learning for customers who are interested in using this technology to categorise service tickets. Service Cloud Enterprise customers can already get started with exploring the machine learning scenarios besides ticket categorisation.
Service Ticket Intelligence can also be consumed directly as a business service on the SAP Cloud Platform via the consumption model.
For on-premise customers, check out our CRM Integration Guide and this blog post with an overview of the solution setup.
Very Informative blog. I just wanted to know if the service ticket intelligence service have any kind of "Incremental learning" feature? If yes, Can you provide me few information regarding the same?
Hi Samarth, you can refer to our documentation where there's information on incremental training for both the classification and recommendation API.