Problem

Handling support tickets in a CRM setup needs as little turnaround time as possible for the highest possible customer satisfaction. This involves having to redirect and assign the incoming tickets to relevant personnel with the expertise required to address them. But reading the content and identifying the category of a ticket is manual process and also time consuming, ultimately affecting the customer satisfaction. There needs to be a system which automatically extracts this information, and a platform to provide insights to these tickets for overall operational efficiency.

Challenges

  • Multiple CRM handles with different formats of data
  • Content of tickets varies based on the domain of companies
  • Keywords/ tokens of importance vary with each company
  • Sentiment expressed in tickets is not very polarised
  • Users tend to attach log of error messages in the mail, which adds to the noise of text in detecting entities

Solution

We have built a natural language processing based solution which, with minimal setup required, can be customized to handle tickets of any new company. We have built pipelines to retrain the models based on the entities of importance, and also tweak the models based on the changes in vocabulary.

Following are the modules built as part of the pipeline:

Text cleanup

  • Preprocessing is an essential part of any nlp task, wherein the text is to be cleaned up to bring it to a required format for the information extraction models. This includes normalizing different tenses of words, normalizing synonyms, spell correction etc.
  • In the context of this challenge, it is necessary to segregate the mails and identify if a block of text is header or signature or the body of the mail
  • Spell correction and normalization of abbreviations are also required, using regular expressions.
  • This processed text then needs to be tokenized, which is to split the raw text to a list of words, using popular open source libraries like nltk and spaCy.

Log Detection

  • It is observed that in case of technology companies, users tend to attach error log messages in their mail explaining the error, which need to be separated out from the actual context of the mail in order to facilitate the support engineers in easily identifying the error and provide appropriate solution.
  • A machine learning based classifier is built to classify each sentence in the text if it is part of the error log or not

Named Entity Recognition

  • As part of information extraction, identifying the terms of interest present in text helps in assigning each ticket appropriate tags.
  • From the above example, we can infer that the ticket is related to Docker’s UCP, and the ticket would then be assigned to an engineer working on Ubuntu.
  • This tagging is done using statistical models like Conditional Random Fields (CRF) and also using the latest advances in deep learning for nlp, like Long Short-Term Memory (LSTM) networks.

Sentiment grading

  • Issues raised by the users to support teams vary in severity, and it is important to address the most dissatisfied customers first and with at most priority to keep the overall customer satisfaction level intact.
  • As a result, it is important to identify the degree of sentiment expressed in a mail and assign tickets to appropriate experienced personnel to handle them.
  • The classification model built to address this issue gives a score of sentiment expressed. It keeps track of the context of words in the text, presence of expletives and a few other set of rules in addition to the score coming from the machine learning model.

Business Impact

This pipeline has been customized and deployed to multiple technology companies, whose efficiency has drastically improved in serving their customers