Intelligent automation & its machine learning algorithms
The goal of every intelligent automation use case is ultimately to automate certain business decisions, such as paying and/or booking an invoice, settling an insurance claim, or validating a credit request.
As introduced in an earlier article, there are 3 main components of intelligent automation:
- Read – Extracting data from documents
- Reason – Defining the next steps to process these documents (decisions such as: pay the invoice, forward request to correct department, flag fraud, etc.)
- Rely on – Trusting the AI-based decisions to automate the document handling process and free up time for human operators
The first component ‘read’ had already been covered in another article, which concluded with the advantages of AI-based data capture (or AI-based OCR) over traditional techniques.
The reasoning part boils down to interpreting information (aka “reading in between the lines”) and connecting the dots in order to make a decision.
Various types of reasoning are needed to achieve these business goals. This also requires a diverse range of AI techniques to be applied.
In what follows, we elaborate on the typical business challenges we address and some of the AI techniques that underpin them.
Splitting and classification
To achieve true, intelligent, automation you need a solution that implies more than just capturing data from documents. That’s where classification and splitting come in.
The classification technology part is able to recognize different types of documents. This is useful for workflows where multiple documents have to be processed.
You can also use this kind of technology to check whether your users uploaded the correct document (e.g. did they upload a copy of their ID card, or did they upload a driver’s license instead?).
Frequently asked questions about this business challenge:
- Can an email and its various attachments be separated and classified?
- Can you split a large pile of invoices scanned into a single file? Can you detect credit notes, credit card statements?
- Which documents from a larger bundle contain relevant information and which ones should we ignore?
- When we ask a customer to upload a document like a salary slip or ID card, can you check whether the document uploaded is indeed what we expect?
- Can you distinguish images that contain receipts versus images that merely contain an email footer?
- Is it possible to detect and split multiple expense notes that have been collated on one page?
Technology – typical components used to solve these questions:
Splitting and classification cover a broad range of business challenges, and we apply an equally broad range of techniques to address them. From straightforward, imperative interpretation of the structure of an email (“various attachments”), over canny feature engineering to propel decision trees (“split invoices”) to transfer learning for object detection using Region-based Convolutional Neural Networks (“split collated expense notes”).
Our expertise lies in the combination of our pre-trained models, the problem-specific feature engineering, and our ability to select and apply the right machine learning technique for the task at hand.
Visual content analysis
Our Machine Learning algorithms also understand the visual context of a document and therefore, it can determine whether or not a contract has been signed, for example. But that’s not just it! Below, you will find some of the most frequently asked questions we get from (potential) customers.
Frequently asked questions about this business challenge:
- Is it possible to detect whether a contract has been signed or not?
- When customers send back a filled-in form with a barcode, can the barcode be read?
- When a customer makes handwritten notes on a document, can this be detected so that the doc is flagged for a manual review by an operator?
- Can you detect a QR code? Can you automate a payment with a SEPA payments code?
- On some documents the company name is not mentioned, it only appears in the logo. Can this logo be located?
- Can you read a rotated document? What happens if there are multiple receipts or card images on one page, can you split and read these automatically?
Technology – typical components used to solve these questions:
This picture below is a fun example of how visual AI can help you determine whether it implies a blueberry muffin or a chihuahua. The real value lies in applying visual solutions in the document automation space.
Part of the challenge of bringing visual AI to document processing is to have the algorithm learn the large variation in visual elements (e.g. differences in handwriting and signatures) from a rather small set of samples.
In order to help the machine, we apply pre-processing techniques, ranging from image orientation correction to shadow removal. All to enhance the quality of the captured information. However, with such few samples, we also depend on transfer learning solutions to bridge this gap.
Specialized Region-based Convolutional Neural Networks (RCNNs, Ren et al.) are trained in a semi-supervised fashion to detect and differentiate between handwritten text and signatures (a task nearly impossible for a rule-based system). Similar neural networks within the framework of object detection are used to detect logos and read barcodes.
Going one step further, we leverage the power of these algorithms (RCNNs) to split multiple attachments (cards, receipts, etc.), originally scanned on the same page, onto separate pages.
Determining intent or classifying content
Understanding the intent of an email is one of the more difficult AI tasks, but it’s so valuable once you have cracked it. Imagine a nearly flawless automated email routing solution, taking action based on the understanding of the intent of the email (complaint, change request, information request, etc.).
Another incredibly useful application of AI technology is the classification of the content of a document. Imagine you work at the credit disbursement department and your client sends you a document via an online request form or via an email. Having a technology that understands the type of invoice your user submitted, can help you detect fraudulent disbursement requests a lot faster.
Frequently asked questions about this business challenge:
- Can you interpret customer emails, and categorize intent?
- Can you check if the invoice type matches the given loan/credit type?
- Can you determine which line items on an invoice, or a receipt, are covered in the clients’ insurance plan?
- Which business line does a certain document belong to?
Technology – typical components used to solve these questions:
N-gram word representations remain a solid baseline for short-document text classification. However, to really understand the intent from semantically rich documents (e.g. long emails), Contract.fit continually invests in bringing advanced Natural Language Processing (NLP) algorithms to the playground.
Following the recent paradigm shift in NLP, we leverage transfer learning from large pre-trained language modeling architectures (BERT, Devlin et al. 2019) for practical and performant models that make sense of your documents.
Instead of treating documents as an unordered bag of unique tokens, document representations are built up from discrete tokens as learned in-context, which combine hierarchically (character>wordpiece>word>sentence>paragraph>page).
Decision making – Putting it all together
Taking a decision in an intelligent automation case involves many different components, as illustrated throughout this document. At one point all of these need to be brought together to make a final decision on how to handle an accounting item, insurance claim, or credit request.
This is where we bring together the different pieces from the Read and the Reason part, to ultimately deliver the business value.
Frequently asked questions about this business challenge:
- Can we use this technology to pay out smaller insurance claims automatically?
- Can you accept or reject proof documents for a credit request?
- Does the technology know which invoice it should book to which ledger?
- If a client case gets rejected, will we know why?
Technology – typical components used to solve these questions:
These questions link very closely to the whole debate around “explainable AI”. Can an AI system explain why it is making a certain decision or does it remain a black box?
From a technology point of view, the answer is twofold. On one hand, there is the explainability of individual models: which feature(s) contributed most to the decision? This usually means a trade-off between somewhat less performant models (such as Support-Vector-Machine) that have very good explainability and highly performant neural networks with little to no explainability.
On the other hand, a final decision is based on the outcome of several individual predictions. The combination of each individual prediction forms an ensemble which is again explainable.
Conclusion
Earlier, we gave a brief overview of typical business questions asked, as well as typical technologies used for AI-based decision making.
However, there is still a gap between running AI in the lab and in a lifelike environment.
That brings us to the last R of the three: Rely-on, which is all about deciding whether to process client cases fully automated without any human intervention, or whether to involve a human being in the decision-making process. We’ll tackle this topic in our next article.
About Contract.fit
At Contract.fit, we want to put Intelligent Automation at everyone’s fingertips. We believe that AI and machine learning technologies can simplify our office life and free us from the burden of administrative tasks. With our solution, we want to help our clients increase customer satisfaction by freeing up their time for more value-adding, customer-focused tasks.
Did you recognize your own questions in the FAQs above? Don’t hesitate to reach out. Our experts can help you get started with Intelligent Automation in your company!
Leave a Reply