KNOWLEDGE BASE ARTICLE

Quick Guide to Regular Expressions in Umango Extract

Regular Expressions 'RegEx' are a fast, powerful and accurate way to be able to identify exactly the text you want to extract from an area of a document. For those with some technical know-how the steps in creating Regular Expressions can be simple, for others it may take a little more time.

The internet is full of website to assist in building and validating Regular Expression. Some examples that you may wish to utilize include;

  1. www.regex101.com
  2. www.regexpal.com
  3. www.regex101.com

Here are a few examples of Regular Expressions to help get you started.

Hints:

Fig 1. Test your results using Tesseract and ABBYY OCR engines can give you different results. ABBYY set to Accurate tends to provide the best result and doesn't slow the extraction down (compared to Fast). When using Tesseract leaving the setting on Fast tends to provide the best results vs. speed.

Fig 2. Remember Format and Validation provides the rules around how data should be structured once it has been captured.

Fig 3. Smart Seek is the rules we use to capture data we want to locate and capture.

Use 'Test' in Umango Extract to check your settings before saving your job.

Fig 1.

Fig 2.

Fig 3.

Example Regular Expressions using Umango Extract.

Objective

Regular Expression for Format and Validation

Regular Expression for Smart Seek

Image

A 6 to 7 digit number

Reg Ex: REGEX(\d{6,7})

A number after the word BALANCE

REGEX(\d{1,5}.\d{2})

REGEX((?<=BALANCE.*)\d{1,5}.\d{2})

Date below looking for NN/NN/NNNN

REGEX(\d{1,2}/\d{1,2}/\d{2,4})

REGEX(\d{1,2}/\d{1,2}/\d{2,4})

Looking for a number NN NNN NNN NNN

REGEX(\d{2}\s*\d{3}\s*\d{3}\s*\d{3})

Looking for a string of data after the word Name:

RegEx((?<=Name:\s).*\n)

Looking for the dollar amount

REGEX(\$[0-9]{1,5}.[0-9]{2})

REGEX(\$[0-9]{1,5}.[0-9]{2})

Extracting the account number

REGEX([0-9]{3}\s?[0-9]{3}\s?[0-9]{3})

REGEX([0-9]{3}\s?[0-9]{3}\s?[0-9]{3})

Link to this article http://umango.com/KB?article=85