KNOWLEDGE BASE ARTICLE

Understanding Document Processors

Adding Umango document processors can make a significant difference to the speed that documents are processed in Umango. But not always. This article aims to explain several key points that determine the benefits and various scenarios where document processors can help:

What is a Umango document processor?

A document processor is a worker thread that performs work on a file or batch (a workload package) as it moves through Umango. This work includes importing the file, applying image filters, data capture (OCR etc), file format conversion and uploading the file to its final destination.

When is it useful to add document processors?

Consider a document processor as being like the cashier at Walmart and the customers are like the documents. The documents (customers) line up in a queue and wait for a processor (cashier) to become available so they can be processed. The more processors (cashiers) the faster the documents in the queue will be reduced/processed. However, adding processors (cashiers) will not increase the speed of each individual document (customer) being processed. If there are no customers waiting in line, then adding cashiers will not increase the speed a customer will be processed through a checkout. Similarly, adding processors will not increase the speed that documents are processed unless there are documents waiting in the document queue.

So, if you are expecting 1000 documents to be processed every 24 hours then the decision on whether adding processors would be helpful or not is fully dependent on whether all 1000 documents will be processed in a short window of time or if the documents will trickle in over the full 24 hour period. If they all need to be processed in a narrow window of time then adding processors would typically provide significant benefit.

In summary, if Umango often has a queue of documents/batches with a status of "waiting..." then adding document processors will speed things up. Otherwise, there is no benefit in adding more processors.

Is a document processor the same as a CPU core?

No, they are different. However, they are associated. Each document processor will consume up to 100% of the processing power of 1 CPU core. So if you have 1 document processor and 4 CPU cores, the maximum processing power that will be used will be 1/4 or 25% of the total CPU power. Adding a second document processor enables the use of a second CPU core to be consumed concurrently (1 per document processor) and therefore up to 2/4 or 50% of the total CPU power. As a general rule, it is never effective to license more document processors than the server has CPU cores. In fact we would recommend licensing no more than the number of CPU cores minus 1. This leaves 1 core available to perform other functions and will not leave the server laboring at 100%. Once Umango is consuming 100% of the CPU power, there is no more processing power to assign to another document processor.

Are processors assigned to jobs or sources?

Neither, they are assigned to the processing queue. There is no association between jobs or sources and document processors. Processors are not assigned to jobs, nor are they assigned to a job's source. Document processors sit and wait for a workload to hit the queue and then they go to work on the document or batch (the workload). Using the Walmart scenario above, where cashiers are like document processors and documents are like customers, sources are like the isles in the store.

Link to this article http://umango.com/KB?article=103