{"id":3346,"date":"2020-06-18T14:03:13","date_gmt":"2020-06-18T14:03:13","guid":{"rendered":"https:\/\/47billion.com\/?p=3346"},"modified":"2024-12-23T05:15:14","modified_gmt":"2024-12-23T05:15:14","slug":"document-understanding-doorway-to-efficient-systems","status":"publish","type":"post","link":"https:\/\/47billion.com\/blog\/document-understanding-doorway-to-efficient-systems\/","title":{"rendered":"Document Understanding – A Doorway To Efficient Systems"},"content":{"rendered":"\n
A 60-year-old patient walks into a new medical facility. He carries with him a big pile of printed papers of health records containing his reports from all his previous medical providers. The operator at the facility starts entering patient data by physically inspecting the last few entries from his records. <\/p>\n\n\n\n
A borrower submits a home loan application. The mortgage provider asks to submit financial documents, and property details. The documents are scanned and sent to the offshore facility for manual data entry. The offshore operator starts a long and painful process of inspecting and entering financial data from the document into spreadsheets. The structured data is returned to the provider the next day and then the further process continues. A similar manual process is used for contract review, legal case precedence, resume processing, invoices consolidations, expense reconciliation, etc. <\/p>\n\n\n\n
Every day millions of documents with unstructured are generated, inspected, interpreted, exchanged, and used for decision making. These documents are human-readable but not machine-readable. In the process, sometimes, errors are made, and the core information is lost. Also, such data is open to interpretation from a person looking at it causing inconsistencies further in the chain. It is not easy to exchange such data between multiple parties. Lack of standards to exchange such data in different systems further exasperates the issue. Due to these issues, most of the unstructured data in businesses is left untouched leaving a huge gap in analytics and decision making. <\/p>\n\n\n\n
The Platform<\/strong> <\/h3>\n\n\n\n
The solution is to classify documents and extract metadata from such unstructured documents using an Intelligent AI-powered document understanding platform. Such a platform can ingest documents in a variety of formats, at scale, and convert them into SMART Docs. SMART docs consist of relevant structured data (XML, JSON) extracted from such unstructured documents. <\/p>\n\n\n\n
To convert unstructured documents to SMART docs, different types of extractors are used that can accurately detect and extract information from MS DOC files, Digital PDF files, Scanned PDF files, Images, etc. The unstructured documents may contain a variety of content like title, header\/footer, tables, checkboxes, images, signatures, stamps, barcodes, etc. An intelligent document understanding platform performs layout detection to identify each of these areas in the document, OCRs the text or extracts the text based on the document type and creates a structured text output that can be further interpreted. A machine learning-based interpreter can be used to accurately detect and extract values from such a text and generate a corresponding SMART document. <\/p>\n\n\n\n
The accuracy of detection can be further enhanced by the use of structured data to give meaning to the unstructured data. A domain-specific dictionary, taxonomy, ontology can be used to support the extraction making it much more efficient and accurate. Such structured data is either fed with the document or extracted from one part of the document and used to interpret other parts or can be retrieved from external sources during document extraction. For example, claims data along with the Electronic Health Record (EHR<\/a>), use borrower names from one part of the document to search in other parts. <\/p>\n\n\n\n