OCR Scanning
What is OCR scanning
OCR (Optical Character Recognition) is a technology that converts text within images and scanned documents into searchable text.
This means that documents such as:
- scanned PDFs
- image files (e.g. JPG, PNG)
- image-based PDFs
can be indexed and included in search results, just like standard text-based documents.
How OCR works in the Data Room
In Admincontrol, OCR scanning is part of the AI-powered search experience.
When documents are uploaded:
- The system detects files that contain text in image form
- OCR processing extracts the text from those images
- The extracted text is added to the search index
- The document becomes fully searchable using keywords or AI search
This allows users to:
- search across all documents, regardless of format
- locate specific terms, figures, or clauses without opening each file
- work more efficiently during due diligence
What type of files are supported?
OCR is automatically applied to:
- scanned documents
- image files (e.g. JPG, PNG)
- PDFs that contain images instead of selectable text
After processing, these files behave like standard documents in search.
When is OCR applied?
OCR scanning is performed:
- automatically during document upload
- when AI-powered search is enabled in the portal
- only for newly uploaded files.
Only documents uploaded after OCR is enabled will be processed. Existing documents are not retroactively scanned.
.png?width=283&height=51&name=Admincontrol_Logo-RGB_Reverse%20(1).png)