Skip to content

OCR Scanning

What is OCR scanning

OCR (Optical Character Recognition) is a technology that converts text within images and scanned documents into searchable text.

This means that documents such as:

  • scanned PDFs
  • image files (e.g. JPG, PNG)
  • image-based PDFs

can be indexed and included in search results, just like standard text-based documents.

How OCR works in the Data Room

In Admincontrol, OCR scanning is part of the AI-powered search experience.

When documents are uploaded:

  1. The system detects files that contain text in image form
  2. OCR processing extracts the text from those images
  3. The extracted text is added to the search index
  4. The document becomes fully searchable using keywords or AI search

This allows users to:

  • search across all documents, regardless of format
  • locate specific terms, figures, or clauses without opening each file
  • work more efficiently during due diligence

What type of files are supported? 

OCR is automatically applied to:

  • scanned documents
  • image files (e.g. JPG, PNG)
  • PDFs that contain images instead of selectable text

After processing, these files behave like standard documents in search.

When is OCR applied? 

OCR scanning is performed:

  • automatically during document upload
  • when AI-powered search is enabled in the portal 
  • only for newly uploaded files.

Only documents uploaded after OCR is enabled will be processed. Existing documents are not retroactively scanned.