Optical character recognition (OCR) | VDRPro

  • Updated

Product: VDRPro
Applies to: All managers and publishers
 

Question

What is OCR and what file types are supported?

Answer

Optical character recognition (OCR) is the process of scanning images of printed, typewritten, or handwritten text and converting them electronically or mechanically into machine-encoded text. OCR turns scanned image text into searchable text in Intralinks.

When documents are scanned for OCR in Intralinks, the system adds the metadata to the Intralinks search engine (rather than to the original files). As a result, you can search for document keywords through the Intralinks system. However, if you download the original files and try to search in the files themselves for keywords, the search will not provide any results.

When OCR is enabled on an exchange, supported files are scanned as they are uploaded. The metadata / content is generally searchable within 30 minutes of upload. If the OCR setting is enabled after documents are uploaded, the existing documents are triggered for OCR but may take 24+ hours to complete.

Intralinks OCR fully supports UTF-8, and as such works on all languages.

File types supported for OCR:

  • PDF
  • JPEG
  • GIF
  • TIFF
Microsoft Office files are not supported for OCR.

Additional information

Was this article helpful?