Skip header
 

Embedding Text Information in Scanned Data

You can use the OCR function to embed the text information in the scanned document without processing the data on your computer.

Important

  • For details about the optional units required for this function, see "Functions Requiring Optional Configurations", Getting Started.

  • This function supports the following file types: [PDF], [High Compression PDF], and [PDF/A].

  • If [Black & White: Photo] is selected from [Original Type] when originals are being scanned, the text is scanned in shades of gray, and the characters and the top and bottom of the page may not be recognized correctly. If OCR accuracy has a higher priority than the image quality, select [Black & White: Text] in [Original Type] when scanning the original.

  • You cannot use the OCR function in the following cases:

    • [TIFF / JPEG] or [TIFF] is selected as the file type

    • [Store to HDD] or [Store to HDD + Send] in [Store File] is selected.

    • [100 dpi] is selected as the resolution.

    • [Preview] is selected.

    • As the destination of the distribution server, [WSD] or [DSM] is used.

1Place originals.

2Press [Send File Type / Name].

Operation panel screen illustration

3Select [PDF] in [File Type].

4Select [OCR Settings] under the PDF file settings, and then select [On].

5Configure the settings such as [Add Extrct.Text to File Name], [Delete Blank Page] and [Cognitive Language] as required.

6Press [OK] twice.

7Configure the destination address and other required settings.

8Press the [Start] key.

Note

  • The OCR function can process texts up to 40,000 characters per page.

  • The OCR function can recognize the following languages:

    • English, German, French, Italian, Spanish, Dutch, Portuguese, Polish, Swedish, Finnish, Norwegian, Hungarian, Danish, Japanese.

  • The effective resolution may be less than 200 dpi when an image scanned at 200 dpi or greater resolution is reduced by specifying the reproduction ratio. You can apply the OCR function in such cases, but the text recognition accuracy may deteriorate.

  • Depending on character shapes or types, characters may not be recognized correctly.

  • A PDF file without embedded text is generated if the scanned page does not contain a section that can be recognized as characters.

  • If a page contains large blank areas, the top and bottom of the page may not be recognized correctly.

  • No PDF file is generated if all pages in a document are determined as blank pages. If this happens, make sure to set the originals correctly, and try again.

  • A blank page or the top and bottom of a page may not be recognized correctly if the scanned page has smears or dirty spots or an image on the back side of the page can be seen through.

  • No type faces are identified while the OCR function is being applied to scanning. If the widths of the printed and embedded characters differ, the position of the embedded text may not match that of the printed text on the scanned page.

  • If you specify the OCR Settings and scan multiple sets of originals consecutively, the scan speed may become slower depending on the resolution setting and the sizes of the originals.