OCRmyPDF 6.2.2

OCRmyPDF adds an inisible text layer to PDF documents after passing it through the Tesseract OCR engine. The output will be PDF/A with a selectable but invisible text layer above scanned image-documents. This allows later searching and archiving.

Tags pdf ocr scanning
License GNU GPLv3
State stable

Recent Releases

6.2.215 Jul 2018 03:19 minor feature: Ignore masks when deciding what color to rasterize at . Backport Python 3.7 for ruffus 2.7.0 from ocrmypdf v7.0.0. . Cherrypick Python 3.7 documentation updates from v7.0.0. . a comment about Tesseract behavior in certain versions. . Cherrypick warning about --user-words not having any effect. . main: do better parameter validation. . Tests: Add ability to disable use of cache. . Tests: Speed up a slow test (cherry-picked from v7). . Travis: modernize with v7.0.0 updates. . problem iterating ruffus exceptions and rotate-pages-threshold pa . . ocrmypdf.exec: trap FileNotFoundError too. . Skip locale check on Python 3.7. . Update release notes for v6.2.2. . Travis: v6 build failures. . Travis: nevermind xenial, then.
7.0.011 Jul 2018 07:25 minor feature: Ignore masks when deciding what color to rasterize at . Remove gpg. . Add wiki link to template. . recent versions of tesseract not registering as textonly_pdf. . v6.2.1 release notes. . Use qpdf 8.0.2 backport, force old pytest-timeout to build. . Merge branch 'test/ignore-masks'. . : doesn't work when installed in non-Unicode path. . path error on Py3.5. . Remove dependency on private fork of ruffus, change to official 2.7. . Remove ruffus 2.6.3 exception special casing. . Update release notes. . Update readme. . Declare certain APIs public. . typo introduced in. . Merge branch 'develop' (7.0.0) into master.
6.2.125 Jun 2018 05:25 minor feature: Remove gpg . Add wiki link to template. . recent versions of tesseract not registering as textonly_pdf. . v6.2.1 release notes. . Use qpdf 8.0.2 backport, force old pytest-timeout to build.
7.0.0rc107 Jun 2018 17:45 minor feature: Use python-xmp-toolkit for xmp check . Optimize: use tempdir for cmdline invocation. . Suppress some spurious tesseract errors. . Optimize: error in Py3.5.
6.2.007 May 2018 16:45 minor feature: Use more standard __version__ rather than PILLOW_VERSION . Add support for PDF/A-3. . helpers: missing call to complain(). . Don't suppress error message from config_notfound. . helpers.py again. . Add gpg key to template. . test_pageinfo: remove duplicate import. . --remove-background error on PDFs with colormapped images. . Expand size growth reasons to other arguments that trigger transcoding. . Update Dockerfile for Ubuntu 18.04. . Add 18.04 update procedure. . XMP validation with /CreationDate. . Merge branch 'feature/pdfa3'. . v6.2.0 Release notes. . v6.2.0. failure to prevent use of Ghostscript on /UserUnit files. . Trap PDF/A-3 errors on old Ghostscript.
6.1.503 May 2018 22:00 minor feature:
3.014 Sep 2015 17:45 minor feature: bump to v3.0 and move repos. Test case: No longer using JHOVE. Move to my repo: github.com/fritz-hh = jbarlow83.
3.0-rc931 Aug 2015 01:45 minor feature: Throw exception if iccprofiles not found instead of returning None. unpaper: support paletted files by conversion instead of bailing. Use png256 raster device when possible. Prevent running validation on missing file after an exception is thrown. Add test cases for additional image formats. ghostscript: quiet startup on rasterize. Bump version to -rc9.