OCR on Linux with Tesseract

June 30, 2024
Linux
Tesseract
OCR

One of the things that I wanted quite a lot on my desktop is OCR. OCR would allow for quick copying of text from images and other areas where copying is not allowed.

This is provided on Windows by the Text Extractor Tool1 within PowerToys. It basically allows to use a shortcut, which provides a screenshot like window to select the area of the screen with text you need. The text is then extracted and copied to the clipboard.

I felt like I could get a similar experience on Linux. I then came across an open source program called tesseract2, which contains an OCR engine and a command line tool.

With a similar approach to the Windows tool, on wayland we can use grim and slurp to take screenshot of an area, then pipe said screenshot to tesseract and finally use wl-copy to copy the extracted text to the clipboard.

For Xorg based window managers, a similar approach can be used with flameshot for screenshot and xclip to copy to clipboard.

Installation

  • Ubuntu
    $ sudo apt install tesseract-ocr libtesseract-dev
    
  • Fedora
    $ sudo dnf install tesseract tesseract-devel
    
  • Arch
    $ sudo pacman -S tesseract
    

Verify the installation by running the following command:

$ tesseract --version

For Wayland based window managers Grim , Slurp and wl-copy also need to be installed

  • Ubuntu
    $ sudo apt install grim slurp wl-clipboard
    
  • Fedora
    $ sudo dnf install grim slurp wl-clipboard
    
  • Arch
    $ sudo pacman -S grim slurp  wl-clipboard
    

For Xorg based window managers Flameshot and xclip also need to be installed

  • Ubuntu
    $ sudo apt install flameshot xclip
    
  • Fedora
    $ sudo dnf install flameshot xclip
    
  • Arch
    $ sudo pacman -S flameshot xclip
    

Usage

On Wayland

Using grim and slurp to take a screenshot of an area and pipe it to tesseract and then copy the extracted text to the clipboard

$ grim -g "$(slurp)" - | tesseract stdin stdout | wl-copy

This can further be bound to a key combination like done below on Hyprland where it is bout to ALT + SHIFT + S

bind = ALT_SHIFT,s,exec,grim -g "$(slurp)" - | tesseract stdin stdout | wl-copy

On Xorg

$ flameshot gui --raw | tesseract stdin stdout | xclip -in -selection clipboard

Footnotes

  1. https://learn.microsoft.com/en-us/windows/powertoys/text-extractor

  2. https://github.com/tesseract-ocr/tesseract