When we read manga, sometimes there's a need to quickly OCR a portion of the screen to look up new words and add sentences to Anki. To do so, you're going to use an optical character recognition program and a few helper tools.
Install the following dependencies:
$ sudo pacman -S --needed sxiv maim tesseract xclip imagemagick unzip
is an excellent image viewer.
For this setup you can replace it with any image viewer, but
sxivis what I use.
- tesseract is the OCR engine. It is considered fairly accurate, and many people like it.
- maim is a utility for taking screenshots which can take parts of the screen.
- xclip is a tool for copying text to clipboard.
- imagemagick is a command-line image editor. It's going to come handy to edit the screenshots before Tesseract analyzes them.
- unzip is a tool for extracting zip archives.
and save it as
maimocr is a script we are going to use to recognize Japanese text.
Make the file executable:
$ chmod +x ~/.local/bin/maimocr
~/.local/bin should be in your
Bind this script to any key in your DE, WM, sxhkd, xbindkeysrc, etc. Here's an example for i3wm:
bindsym $mod+o exec --no-startup-id maimocr
Now you can quickly call
maimocr anywhere by pressing the keyboard shortcut.
Tesseract doesn't work without
trained data files.
These files tell Tesseract how to read and recognize text from images.
When you first run
maimocr, it should download Japanese data files automatically.
Check the terminal output to see if the process succeeds.
When you run it the second time,
maimocr will ask you to select an area with Japanese text and try to OCR it.
The resulting text will be saved to the system clipboard.
Use it in combination with Yomichan Search
to quickly lookup Japanese words in real-time.
To open Yomichan Search, open your Web Browser and press
Alt+Insert. Yomichan should be already installed.
To use additional data files with
copy any new
*.traineddata files to
These instructions are no longer necessary. The files are included by default.
We won't need the program itself because it's garbage
but the trained data files are going to be useful.
Extract the contents of the
tessdata folder to
$ unzip -j Capture2Text_v*_64bit.zip 'Capture2Text/tessdata/*' -d ~/.local/share/tessdata
Alternatively, download just the Capture2Text Japanese files from here.
Contents of the ZIP archive.
If you notice that the script fails to OCR certain images, try to zoom in or find a scan with a better resolution. Tesseract works poorly at low resolutions.
Nonstandard fonts often fail to OCR properly.
In this case I don't have a definitive answer at the moment.
Try searching for more
*.traineddata files online
and adding them to the
If you want to add a screenshot from a manga to your Anki card,
maim can do that too.
is a script that uses
maim to screenshot parts of the screen and copy them to the clipboard.
Install it to the same location as
maimocr, make it executable and bind it to a key.
Note: ames is another program that can add screenshots to Anki.
- kanjitomo. It's quite bloated and forces you to use a Japanese to English dictionary instead of a Japanese to Japanese one.
- manga-ocr. Can be used to OCR Japanese text instead of Tesseract. Unfortunately, I haven't been able to install it and can't comment on it.