Is listening to TTS (text-to-speech) bad?

December 05, 2022 — Tatsumoto Ren

The robot voice doesn't sound like real Japanese. Particularly, it makes a lot of pitch accent mistakes. Even if you don't count pitch accent, the computer-generated audio is still very bad. You never want to be feeding your brain toxic input.

On the word level, pitch accent data may be wrong, outdated, or there could be multiple accents. When the pitch accent depends on the usage, the algorithm often can't pick the right one.

On the sentence level text-to-speech is even less correct because there are rules that modify pitch accents of words in a sentence. Computers don't necessarily know these rules.

You should always listen to real native audio. For example, instead of generating a text-to-speech audio for a book, download an audiobook. Instead of adding text-to-speech audio to your Anki cards, copy pronunciations from Qolibri, Forvo, or other sources (banks) that provide native audio. Also, just try mining from movies and TV shows more because they have audio built-in.

Tags: faq