Automatic Speech Recognition in Linux - Seeking Experiences and Recommendations

njordomir@lemmy.world · 2 years ago

Automatic Speech Recognition in Linux - Seeking Experiences and Recommendations

RmDebArc_5@sh.itjust.works · 2 years ago

I use Speech Note for STT/TTS and it works great. You can choose between different models, I use whisper (more accurate) or Vosk (faster). You don’t need a GPU, but it will speed things up greatly

njordomir@lemmy.world · 2 years ago

I was able to quickly set up and use whisper (base) using Speech Note without issue and it saved me over 80% of what I would have had to manually do. Thank you for the recommendation.

just_another_person@lemmy.world · 2 years ago

Depends on what the audio is. What’s the crisis?

Generally, you can use CPU for anything based on pytorch, it will just take substantially longer.

njordomir@lemmy.world · 2 years ago

Transcription of numerous voice mails and phone calls for a legal matter. Would like to supply transcripts with the audio files so we don’t have to pay as much time for the lawyer’s paralegals to review and decide what is actually going to be useful.

just_another_person@lemmy.world · 2 years ago

Start with Whisper as someone else mentioned. DeepSpeech by Mozilla is another simple one.

Both are similar in performance and accuracy for normal spoken conversation with no extra auditory noise.

njordomir@lemmy.world · 2 years ago

Whisper worked for me. I’ll have to go back through and tag speakers and fox a few spots but you guys have saved me 80-90% of the work. Thank you.