I made something pretty similar over winter break so I could have something read books to me. ... Then it turned into a prompting mechanism of course! It uses Whisper, Ollama, and TTS from CoquiAI. It's written in shell and should hopefully be "Posix-compliant", but it does use zenity from Ubuntu; not sure how widely used zenity is.
https://github.com/jcmccormick/runtts