Have you tried using the Accessibility API instead of (or alongside) taking screenshots? It wont work with all apps but you can fall back to OCR when it doesn’t and best of all you can monitor the “DOM” for changes.
Candidly, I don't know how to do this effectively, especially with browsers. I looked into this approach using the notification pattern, but I just couldn't see a good way to do it. I'm no expert in Mac APIs and would love to learn and / or see any specific approaches you have in mind!