What I don't like about LLMs is that people keep re-inventing the wheel over and over. For example, we've been able to control browsers using GPT for about 2 years now:
none of these have stuck right. And none of them work well enough that all web dev agencies no longer have to worry about e2e testing. (or do some of them? Maybe the market is simply that inefficient).
I don't see this being a solution for full e2e regression testing. Having to run inference for each command/test seems expensive. I do think there's room for self-healing tests after failure.
If this works well enough couldn't you save the selectors and only use inference running the test for the first time and when the UI has changed. Cheaper than dev?
that would be a good feature. though there are already playback and record tools to do just that. I really only see this for low-code users who want to automate a novel task.
- https://github.com/mayt/BrowserGPT
- https://github.com/TaxyAI/browser-extension
- https://github.com/browser-use/browser-use
- https://github.com/Skyvern-AI/skyvern
- https://github.com/m1guelpf/browser-agent
- https://github.com/richardyc/Chrome-GPT
- https://github.com/handrew/browserpilot
- https://github.com/ishan0102/vimGPT
- https://github.com/Jiayi-Pan/GPT-V-on-Web