Can anyone explain why this requires a relatively high-end GPU? Looking at the slo-mo GIFs, it looks like `brightness *= SomeLUT[(y + t) % sizeOfTheLUT]` for each colour channel would do the trick.
You need to keep the GPU free to work on the game; doing CRT simulation at 60fps at 480Hz requires brand 8 new frames per videogame frame, and it's doing a bunch of math operations per subpixel per refresh cycle. If you run it at full resolution 2560x1440x480x3, that's a lot of processing.
Especially since it also uses a variable-MPRT algorithm that cascades brightest pixels to subsequent refresh cycles;
That's why it's coming to RetroArch and best to process the low-resolution framebuffers first, before scaling and sending through CRT filters/simulated curvatures/etc.
What makes it so complicated?