Debugging experience aside, I found that "-O3" is generally worth it if you also set "-march=native". For example, here are some run times for computing SHA256, you can see that there is slightly more to be gained going from -O2 to -O3 with -march=native:
This is basically SHA256 over ~8GB of data, averaged over 5 runs. The numbers are rather crude here since I measured them just now, but I remember it was more significant when I first did it last month for https://news.ycombinator.com/item?id=40687942
Yeah -march=native is amazing. I use it when compiling & benchmarking rust code.
But - to anyone reading this later - please don’t do this blindly. You probably never want to distribute binaries with this flag set. It enables all the features available on the host CPU. So your build will change depending on the physical cpu you have installed. If you have a modern amd cpu, it may enable avx512 extensions and make your binary unusable on many Intel CPUs.