If the branch predictor fails, does it have to take time undoing the steps it's already done? Or are pipelines long enough that the fail would be detected before registers/pointers are updated?
The results for partially-executed instructions aren't "committed" yet, so the pipeline is flushed (thrown away), and execution starts again with an empty pipeline at the corrected address. IIRC, each stage of a pipeline has its own registers. The registers your code knows about would not have been updated until the end of the pipeline, but that branch instruction is ahead of the speculative execution, so none of the speculative results would be stored before the mistake is identified.
So, there's no work required to "undo" those mistakes, but starting from an empty pipeline still means you're a dozen (or two) clock cycles from getting back to where you would have been with a successful branch prediction, which is why it is important for the CPU to predict correctly as often as it can.
Not undoing AFAIK, as it won't have committed the results before the branch resolves. However it will have to throw away all the work it did and start from the correct branch target. This is called a pipeline stall.
Particularly the Pentium 4 suffered due to a long pipeline hence long stall. The successor to the Pentium 4 was an evolved Pentium III core with a much improved branch predictor and larger cache[1] which helped it outperform the Pentium 4[2].