Well, reducing the share number of branches in your code has exactly the same impact as improving the predictor's performance. So, no I don't think you can get this conclusion from the article.
The article is only biased into improving the CPU and ignoring any opportunity of improving your program. What is understandable, because it's about CPU architectures.
In practice, you will want to improve whatever part you have the freedom to change. You usually can not choose.
That said, I have seen people tricked into thinking branches are only if statements. If using polymorphism to reduce if statements, you'll still want to reduce the variations of the types sent through the code. Well, strictly you just want the variations to be predictable, I suppose?
I've been wondering about this recently. Modern branch predictors are really good at predicting correlated conditional branches. That is, if you have two if statements near each other that use the same condition, modern branch predictors have a very good chance of learning to predict the second one perfectly based on what the first one did.
Is the same true for indirect calls, i.e. virtual function calls? That could be quite the powerful optimization, but it's probably really hard to do.
My understanding is that this is, essentially, the optimization that you get with many common "entity component system" frameworks. That is, you typically try to keep homogeneous collections of entities so that when you are processing them, you are doing a similar processing in a loop.
Things are getting way too high-level here. Nobody will be able to answer your question because it depends on an overwhelming amount of details.
Your compiler capabilities will have a much larger influence on your code performance than the lower level properties of your GPU. As always, it's a matter of profiling and discovering what is actually happening. Most times when people use polymorphism, it's free.
Agreed it is tough to answer. Highly polymorphic code is often not that diverse, for one. And many languages that encourage polymorphism are run with a jit. Which can be seen as a higher level branch predictor. And, predictably, benefits code paths that don't have a lot of diversity in the types being sent through code.
The article is only biased into improving the CPU and ignoring any opportunity of improving your program. What is understandable, because it's about CPU architectures.
In practice, you will want to improve whatever part you have the freedom to change. You usually can not choose.