You might not care about the implementation for a single call that only hits the happy path, but as soon as you are making more than one call or having to deal with failures the implementation definitely matters. And I think that FP makes it easier to build composable abstractions on top of the underlying code.
While in theory composability and encapsulation are orthogonal concerns when using OOP, in practice (for at least Java and C#) I find that there's often tension between the two.
VHDL/Verilog does but you're not going to write many apps in that.
At a high level:
1. Brute force with automated tests. Great if you have known datasets and platforms.
2. Work from most constrained hardware first. Easy to say, hard to do. Back in the X360/PS3 days almost everyone screwed this up and developed for X360 first.
3. If you need to do N of the same things fast, use an contiguous array. If you want to enforce that make the array part of your API. CPU prefetchers are amazing and love predictable memory patterns.
4. Rust is one of the few languages that bakes semantics into the language that line up well with modern architectures. Specifically Rust can automatically apply restrict semantics. It also forces you to think about ownership upfront in a way that tends to be performance friendly.
While in theory composability and encapsulation are orthogonal concerns when using OOP, in practice (for at least Java and C#) I find that there's often tension between the two.