How certain are you that those challenges are "genuinely novel" and simply not accounted for in the training data?
I'm hardly an expert, but it seems intuitive to me that even if a problem isn't explicitly accounted for in publicly available training data, many underlying partial solutions to similar problems may be, and an LLM amalgamating that data could very well produce something that appears to be "synthesizing a new thought".
Essentially instead of regurgitating an existing solution, it regurgitates everything around said solution with a thin conceptual lattice holding it together.
No, most of programming is at least implicitly coming up with a human-language description of the problem and solution that isn't full of gaps and errors. LLM users often don't give themselves enough credit for how much thought goes into the prompt - likely because those thoughts are easy for humans! But not necessarily for LLMs.
Sort of related to how you need to specify the level of LLM reasoning not just to control cost, but because the non-reasoning model just goes ahead and answers incorrectly, and the reasoning model will "overreason" on simple problems. Being able to estimate the reasoning-intensiveness of a problem before solving it is a big part of human intelligence (and IIRC is common to all great apes). I don't think LLMs are really able to do this, except via case-by-case RLHF whack-a-mole.
I'm hardly an expert, but it seems intuitive to me that even if a problem isn't explicitly accounted for in publicly available training data, many underlying partial solutions to similar problems may be, and an LLM amalgamating that data could very well produce something that appears to be "synthesizing a new thought".
Essentially instead of regurgitating an existing solution, it regurgitates everything around said solution with a thin conceptual lattice holding it together.