> Please leave recursion to math and keep it out of (in particular C) software: it kills and will kill again.
This is just nonsense. The issue is doing an unbounded amount of resource consuming work. Don't do an unbounded amount of resource consuming work, regardless of whether that work is expressed in a recursive or iterative form.
Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation. And any tail recursive program can be transformed into a loop (a trampoline). It's really not the language construct that is the issue here.
> Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation.
You know, I got spoiled by Haskell, doing recursion everywhere without a care, and all I had to think was the evaluation order (when things blew up). Now that I'm doing some OCaml, I have to stop and think "am I writing a tail recursive function". It's easy to write multiple levels of recursion and lose track if you're writing a tail recursive function that the compiler will optimize.
I think recursions are really easy to make unbounded by mistake. Maybe not so much as for loops and off by ones.
Agreed, and this is why some languages have annotations that ask the compiler to check a function is indeed tail recursive.
However I don't think that is the case in Expat. If the algorithm is tail recursive, but accidentally not expressed in that way, its a simple (but perhaps tedious to manually apply) program transform to express it in a way that does not consume unbounded memory (heap or stack). From the scant details on the fix in the article it appears the parsing algorithm itself was completely changed. ("The third variant "Parameter entities" reuses ... the same mechanism of delayed interpretation.") If this is the case the issue is not recursion, as the original parsing algorithm would consume unbounded resources no matter how it was expressed in C.
I'm aware of the tailcall annotation but I didn't have to rely on it yet. For me the benefit of picking up OCaml is that I can do imperative constructs on a first pass (mutation and for loops), and refactor it after to pure code, when needed.
> Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation. And any tail recursive program can be transformed into a loop (a trampoline)
A computer can chug through millions of loop iterations. I don't think tail recursion will ever result in heap exhaustion. But a stack overflow can be caused with tens of thousands of nested calls, which is tiny in comparison.
Recursion based stack overflow issues are easier to exploit because modern computer's stack space is much smaller relative to other resources.
>Sounds like a design issue in particular language implementations rather than a problem with the programming technique.
You say this like these are fundamentally separable things? I find this comment deeply confusing. Every single real software stack ever has a layer where the sausage gets made so to speak.
Stack overflow should not be a vulnerability for any modern tool chain. As to resource limits, LLVM has supported segmented stacks for something like a decade or maybe longer. Recursion is absolutely not the problem here. Outdated programming practices are.
In the general case? The failure to compile with -fsplit-stack when that's necessary for whatever your requirements are. The failure to enable the stack protector when ... pretty much always.
For this particular CVE? I'm not clear. Possibly none. The writeup didn't provide sufficient detail and I haven't bothered to wade through the code. There may well be a reason recursion won't work here but it certainly isn't general.
I'd be curious to know in this case why resource limits couldn't be enforced for the recursive implementation but could be for the iterative one.
> And it gets more complex when there's a bunch of state that requires you to save the stack frame in your explicit stack.
You are incorrect about tail recursion (look at continuation-passing style; it transforms any program to tail recursive form) but you are getting at the core of the issue here. The issue is either:
* whatever the parser is doing is already tail recursive, but not expressed in C in a way that doesn't consume stack. In this case it's trivial, but perhaps tedious, to convert it to a form that doesn't use unbounded resources.
* whatever the parser is doing uses the intermediate results of the recursion, and hence is not trivially tail recursive. In this case any reformulation of the algorithm using different language constructs will continue to use unbounded resources, just heap instead of stack.
In other words, you can shift memory usage from the stack to the heap, but if a client can give you an unbounded amount of memory consuming work you are screwed either way. Ultimately the problem has nothing to do with recursion.
CPS transforms all programs into tail calls, not tail recursive. It buys you TCE in trivially tail recursive functions but does not magically transform something that needs to preserve state across recursion into something that is stateless.
But you're right, that's what I'm getting at. Recursion is an inherent semantic and whether you make the stack explicit or implicit doesn't get rid of memory allocation problems. I said in another comment that stack allocation is still dynamic memory management, it's just something many people take for granted.
>Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation. And any tail recursive program can be transformed into a loop (a trampoline). It's really not the language construct that is the issue here.
Not every recursion can be transformed into tail recursion. If it's not simply tail recursion (which is the high level language compiler way of implementing the old tried and true assembly language optimization technique of making the last function call be a JMP instead of a JSR followed by an RTS, that I learned from WOZ's Apple ][ 6502 monitor ROM code), you still have to manage the state somewhere. There ain't no such thing as a free lunch.
And if not on the C stack, then it has to be somewhere else, like another stack or linked list, and then you still have to apply your own limits to keep it from being unlimited, so you're back where you started (but in a Sisyphusly tail call recursive way, you still have to solve the problem again).
>Greenspun's tenth rule of programming "Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."
Of course you can always just pass a recursion depth limit parameter down the recursion and stop when it expires, even letting the caller decide how deep a recursion they want to permit, which is easier and safer than using malloc or cons or std::vector to roll your own stack.
Or the compiler could even do that for you, but you still have to figure out how to gracefully bottom out...
So if "Exception Handling" is hopefully failing upward, then "Depthception Handling" is hopelessly failing downward!
Depthception Handling is the vertical reflection of Exception Handling, with "implicit depth" recursion depth passing, like C++'s "implicit this" but deeper, not to be confused with the orthogonal "continuation passing" / "halt passing" dichotomy (passing HALT and CATCHFIRE instructions around but promising to never call them except in fatal emergencies):
"try" / "catch" = An active attempt to solve things.
"giveup" / "bottom" = Passive resignation to inevitability.
"throw" is an emergency exit UP.
"drop" is an emergency slide DOWN.
"try" => "giveup" introduces a futile attempt at unbounded recursion.
"catch" => "bottom" declares a bottoming out handler after a function signature, to guard its body below.
"throw" => "drop" immediately bottoms out from any depth!
fn doomed_recursion()
bottom { println!("Too deep, I give up."); }
{
giveup {
if going_too_deep() {
drop;
}
doomed_recursion();
}
}
And you can even bind the depth for explicit depth monitoring and control:
> Not every recursion can be transformed into tail recursion. If it's not simply tail recursion [...] you still have to manage the state somewhere. There ain't no such thing as a free lunch.
Ok, every recursion in a language that supports programmer-managed state can be transformed into a tail recursion. That still means any recursion in all serious programming languages and most joke programming languages too.
No, still false. Anything that recurses more than once cannot. A Fibonacci series, done by recursion, cannot be done by tail recursion (without memoization). A binary tree walker that visits all the notes (as opposed to searching for one node) cannot be done with tail recursion. And so on.
And creating continuations for recursion requires memory allocation. How recursive! You can't just recursively move the memory allocation problem around and call it solved or somebody else's problem.
At least start with a rigorous, well-specified, robust, high-performance, complete implementation of Common Lisp, if you're going to recurse infinitely.
I agree that algorithms that run in bounded stack space can still consume arbitrary amounts of memory, and that is the central flaw in the article that I'm responding to.
The issue is either:
* whatever the parser is doing is already trivially tail recursive, but not expressed in C in a way that doesn't consume stack. In this case it's trivial, but perhaps tedious, to convert it to a form that doesn't use unbounded resources.
* whatever the parser is doing uses the intermediate results of the recursion, and hence is not trivially tail recursive. In this case any reformulation of the algorithm using different language constructs will continue to use unbounded resources, just heap instead of stack.
> Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation.
Can't all recursive functions be transformed to stack-based algorithms? And then, generally, isn't there a flatter resource consumption curve for stack-based algorithms, unless the language supports tail recursion?
E.g. Python has sys.getrecursionlimit() == 1000 by default, RecursionError, collections.deque; and "Performance of the Python 3.14 tail-call interpreter":
https://news.ycombinator.com/item?id=43317592
> Can't all recursive functions be transformed to stack-based algorithms?
Yes, you can explicitly manage the stack. You still consume memory at the same rate, but now it is a heap-allocated stack that is programmatically controlled, instead of a stack-allocated stack that is automatically controlled.
The issue is either:
* whatever the parser is doing is already tail recursive, but not expressed in C in a way that doesn't consume stack. In this case it's trivial, but perhaps tedious, to convert it to a form that doesn't use unbounded resources.
* whatever the parser is doing uses the intermediate results of the recursion, and hence is not trivially tail recursive. In this case any reformulation of the algorithm using different language constructs will continue to use unbounded resources, just heap instead of stack.
It's not clear how the issue was fixed. The only description in the blog post is "It used the same mechanism of delayed interpretation" which suggests they didn't translate the existing algorithm to a different form, but changed the algorithm itself to a different algorithm. (It reads like they went from eager to lazy evaluation.)
Either way, it's not recursion itself that is the problem.
> You still consume memory at the same rate, but now it is a heap-allocated stack that is programmatically controlled, instead of a stack-allocated stack that is automatically controlled
To be precise; with precision:
In C (and IIUC now Python, too), a stack frame is created for each function call (unless tail call optimization is applied by the compiler).
To avoid creating an unnecessary stack frame for each recursive function call, instead create a collections.deque and add traversed nodes to the beginning or end enqueued for processing in a loop within one non-recursive function.
Is Tail Call Optimization faster than collections.deque in a loop in one function scope stack frame?
Yes, that is basically it. A tail call should be the same jump instruction as a loop, but performance really depends on the language implementation and is hard to make general statements about.
This is just nonsense. The issue is doing an unbounded amount of resource consuming work. Don't do an unbounded amount of resource consuming work, regardless of whether that work is expressed in a recursive or iterative form.
Any recursive function can be transformed into a tail recursive form, exchanging stack allocation for heap allocation. And any tail recursive program can be transformed into a loop (a trampoline). It's really not the language construct that is the issue here.