As TFA says, the hard part is that "1" and "l" look the same in the selected typeface. Whether your OCR is done by computers or humans, you still have to deal with that problem somehow. You still need to do the part sketched out e.g. by pyrolistical in [1] and implemented by dperfect in [2].
In my amateur opinion, it's almost the opposite. For Plato, the material world, while "real" enough, is less important and in some sense less True than the higher immaterial world of Forms or Ideas. The highest, truest, realest world is "above" this one, related to cognition, and (more or less) accessible by reason. We may be in a cave, but all we have to do is walk up into the sunlight — which, by the way, is nothing but a higher and truer form of light than our current firelight. (This idea that material objects partake of their corresponding higher-level Ideas leads to the Third Man paradox: if it is the Form of Man that compasses similar material instances such as Socrates and Achilles, is there then a third... thing... that compasses Socrates, Achilles, and Man itself?)
For Kant, and therefore for Schopenhauer, the visible world is composed merely of objects, which are by definition only mental representations: a world of objects "exists" only in the mind of a subject. If there is a Thing-in-Itself (which even Kant does not doubt, if I recall correctly), it certainly cannot be a mental representation: the nature of the Thing-in-Itself is unknowable (says Kant) but certainly in no way at all like the mere object that appears to our mental processes. (Schopenhauer says the Thing-in-Itself is composed of pure Will, whatever that means.) The realest world is "behind" or "below" the visible one, completely divorced from human reason, and by definition completely inaccessible to any form of cognition (which includes the sensory perception we share with the animals, as well as the reason that belongs to humans alone). The Third Man paradox makes no sense at all for Kant, first because whatever the ineffable Thing-in-Itself is, it certainly won't literally "partake" of any mental concept we might come up with, and secondly because it would be a category error to suppose that any property could be true of both a mental object and a thing-in-itself, which are nothing alike. (The Thing-in-Itself doesn't even exist in time or space, nor does it have a cause. Time, space, and causality are all purely human frameworks imposed by our cognitive processes: to suppose that space has any real existence simply because you perceive it is, again, a category error, akin to supposing that the world is really yellow-tinged just because you happen to be wearing yellow goggles.)
> The C language is too complicated and too flexible to allow that.
I disagree. In fact, I would expect the following could be a pretty reasonable exercise in a book like "Software Tools"[1]: "Write a program to extract all the function declarations from a C header file that does not contain any macro-preprocessor directives." This requires writing a full C lexer; a parser for function declarations (but for function and struct bodies you can do simple brace-matching); and nothing else. To make this tool useful in production, you must either write a full C preprocessor, or else use a pipeline to compose your tool with `cpp` or `gcc -E`. Which is the better choice?
However, I do see that the actual "Software Tools" book doesn't get as advanced as lexing/parsing; it goes only as far as the tools we today call `grep` and `sed`.
I certainly agree that doing the same for C++ would require a full-blown compiler, because of context-dependent constructs like `decltype(arbitrary-expression)::x < y > (z)`; but there's nothing like that in K&R-era C, or even in C89.
No, I think the only reason such a declaration-extracting tool wasn't disseminated widely at the time (say, the mid-to-late 1970s) is that the cost-benefit ratio wouldn't have been seen as very rewarding. It would automate only half the task of writing a header file: the other and more difficult half is writing the accompanying code comments, which cannot be automated. Also, programmers of that era might be more likely to start with the header file (the interface and documentation), and proceed to the implementation only afterward.
> Write a program to extract all the function declarations from a C header file that does not contain any macro-preprocessor directives
There you go. You just threw away the most difficult part of the problem: the macros. Even a medium-sized C library can have maybe 500 lines of dense macros with ifdef/endif/define which depends on the platform, the CPU architecture, as well as user-configurable options at ./configure time. Should you evaluate the macro ifdefs or preserve them when you extract the header? It depends on each macro!
And your tool would still be highly incomplete because it only handles function declarations not struct definitions, typedefs you expect the users to use.
> the other and more difficult half is writing the accompanying code comments, which cannot be automated
Again disagree. Newer languages have taught us that it is valuable to have two syntaxes for comments, one intended for implementation and one intended for the interface. It’s more popularly known as docstrings but you can just reuse the comment syntax and differentiate between // and /// comments for example. The hypothetical extractor tool will work no differently from a documentation extractor tool.
I interpreted OP's post to say that you take a C file after the preprocessor has translated it. How you perform that preprocessing can simply be by passing the file to an existing C preprocessor, or you can implement it as well.
Implementing a C preprocessor is tedious work, but it's nothing remotely complex in terms of challenging data structures, algorithms, or requiring sophisticated architecture. It's basically just ensuring your preprocessor implements all of the rules, each of which is pretty simple.
And you had the same misunderstanding as OP. Because you have eliminated all macros during the preprocessor step, you can no longer have macro-based APIs, including function-like macros you expect library users to use, #ifdef blocks where you expect user code to either #define or #undef, and a primitive form of maintaining API compatibility but not ABI compatibility for many things.
It’s a cute learning project for a student of computer science for sure. It’s not remotely a useful software engineering tool.
Our points of view are probably not too far off, really. Remember this whole thought-experiment is counterfactual: we're imagining what "automatic extraction of function declarations from a .c file" would have looked like in the K&R era, in response to claims (from 50 years later) that "No sane programming language should require a duplication in order to export something" and "The .h could have been a compiler output." So we're both having to imagine the motivations and design requirements of a hypothetical programmer from the 1970s or 1980s.
I agree that the tool I sketched wouldn't let your .h file contain macros, nor C99 inline functions, nor is it clear how it would distinguish between structs whose definition must be "exported" (like sockaddr_t) and structs where a declaration suffices (like FILE). But:
- Does our hypothetical programmer care about those limitations? Maybe he doesn't write libraries that depend on exporting macros. He (counterfactually) wants this tool; maybe that indicates that his preferences and priorities are different from his contemporaries'.
- C++20 Modules also do not let you export macros. The "toy" tool we can build with 1970s technology happens to be the same in this respect as the C++20 tool we're emulating! A modern programmer might indeed say "That's not a useful software engineering tool, because macros" — but I presume they'd say the exact same thing about C++20 Modules. (And I wouldn't even disagree! I'm just saying that that particular objection does not distinguish this hypothetical 1970s .h-file-generator from the modern C++20 Modules facility.)
[EDIT: Or to put it better, maybe: Someone-not-you might say, "I love Modules! Why couldn't we have had it in the 1970s, by auto-generating .h files?" And my answer is, we could have. (Yes it couldn't have handled macros, but then neither can C++20 Modules.) So why didn't we get it in the 1970s? Not because it would have been physically difficult at all, but rather — I speculate — because for cultural reasons it wasn't wanted.]
> There's one or two people that will hold your code hostage until you reply to every little nit. At that point they don't feel like nits.
If the comment must be addressed before the review is approved, then it is not a nit, it is a blocker (a "changes required"). Blockers should not be marked as nits — nor vice versa.
I agree that prefixing comments with "Nit:" (or vice versa in extreme cases "This is a big one:") is psychologically useful. Yet another reason it's useful is that it's not uncommon for perceived importance to vary over time: you start with "hmm, this could be named blah" and a week later you've convinced yourself it's a blocker — so, force yourself to recognize that it was originally phrased as a nit, and force yourself to come back and say explicitly "I've changed my mind: I think this is important." With or without the "nit/blocker" prefixing pattern, the reviewer may come off as capricious; but with the pattern, he's at least measurably capricious.
That's only for the parallel overload. The ordinary sequential overload doesn't allocate: the only three ordinary STL algorithms that allocate are stable_sort, stable_partition, and (ironically) inplace_merge.
[1] - https://news.ycombinator.com/item?id=46906897
[2] - https://news.ycombinator.com/item?id=46916065
reply