Hacker Newsnew | past | comments | ask | show | jobs | submit | Chabsff's commentslogin

TU-level concepts (mostly) dissolve during the linking stage. You need to compile with -c to generate an object file in order to see the distinction.

Also, the difference manifests in the symbols table, not the assembly.


To clarify, I was talking about Compiler Explorer-cleaned disassembly, same as the comment I was replying to.


> and to avoid the warning (error) the code is decorated with compiler pacifiers, which makes no sense!

How is that a bad thing, exactly?

Think of it this way: The pacifiers don't just prevent the warnings. They embed the warnings within the code itself in a way where they are acknowledged by the developer.

Sure, just throwing in compiler pacifiers willy-nilly to squelch the warnings is terrible.

However, making developers explicitly write in the code "Yes, this block of code triggers a warning, and yes it's what I want to do because xyz" seems not only perfectly fine, but straight up desirable. Preventing them from pushing the code to the repo before doing so by enabling warnings-as-errors is a great way to get that done.

The only place where I've seen warnings-as-errors become a huge pain is when dealing with multiple platforms and multiple compilers that have different settings. This was a big issue in Gen7 game dev because getting the PS3's gcc, the Wii's CodeWarrior and the XBox360's MSVC to align on warnings was like herding cats, and not every dev had every devkit for obvious reason. And even then, warnings as errors was still very much worth it in the long run.


IMHO readability is the absolute maximum paramount priority. Having the code interrupted by pacifiers makes the code more difficult to read. The warning is very visible when compiling. Let me argue, much more visible. Why? well, independent if my last change had something directly to do with that piece of code, I will see the warning. If I use some preprocessor magic, I will only see that if I directly work in that part of the code.

Again, IMHO the big problem is people think "warnings are ok, just warnings, can be ignored".

And just as anecdotal point "Sure, just throwing in compiler pacifiers willy-nilly to squelch the warnings is terrible." this is exactly what I have seen in real life, 100% of the time.


Well said.

For some reason people stop thinking when it comes to warnings. Often it is the warning which gets one to rethink and refactor the code properly. If for whatever reason you want to live with the warning, comment the code appropriately, do not squelch blindly.


But how do you distinguish between warnings intended by the author and warnings, that weren't, so they should be fixed?


Please remember we are coming from "set warnings to errors", which I interpret as: I know better than the people doing the compiler. There is a good reason for the two. If not, there could be no warnings at all, all would be an error.

My rationale: if you do set warn->error, then there are 2 ways around it: change the code to eliminate the warning, or pacify the compiler. Note, the measure to set it to error, is to instigate lazy programmers to deal with it. If the lazy person is really lazy, then they will deal with it with a pacifier. You won nothing.

There is no one recipe for everything. That is why, even if I do not like to treat warnings as errors, sometimes may be a possible solution.

I think you should deal with warnings, you should have as few as possible, if any at all. So if you have just a couple, is not a problem to document them clearly. Developers building the project should be informed anyway of many other things.

In some projects I worked, we saw warnings as technical debt. So hiding them with a pacifier would make us forget. But we saw them in every build, so we were reminded constantly, we should rework that code. Again, it depends on the setup you have in the project. I know people now are working with this new trend "ci/cd" and never get to the see the compilation. So depending on the setup one thing or another may be better.


> My rationale: if you do set warn->error, then there are 2 ways around it: change the code to eliminate the warning, or pacify the compiler. Note, the measure to set it to error, is to instigate lazy programmers to deal with it. If the lazy person is really lazy, then they will deal with it with a pacifier. You won nothing.

> You won nothing.

No, you won that you can distinguish between intended and not intended warnings. Specifying in the code, which warnings are expected makes all warnings that the compiler outputs something you want to get fixed. When you do not do that, than it is easy to miss a new warning or that the warning changed. So you essentially say that you should not distinguish between intended and non-intended warnings?

Having no warnings is a worthwhile goal, but often not possible, since you want to be warned for some things, so you need that warning level, but you don't want to be warned about that in a specific line.


> So you essentially say that you should not distinguish between intended and non-intended warnings?

No, I pretty clearly said the opposite. Please read what I wrote:

"[...] is not a problem to document them clearly. Developers building the project should be informed anyway of many other things"

I also stated "warnings, you should have as few as possible, if any at all" in the projects I worked we hardly had any in the final delivery, but we had many in-between, which I find ok. If there are only 2 warnings, I do not see a big risk of not seeing a 3rd. I expect developers to look the compiler output, carefully, as if it was a review from a coworker.

Last but not least you ignore my last paragraph, where I say warnings are typically technical debt. There should be in the long run no "expected" warnings. My whole point is that they are just no error, so you should allow the program to compile and keep working in other things. I do not think is ok to have warnings. Also (specially) I think is a bad idea to silence the compiler.

Anyway, a good compiler will end with a nile lime “N warnings detected“ so there is that. You can just compare an integer to know if there are more warnings… not so difficult, is it?

If you read my comments it should be clear. If not, I cannot help with that. If you want to disagree, as long as you don't work in my code, is ok. This is just my 2ct opinion.


It's not just a pure matter of law, and looking at it from that perspective is naive.

Legacy publishers in general (and a few big ones in particular, like der Spiegel) have been lobbying hard for legislatures to redirect big tech revenue to their failing businesses.

The focus on AI here is really just the continuation of that ongoing fight that has been raging for over a decade now. If it wasn't that, it would be some other wedge.

I'm not saying Google is squeaky-clean here, far from it. However, it's important to keep in mind that the main drive here is to get publishers paid, not to force Google to be accountable to some specific standards.


In the grand scheme of things, we've only had about a quarter century where you needed a *very* specific kind of problem where prosumer hardware wasn't adequate across computer science as a whole.

It's kind of amazing we got that at all for a while.


If you discard the early days of gigantic expensive computers. I guess it's come full circle after a fashion.


There is a HUGE difference in that the combined short length with the fact that the video starts playing before you even have a chance to make a decision on whether to watch it or not leads you to a "heh! I'm here already, might as well just watch the thing".


This is a response to you and the other Y people that confuse short videos with autoplay and user engagement techniques.

There are people that autoplay long videos, in fact people stream random Simpson’s (or other favorite tv show, podcast, music, books on tape, etc) episodes in the background while they work. Classic TV has autoplay with no opportunity to decide. Autoplay is not an exclusive short form video feature. I can make a short video on my computer and it will not autoplay other content.


There's no confusion here. It's pretty easy to make the argument that the combination of auto play and short form is orders of magnitude more problematic than the sum of their parts.


Yes but then we’re not talking about short form video being addictive but rather the hunt for a good short form video is addictive. This same idea can be applied to long form and any other medium you enjoy, finish, and immediately want more of. Now if you have only 30 mins before your next task to watch a long form video then you may skip starting the video, but that doesn’t mean there is anything inherently bad about short form video but rather the tools for viewing it. So yes you are confusing and you’re intentionally confusing the two so that your point stands about short form video, but it doesn’t because your points are about the viewing tools.

If you continue to push this point, people will only think that short videos under 3 minutes are some how the devil and TikTok et al will continue on making whatever length of video is next in line, more addictive.


Honestly, I don't mind the format in principle, and the process that goes from YT's homepage to watching a single one of them is not that bad to me. As long as I get to make a decision that I want to watch something, consciously go "I will click on this thing and watch it" and only then proceed to watch it, then it's _fine_.

It's the algorithmic loop that starts the moment you scroll to the next video that starts playing before you even have a chance to decide whether or not it's something that you want to watch that's abhorrent to me.


If your argument is that the guardrails only provide a false sense of security, and removing them would ultimately be a good thing because it would force people to account for that, that's an interesting conversation to have

But it's clearly not the one at play here.


The guardrails clearly don't help.

A computer can not be held accountable, so who is held accountable?


Agreed.

DEATH handing out swords to kids as Santa in the Hogfather is a funny joke, not an example to follow.


> I feel like LLMs are a fairly boring technology. They are stochastic black boxes. The training is essentially run-of-the-mill statistical inference. There are some more recent innovations on software/hardware-level, but these are not LLM-specific really.

This is pretty ironic, considering the subject matter of that blog post. It's a super-common misconception that's gained very wide popularity due to reactionary (and, imo, rather poor) popular science reporting.

The author parroting that with confidence in a post about Dunner-Krugering gives me a bit of a chuckle.


I also find it hard to get excited about black boxes - imo there's no real meat to the insights they give, only the shell of a "correct" answer


I'm not sure what claim your disputing or making with this.

What more are LLMs than statistical inference machines? I don't know that I'd assert that's all they are with confidence but all the configurations options I can play with during generation (Top K, Top P, Temperature, etc.) are all ways to _not_ select the most likely next token which leads me to believe that they are, in fact, just statistical inference machines.


What more are human brains than piles of wet meat?

It's not an argument - it's a dismissal. It's boneheaded refusal to think on the matter in any depth, or consider any of the implications.

The main reason to say "LLMs are just next token predictions" is to stop thinking about all the inconvenient things. Things like "how the fuck does training on piles of text make machines that can write new short stories" or "why is a big fat pile of matrix multiplications better at solving unseen math problems than I am".


The way I always like to think about it is: "a computer shouldn't be able to do this."

I'm an SWE working in AI-related development so I have a probably higher baseline of understanding than most, but even I end up awed sometimes. For example, I was playing a video game the other night that had an annoying box sliding puzzle in it (you know, where you've got to move a piece to specific area but it's blocked by other pieces that you need to move in some order first). I struggled with it for way too long (because I missed a crucial detail), so for shits and giggles I decided to let ChatGPT have a go at it.

I took a photo of the initial game board on my tv and fed it into the high thinking version with a bit of text describing the desired outcome. ChatGPT was able to process the image and my text and after a few turns generated python code to solve it. It didn't come up with the solution, but that's because of the detail I missed that fundamentally changed the rules.

Anyway, I've been in the tech industry long enough that I have a pretty good idea of what should and shouldn't be possible with programs. It's absolutely wild to me that I was able to use a photo of a game board and like three sentences of text and end up with an accurate conclusion (that it was unsolvable based on the provided rules). There's so much more potential with these things than many people realize.


The fundamental assumption under all of software engineering is: "computers don't think like humans do".

They can process 2 megabytes of C sources, but not 2 sentences of natural language instructions. They find it easy to multiply 10-digit numbers but not to tell a picture of a dog from one of a cat. Computers are inhuman, in a very fundamental way. No natural language understanding, no pattern recognition, no common sense.

Machine learning was working to undermine that old assumption for a long time. But LLMs took a sledgehammer to it. Their capabilities are genuinely closer to "what humans can usually do" than to "what computers can usually do", despite them running on computers. It's a breakthrough.


> What more are human brains than piles of wet meat?

Calculation isn't what makes us special; that's down to things like consciousness, self-awareness and volition.

> The main reason to say "LLMs are just next token predictions" is to stop thinking about all the inconvenient things. Things like...

They do it by iteratively predicting the next token.

Suppose the calculations to do a more detailed analysis were tractable. Why should we expect the result to be any more insightful? It would not make the computer conscious, self-aware or motivated. For the same reason that conventional programs do not.


> They do it by iteratively predicting the next token.

You don't know that. It's how the llm presents, not how it does things. That's what I mean by it being the interface.

There's ever only one word that comes out of your mouth at a time, but we don't conclude that humans only think one word at a time. Who's to say the machine doesn't plan out the full sentence and outputs just the next token?

I don't know either fwiw, and that's my main point. There's a lot to criticize about LLMs and, believe or not, I am a huge detractor of their use in most contexts. But this is a bad criticism of them. And it bugs me a lot because the really important problems with them are broadly ignored by this low-effort, ill-thought-out offhand dismissal.


Have you read the literature? Do you have a background in machine learning or statistics?

Yes. We know that LLMs can be trained by predicting the next token. This is a fact. You can look up the research papers, and open source training code.

I can't work it out, are you advocating a conspiracy theory that these models are trained with some elusive secret and that the researchers are lying to you?

Being trained by predicting one token at a time is also not a criticism??! It is just a factually correct description...


> Have you read the literature? Do you have a background in machine learning or statistics?

Very much so. Decades.

> Being trained by predicting one token at a time is also not a criticism??! It is just a factually correct description...

Of course that's the case. The objection I've had from the very first post in this thread is that using this trivially obvious fact as evidence that LLMs are boring/uninteresting/not AI/whatever is missing the forest for the trees.

"We understand [the I/Os and components of] LLMs, and what they are is nothing special" is the topic at hand. This is reductionist naivete. There is a gulf of complexity, in the formal mathematical sense and reductionism's arch-enemy, that is being handwaved away.

People responding to that with "but they ARE predicting one token at a time" are either falling into the very mistake I'm talking about, or are talking about something else entirety.


Do you have, by chance, a set of benchmarks that could be administered to humans and LLMs both, and used to measure and compare the levels of "consciousness, self-awareness and volition" in them?

Because if not, it's worthless philosophical drivel. If it can't be defined, let alone measured, then it might as well not exist.

What is measurable and does exist: performance on specific tasks.

And the pool of tasks where humans confidently outperform LLMs is both finite and ever diminishing. That doesn't bode well for human intelligence being unique or exceptional in any way.


> Because if not, it's worthless philosophical drivel.

The feeling is mutual:

> ... that doesn't bode well for human intelligence being unique or exceptional in any way.

My guess was that you argued that we "don't understand" these systems, or that our incomplete analysis matters, specifically to justify the possibility that they are in whatever sense "intelligent". And now you are making that explicit.

If you think that intelligence is well-defined enough, and the definition agreed-upon enough, to argue along these lines, the sophistry is yours.

> If it can't be defined, let alone measured

In fact, we can measure things (like "intelligence") without being able to define them. We can generally agree that a person of higher IQ has been measured to be more intelligent than a person of lower IQ, even without agreeing on what was actually measured. Measurement can be indirect; we only need accept that performance on tasks on an IQ test correlates with intelligence, not necessarily that the tasks demonstrate or represent intelligence.

And similarly, based on our individual understanding of the concept of "intelligence", we may conclude that IQ test results may not be probative in specific cases, or that administering such a test is inappropriate in specific cases.


Well, you could do the funny thing, and try to measure the IQ of an LLM using human IQ tests.

Frontier models usually get somewhere between 90 and 125, including on unseen tasks. Massive error bars. The performance of frontier models keeps rising, in line with other benchmarks.

And, for all the obvious issues with the method? It's less of a worthless thing to do than claiming "LLMs don't have consciousness, self-awareness and volition, and no, not gonna give definitions, not gonna give tests, they just don't have that".


I mean, yeah, statistics works. It's not that surprising that super amazing statistical modelling can approximate a distribution. Of course, thoughts, words, arguments are distributions, and with a powerful enough model you can simulate them.

None of this is surprising? Like, I think you just lack a good statistical intuition. The amazing thing is that we have these extremely capable models, and methods to learn them. That process is an active area of research (as is much of statistics), but it is just all statistics...


How is that a misconception? LLMs are just advanced statistical modelling (unsupervised machine learning) with small tweaks (e.g., some fine-tuning for human preference).

At the core, they are just statistical modelling. The fact that statistical modelling can produce coherent thoughts is impressive (and basically vindicates materialism) but that doesn't change the fact it is all based on statistical modelling. ...? What is your view?


What's the misconception? LLMs are probabilistic next-token prediction based on current context, right?


Yeah, but that's their interface. That informs surprisingly little about their inner workings.

ANNs are arbitrary function approximators. The training process uses statistical methods to identify a set of parameters that approximate the function as best as possible. That doesn't necessarily mean that the end result is equivalent to a very fancy multi-stage linear regression. It's a possible outcome of the process, but it's not the only possible outcome.

Looking at a LLMs I/O structure and training process is not enough to conclude much of anything. And that's the misconception.


> Yeah, but that's their interface. That informs surprisingly little about their inner workings.

I'm not sure I follow. LLMs are probabilistic next-token prediction based on current context, that is a factual, foundational statement about the technology that runs all LLMs today.

We can ascribe other things to that, such as reasoning or knowledge or agency, but that doesn't change how they work. Their fundamental architecture is well understood, even if we allow for the idea that maybe there are some emergent behaviors that we haven't described completely.

> It's a possible outcome of the process, but it's not the only possible outcome.

Again, you can ascribe these other things to it, but to say that these external descriptions of outputs call into question the architecture that runs these LLMs is a strange thing to say.

> Looking at a LLMs I/O structure and training process is not enough to conclude much of anything. And that's the misconception.

I don't see how that's a misconception. We evaluate all pretty much everything by inputs and outputs. And we use those to infer internal state. Because that's all we're capable of in the real world.


Then why not say "they are just computer programs"?

I think the reason people don't say that is because they want to say "I already understand what they are, and I'm not impressed and it's nothing new". But what the comment you are replying to is saying is that the inner workings are the important innovative stuff.


> Then why not say "they are just computer programs"?

LLMs are probabilistic or non-deterministic computer programs, plenty of people say this. That is not much different than saying "LLMs are probabilistic next-token prediction based on current context".

> I think the reason people don't say that is because they want to say "I already understand what they are, and I'm not impressed and it's nothing new". But what the comment you are replying to is saying is that the inner workings are the important innovative stuff.

But we already know the inner workings. It's transformers, embeddings, and math at a scale that we couldn't do before 2015. We already had multi-layer perceptrons with backpropagation and recurrent neural networks and markov chains before this, but the hardware to do this kind of contextual next-token prediction simply didn't exist at those times.

I understand that it feels like there's a lot going on with these chatbots, but half of the illusion of chatbots isn't even the LLM, it's the context management that is exceptionally mundane compared to the LLM itself. These things are combined with a carefully crafted UX to deliberately convey the impression that you're talking to a human. But in the end, it is just a program and it's just doing context management and token prediction that happens to align (most of the time) with human expectations because it was designed to do so.

The two of you seem to be implying there's something spooky or mysterious happening with LLMs that goes beyond our comprehension of them, but I'm not seeing the components of your argument for this.


> But we already know the inner workings.

Overconfident and wrong.

No one understands how an LLM works. Some people just delude themselves into thinking that they do.

Saying "I know how LLMs work because I read a paper about transformer architecture" is about as delusional as saying "I read a paper about transistors, and now I understand how Ryzen 9800X3D works". Maybe more so.

It takes actual reverse engineering work to figure out how LLMs can do small bits and tiny slivers of what they do. And here you are - claiming that we actually already know everything there is to know about them.


I never claimed we already know everything about LLMs. Knowing "everything about" anything these days is impossible given the complexity of our technology. Even antennae, a centuries old technology, is something we're still innovating on and don't completely understand in all domains.

But that's a categorically different statement than "no one understands how an LLM works", because we absolutely do.

You're spending a lot of time describing whether we know or don't know LLMs, but you're not talking at all about what it is that you think we do or do not understand. Instead of describing what you think the state of the knowledge is about LLMs, can you talk about what it is that you think that is unknown or not understood?


I think the person you are responding to is using a strange definition of "know."

I think they mean "do we understand how they process information to produce their outputs" (i.e., do we have an analytical description of the function they are trying to approximate).

You and I mean, we understand the training process that produces their behaviour (and this training process is mainly standard statistical modelling / ML).

In short, both sides are talking past each other.


I agree. The two of us are talking past each other, and I wonder if it's because there's a certain strain of thought around LLMs that believes that epistemological questions and technology that we don't fully understand are somehow unique to computer science problems.

Questions about the nature of knowledge (epistemology and other philosophical/cognitive studies) in humans are still unsolved to this day, and frankly may never be fully understood. I'm not saying this makes LLM automatically similar to human intelligence, but there are plenty of behaviors, instincts, and knowledge across many kinds of objects that we don't fully understand the origin of. LLMs aren't qualitatively different in this way.

There are many technologies that we used that we didn't fully understand at the time, even iterating and improving on those designs without having a strong theory behind them. Only later did we develop the theoretical frameworks that explain how those things work. Much like we're now researching the underpinnings of how LLMs work to develop more robust theories around them.

I'm genuinely trying to engage in a conversation and understand where this person is coming from and what they think is so unique about this moment and this technology. I understand the technological feat and I think it's a huge step forward, but I don't understand the mysticism that has emerged around it.


> Saying "I know how LLMs work because I read a paper about transformer architecture" is about as delusional as saying "I read a paper about transistors, and now I understand how Ryzen 9800X3D works". Maybe more so.

Which is to say, not delusional at all.

Or else we have to accept that basically hardly anyone "understands" anything. You set an unrealistic standard.

Beginners play abstract board games terribly. We don't say that this means they "don't understand" the game until they become experts; nor do we say that the experts "haven't understood" the game because it isn't strongly solved. Knowing the rules, consistently making legal moves and perhaps having some basic tactical ideas is generally considered sufficient.

Similarly, people who took the SICP course and didn't emerge thoroughly confused can reasonably be said to "understand how to program". They don't have to create MLOC-sized systems to prove it.

> It takes actual reverse engineering work to figure out how LLMs can do small bits and tiny slivers of what they do. And here you are - claiming that we actually already know everything there is to know about them.

No; it's a dismissal of the relevance of doing more detailed analysis, specifically to the question of what "understanding" entails.

The fact that a large pile of "transformers" is capable of producing the results we see now, may be surprising; and we may lack the mental resources needed to trace through a given calculation and ascribe aspects of the result to specific outputs from specific parts of the computation. But that just means it's a massive computation. It doesn't fundamentally change how that computation works, and doesn't negate the "understanding" thereof.


Understanding a transistor is an incredibly small part of how Ryzen 9800X3D does what it does.

Is it a foundational part? Yes. But if you have it and nothing else, that adds up to knowing almost nothing about how the whole CPU works. And you could come to understand much more than that without ever learning what a "transistor" even is.

Understanding low level foundations does not automatically confer the understanding of high level behaviors! I wish I could make THAT into a nail, and drive it into people's skulls, because I keep seeing people who INSIST on making this mistake over and over and over and over and over again.


My entire point here is that one can, in fact, reasonably claim to "understand" a system without being able to model its high level behaviors. It's not a mistake; it's disagreeing with you about what the word "understand" means.


For the sake of this conversation "understanding" implicitly means "understand enough about it to be unimpressed".

This is what's being challenged: That you can discount LLMs as uninteresting because they are "just" probalistic inference machines. This completely underestimates just how far you can push the concept.

Your pedantic definition of understand might be technically correct. But that's not what's being discussed.

That is, unless you assign metaphysical properties to the notion of intelligence. But the current consensus is that intelligence can be simulated, at least in principle.


I'm not sure what you mean?

Saying we understand the training process of LLMs does not mean that LLMs are not super impressive. They are shining testiments to the power of statistical modelling / machine learning. Arbitrarily reclassifying them as something else is not useful. It is simply untrue.

There is nothing wrong with being impressed by statistics... You seem to be saying that statistics is interesting and there for to say that LLMs are statistics dismissed them. I think perhaps you are just implicitly biased against statistics! :p


Is understanding a system not implicitly saying you know how, on a high level, it works?

You'd have to know a lot about transformer architecture and some reasonable LLM specific stuff to do this beyond just those basics listed earlier.

When it's not just a black box but you can say something meaningful to approximate its high level behavior is where I'd put understand. Transistors won't get you to CPU archiecture and transformers don't get you to LLMs.


There is so much complexity in interactions of systems that is easy to miss.

Saying that one can understand a modern CPU by understanding how a transistor works is kinda akin to saying you can understand the operation of a country by understanding a human from it. It's a necessary step, probably, but definitely not sufficient.

It also reminds me of a pet peeve in software development where it's tempting to think you understand the system from the unit tests of each component, while all the interesting stuff happens when different components interact with each other in novel ways.


What do you mean? what do you think statistical modelling is?

I am very confused by your stance.

The aim of the function approximation is to maximize the likelihood of the observed data (this is standard statistical modelling), using machine learning (e.g., stochastic gradient decent) on a class of universal function approximators is a standard approach to fitting such a model.

What do you think statistical modelling involves?


Not by virtue of that alone.

A choice of tech stack can never be enough to prove anything. It only establishes a lower bound on resource usage, but there is never and upper bound as long as while() and malloc() are available.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: