“When the Waymo vehicle encounters a particular situation on the road, the autonomous driver can reach out to a human fleet response agent for additional information to contextualize its environment,” the post reads. “The Waymo Driver [software] does not rely solely on the inputs it receives from the fleet response agent and it is in control of the vehicle at all times.” [from Waymo's own blog https://waymo.com/blog/2024/05/fleet-response/]
In my opinion there's nothing wrong with it per se, but (a) it's still worth mentioning, because most people have the impression that Waymo cars are completely unassisted, and (b) it makes me wonder how feasible Waymo's operations would be if it weren't for global income inequality.
To say they LLMs are 'predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.' is not entirely accurate. Classic LLMs like GPT 3 , sure. But LLM-powered chatbots (ChatGPT, Claude - which is what this article is really about) go through much more than just predict-next-token training (RLHF, presumably now reasoning training, who knows what else).
> go through much more than just predict-next-token training (RLHF, presumably now reasoning training, who knows what else).
Yep, but...
> To say they LLMs are 'predictive text models trained to match patterns in their data, statistical algorithms, not brains, not systems with “psychology” in any human sense.' is not entirely accurate.
That's a logical leap, and you'd need to bridge the gap between "more than next-token prediction" to similarity to wetware brains and "systems with psychology".
This blog post is full of bizarre statements and the author seems almost entirely ignorant of the history or present of AI. I think it's fair to argue there may be an AI bubble that will burst, but this blog post is plainly wrong in many ways.
Here's a few clarifications (sorry this is so long...):
"I should explain for anyone who hasn't heard that term [AI winter]... there was much hope, as there is now, but ultimately the technology stagnated. "
The term AI winter typically refers to a period of reduced funding for AI research/development, not the technology stagnating (the technology failing to deliver on expectations was the cause of the AI winter, not the definition of AI winter).
"[When GPT3 came out, pre-ChatGPT] People were saying that this meant that the AI winter was over, and a new era was beginning."
People tend to agree there were two AI winters already, one having to do with symbolic AI disappointments/general lack of progress (70s), and the latter related to expert systems (late 80s). That AI winter has long been over. The Deep Learning revolution started in ~2012, and by 2020 (GPT 3) huge amount of talent and money were already going into AI for years. This trend just accelerated with ChatGPT.
"[After symbolic AI] So then came transformers. Seemingly capable of true AI, or, at least, scaling to being good enough to be called true AI, with astonishing capabilities ... the huge research breakthrough was figuring out that, by starting with essentially random coefficients (weights and biases) in the linear algebra, and during training back-propagating errors, these weights and biases could eventually converge on something that worked."
Transformers came about in 2017. The first wave of excitement about neural nets and backpropagation goes all the way back to the late 80s/early 90s, and AI (computer vision, NLP, to a lesser extent robotics) were already heavily ML-based by the 2000s, just not neural-net based (this changed in roughly 2012).
"All transformers have a fundamental limitation, which can not be eliminated by scaling to larger models, more training data or better fine-tuning ... This is the root of the hallucination problem in transformers, and is unsolveable because hallucinating is all that transformers can do."
The 'highest number' token is not necessarily chosen, this depends on the decoding algorithm. That aside, 'the next token will be generated to match that bad choice' makes it sound like once you generate one 'wrong' token the rest of the output is also wrong. A token is a few characters, and need not 'poison' the rest of the output.
That aside, there are plenty of ways to 'recover' from starting to go down the wrong route. A key aspect of why reasoning in LLMs works well is that it typically incorporates backtracking - going earlier in the reasoning to verify details or whatnot. You can do uncertainty estimation in the decoding algorithm, use a secondary model, plenty of things (here is a detailed survey https://arxiv.org/pdf/2311.05232 , one of several that is easy to find).
"The technology won't disappear – existing models, particularly in the open source domain, will still be available, and will still be used, but expect a few 'killer app' use cases to remain, with the rest falling away."
A quick google search shows ChatGPT currently has 800 million weekly active users who are using it for all sorts of things. AI-assisted programming is certainly here to stay, and there are plenty of other industries in which AI will be part of the workflow (helping do research, take notes, summarize, build presentations, etc.)
I think discussion is good, but it's disappointing to see stuff with this level of accuracy being on front page of HN.
For reference, the details about how the LLMs are queried:
"How the players work
All players use the same system prompt
Each time it's their turn, or after a hand ends (to write a note), we query the LLM
At each decision point, the LLM sees:
General hand info — player positions, stacks, hero's cards
Player stats across the tournament (VPIP, PFR, 3bet, etc.)
Notes hero has written about other players in past hands
From the LLM, we expect:
Reasoning about the decision
The action to take (executed in the poker engine)
A reasoning summary for the live viewer interface
Models have a maximum token limit for reasoning
If there's a problem with the response (timeout, invalid output), the fallback action is fold"
The fact the models are given stats about the other models is rather disappointing to me, makes it less interesting. Would be curious how this would go if the models had to only use notes/context would be more interesting. Maybe it's a way to save on costs, this could get expensive...
But LLMs would presumably also condition on past observations of opponents - i.e. LLMs can conversely adapt their strategy during repeated play (especially if given a budget for reasoning as opposed to direct sampling from their output distributions).
The rules state the LLMs do get "Notes hero has written about other players in past hands" and "Models have a maximum token limit for reasoning" , so the outcome might be at least more interesting as a result.
The top models on the leaderboard are notably also the ones strongest in reasoning. They even show the models' notes, e.g. Grok on Claude: "About: claude
Called preflop open and flop bet in multiway pot but folded to turn donk bet after checking, suggesting a passive postflop style that folds to aggression on later streets."
PS The sampling params also matter a lot (with temperature 0 the LLMs are going to be very consistent, going higher they could get more 'creative').
PPS the models getting statistics about other models' behavior seems kind of like cheating, they rely on it heavily, e.g. 'I flopped middle pair (tens) on a paired board (9s-Th-9d) against LLAMA, a loose passive player (64.5% VPIP, only 29.5% PFR)'
Seems like a good overview, but I do find this bit unclear:
"But why don’t market forces correct these issues?
The answer lies in the unique shield that non-dischargeable student loans provide to educational institutions and lenders.
In a normal market, if a product consistently fails to deliver value, consumers stop buying it. Producers either improve or go out of business. But in the world of higher education, this feedback loop is broken.
Colleges and universities, shielded by the guarantee of student loan money, have no real incentive to improve their product or direct students to majors that have an ability to pay back their loans.
They can raise tuition year after year, even as the value of their degrees stagnates or declines. "
Sure, colleges can charge a lot due to loans, but they are still competing with each other and differences in tuition could make a big difference. I went to Georgia Tech over other universities because it was in-state and Georgia has generous scholarships for students with good grades. So why does competition among schools not lower costs?
> But why don’t market forces correct these issues?
Another theory: The value creation is not linked to the value capture. So market forces make a bad feedback loop.
Look, I'm totally pro business, but business is only "good" at allocating capital when value capture and creation are linked. Education isn't like that. The closest we have are the bootcamp schools, where they take a cut out of your first 2 year's salary if you find a job or nothing if you don't.
When capture/creation are not linked, you need a different social organization method. "Government" or "Religion/non-profit" come to mind. Perhaps others have additional suggestions.
The fundamental problem in job education is that it needs to be linked to the needs of future employers, but those employers do not have an incentive to hire workers and train them, thereby aligning the education program with the needs of the employers. Employers do not want to pay for training, because employees can leave at any point, so they decided to let employees go to university and pay for their own education. This then leads to a misalignment between what people elect to receive an education in and what employers want, because people aren't mind readers and know exactly what will make their boss five years into the future happy. So what happens instead is that higher education becomes purely about standardising worker skills, so that each worker is a replaceable cog according to their degree. This means you can just hire X amount of Y degree holders instead of caring about their individual skills.
I ran a coding bootcamp school that had both your typical pay-upfront and later added an option like you outline. I can't speak for all programs, but schools use an affiliate third party lender for those "free" loan programs.
It was relatively new for us when I left, so I never saw the aftermath. I know it worked out well for some students, but my biggest concern was ensuring payments only kicked in if the job was "in-industry or field". My logic was the value isn't there if you go to a coding bootcamp only to not use the skills.
I was still worried they'd basically ask "do you use a computer?" and consider it in-field.
Another issue here is we had folks just looking to up-skill and the value return was harder to gauge if they were returning or continuing to work their job. This was mostly limited to our part-time program so we didn't offer the delayed-loan for it.
Apparently yeah at least arguably the most prominent boot camp takes a verry broad stance on 'related' occupations for income sharing: https://www.sandofsky.com/lambda-school/
I don't know about nationally but my local universities are having year over year enrollment decreases. I think there are some market forces in play, but they aren't reducing tuition, just making the universities ask for more state or local tax money.
> I went to Georgia Tech over other universities because it was in-state and Georgia has generous scholarships for students with good grades. So why does competition among schools not lower costs?
All the schools have access to loans that are guaranteed to be repaid. We still have the mindset that degrees are required for employment (I'm not commenting on whether that's good or bad; that's just the current cultural mindset). Because of this, schools have no incentives to control costs. The students will go regardless because they have access to money that will pay for the tuition, no matter how much it costs. There's no penalty for the universities to raise costs because they will get students anyways.
Also people get their first loan when they’ve just been legally considered adults. Nobody knows for sure they’ll be able to start paying these back in five years.
You buy a car so you can work and eat. These are very straightforward causes and effects. No car no job. Buy car that costs << than job. Done. Buy an education and you get more bills, not more income, for years. You might not even finish.
Exciting to see this so soon after Anthropic's "Mapping the Mind of a Large Language Model" (under 3 weeks). I find these efforts really exciting; it is still common to hear people say "we have no idea how LLMs / Deep Learning works", but that is really a gross generalization as stuff like this shows.
Wonder if this was a bit rushed out in response to Anthropic's release (as well as the departure of Jan Leike from OpenAI)... the paper link doesn't even go to Arxiv, and the analysis is not nearly as deep. Though who knows, might be unrelated.
I read this as "we have not built up tools / math to understand neural networks as they are new and exciting" and not as "neural networks are magical and complex and not understandable because we are meddling with something we cannot control".
A good example would be planes - it took a long while to develop mathematical models that could be used to model behavior. Meanwhile practical experimentation developed decent rule of thumb for what worked / did not work.
So I don't think it's fair to say that "we don't" (know how neural networks work), we don't have math / models yet that can explain/model their behavior...
Chaotic nonlinear dynamics have been an object of mathematical research for a very long time and we have built up good mathematical tools to work with them, but in spite of that turbulent flow and similar phenomena (brains/LLM's) remain poorly understood.
The problem is that the macro and micro dynamics of complex systems are intimately linked, making for non-stationary non-ergodic behavior that cannot be reduced to a few principles upon which we can build a model or extrapolate a body of knowledge. We simply cannot understand complex systems because they cannot be "reduced". They are what they are, unique and unprincipled in every moment (hey, like people!).
The analogy to airplanes is not relevant imo. Our lack of understanding behind the physics of an airplane is different from our lack of understanding of what an LLM is doing.
The lack of understanding is so profound for LLMs that we can’t even fully define the thing we don’t understand. What is intelligence? What is understanding?
Understanding the LLM would be akin to understanding the human brain. Which presents a secondary problem. Is it possible for an entity to understand itself holistically in the same way we understand physical processes with mathematical models? Unlikely imo.
I think this project is a pipe dream. At best it will yield another analogy. This is what I mean: We currently understand machine learning through the analogy of a best fit curve. This project will at best just come up with another high level perspective that offers limited understanding.
In fact, I predict that all AI technology into the far future can only be understood through heavy use of extremely high level abstractions. It’s simply not possible for a thing to truly understand itself.
I think you have to make a distinction between transformers and neural networks in general, maybe also between training and inference.
Many/most types of neural network such as CNNs are well understood since there is a simple flow of information. e.g. In a CNN you've got a hierarchy of feature detectors (convolutional layers) with a few linear classifier layers on top. Feature detectors are just learning decision surfaces to isolate features (useful to higher layers), and at inference time the CNN is just detecting these hierarchical features than classifying the image based on combinations of these features. Simple.
Transformers seem qualitatively different in terms of complexity of operation, not least because it seems we still don't even know exactly what they are learning. Sure, they are learning to predict next word, but just like the CNN whose output classification is based on features learnt by earlier layers, the output words predicted by a transformer are based on some sort or world model/derived rules learned by earlier layers of the transformer, which we don't fully understand.
Not only don't we know exactly what transformers are learning internally (although recent interpretability work gives us a glimpse of some of the sorts of things they are learning), but also the way data moves through them is partially learnt rather than proscribed by the architecture. We have attention heads utilizing learnt lookup keys to find data at arbitrary positions in the context, and then able to copy portions of that data to other positions. Attention heads learn to coordinate to work in unison in ways not specified by the architecture, such as the "induction heads" (consecutive attention head pairs) identified by Anthropic that seem to be one of the work horses of how transformers are working and copying data around.
Additionally, there are multiple types of data learnt by a transformer, from declarative knowledge ("facts") that seem to mostly be learnt by the linear layers to the language/thought rules learnt by the attention mechanism that then affect the flow of data through the model, as discussed above.
So, it's not that we don't know how neural networks work (and of course at one level they all work the same - to minimize errors), but more specifically that we don't fully know how transformer-based LLMs work since their operation is a lot more dynamic and data dependent than most other architectures, and the complexity of what they are learning far higher.
Could there also be a “legal hedging” reason for why you would release a paper like this?
By reaffirming that “we don’t know how this works, nobody does” it’s easier to avoid being charged with copyright infringement from various actors/data sources that have sued them.
LLMs aren't the only kind of AI, just one of the two current shiny kinds.
If a "cure for cancer" (cancer is not just one disease so, unfortunately, that's not even as coherent a request as we'd all like it to be) is what you're hoping for, look instead at the stuff like AlphaFold etc.: https://en.wikipedia.org/wiki/AlphaFold
I don't know how to tell where real science ends and PR bluster begins in such models, though I can say that the closest I've heard to a word against it is "sure, but we've got other things besides protein folding to solve", which is a good sign.
(I assume AlphaFold is also a mysterious black box, and that tools such as the one under discussion may help us demystify it too).
Is your argument that because AI can’t currently do the arbitrary things you wish it would do, it is therefore bullshit?
This perspective discounts two important things:
1. All the things it can obviously do very well today
2. Future advancements to the tech (billions are pouring in, but this takes time to manifest in prod)
I’m trying not to be one of the “guys” you’re talking about, but I just can’t comprehend your take. Do you not recognize that there is utility to current models? What makes it all bullshit?
> Is your argument that because AI can’t currently do the arbitrary things you wish it would do, it is therefore bullshit?
It is sold as a research tool, but it cannot be trusted to return facts, because it will happily recombine disconnected pieces of data. AI cannot tell truth from lies, it is good at constructing output that looks like an answer but it does not care about the factual correctness. Google search result summaries are a good example of this problem. When I searched for "what happened to the inventor of Tetris?" it took bios of two Russian-born developers, one Pajitnov and another a murderer and combined them into one presenting Pajitnov as a murderer. I thought it did not sound right and did some additional searching and sure enough it wasn't true, but how many people who were shown that answer have become convinced that he was a murderer? What if his neighbours saw it? What if such made up summaries are fed into a system that decides who can board a plane? When I bring this problem up people tell me it's not an issue and you should always go back and verify facts, but what if the sources I use have the same problem of being made up content? We are not telling people to stop, think, and verify outputs produced by AI, we are telling them AI is making them "more productive" so they use it to produce garbage content without checking the facts. Please explain to me the usefulness of a tool I cannot trust? Producing garbage faster is not something I wake up in the morning wanting to do more of.
> 2. Future advancements to the tech (billions are pouring in, but this takes time to manifest in prod)
Unlike AI VCs can count and would like to see a return in their investments. I don't think there's much to show for it so far.
Your first section is very much a limit of LLMs, but again, that's not all AI — if you want an AI to play chess, and you want to actually win, you use Stockfish or AlphaZero, because if you use an LLM it will perform illegal moves.
Why would I want to use an AI to win a game of chess? Where's the fun and challenge in it? "Go win me a chess tournament" is the wish nobody has unless we are talking about someone who wants to pretend to be a chess master. It's still a small market. Examples like these are very common in the AI community, they are solutions to problems nobody has.
Though such a question misses the point: use the right tool for the job.
(For a non-ML example, I don't know why 5/8 wrenches exist, but I'm confident of two things: (1) that's a very specific size, unlikely to be useful for a different sized problem; (2) using one as a hammer would be sub-optimal).
I'm not interested in creating a taxonomy of special purpose AI models which each do one thing well and nothing else. What I can do is give a handful of famous examples, such as chess.
Other choices at my disposal (but purely off the top of my head and in no way systematic) include the use of OCR to read and hence sort post faster than any human and also more accurately than all but the very best. Or in food processing for quality control (I passed up on a student work placement for that 20 years ago). Or the entirity of Google search, Gmail's spam filters, Maps' route finding and at least some of its knowledge of house numbers, their CAPTCHA system, and Translate (the one system on this list which is fundamentally the same an LLM). Or ANPR.
It's like you're saying "food is bad" because you don't like cabbage — the dislike is a valid preference, of course it is, but it doesn't lead to the conclusion.
There are two mindsets at play here, the cynics vs the optimists. I’m an optimist to a fault by nature, but I also think there is a kind of Pascals bet to be played here.
If you bet sensibly on the current bubble/wave and turn out to be wrong - well you’re in the same place as everyone else with maybe some time and money lost.
In what way? If the current bubble/wave turns out to be right doesn't that mean we're all out of a job? Unless by betting on it you mean buying Nvidia stock?
We might all be out of a job if the AI reached superhuman performance at everything (or even just human at much lower cost), but even without that this can still be a 10x speedup of the rate of change in the industrial revolution.
What "out of a job" means is currently unclear: do we all get UBI, or starve? Just because we could get UBI doesn't mean we will, and the transition has to be very fast (because the software won't be good enough until one day when a software update comes) and very well managed and almost simultaneous worldwide, which probably means it will go wrong.
> 1. All the things it can obviously do very well today
I'm curious what those things are.
At least to me, it isn't obvious that LLMs solve any of their many applications from the past year "very well". I worry about failures (hallucinations, misinterpretation of prompts, regurgitation of incorrect facts, violation of copyright, and more). I don't have a good sense of when they fail, how often this happens, or how to identify these failures in scenarios where I'm not a domain expert.
But maybe some subset of these are solved problems, or at least problems that are actively being worked on for the next generation of models.
Yes, there are many other kinds of AI. Stockfish is better at chess than any human. But when you start talking about emergent behavior from machine learning, the failure modes are much harder to reason about.
> At least to me, it isn't obvious that LLMs solve any of their many applications from the past year "very well". I worry about failures (hallucinations, misinterpretation of prompts, regurgitation of incorrect facts, violation of copyright, and more)
That’s a list of things that gets clicks in the popular press.
Some solutions I love:
Recording a video and creating a transcript from it. Then edit the transcript and the video gets edited.
Translating (dubbing) video and changing the lips of the speaker to match where they should be for the new audio.
Scanning invoices for mistakes (I know entire businesses build just in this one thing).
Understanding edge cases. So many things have some nice hard and fast rules that can fail, and an ML system of some sort can figure it out so the system can move on. I do this for processing SEC data, and soon for web scraping (like they change the html but visually it’s kinda the same, and the ai system can figure it, give me a new html selector, and back in business).
> That’s a list of things that gets clicks in the popular press.
Are you saying they're nonissues in practice? I agree that most of those points have shown up in the news, but they're also things that I have personally observed when interacting with LLMs.
Of your (and sibling commenters') cited use cases, I see a number of scenarios where AI is used to perform a quick first pass, and a human then refines that output (transcript generation, scanning invoices, iterating on transformations for syntax trees, etc). That's great that it works for you. My worry here is that you might heuristically observe that it worked perfectly 20 times in a row, then decide to remove that human check even as it admits more errors than is acceptable for your use case.
> Scanning invoices for mistakes
This is one of those cases where I would like to better understand the false negatives. If a human reviews the output, then okay, false positives are easy enough to override. But how bad is a false negative? Is it just unnecessary expenses to the company, or does it expose them to liability?
> Translating (dubbing) video and changing the lips of the speaker to match where they should be for the new audio.
This is useful in itself, but surely you too can see the potential for abuse? (This is literally putting words in someone else's mouth.)
> This is useful in itself, but surely you too can see the potential for abuse? (This is literally putting words in someone else's mouth.)
If I was a famous actor, I would demand it. I don’t want people hearing different voices for different movies. I want them to hear my voice. And I’d want it to be authentic. Seeing lips move to the wrong words does not help make a connection.
As for abuse, sure. Not sure anything has had worse abuse than database technology. There should definitely be an avenue for the government to shutdown any database instance anywhere (like California is doing with AI). I would have shutdown data broker databases long ago.
> This is one of those cases where I would like to better understand the false negatives. If a human reviews the output, then okay, false positives are easy enough to override. But how bad is a false negative? Is it just unnecessary expenses to the company, or does it expose them to liability?
In the companies I know about, these invoices requesting overpayment just got paid. So, worst case, it’s the same. But best case there is way way way more money to save than the cost of the service.
I find them pretty good at reasoning about tree structures that have a depth which I myself find difficult to navigate. For instance, I've been working with libcst (a syntax tree library) and I can say:
1. observe this refactor rule which I like
2. here's the starting code
3. here's the desired code
4. write me a refactor rule, in the style of 1, which transforms 2 into 3
It sometimes takes a few iterations where I show it a diff which highlights how its rule fails to construct 3 given 2, but it usually gets me the transformation I need much faster than I'd have done so by hand.
And once I'm done I have a rule which I can apply without the LLM in the loop and which is much more robust than something like a patch file (which often fail to apply for irrelevant reasons like whitespace or comments that have changed since I wrote the rule).
The key is to find cases, like this one, where you can sort of encircle the problem with context from multiple sides, one of which works as a pass/fail indicator. Hallucinations happen, but you use that indicator to ensure that they get retried with the failure text as correcting context.
It helps to design your code so that those context pieces stay small and reasonably self-describing without taking a foray deep into the dependencies, but then that's just a good idea anyway.
> But when you start talking about emergent behavior from machine learning, the failure modes are much harder to reason about.
Sure, this is basically why so many are concerned AI might kill us all:
Lots of observed emergent phenomena that are not expected (basically everything ChatGPT can do given it was trained on next token prediction), as a result of doing exactly what we said instead of what we meant (all computer bugs ever), doing it so hard that something breaks (Goodhart's law), doing it so fast that humans can't respond (stock market flash-crashes, many robotic control systems), and being so capable in smaller scale tests that they tempt people to let go of the metaphorical steering wheel (the lawyers citing ChatGPT, but also previously a T-shirt company that dictionary merged verbs into "keep calm and …" without checking, and even more previously either Amazon or eBay dictionary merging nouns into "${x}: buy it cheap on {whichever site it was}" with nouns including "plutonium" and "slaves").
If people had a good model for the AI, it wouldn't be a problem, we'd simply use them only for what they are good at and nothing else.
Just because you can do something with technology doesn't mean the problem is technology itself. It's like newspapers. Printing them I technology and allows all kind of things. If you're of the authoritarian mindset, you'll want to control it all out of some stated fear, but you can do that for everything.
We also know that petroleum mixed with air may be combusted to release energy; we needed to characterise this much better in order for the motor car to be distinguishable from a fuel-air bomb.
Yes, but we also know that a knife can be used to slice vegetables or stab people, and we still allow knives. I can go to Google right now and easily find out how to make Sarin or ricin at home. Are you suggesting that we should ban Google Search because of that?
> Yes, but we also know that a knife can be used to slice vegetables or stab people, and we still allow knives.
I'm from the UK originally, and guess what.
Also missing the point, given stabbing is a crime; what's the AI equivalent of a stabbing? Does anyone on the planet know?
> I can go to Google right now and easily find out how to make Sarin or ricin at home. Are you suggesting that we should ban Google Search because of that?
Google search has restrictions on what you can search for, and on what results it can return. The question is where to set those thresholds, those limits — and politicians do regularly argue about this for all kinds of reasons much weaker than actual toxins. The current fight in the US over Section 230 looks like it's about what can and can't be done and by whom and who is considered liable for unlawful content, despite the USA being (IMO) the global outlier in favour of free speech due to its maximalist attitude and constitution.
People joke about getting on watchlists due to their searches, and at least one YouTuber I follow has had agents show up to investigate their purchases.
Facebook got flack from the UN because they failed to have appropriate limits on their systems, leading to their platform being used to orchestrate the (still ongoing) genocide in Myanmar.
What's being asked for here is not the equivalent of "ban google search", it's "figure out the extent to which we need an equivalent of Section 230, an equivalent of law enforcement cooperation, an equivalent of spam filtering, an equivalent of the right to be forgotten, of etc." — we don't even have the questions yet, we have the analogies, that's all, and analogies aren't good enough regardless of if the system that you fear might do wrong is an AI or a regulatory body.
You may think the UK government is nuts (I do, I left due to an unrelated law), but it is what it is.
> Or Google search somehow doesn’t return the chemical processes to make Sarin?
You're still missing the point of everything I've said if you think that's even a good rhetorical question.
I have no idea if that's me giving bad descriptions, or you being primed with the exact false world model I'm trying to convince you to change from.
Hill climbing sometimes involves going down one local peak before you can climb the global.
Again, and I don't know how to make this clearer, I am not calling for an undifferentiated ban on all AI just because they can be used for bad ends, I'm saying that we need to figure out how to even tell which uses are even the bad ones.
Your original text was:
> We know exactly what the system is capable of doing. It’s capable of outputting tokens which can then be converted into text
Well, we know exactly what a knife is capable of doing.
Does that knowledge mean we allow stabbing? Of course not!
What's the AI equivalent of a stabbing? Nobody knows.
Same as you’re not allowed to commit eg election or postal fraud using LLMs. Are you allowed to carry a hammer? You can use that to kill people. You can also mow them down with a car, push them in front of a train with your bare hands, poison them with otherwise benign household chemicals and so on. It’s the applications that should be regulated, not the underlying tech
We were planning to release the paper around this time independent of the other events you mention.
I think it is still predominantly accurate to say that we have no idea how LLMs work. SAEs might eventually change that, but there's still a long way to go.
> but that is really a gross generalization as stuff like this shows.
I think this research actually still reinforces that we still have very little understanding of the internals. The blog post also reiterates that this is early work with many limitations.
> likely all these guys went to the same metaphorical SF bars, it was in the water
It also is coming from a long lineage of thought no? For instance, one of the things often thought early in an ML course is the notion that “early layers respond to/generate general information/patterns, and deeper layers respond to/generate more detailed/complex patterns/information.” That is obviously an overly broad and vague statement but it is a useful intuition and can be backed up by doing some various inspection of eg what maximally activates some convolution filters. So already there is a notion that there is some sort of spatial structure to how semantics are processed and represented in a neural network (even if in a totally different context, as in image processing mentioned above), where “spatial” here is used to refer to different regions of the network.
Even more simply, in fact as simple as you can get: with linear regression, the most interpretable model you can get- you have a clear notion that different parameter groups of the model respond to different “concepts” (where a concept is taken to be whatever the variables associated with a given subset of coefficients represent).
In some sense, at least in a high-level/intuitive reading of the new research coming out of Anthropic and OpenAI, I think the current research is just a natural extension of these ideas, albeit in a much more complicated context and massive scale.
Somebody else, please correct me if you think my reading is incorrect!!
This project has been in the works for about a year. The initial commit to the public repo was not really closely related to this project, it was part of the release of the Transformer debugger, and the repo was just reused for this release.
ha thank you Leo; i myself felt uneasy pointing out commit date based evidence and you just proved why.
mild followup question: any alpha to be gained from training the same SAEs on two different generations of GPT4, eg GPT4 on march 2023 vs june 2023 vintage, whatever is most architecturally comparable, and diffing them. what would be your priors on what you’d find?
It’s hard to believe it was written overnight.. this seems more like a public stable dump of what they’ve been working on without saying when they started. Some clues could come from looking at when all the deps it uses were released. They’re also calling this version 0.1.67, though I’m not sure that means anything either.
The fact that a paper is implying a LLM has a mind doesn't exactly bode well for the people who wrote it, not to mention the continued meaningless babbling about "safety". It'd also be nice if they could show their work so we could replicate it. Still, not shabby for an ad!
Well - what is a mind exactly? We don't really have a good definition for a human mind. Not sure we should be claiming domain over the term. It's not a terrible shorthand for discussing something that reads and responds as if it had some kind of mind - whether technically true or not (which we honestly don't know).
> It's not a terrible shorthand for discussing something that reads and responds as if it had some kind of mind
I really don't see it like that—it has very little memory, it has no ability to introspect before "choosing" what to say, no awareness of the concept of the coherency of statements (i.e. whether or not it's saying things that directly contradict its training), seems to have little sense of non-pattern-driven computation beyond what token patterns can encode at a surface level (e.g. of course it knows 1 + 1 = 2, but does it recognize odd notation/can it recognize and analyze arbitrary statements? of course not). I fully grant it is compelling evidence we can replicate many brain-like processes with software neural nets, but that's an entirely different thing than raising it to a level of thought or consciousness or self-awareness (which I argue is necessary in order to appropriately issue coherent statements, as perspective is a necessary thing to address even when attempting to make factual statements), but it strikes me as a lot closer to an analogy for a potential constituent component of a mind rather than a mind per se.
Indeed, and the very last section about how they’ve now “open sourced” this research is also a bit vague. They’ve shared their research methodology and findings… But isn’t that obligatory when writing a public paper?
But even with current efforts so far, I don't think we have an understanding of how/why these emergent capabilities are formed. LLMs are still a black box as ever.
The Deep Visualization Toolbox from nearly 10 years ago is solid precedent for understanding deep models, albeit much smaller models than LLMs. It’s hard to say OpenAI’s “visualization” released today is nearly as effective. It could be that GPT-4 is much harder to instrument.
At the shit-tier level, the majority of people building applications on this technology are projecting abilities onto it that even they can't really demonstrate it has in a reliable way.
At the inventor level, the people who make it are dependent on projecting the idea that magic will happen when they have more compute.
At every level, the products are so far ahead of the knowledge that it's actually unethical.
I have finished a PhD in AI just this past year, and can assure you there exist reviewers who spend hours per review to do it well. It's true that these days it's often the case that you can (and are more likely than not to) get unlucky with lazier reviewers, but that does not appear to have been the case with this paper.
For example just see this from the review of f5bf:
"The main contribution of the paper comprises two new NLM architectures that facilitate training on massive data sets. The first model, CBOW, is essentially a standard feed-forward NLM without the intermediate projection layer (but with weight sharing + averaging before applying the non-linearity in the hidden layer). The second model, skip-gram, comprises a collection of simple feed-forward nets that predict the presence of a preceding or succeeding word from the current word. The models are trained on a massive Google News corpus, and tested on a semantic and syntactic question-answering task. The results of these experiments look promising.
...
(2) The description of the models that are developed is very minimal, making it hard to determine how different they are from, e.g., the models presented in [15]. It would be very helpful if the authors included some graphical representations and/or more mathematical details of their models. Given that the authors still almost have one page left, and that they use a lot of space for the (frankly, somewhat superfluous) equations for the number of parameters of each model, this should not be a problem."
These reviews in turn led to significant (though apparently not significant enough) modifications to the paper (https://openreview.net/forum?id=idpCdOWtqXd60¬eId=C8Vn84f...). These were some quality reviews and the paper benefited from going this review process, IMHO.
"Leadership worked with you around the clock to find a mutually agreeable outcome. Yet within two days of your initial decision, you again replaced interim CEO Mira Murati against the best interests of the company. You also informed the leadership team that allowing the company to be destroyed “would be consistent with the mission.”"
“When the Waymo vehicle encounters a particular situation on the road, the autonomous driver can reach out to a human fleet response agent for additional information to contextualize its environment,” the post reads. “The Waymo Driver [software] does not rely solely on the inputs it receives from the fleet response agent and it is in control of the vehicle at all times.” [from Waymo's own blog https://waymo.com/blog/2024/05/fleet-response/]
What's the problem with this?