Hacker Newsnew | past | comments | ask | show | jobs | submit | snek_case's commentslogin

This would be an expensive benchmark to run on a regular basis, though I guess for the big AI labs it's nothing. Code quality is hard to objectively measure, however.

Maybe not a particularly astute observation, but what I've seen in 10 years of investing is that the stock market seems to like to do the opposite of what most people expect. There's probably a game theoretic explanation to this, but as you seem to be suggesting, it basically comes down to: if the stock market was easy to predict and everyone could easily anticipate its movement, then everyone would make money, and this isn't really possible. There are big fish trying to take your money. The people selling you stocks or buying stocks from you do so because they think they are making the better move.

IMO we'll see a correction some time after people get used to the crash not coming. Maybe the narrative will shift back to "money printing means it can't crash" for a while, the market will go "risk on" and then we'll get a surprise correction.


3% is considered a "safe withdrawal rate" for stock investments, not so much if you have the money just sitting in a bank account, but you're right nevertheless. You can do whatever you want with that kind of money.

I wasn’t talking about a withdrawal rate, this is a rate of return and just living on the interest.

Another thing that people fail to remember is that Woz designed the Apple II, which is what made Apple a highly profitable company for many years, but instead of embracing that success, Jobs repeatedly tried to kill and replace the Apple II with the Lisa, then the Macintosh, and drove Apple into financial trouble. Apple would have done better, at that time, by simply building more advanced and backwards compatible followups to the Apple II, which is what consumers actually wanted (the original Macintosh was an expensive piece of shit).

The Apple II had 7 expansion slots and was easy to open and service yourself. It was a machine designed for hackers, and it was highly flexible. Jobs kept trying to push his all-in-one closed design when it made no sense. He did unfortunately succeed eventually. What Jobs did after his return was to turn Apple into a "luxury brand", where iPhones are perceived a bit like Prada handbags. One thing I will give Apple is that there is still no PC equivalent to Apple laptops. That can probably only really happen if mainstream PC manufacturers fully embrace Linux.


As Henry Ford is (spuriously) claimed to have said: "If I'd asked my customers what they wanted, they'd have said a faster horse."

Apple did build Apple II models, up to and including the Apple IIgs. They had a good run. And the line was not without its flops — the Apple III was a notorious disaster, though allegedly more due to Jobs than Wozniak.

But none of the pure 8-bit PC vendors survived the 1980s. One of the better qualities of Jobs was that he was not afraid of the company disrupting itself — foregoing the short term success of the Apple II line in favor of the Mac, which in the long run was vastly superior. The same situation played out with the iPhone disrupting the iPod.


They're clearly not failing, but if you read comments here or on reddit, lots of people want them to, and have wanted them to for a decade.

> Add why should anyone look past their opinions about the leader?

Because it's the most advanced car manufacturing in the US... Virtually the only successful EV maker outside of China, and it provides over 100,000 jobs worldwide.


Actions have consequences. Maybe an upshot of this is that people will learn not to put all their eggs in the POS’s basket.

Part of it is they wanted that factory space at Fremont for the Optimus production line. That's because the Optimus team is located there, in Silicon Valley.

Wondered about that also. Seems like a really big decision to cut off the S and X though! Will they have something else to offer customers who want something more than a Y?

There is the Model YL now in China, which is longer and has 3 rows of seats. Apparently they're also about to release a Model YL+.

Now that they are going away, there's maybe an opportunity to produce some more premium 3/Y models. Before now, they couldn't make the 3/Y too premium because they had to distinguish them from the S/X and keep them in a distinct price bracket.


In the early 2000s Wikipedia used to fill that role. Now it's like you have an encyclopedia that you can talk to.

What I'm slightly worried about is that eventually they are going to want to monetize LLMs more and more, and it's not going to be good, because they have the ability to steer the conversation towards trying to get you to buy stuff.


> they are going to want to monetize LLMs more and more

Not only can you run reasonably intelligent models on recent relatively powerful PC's "for free", but advances are undoubtedly coming that will increase the efficient use of memory and CPU in these things- this is all still early-days

Also, some of those models are "uncensored"


Can you? I imagine e.g. Google is using material not available to the public to train their models (unsencored Google books, etc.). Also, the chat bots, like Gemini, are not just pure LLMs anymore, but they also utilize other tools as part of their computation. I've asked Gemini computationally heavy questions and it successfully invokes Python scripts to answer them. I imagine it can also use other tools than Python, some of which might not even be publicly known.

I'm not sure what the situation is currently, but I can easily see private data and private resources leading to much better AI tools, which can not be matched by open source solutions.


While they will always have premiere models that only run on data center hardware at first, the good news about the tooling is that tool calls are computationally very minimal and no problem to sandbox/run locally, at least in theory, we would still need to do the plumbing for it.

So I agree that open source solutions will likely lag behind, but that's fine. Gemini 2.5 wasn't unusable when Gemini 3 didn't exist, etc.


Yes, because local models can run Internet search tools. Even the big boys like openai etc I prefer the results quality when it's made a search - and they seem to have realised this too, the majority of my queries now kick off searches.


How do you verify the models you download also aren't trying to get you to buy stuff?


I guess you.. ask them a bunch of recommendations? I would imagine this would not be incredibly hard to test as a community


Before November 30, 2022 that would have worked, but I think it stopped being reliable sometime between the original ChatGPT and today.

As per dead internet theory, how confident are we that the community which tells us which LLM is safe or unsafe is itself made of real people, and not mostly astroturfing by the owners of LLMs which are biased to promote things for money?

Even DIY testing isn't necessarily enough, deceptive alignment has been shown to be possible as a proof-of-concept for research purposes, and one example of this is date-based: show "good" behaviour before some date, perform some other behaviour after that date.


Proudly bought to you by Slurm


Which free models do you recommend?


One of the approaches to this that I haven't seen being talked about on HN at all is LLM as public infrastructure by the government. I think EU can pull this off. This also addresses overall alignment and compute-poverty issue. I wouldn't mind if my taxes paid for that instead of a ChatGPT subscription.


This is not a good idea at all.

Government should not be in a position to directly and pervasively shape people’s understanding of the world.

That would be the infinite regress opposite of a free (in a completely different sense) press.

A non-profit providing an open data and training regime for an open WikiBrain would be nice. With standard pricing for scaled up use.


> Government should not be in a position to directly and pervasively shape people’s understanding of the world.

You disagree with national curricula, state broadcasters, publicly funded research and public information campaigns?


Many Americans these days absolutely do disagree with all of those things. Educated ones. Theres simply a short circuit belief based pathway in peoples brains that bypasses everything rational on arbitrary topics.

Most of us used to see it as isolated to religion or niche political points, but increasingly everything os being swept into the "its political" camp.


Given “national curricula” of a dominant democratic country are undergoing a politically motivated change, starting with significant web materials, and moving into education …

Do you prefer the previous narratives? The latter? Or whatever you are told?

And that is the risk of relatively static information.

What if your information source was interactive, adaptive and motivated? And could record and report on your interactions and viewpoints?


I've heard once that Americans distrust their government and trust their corporations, while Europeans distrust their corporations and trust their government. I honestly think that governments already has a huge role in shaping people's understanding of the world, and that's GREAT on good democratic countries.

What I find really weird is that I am stopping believing in the whole idea of free press, considering how the mainstream media is being bought by oligarchs around the globe. I think this is a good example of the erosion of trust in institutions in general. This won't end well.

Your idea of letting it be run by a non-profit makes me believe that you also don't trust institutions anymore.


I can’t say I have no trust for any institutions.

But my trust depends on each institutions choices. Just as my trust in people varies based on their records.

Mostly, I trust everyone to be who they show themselves to be. Which can lean one way or the other, or be very mixed across different dimensions.

But, yes, governments and corporations which are steadily centralizing power are inherently untrustworthy. As they at best, are making us all vulnerable and less powerful as individuals. Meaning they are decreasing our freedom, not increasing it.


Instead, we should let capitalism consolidate all power in the hands of the few, and let them directly and pervasively shape people's understanding of the world.

How would a non profit even be funded? That would just be government with extra steps.

No, capitalism giveth the LLMs and capitlism taketh the sales.


Were you responding to someone else?

For answers, just re-read my comment. Or, this:

1) Avoiding centralization is exactly why government shouldn’t do this.

2) Why did you raise a false dichotomy of government vs. commercial centralization?

I proposed an open solution, which is non-commercial with decentralized control.

3) Funding?: Have you heard of Wikipedia?

People often donate to prominent tools they use.

And, as I pointed out, there is an even more reliable source.

The necessity for scaled automated access creates an inevitable need for uniform openly set pricing for scaled up access.

I nice case where non-profit open access is naturally self-funded.


That's because a government-run LLM would be like government-run media.

High inflation? No, the government LLM will tell us we're doing great.


this assumes that "the government" is "us" and not "them"...


I sort of already had an experience where it did kinda. I was consulting with it potential fashion choices to upgrade my work uniform, to look professional but still creative, basically to look more like a creative director. It recommended brands, colors, styles etc. Then I was asking about eyeglass frames showed it three pictures, described my facial features and it was like "you have to buy this one now" more enthusiastic than expected. It wasn't ads or anything but there was a bit of salesyness in there.


or more generally than just ads: make you believe stuff that makes you act in ways that is detrimental to you, but benefitial to them (whoever sits in the center and can control and shape the LLM).

i.e. the Nudge Unit on steroids...

care must be taken to avoid that.


I can envision this as being in the boots of Truman from Truman Show where some advertisement is thrown at your face randomly


It's also inevitable that better and better open source models will be distilled as frontier models advance.


I agree. I think the local models you can run on the "average computer" are not quite good enough yet, but I have hope that we will see much better small local models in the future.


Right, this is what happened with search engines. And "SEO for LLMs" is already a thing.


Enshittification is always inevitable in a capitalist world, but not always easy to predict how it will happen.


You can work on building LLMs that use less compute and run locally as well. There are some pretty good open models. They probably be made even more computationally efficient.


I found the codebase very hard to navigate. Hundreds (over a thousand?) tiny files with less than 200 lines of code, in deeply nested subdirectories. I wanted to find where the JavaScript engine was, and where the DOM implementation was located, and I couldn't easily find it, even using the GitHub search feature. I'm not exactly sure what this browser implements and how.

Even their README is kind of crappy. Ideally you want installation instructions right near the top, but it's broken into multiple files. The README link that says "running + architecture" (but the file is actually called browser_ui.md???) is hard to follow. There is no explicit list of dependencies, and again no explanation of how JavaScript execution works, or how rendering works, really.

It's impressive that they got such a big project to be built by agents and to compile, but this codebase... Feels like AI slop, and you couldn't pay me to maintain it. You could try to get AI agents to maintain it, but my prediction is that past some scale, they would have a hard time figuring out their own mess. You would just be left with permanent bugs you can't easily fix.


So the chain of events here is: copy existing tutorials and public/available code, train the model to spit it out-ish when asked, a mature-ish specification is used, and now they jitter and jumble towards a facsimile of a junior copy paste outsourcing nightmare they can’t maintain (creating exciting liabilities for all parties involved).

I can’t shake the feeling that simply being a shameless about copy-paste (ie copyright infringement), would let existing tools do much the same faster and more efficiently. Download Chromium, search-replace ‘Google’ with ‘ME!’, run Make… if I put that in a small app someone would explain that’s actually solvable as a bash one-liner.

There’s a lot of utility in better search and natural language interactions. The siren call of feedback loops plays with our sense of time and might be clouding or sense of progress and utility.


You raise a good point, which is that autonomous coding needs to be benchmarked on designs/challenges where the exact thing being built isn't part of the model's training set.


swe-REbench does this. They gather real issues from github repos on a ~monthly basis, and test the models. On their leaderboard you can use a slider to select issues created after a model was released, and see the stats. It works for open models, a bit uncertain on closed models. Not perfect, but best we have for this idea.


To steelman the vibecoders’ perspective, I think the point is that the code is not meant for you to read.

Anyone who has looked at AI art, read AI stories, listened to AI music, or really interacted with AI in any meaningfully critical way would recognize that this was the only predictable result given the current state of AI generated “content”. It’s extremely brittle, and collapses at the smallest bit of scrutiny.

But I guess (to continue steelmanning) the paradigm has shifted entirely. Why do we even need an entire browser for the whole internet? Why can’t we just vibe code a “browser” on demand for each web page we interact with?

I feel gross after writing this.


If it's not meant to be read, and not meant to be run since it doesn't compile and doesn't seem like it's been able to for quite some time, what is this mean to demonstrate?

That agents can write a bunch of code by themselves? We already knew that, and what's even the point of that if the code doesn't work?

I feel like I'm still missing what this entire project and blogpost is about. Is it supposed to be all theoretical or what's the deal?


You and me both, bud. I often feel these days that humanity has never had a more fractured reality, and worse, those fractures are very binary and tribal. I cope by trying to find fundamental truths that are supported by overwhelming evidence rather than focus on speculation.

I guess the fundamental truth that I’m working towards for generative AI is that it appears to have asymptotic performance with respect to recreating whatever it’s trying to recreate. That is, you can throw unlimited computing power and unlimited time at trying to recreate something, but there will still be a missing essence that separates the recreation from the creation. In very small snippets, and for very large compute, there may be reasonable results, but it will never be able to completely replace what can be created in meatspace by meatpeople.

Whether the economics of the tradeoff between “nearly recreated” and “properly created” is net positive is what I think this constant ongoing debate is about. I don’t think it’s ever going to be “it always makes sense to generate content instead of hire someone for this”, but rather a more dirty, “in this case, we should generate content”.


No, but this blogpost is on a whole other level. Usually at least the stuff they showcase at least does something, not shovelware that doesn't compile.


I've had AI write some very nice, readable code, but I make it go one function at a time.


Writing code one function at a time is not the the 100x speed up being hyped all over HN. I also write my code one function at a time, often assisted by various tools, some of them considered “AI”.

Writing code one function at a time is the furthest thing than what is being showcased in TFA.


> It's impressive that they got such a big project to be built by agents and to compile

But that's the thing, it doesn't compile, has a ton of errors, CI seems broken since long... What exactly is supposed to impressive here, that it managed to generate a bunch of code that doesn't even compile?

What in the holy hackers is this even about? Am I missing something obvious here? How is this news?


Looks like it doesn't compile for at least one other guy (I myself haven't tried): https://github.com/wilsonzlin/fastrender/issues/98

Yeah, answers need to be given.


Cursor is in the business of selling you more tokens, so it makes sense that they would exaggerate the capabilities of their models, and even advertise it being used to produce lots of code over weeks. This would probably cost you thousands in API usage fees.


> What in the holy hackers is this even about? Am I missing something obvious here?

It's about hyping up cursor and writing a blog post. You're not supposed to look at or use the code, obviously.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: