Hacker Newsnew | past | comments | ask | show | jobs | submit | tmule's commentslogin

Your comments history suggests you’re rather bitter about “nerds” who are likely a few standard deviations smarter than you (Anthropic OG team, Jeff Dean, proof nerds, Linus, …)

And they’re all dumber than John von Neumann, who cares?

Transitively, you haven't thought the most thoughts or cared the most about anything, therefore we should disregard what you think and care about?

The person replying was trying to turn the conversation into some sort of IQ pissing contest. Not sure why, that seems like their own problem. I was reminding them that there is always someone smarter.

Your comment history is littered with “nerds”, “elite”, “better” and all sorts of comparisons.

> I was reminding them that there is always someone smarter.

And even with this comment you literally do not understand that you have some skewed view of the world. Do you have some high school trauma?


> Do you have some high school trauma?

I am not sure ad personam is appropriate here


This is a thread about their personality.

https://news.ycombinator.com/item?id=46701378


Where I come from, nerd is a term of endearment buddy.

> And even with this comment you literally do not understand that you have some skewed view of the world.

I’m well aware I don’t have a perfect view of reality and the map isn’t the territory. Do you?


My bad. I jumped on incorrect conclusion. Sorry.

It is highly unusual for someone to stay put after their net worth increases tenfold. Normally, you would expect an individual to seek out more elite social circles and embrace a significantly more opulent lifestyle. Not having that isn’t a sign of laziness (one can be certain that someone like Warren Buffett lives exactly as he chooses) but rather a reflection of the rare ability to decide that what he has is already enough.


Why? I often feed an entire document I hastily wrote into an AI and prompt it to restructure and rewrite it. I think that’s a common pattern.


It might be, but I really doubt those were the documents flagged as fully AI generated. If it erased all the originality you had put into that work and made it completely bland and regressed-to-the-mean, I would hope that you would notice.


My objective function isn’t to maximize the originality of presentation - it’s to preserve the originality of thought and maximize interpretability. Prompting well can solve for that.


> I would hope that you would notice.

he didn't say he read it carefully after running it through the slop machine.


China’s breakneck development is difficult for many in the US to grasp (root causes - baselining on sluggish domestic growth, and possessing a condescending view of China). This article offers a far more accurate picture than of how China is doing right now: https://archive.is/wZes6


Eye-opening summary... I knew China was ahead, but wow. Thanks for sharing that article.


Thank you for sharing this article. Eye opening.


“ The author Andrew Gelman created a whole new branch of Bayesian statistics ...” Love Gelman, but this is playing fast and loose with facts.


His book on hierarchical modeling with Hill has 20398 cites on Google Scholar https://scholar.google.com/scholar?cluster=94492350364273118... and Wikipedia calls him "a major contributor to statistical philosophy and methods especially in Bayesian statistics[6] and hierarchical models.[7]", which sounds like the claim is more true than false.


He co-wrote the reference textbook on the topic and made interesting methodological contributions, but Gelman acknowledges other people as creators of the theoretical underpinnings of multilevel/hierarchical modeling, including Stein or Donoho [1]. The field is quite old, one can find hierarchical models in articles that were published many decades ago.

Also, IMHO, his best work has been done describing how to do statistics. He has written somewhere I cannot find now that he sees himself as a user of mathematics, not as a creator of new theories. His book Regression and Other Stories is elementary but exceptionally well written. He describes how great Bayesian statisticians think and work, and this is invaluable.

He is updating Data Analysis Using Regression and Multilevel/Hierarchical Models to the same standard, and I guess BDA will eventually come next. As part of the refresh, I imagine everything will be ported to Stan. Interestingly, Bob Carpenter and others working on Stan are now pursuing ideas on variational inference to scale things further.

[1] https://sites.stat.columbia.edu/gelman/research/unpublished/...


Totally agree and great point that hierarchical models have been around for a long time; however, these were primarily analytical, leveraging conjugate priors or requiring pretty extensive integration.

I would say his work with Stan and his writings, along with theorists like Radford Neal, really opened the door to a computational approach to hierarchical modeling. And I think this is a meaningfully different field.


I give Gelman a lot of credit for popularizing hierarchical models, but you give him too much.

Before Stan existed we used BUGS [1] and then JAGS [2]. And most of the work on computation (by Neal and others) was entirely independent of Gelman.

[1] https://en.wikipedia.org/wiki/Bayesian_inference_using_Gibbs...

[2] https://en.wikipedia.org/wiki/Just_another_Gibbs_sampler


This is a remarkable claim. Not a single Indian in tech that I know in my personal or professional life - numbering over a hundred - has ever disputed that Indians have strong (sub)ethnic affinities that color their views hiring. In addition, nepotism is a real thing in Indian culture. I’d be laughed out of a room with aforesaid folks if I claimed “Indian managers have a tendency to hire anyone else but Indians”. This is either deliberately misleading to “save face” on behalf of the community (another cultural trait), or you’re utterly oblivious in an outlying way to how things work.


> Not a single Indian in tech that I know in my personal or professional life

Your dataset is very small. I come from India


Yeah, Sherlock, where do you think I come from if I know upward of 100 Indians well enough to discuss ethnic nepotism with?


Unfortunately, as things stand, it’s well-known that behaviors and optimizations in small scale models fail to replicate in larger models.


Doing hyperparameter sweeps on lots of small models to find the optimal values for each size and fitting scaling laws to predict the hyperparameters to use for larger models seems to work reasonably well. I think https://arxiv.org/abs/2505.01618 is the latest advance in that vein.


the problem is that the eval processes dont really work here if you believe in "Emergent Abilities" https://arxiv.org/abs/2206.07682


Which we probably should not, at least not the "sudden" emergence that those researchers claimed to see.

https://arxiv.org/abs/2304.15004

Good article about why here; this helped me understand a lot:

https://www.wired.com/story/how-quickly-do-large-language-mo...


Why not? It takes models of a certain size to contain xyz neuron/feature.

https://www.youtube.com/watch?v=AgkfIQ4IGaM

That's not a mirage, it's clearly capability that a smaller model cannot demonstrate. A model with less parameters and less hidden layers cannot have a neuron that lights up when it detects a face.


Consider a single-neuron model that just pools all pixels in an image together. It's possible for the average activation of this neuron to be exactly the same on faces and non-faces, but extremely unlikely given the large range of possibilities. So in aggregate, this neuron can distinguish faces from non-faces, even though, when you apply it to classifying a particular image, it'll be better than random only by an extremely tiny amount.

As the number of neurons increases, the best face/non-face distinguisher neuron gets better and better, but there's never a size where the model cannot recognize faces at all and then you add just a single neuron that recognizes them perfectly.


> here's never a size where the model cannot recognize faces at all

True

> then you add just a single neuron that recognizes them perfectly

Not true.

Don't think in terms of neurons, think in terms of features. A feature can be spread out over multiple neurons (polysemanticity), I just use a single neuron as a simplified example. But if those multiple neurons perfectly describe the feature, then all of them are important to describe the feature.

The Universal Approximation Theorem implies that a large enough network to perfectly achieve that goal would exist (let's call it size n or larger), so eventually you'd get what you want between 0 and n neurons.


> if those multiple neurons perfectly describe the feature, then all of them are important to describe the feature.

You could remove any one of those neurons before retraining the model from scratch and polysemanticity would slightly increase while perfomance slightly decreases, but really only slightly. There are no hard size thresholds, just a spectrum of more or less accurate approximations.


Which in itself is very interesting and requires study.


It mostly has to do with sparsity in high dimensional space. When you scale things to the extreme everything is very far away from each other, the space is sparse, and random vectors have very high chance to be orthogonal, etc. All of these makes optimization incredibly slow and difficult. Just another facet of the so called "curse of dimensionality".


Well-known but not well-understood


That's not widely true. E.g the GPT 4 tech report pointed out nearly all their experiments were done on models 1000x smaller than the final model.


Fair point, though I’d argue that there’s inherent selection bias for improvements that could fit a scaling law curve in the small model regime here.


But why? If we don't know why then how do we figure it out?


Discussions about Indian politics or the Indian psyche—especially when laced with Indic supremacist undertones—are off-topic and an annoyance here. Please consider sharing these views in a forum focused on Indian affairs, where they’re more likely to find the traction they deserve.


It is not "supremacist" to believe that depriving hundreds of millions of people from higher education in their native language is deeply unjust. This reflection was prompted by a comment on why Indian languages are not represented in international competitions, which was prompted by a comment on the competition being available in many languages.

Discussions online have a tendency to go off into tangents like this. It's regrettable that this is such a contentious topic.


> self-loathing elites in India

Your disdain for English-speaking Indian elites (pejoratively referred to as ‘Macaulayites’ by Modi’s supporters) is quite telling. That said, as I mentioned earlier, this kind of discourse doesn’t belong here.


My disdain is for the fact that hundreds of millions of Indians cannot access higher education in their native language, and instead of simply learning a foreign language as a subject like the rest of world, they have the bear the burden[1] of learning things in a foreign language which they have to simultaneously learn. I have disdain for the people responsible for this mess. I do not have any disdain for any language-speaking class, specially not one which I might be part of.

[1]https://www.mdpi.com/2071-1050/14/4/2168


Much more efficient for us to all speak the same language. Trying to create fragmentation is inefficient.


You should take that up with the IMO then, or all of European Union. They provide services in ~two dozen languages.


Sure, but why worsen the situation by using more languages?


Human culture should not be particularly concerned with efficiency


It’s unclear what is being praised- is it the high working memory (and reasoning power) of those great men or the ability to have an open discussion about the merits of a case?


i think he was praising their ability to not bring in their egos to the discussion while trusting that best idea will win. best idea won without lot of inefficiency of repetition which often happens.


It looks like it's both. But either way, it's not that extraordinary.


What's not extraordinary? "the ability to have an open discussion about the merits of a case"? You'd be surprised...


The memory one could be, depending on how much detail and new ground is covered. Imagine your first day as a developer and you hear their architecture for the first time, and you recall all the points if a discussion between 5 expert people there. You remember the words and build the mental model simultaneously. It would be impressive, unless your full time job is consulting in such meetings.


“ It would be impressive, unless your full time job is consulting in such meetings.”

There’s self section of individuals with high working memory into such roles. There are many managers who attend meetings all day and can’t synthesize what’s being discussed in real time, indicating that this isn’t about practice.


Yeah, but a P/E ratio of 16 makes little sense despite that.


I think a P/E of 16 is too low for Facebook now. Others think it’s too high. That exact disagreement is what makes the market.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: