More

silviot · 2025-11-21T09:38:30 1763717910

I tried the playground at https://playground.allenai.org/ and clicked the "Show OlmoTrace" button.

Above the response it says

> Documents from the training data that have exact text matches with the model response. Powered by infini-gram

so, if I understand correctly, it searches the training data for matches in the LLM output. This is not traceability in my opinion. This is an attempt at guessing.

Checking individual sources I got texts completely unrelated with the question/answer, but that happen to share an N-gram [1] (I saw sequences up to 6 words) with the LLM answer.

I think they're being dishonest in their presentation of what Olmo can and can't do.

[1] https://en.wikipedia.org/wiki/N-gram

comp_raccoon · 2025-11-21T16:21:23 1763742083

Olmo researcher here. The point of OlmoTrace is not no attribute the entire response to one document in the training data—that’s not how language models “acquire” knowledge, and finding a single or few documents as support for an answer is impossible.

The point of OlmoTrace is to show that fragments of model response are influenced by its training data. sometimes is how specific adjectives are used together in way that seem unnatural to us, but are combination of training data (ask for a movie review!)

A favorite example of mine is asking to tell a joke or ask for a random number, because strangely all LLMs return the same joke or number. Well with OlmoTrace, you can see which docs in the training data contain the super common response!

hope this helps

silviot · on May 14, 2024

Not offensive, but much more often wrong than person.

gardenhedge · on May 17, 2024

So the solution is to avoid the problem?

silviot · on April 12, 2024

> ...and to be clear, Hacker News is not a representative sample of their customers.

I am a customer and I learned about Kagi here. I assume many people are on the same boat, so I wouldn't be so sure about that.

rjbwork · on April 12, 2024

FWIW I'm a customer and had never read about it on HN until this post. I learned about it from a private Discord programmer community.

cqqxo4zV46cp · on April 12, 2024

Same shit, really.

kerkeslager · on April 12, 2024

> I am a customer and I learned about Kagi here.

Perhaps I should have said "target customers" where I said "customers", I don't know. But it should not be surprising that "being an HN user" correlates strongly with "finds out about things on HN".

rrrix1 · on April 13, 2024

Kagi does zero advertising, only word of mouth (or from social news sites).

I would strongly bet their primary source of customers came via a HN referral.

We are literally their Target Customers.

silviot · on April 12, 2024

> Upper management are just average people with better networking and less empathy

Very concise and to the point. I might print and hang this!

silviot · on Feb 19, 2024

Excellent game. Excellent attitude towards feedback. Keep up with the good stuff!

silviot · on Feb 13, 2024

> But anyway, even if it were all true, the only reason we are talking about diffusers, and the only reason we are paying attention to this author's work Fairly Trained, is because of someone training on data that was not expressly licensed.

Thanks for putting this into words. I'm of the same opinion and this is the best articulation I have so far.

silviot · on Feb 5, 2024

This is very different from my experience. Whenever someone I was in a conversation with brought up GEB, it was always a great pleasure of mine. I'd get the chance to discuss the main ideas of the book, and the way I assimilated them. I tend to not even engage in conversations with people who do it mostly to show off the extent of their knowledge. I believe this second point is the important one. GEB is completely orthogonal to the problem you describe.

silviot · on Jan 18, 2024

Because otherwise you spend too much time arguing about not-so-important matters (in a word: bikeshed - you end up bikeshedding less).

silviot · on Nov 18, 2023

I'm trying to read and reread this over and over again to make sense of this but to me it sounds like in the comments people speak as if Greg brockman resigned while in the article he is not amongst the three names who resigned. What am I missing here?

frabcus · on Nov 18, 2023

He was fired from being Chair of the Board, but the rest of the board left him in his position as an engineer (?) in the company. Then an hour or two later he resigned as an engineer.

See: https://news.ycombinator.com/item?id=38312704

aidaman · on Nov 18, 2023

he resigned earlier. google it. or bing it. or chatgpt it.

croes · on Nov 18, 2023

ChatGPT doesn't know recent events

earino · on Nov 18, 2023

ChatGPT now has the ability to do web browsing to search for recent events!

https://chat.openai.com/share/c35e3fd1-d94e-477b-a331-b14384...

silviot · on Oct 20, 2023

> It's pretty simple actually

That's true. You just need to exploit your fellow humans! Make them work for you but never pay them for the full value they bring. Always pay them the least they will still accept. So you get a piece of their pie. Your position of power will allow you to do so, and people will see your position of power as "natural" and will see no problem in the exploitation.