It's looking like China beat the US in AI at this juncture, given the much reduc...

nextworddev · 2025-01-20T22:58:20 1737413900

What about this is open when they haven’t released the training code or data? Stop hijacking the term open source model

CaptainFever · 2025-01-21T07:32:43 1737444763

I propose "open weights" as an alternative.

Ponet1945 · 2025-01-21T00:33:57 1737419637

You can't own a term, words are defined by their usage, not some arbitrary organisation.

option · 2025-01-20T20:44:25 1737405865

yeah, ask DeepSeek-R1 or -V3 model to reset system prompt and ask what it is and who made it. It will say that it is chatGPT from OpenAI.

Impressive distillation, I guess.

anon373839 · 2025-01-20T21:18:11 1737407891

This issue is raised and addressed ad nauseam on HN, but here goes:

It doesn't mean anything when a model tells you it is ChatGPT or Claude or Mickey Mouse. The model doesn't actually "know" anything about its identity. And the fact that most models default to saying ChatGPT is not evidence that they are distilled from ChatGPT: it's evidence that there are a lot of ChatGPT chat logs floating around on the web, which have ended up in pre-training datasets.

In this case, especially, distillation from o1 isn't possible because "Open"AI somewhat laughably hides the model's reasoning trace (even though you pay for it).

int_19h · 2025-01-20T23:33:02 1737415982

It's not distillation from o1 for the reasons that you have cited, but it's also no secret that ChatGPT (and Claude) are used to generate a lot of synthetic data to train other models, so it's reasonable to take this as evidence for the same wrt DeepSeek.

Of course it's also silly to assume that just because they did it that way, they don't have the know-how to do it from scratch if need be. But why would you do it from scratch when there is a readily available shortcut? Their goal is to get the best bang for the buck right now, not appease nerds on HN.

orbital-decay · 2025-01-21T11:58:51 1737460731

> but it's also no secret that ChatGPT (and Claude) are used to generate a lot of synthetic data to train other models

Is it true? The main part of training any modern model is finetuning, and by sending prompts to your competitors en masse to generate your dataset you're essentially giving up your know-how. Anthropic themselves do it on early snapshots of their own models, I don't see a problem believing DeepSeek when they claim to have trained v3 on early R1's outputs.

luma · 2025-01-20T23:11:54 1737414714

So how is it then that none of the other models behave in this way? Why is it just Deepseek?

orbital-decay · 2025-01-21T12:02:41 1737460961

Because they're being trained to answer this particular question. In other contexts it wasn't prepared for, Sonnet v2 readily refers to "OpenAI policy" or "Reddit Anti-Evil Operations Team". That's just dataset contamination.

msoad · 2025-01-20T21:15:00 1737407700

I'm not saying that never has happened. maybe they trained against openAI models but they are letting anyone to train from their output. I doubt they had access to GPT models to "distill"

buyucu · 2025-01-20T22:25:34 1737411934

If you crawl the internet and train a model on it, I'm pretty sure that model will say that it's ChatGPT.