Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not distillation from o1 for the reasons that you have cited, but it's also no secret that ChatGPT (and Claude) are used to generate a lot of synthetic data to train other models, so it's reasonable to take this as evidence for the same wrt DeepSeek.

Of course it's also silly to assume that just because they did it that way, they don't have the know-how to do it from scratch if need be. But why would you do it from scratch when there is a readily available shortcut? Their goal is to get the best bang for the buck right now, not appease nerds on HN.




> but it's also no secret that ChatGPT (and Claude) are used to generate a lot of synthetic data to train other models

Is it true? The main part of training any modern model is finetuning, and by sending prompts to your competitors en masse to generate your dataset you're essentially giving up your know-how. Anthropic themselves do it on early snapshots of their own models, I don't see a problem believing DeepSeek when they claim to have trained v3 on early R1's outputs.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: