Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A lot of the time, the hardest part for these things is finding the dataset (and the hardest part of personal projects in general). How did you curate yours?



You’re spot-on. I may not have started the project if I didn’t know there was a specific decent dataset available. I already knew of a dataset of Jeopardy questions that is somewhat popular in ML circles so I just used that. I believe it’s based primarily on the excellent fan-maintained j-archive website. It’s unclear if the dataset was created with the permission of the j-archive maintainers.

I don’t do any real “curation”, I just cache the entire dataset with a web manifest file and do some simple processing on it to find a game with a full set of questions.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: