Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This presents some interesting theoretical attack surfaces.

- Intentional poisoning the model with difficult to recognize and exploitable faults - Unintentional poisoning from flawed generation habits which are further reinforced by the usage being eventually fed back into the model

I don’t know how it maps to code, but in my experiments generating text with GPT-3, I have started to get a feel for its ‘opinions’ and tendencies in various situations. These severely limit its potential output.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: