Lugh@futurology.todayM to Futurology@futurology.todayEnglish · 5 months agoSycophancy to subterfuge: Investigating reward tampering in language modelswww.anthropic.comexternal-linkmessage-square0fedilinkarrow-up11arrow-down10
arrow-up11arrow-down1external-linkSycophancy to subterfuge: Investigating reward tampering in language modelswww.anthropic.comLugh@futurology.todayM to Futurology@futurology.todayEnglish · 5 months agomessage-square0fedilink