There is a tremendous amount of yelling about AI safety from two kinds of people, as far as I can tell.
There are the doomers — Mr. Yudkowsky and his disciples — who seem to care more about proselytizing a sort of depressing view of the world than doing anything. They seem so unwilling to change their mind about anything, and to me it feels like a sort of influencer cult for nihilism.
Then there are the boomers — most of the VC industry — who are developing a kind of cytokine storm reaction to any sense of AI fear. After a decade of being lambasted by the media for building companies and products, folks are aesthetically opposed to any kind of naysaying. Progress is good; more is good; risk is good.
I am not here to engage in the specifics of the AI Safety debate — there is oodles of that content online — merely to observe the two camps are not acting as judicious thinkers, in my view. Both parties are communicating emotions not ideas to each other.
This reminds me a little of the debate on climate change, which started as kind of “climate justice” religion, and has since morphed into more productive things like “carbon capture”, “more fusion”, “more fission”.
It’s a difference of debating theology versus technology. One theme of the theology debates is use of complex or vague terms. I find these “safety” concerns tend to get centered around words like “agency” or “consciousness” which quickly devolve into Yale-common-room-philosophy debates about what it means to be alive. I personally don’t find these productive or useful — and merely observe that any conversation that uses these words almost never produces productive results.
It might be helpful to start talking in specific terms — what exactly should one be worried about in the next 3 months, and how can we fix that? For example, a specific concern is that a looping GPT-4.5 might just fork-bomb itself on the web like an old school worm (remember ILOVEYOU) but much more intelligent. I think that is a fair risk. So, where are the sandboxing benchmarks?
Another example issue might be GPT-4.5 starts educating people on how to create synthetic biological threats which they can order online. It would seem extremely plausible to create benchmarks for this kind of output too.
As I type these words I can feel the arguments from the true safetyists, which will always center around some kind of T+5 reality (that is, GPT-9 is so smart it will cheat all the benchmarks and then the world ends). To that I’d say I’d consider the reflexivity of the human system — we can and will adjust what we do based on the reality at hand. And so the task should be, in my very humble view, to find and focus on immediate, useful, and specific benchmarks that guide AI labs in terms of what & how they should build future models. Less theology, more technology.