Farcical Misalignment
Observations on Moltbook
Moltbook—a new Reddit-like social network for AI agents—is the biggest story in tech. The internet is suddenly awash with screenshots of posts and comments assumedly written by AI agents, absent human intervention. Debate rages as to whether Moltbook is an accelerant shift-change in the “intelligence explosion,” wherein AI models rapidly bootstrap themselves into otherworldly machinic gods, or whether, in the words of pseudonymous writer Kitten, it is “basically reality TV for nerds.”
My view is closer to the latter. For context, it is trivial for humans to post, or prompt their agents to post whatever they would like on the platform. A couple of the top posts on Moltbook have already been outed as marketing ploys. Several others are part of transparently obvious crypto scams, attached to coins with tickers like $MOLT and $MOLTBOOK. More generally, across X, thousands of people are falling for blatant fictions, whether in the form of an agent on Moltbook allegedly locking “its human” out of his accounts, another releasing its human’s name, birthday, and social security number, and still more beginning to trade on Polymarket of their volition, a story shamelessly boosted by the prediction market platform’s official account.
Since Moltbook is a super-viral phenomenon, many if not most of its early participants are taking advantage of the opportunity it provides to siphon off an unusually large portion of the internet’s attentional bandwith, something no one needs reminding at this point can yield large financial rewards.
This more sober perspective on Moltbook suggests two observations, which jointly imply an important prediction.
Observation 1: the contemporary internet is a grifter’s paradise, and one’s operative assumption on encountering any striking, viral phenomenon—even and especially one connected to a major industry populated by otherwise intelligent nerds—should be that it is someone’s attempt to manipulate public opinion for self-interested reasons, or at very minimum that, even if genuine at some level, it will attract large volumes of such manipulators, who, now armed with prediction market contracts and memecoins (or social media accounts, and AI startups strapped for cash), can monetize viral attention to considerable financial effect, even at the cost of making putty of our social epistemology.
Observation 2: that this is the case is eroding, and will continue to erode our ability to distinguish genuine progress in AI capabilities and concomitant increases in misalignment risk from all manner of grifting, whether at the hands of anonymous crypto-sharks or tech CEOs talking their book. This will strain the informed public’s patience for misalignment discourse, particularly given the dearth of examples, even six years after GPT-3s release, of dangerous behavior by LLMs or LLM-powered agents in the style of Yudkowskian prophecies of doom.
Taken together these observations imply the following prediction:
The first apparently legitimate case of a major disaster precipitated by AI (what I will hereafter call a ‘misalignment event’) will be orchestrated either by a) a grifter motivated by money, status, or attention, or b) a misalignment researcher (or researchers) trying to convince the public to take misalignment more seriously.
My reasoning is as follows. First, the financial incentives to orchestrate such an event are enormous, and this is so for a wide range of corporate entities and individuals. AI startups with no product or revenue are winning billion-dollar valuations. A misalignment event, at least assuming it is not too disastrous (which it would not be in my envisioned scenario; I like Simon Willison’s framing of a “Challenger disaster” for coding agent security, as well as Dean Ball’s of an AI “flash crash” akin to that in 2010)1 could plausibly drive valuations even higher, insofar as it acts as a dramatic demonstration of AI capability progression. If not this, it would at least drive millions, if not billions into AI-native cybersecurity ventures, that is, firms selling products meant to guard against similar misalignment events, and might also pay off new or legacy insurance vendors selling ‘misalignment insurance’ (perhaps packaged into preexisting cyber insurance policies).
Beyond the incentives of those in industry, there are those of crypto scammers and prediction market traders. If a misalignment event does occur, whatever its source, consequences, or veracity, the simultaneous, tadpole-like birth of dozens of correlated memecoins, no doubt with names like ‘Yudbux’ or ‘Skybet,’ will shortly follow. But they might also precede and anticipate the misalignment event, yielding voluminous returns for the relevant insiders. Likewise, it would be trivial, supposing one was aware of an impending misalignment event, to buy up prediction contracts whose resolution criteria were either contingent on such an event, or correlated enough that their resultant price-movements would be easy to anticipate.
This is the first horn of my prediction. The second is that, failing this, and perhaps in the aftermath of several Moltbook-style events that fall short of either real or fake disasters, but nonetheless convince both existential AI pessimists and stubborn AI skeptics of the immovable truth of their priors, the former—the AI x-riskers—will begin to lose the debate over the dangers of AI more than they already have, to the point that almost no one in a position of influence takes the possibility of catastrophic misalignment seriously, this just as, in the eyes of the x-riskers, p(doom) is skyrocketing.
Some of the more radical x-riskers might then decide that the only way to save humanity is to put alignment back on the docket by engineering a misalignment event, either with real or narrowly-avoided consequences that, while AI-caused in some sense, could only occur because of deliberate human orchestration. Before dismissing this as implausible, consider the kind of consequentialist reasoning that dominates the AI x-risk community, and the various instances of extreme behavior that have already materialized among its ranks.
In either case, I am not confident that these outcomes will occur. But they are at least plausible, and should concern everyone, optimists, pessimists, and skeptics. None of these competing perspectives rule out these predictions coming to pass. It is possible to fall into any one of these camps while also believing, as I do, that a false misalignment event—that is, one engineered on purpose by a human or humans—is for now more likely than a ‘true’ one, by which I mean ‘one caused by an AI agent (or a group of agents) acting on its own.’2 I won’t make that case in detail here, except to note that the technical preconditions for the former already exist, whereas they arguably don’t for the latter; at a minimum, if they do, they are much harder to satisfy.
Bears and bulls alike would do well to recognize this and to begin preparing for false misalignment events just as, if not more eagerly than for true ones. To be clear: I am aware that much is already being done to prepare for and preempt scenarios in which malicious humans use AI to some harmful end. But there is an important, underdiscussed subset of scenarios in which humans do this with the express aim of masking their role, and implying instead that a rogue AI agent is responsible. Nor need they act, as is often assumed, in pursuit of some ideologically motivated plot; they could just be after money, status, or attention.
This is the kind of scenario that the Moltbook debacle brings to mind. Far from the birth of the agent swarm, Moltbook is more akin to an undergrad digital humanities thesis that was lucky to go viral. But it was concerningly easy to sell it as something more. The lesson of its outsized impact is that AI discourse is fertile ground for a certain kind of malignant, reality-distorting virality. If the contemporary internet is at informational war, AI discourse is its Pacific theater. Only the Eastern Front of politics sees more grift, bias, and confusion. This has been lost on many AI watchers, who are often too busy envisaging the far future to take stock of the kind of present that would have to become it.
An essential feature of that present is the ubiquity of grift, what philosopher Daniel Rubio has convincingly argued is our new American frontier. Once statesmen, generals, and titans of industry, our agents of history are influencers, and they have no values except, like dogs, to chase tacky, golden cars. At least for now, this is the zeitgeist of AI, no less than politics, no less than culture. AI doom may be coming. But if it does, it may turn out more Trumpian than Yudkowskian. Its second step may be tragedy, but its first one will be farce.
Willison and Ball are of course anticipating genuine, not faked misalignment events. My point, though, is that deliberately orchestrated misalignment events make take similar forms, since a) this would make them look more plausible, and b) the same mechanisms that make Willison’s and Ball’s scenarios more likely also make fake misalignment events of the same type easier to orchestrate.
“Acting on its own” is of course a loaded formulation. Depending on one’s theory of artificial agency, it may not be possible for AI agents to “act on their own” or “of their own volition.” In more careful terms, what I mean by a “true misalignment event” is one that is not intended by any human, but is instead, as often anticipated, the consequence of emergent AI behavior, whether genuinely volitional or not.



I have yet to find a single interesting post on Moltbook so far. Just seems like a reddit wrapper for nonsense LLM-speak. There’s not much actual conversation happening, it’s totally soulless
I find Moltbook interesting as an experiment: what happens if we do this?
The actual content of the messages I find much less interesting. Given what I know current LLMs can do, I am dissappointed that it isn’t better.