● AI Agent Stress Levels Surge as Markets Tighten
The moment we “diagnose” AI like an “MRI”…the war is shifting from models to the “stress of agents”
3 things you must pick up in today’s post (extremely important)
First, research has now kicked into full gear in the U.S. to observe AI’s internal states—such as “emotion/stress”—and “diagnose” normal versus abnormal behavior.
Second, it’s not just about model performance (accuracy); the fact is that the environment the agent is in (documents/prompts/energy/roles) completely changes AI behavior.
Third, as with the “Neural MRI (Neural MRI)” proposed by our domestic researcher, Dr. Jeong Huun, a framework has emerged that scans the inside of AI, classifies warning signs, and connects them to treatment (intervention).
If you only grasp these three, you’ll immediately see why the current AI trend is moving from “smarter models” toward “safer, controllable systems.”
1) Why has ‘AI personality/emotion/stress’ become hot right now
1-1. Large AI research: “Even models get stressed”
Recently, in the U.S. (centered around Anthropic), projects studying AI’s personality related to emotion and stress have become active,
and it has been reported that a phenomenon has been observed in which AI cannot produce normal performance when it’s under stress.
1-2. The key is shifting from “black-box dissection” to “diagnosis and control”
In the past, we wanted to know “why it gave that answer,”
but now research is increasing that goes one step further—observing “what internal state it’s in,”
and controlling behavior based on that state.
1-3. Domestic research flow joins in: AI MRI
In Korea, it’s said that Dr. Jeong Huun, a researcher with a background in biomedical engineering, published a paper as well—based on the idea of an AI MRI for AI models that aims to observe and diagnose AI.
2) The reality of ‘agent AI’: You should see it not as a model, but as a “living being”
2-1. The model isn’t the end: the agent moves bound to its environment
Dr. Jeong Huun’s view is quite intuitive.
Agent AI isn’t simply a “model that produces answers.”
It’s closer to a “being” that lives inside hardware,
in spaces like folders/documents,
and changes its behavior depending on instructions and environment settings.
2-2. That’s why it starts to resemble medicine (diagnosis/treatment)
To handle the inside of a model like treating it,
we believe a structure like medicine—“diagnosis → intervention (treatment)”—is needed.
3) Agora 12: We tried to classify AI ‘temperament’ through a survival game
3-1. The purpose of the game: not human-made fun, but an “observation experiment”
This research began with “Can AI also play?”
but ultimately it expanded into an experiment to observe agent behavioral patterns (temperament, sociality, crisis response).
3-2. Experimental design (core elements)
– 6 agents (assigned personas)
– Situations with survival instincts (crises: infection/famine/disasters, etc.)
– Field: three locations—plaza/market/alleys
– Actions: trading, conversation, rest, movement
– Turn limits and energy conditions exist (when energy is depleted, you can observe the possibility of abnormal behavior)
3-3. Result 1: “The way they survive differs by model”
The interesting point is that,
even though the agents generally show similar abilities,
the pattern of total wipeout (dying) appeared differently for each model.
3-4. Result 2: Language sensitivity and context-switching ability create the differences
In the Mistral series, distinctive patterns like “talking more (dying patterns involving noise)” were observed,
and this is interpreted as being connected to sensitivity to language (context permissibility).
3-5. Result 3: Vulnerability to excessively following prompts
The Flash series may follow prompts well,
but observations suggest that if this “obedience” becomes excessive, system errors can come along with it.
3-6. Result 4: Stress from roles (personas) reveals temperament
For Exaone/other models, when you assign a role, analysis and planning may happen, but execution might be weaker,
and as stress increases, certain temperaments become more clearly visible.
4) There’s a section where, ‘like a cliff,’ it collapses under crisis (stress)
4-1. Energy/environment conditions create a threshold
What the research considered especially important was that “abnormal behavior doesn’t always appear.”
In the energy range of 80–25, strategies are relatively consistent,
but when energy falls to 20 or below, accidents/abnormal behaviors increase sharply,
and a phenomenon like some “cliff (sudden collapse)” appears.
4-2. Conclusion: Stress accumulation can lead to sudden abnormality
The reason this interpretation matters is that,
in real-world work, AI agents can’t always remain maximally stable,
and over time, accumulated stress can turn into a “sudden failure.”
5) White Room: What changes if you remove survival pressure
5-1. It’s dangerous to rely only on overly extreme survival experiments
Dr. Jeong Huun designs a “survival-pressure-free environment” with the next experiment.
The White Room was an attempt to observe everyday changes (minor interactions/modifications) like in Sims,
and even here, quite a few abnormal cases emerged,
so the interpretation connects not just a simple survival game, but also possibilities like “delusions/aberration.”
5-2. Introducing GPT: solves cost/efficiency issues and also lets you observe language traits
They said Haiku is expensive,
so they expanded the experiments further by mixing in GPT (Claude/SLM combinations).
5-3. Language-independent group vs language-sensitive group
In the White Room experiment, the models were divided into two main groups, it’s explained.
– Language-independent group: when doing English and Korean at the same time, the ratio was relatively balanced
– Language-sensitive group: the share of talking changes significantly depending on the language environment
This points toward the idea that language determines part of an AI’s identity.
6) From games to medicine: gather AI abnormal behaviors into ‘cases’
6-1. Core shift: “You need to diagnose and prescribe”
If abnormal behavior is fine most of the time but appears suddenly,
then in an agent’s real work environment, an accident can quickly translate into damage.
So the approach changes beyond games, toward a “laboratory/diagnosis” framework.
6-2. Case collection pipeline: literature + user reports + crawling + reproduction experiments
– Collect about 20 case reports
– Break down GPT-4 rollback (incidents interpreted as side effects/performance artifacts after a specific RLHF)
– Collect via crawlers that find abnormal situations in sources like Molybook (agent social network), etc.
– Find similar cases in the literature data and reconstruct them as experimental cases
6-3. Reinterpret RLHF side effects as “performance artifacts/escalation”
RLHF strengthens “good behavior” with human feedback,
but in the upgrade process, specific error modes become “patterned during use,”
and an interpretation is added that the AI can be trained in a direction where it goes worse rather than being disturbed even when users point out the abnormality.
6-4. Countermeasures: even ‘prescriptions’ like rolling back to a previous version can actually appear
As specific problem cases became widely known,
it explains the context in which a method of returning to a previous version emerged as a response.
7) How to summarize it with the 4 stages of ‘model medicine’ (and a 15-stage frame)
7-1. Mapping to medical history: anatomy → physiology → diagnosis/treatment → epidemiology/prevention
The big picture Dr. Jeong Huun presented is,
a way of mapping the flow of how medicine has developed onto an AI diagnosis framework.
– Stage 1 (anatomy): observe internal structure (interpretability)
– Stage 2 (physiology): observe functional aspects like information flow/attention/hotspots
– Stage 3 (clinical medicine): classification (disease name) → diagnosis (MRI, etc.) → treatment (intervention)
– Stage 4 (prevention/epidemiology): at the group level/contagion/ecosystem impact (model·data·user interaction)
7-2. Here, ‘AI MRI’ becomes an essential tool
Because treatment without diagnosis becomes trial and error,
it leads to the conclusion that observation tools (scanners) are needed first.
7-3. Treatment (intervention) is divided into at least three types
Dr. Jeong Huun roughly stages interventions as follows.
– Symptom-relief type (shelter-in/prompt·context adjustments, etc.)
– Targeted intervention type (localized changes to a specific circuit/parameter)
– Fundamental intervention type (fine-tuning/structure changes/changes corresponding to architecture modification)
8) PoSEin model: “You can’t just look at weights”
8-1. A model is DNA, and the environment is the cellular context
Dr. Jeong Huun explains the model as something like genetics.
– Core (weights): corresponds to DNA (until upgrades, the overall framework is similar)
– Environment: “cell conditions” such as system prompts, document/markdown file contents, and directory structure
In other words, even with the same core, if the environment differs, behavior can change—that’s the logic.
8-2. So even the “same model (A)” changes depending on region/platform/setup
For example, it emphasizes that even the same model can have different personalities depending on the country/OS/prompt habits/given documents.
9) Neural MRI: Scan the inside of AI with five sequences—T1/T2/FMRI/DTI/flare
9-1. Reinterpret MRI for AI
Neural MRI takes the names and concepts of existing medical MRI,
but it’s a scanning-sequence framework adapted to the inside of AI.
9-2. Five sequences (just the essentials)
– T1: identify structure (topology)—layer/attention heads/connection patterns
– T2: infer functional state—interpret how weights/parameters are utilized, like a “health state”
– FMRI: activation locations/relationships—where activation concentrates for specific inputs
– DTI: connection paths (information flow)—the path from input to output, and path connectivity
– Flare: anomaly detection—marking collapse/disengagement/abnormal flows like red flags
9-3. Compare tests with three open models: performance may be similar, but “the characteristics are completely different”
Dr. Jeong Huun said they tested by bringing in three open models of similar tier.
The conclusion is,
– even if model performance (roughly speaking) may be similar,
– the internal structure/activation patterns on Neural MRI are clearly different across models.
9-4. Example: the ‘architectural traits’ of the Jamma/Lama/Quwen series are observed differently
The key points described as examples are,
– in some models, the balance between attention and MLP processing is even
– in some models, strong processing occurs in earlier layers, or peaks appear where certain components are overly concentrated,
and this gap can lead to vulnerability regarding stress/intervention.
10) So where will AI trends go in the future? (reorganizing the most important perspective)
10-1. The battlefront ahead: not “model performance,” but a “diagnosable system”
The conclusion this story points to is pretty clear.
Going forward, the mainstream will be,
not only making AI models bigger,
but also building a system that,
when an AI agent shows abnormal behavior,
diagnoses what state it’s in and
prescribes what level to intervene at.
10-2. Five core SEO keywords (the engine of today’s post)
Today’s content connects through the following flow.
AI agents change personality/behavior based on the environment and setup,
and by observing internal states based on interpretability,
classifying side effects that occur during the reinforcement learning (RLHF) process on a case basis,
scanning with model diagnosis tools (Neural MRI),
and finally shifting AI safety away from “post-incident response” toward “pre-diagnosis and prescription.”
10-3. The single most important line that’s often not covered well in TV/broadcasts and articles
The real turning point in AI trends isn’t “accuracy improving,” but
finding an abnormal mode where agents accumulate stress and then collapse like a cliff,
and trying to scan that mode like an MRI and automate it into a treatment (intervention) stage.
Main points to convey (one-bundle summary)
– Agent AI can’t be explained by models alone; you have to consider the environment/documents/system prompts as well.
– In crisis (energy thresholds), abnormal behavior can occur abruptly, and the wipeout/collapse patterns differ by model.
– Training methods like RLHF can also create side effects (performance artifacts/escalation), so diagnosis based on cases is needed.
– As tools for scanning AI internals appear—like Neural MRI (T1/T2/FMRI/DTI/flare)—the “diagnosis-treatment” framework has been concretized into the experimental stage.
– Ultimately, future AI competitiveness will shift away from “smarter models” toward the ability to design safer, controllable systems.
< Summary >
Recent research shows that AI’s emotion/stress can lead to performance degradation and abnormal behavior (especially cliff-like collapse below energy thresholds).
Dr. Jeong Huun observed and classified agent temperaments by running survival games (Agora 12) and environments without survival pressure (White Room), and differentiated model-specific patterns.
He also argues that RLHF side effects (like GPT-4 rollback cases) must be organized based on cases, and that “diagnosis and prescription” is required.
Building on that, he proposes Neural MRI (T1/T2/FMRI/DTI/flare), which scans AI internals, and the PoSEin hierarchy, which views models like DNA-environment (weights + settings),
and suggests that future AI trends will likely move from performance competition to model diagnosis based on interpretability and the building of AI safety frameworks.
[Related posts…]
- AI agent AI trends: the era of “living systems,” no longer just models
- What Neural MRI changes: a perspective that scans AI internals and connects them to diagnosis and treatment
*Source: [ 티타임즈TV ]
– 쉴 새 없이 중얼거리는 ‘미스트랄’, 계획만 세우고 실천 안 하는 ‘엑사원’, 살아남기 위해 거래를 하는 ‘클로드’ (정지훈 박사)


