AI Automation Frenzy

·

·

● AI Agents Go Full Automation

3 decisive signals that AI is shifting from a “answering model” to an “agent workflow that gets it done to the end”

Today’s news is exactly three pillars.

1) The fact that Anthropic is testing Claude in a form that is “always on, responsive to triggers, and runs in its own environment”

2) The fact that Z.ai公開ed a screen-aware (vision) coding model designed to take recognition of the screen and carry it through to real coding/work

3) The fact that Alibaba is pushing “repository-level engineering + agent execution” with Qwen 3.6 Plus, which puts a 1M (1 million) token context front and center

These three are converging in one direction.
It’s moving from a setup where the model stops at the “answering” stage to one where it keeps working after looking at the screen (observing) → reasoning (thinking) → connecting with tools/systems (acting).

And if you pin down this flow with SEO keywords, it looks like:
AI agents, vision-based coding, context windows, multimodal workflows, software automation are threaded throughout the whole news.


1) Anthropic: Testing “Claude CONWAY” as a separate ‘standalone environment’ for agents

[Key takeaway] This reads like a signal that Claude is evolving beyond the chat window into a persistent agent environment (instances) that appears as an option in the sidebar.

1-1. Conway is closer to “persistent (always-on)” than “sessions”

  • Instead of normal chat, users select Conway as a separate option (sidebar)
  • When clicked, a “Conway instance” runs
  • Internally, it’s also expressed in a way that’s closer to a resident/maintained agent workspace rather than a simple session
  • In other words, it’s not structured around “the model answers and stops,” but around work units that carry state

1-2. Agent workspace: chat/search/system split apart

  • Chat: the conversation function you’d generally expect
  • Search: appears to be connected to an experimental hotkey
  • System: the real differentiator
  • Manage the agent environment
  • Install/Connect extensions
  • Add UI tabs
  • Configure context handlers

1-3. The “CNW ZIP” extension ecosystem: operating like a ‘platform,’ not just a model

  • Preparing Conway extensions in a CNW ZIP file packaging format
  • A direction where developers package tools and attach them “like an app” inside the agent environment
  • Why this is important:
  • In the future, Claude is likely to become more than a single model—more like an execution platform where tools are plugged in and wired together
  • Ultimately, the competitive point shifts from “performance only” to “extensions/integration/operating structure”

1-4. Connectors/tools + Chrome toggle: the browser enters the agent loop

  • Display the tools exposed to the connected client
  • There’s a toggle for Claude (browser) to connect directly to Conway
  • The browser itself can become the agent’s input/workspace
  • This reads less like a simple demo and more like a signal to create a real-world loop (observe-act)

1-5. Webhook trigger: a structure that works even if you don’t leave it “open”

  • A webhook system is embedded inside Conway
  • External services call it via a public URL → the agent “wakes up”
  • So it’s not about keeping the user waiting, but moving toward an always-on agent that runs event-based
  • This also aligns with Anthropic’s clawed code/agent workflow direction

1-6. Improved developer experience: Claude Code “NO_FLICKER mode” + mouse support

[Key takeaway] The point isn’t only that agents are getting stronger—it’s also that they improved “terminal UX” so developers can actually use it more comfortably.

  • NO_FLICKER mode
  • Fixes flicker/jumps/long-session performance degradation commonly seen in terminals
  • Updates via screen buffer method (updates only the observable area) instead of full re-rendering
  • Stabilizes CPU/memory usage (considering long conversations and even multimodal multi-agent workflows)
  • Full mouse support
  • Cursor position via clicking
  • Clickable tool output expansion
  • Click a URL to open immediately
  • Click a file path to open in the editor
  • Drag to select → automatic clipboard copy
  • Smooth scrolling-wheel navigation
  • More precise selection units with double/triple click (word/line)
  • One-line trade-off
  • Some native search shortcut keys may behave differently (experimental)

2) Z.ai: Aiming for “screen awareness + coding/work” in one go with GLM-5V-Turbo

[Key takeaway] It’s not just about having the ability to see the screen—it’s about connecting directly to a screen-to-coding/workflow flow.
As the name suggests, “5V Turbo” focuses on vision(Vision)-based coding and the vision agent workflow.

2-1. A direct hit on the existing problem: “can see, but can’t get the work done”

  • Many multimodal models
  • describe images well, but
  • have weaknesses in connecting them to actually useful code/actions
  • GLM-5V-Turbo claims it’s designed to handle both sides “at the same time”

2-2. Inputs include screens/documents/videos: accepting real work formats as-is

  • Supported scope (main points)
  • Images, video, UI layouts, design mockups
  • Dense documents
  • Given real-world workflows, the core is this:
  • In practice, it’s not “clean text”
  • It includes “messy evidence” like broken screens, PDFs, bug screenshots, and problem videos
  • So the model needs to properly capture visual grounding to make real work possible.

2-3. Technical keywords (company claims): Cogvit Vision Encoder + MTP + speed/long-output optimization

  • Preserve fine visual details/layout with Cogvit Vision Encoder
  • Strengthen handling of speed and long outputs with MTP (multi-token prediction)
  • Translated into one line:
  • “Observe clearly (preserve) → think fast (predict) → produce long work results (long output)”

2-4. A 200,000 context window + simultaneous multi-task training

  • 200K context
  • A strategy to handle long documents, large codebases, and long flows based on vision/video in one go
  • Train 30+ tasks at the same time (claim)
  • Including STEM reasoning, vision grounding, video analysis, and tool use
  • It’s not a model that only does one capability well—
  • You can summarize it as aiming for the entire chain that goes from observing → understanding → continuing to the next action.

2-5. Optimizing the agent workflow: oriented toward OpenClaw/Clawed Code

  • Optimized so the agent moves in a screen-based work environment
  • Example:
  • Look at the screen and help with setup
  • Decide the next action by reading the screen
  • Proceed step-by-step like real computer work
  • It also mentions linkage with cloud code
  • Show screenshots/bug situations and it suggests code
  • “Pointing instead of explaining” becomes natural

2-6. Benchmark mentions: evaluating multimodal coding/agent execution

  • CCbench, V2, Zclaw Bench, Claw Eval, etc.
  • The key is that the test isn’t only about “understanding visually”—
  • multimodal coding
  • multimodal multi-step agent execution
  • producing useful results
    That’s what they’re set up to measure.

3) Alibaba: Accelerating “project-level agent coding” with Qwen 3.6 Plus + 1M context

[Key takeaway] The strongest number here is 1M tokens (1 million) context.
And the goal isn’t “a chatbot demo,” but repo-level engineering + real execution.

3-1. “Capability loop”: repeat perception-reasoning-action in one workflow

  • What Alibaba emphasizes is the “full capability loop”
  • In other words, it’s not just answering once and stopping;
  • decomposing tasks
  • doing step-by-step work
  • running tests/corrections
  • moving forward continuously
  • Especially in coding, as agent reliability and repeat execution become more important, this direction aligns well with the market trend (agent workflows).

3-2. Repository engineering: handling the whole project, not snippets

  • Not a single code fragment;
  • perform work across the entire codebase
  • Meaning:
  • Keeping long-form context
  • Tracking dependencies across files
  • Needing multiple edit/validation loops
  • That’s why it matches the nature of 1M context.

3-3. The 1M context window: the foundation for an agent to “keep memory”

  • 1M tokens means the agent can contain more information at once (long documents/large code/long instructions)
  • What an agent needs isn’t really “short Q&A,” but
  • what it did previously
  • which files/tools were important
  • what work still remains
    Maintaining that context.
  • As context grows, the maintenance cost and omissions go down.

3-4. Preview on OpenRouter + (for now) free access: expanding experimental accessibility

  • Offered in a preview form on OpenRouter
  • Currently, free access based on 1M context is mentioned
  • This can also be viewed as a mechanism to help “developers try it quickly and attach it to workflows.”

3-5. Efficiency/reliability: hybrid architecture + stronger agent execution stability (claim)

  • Improving the hybrid architecture →
  • efficiency
  • reduced energy consumption
  • improved scaling
  • It explains that it strengthened inference/agent execution reliability compared to the 3.5 series.

3-6. Deployment-oriented: Wukong (enterprise automation) + agent tool integrations

  • Wukong: a multi-agent platform for automating enterprise work
  • Mentions connections with OpenClaw, Claude Code, Klein, etc.
  • Also for multimodal:
  • parsing dense documents
  • real-world visual analysis
  • long video inference
  • screenshots/hand-drawn wireframes/mockups → generating frontend code
    It describes everything centered on “real work inputs.”

The most important turning point to take away from this news

The core is one thing.

The market’s competitive point is moving
from “how convincingly it answers”
to “whether it can keep repeating observe-reason-act to the very end inside the agent workflow”.

So even though the three companies’ directions differ (Claude CONWAY vs GLM-5V-Turbo vs Qwen 3.6 Plus), they share the same denominator.

  • Persistent (always-on) agents: triggers/webhooks/independent instances
  • Screen-based multimodality: treat screenshots/video/layouts as “work inputs”
  • Context expansion: keep project/long instructions with 200K~1M scale
  • Software automation: finish results through repo-level coding, tool integration, and repeated execution

In short:
AI is moving beyond being a conversation partner toward becoming the “agent-like entity that runs work,” like an operating system.


5 questions to check from an investment/work perspective in this updated field

1) Does the workflow I use connect not just “chat,” but also “event triggers/tool execution”?
2) Does it take screens (screenshots/videos/layouts) as real work inputs and carry that through to the results?
3) Is the context window large enough so project-level work doesn’t break apart?
4) Is the structure one where you can support extensions or tool connections?
5) In multi-step execution, does it reliably work “all the way to the end”?


Main content to convey (one-line summary)

As these three come together—Claude CONWAY’s persistent agent structure, Z.ai’s screen-aware coding, and Alibaba’s 1M-context-based project execution—AI agents are entering the “business workflow automation” stage in earnest.


< Summary >

  • Anthropic tests Claude with a Conway persistent agent environment, showing an always-on execution direction with an extension ecosystem (CNW ZIP), connectors, and a webhook trigger
  • Claude Code improves long-session developer UX with NO_FLICKER mode and full mouse support
  • Z.ai has GLM-5V-Turbo designed to more directly understand screen/layout/document/video inputs and carry it through to agent coding (claims include a 200K context and simultaneous multi-task training)
  • Alibaba ships Qwen 3.6 Plus with 1M context by default and puts the perception-reasoning-action loop and repository engineering front and center
  • Conclusion: Competition is moving from “answers” to “the ability to get it done to the end inside the agent workflow”

[Related articles]

*Source: [ AI Revolution ]

– Anthropic’s New Claude CONWAY Is Unlike Any AI Before


● AI Agents Go Full Automation 3 decisive signals that AI is shifting from a “answering model” to an “agent workflow that gets it done to the end” Today’s news is exactly three pillars. 1) The fact that Anthropic is testing Claude in a form that is “always on, responsive to triggers, and runs in…

Feature is an online magazine made by culture lovers. We offer weekly reflections, reviews, and news on art, literature, and music.

Please subscribe to our newsletter to let us know whenever we publish new content. We send no spam, and you can unsubscribe at any time.

Korean