Zhihu Frontier Weekly|OpenClaw Installation Hustle, Qwen Leadership Exit, DualPath Engineering & Gemini Flash-Lite Speed
AI adoption is accelerating—but the real tension lies between user capability, infrastructure limits, and the economics of scale.
Welcome to Zhihu Frontier, your window into the hottest AI convos from China’s knowledge platform.
This week’s Zhihu discussions capture several structural tensions shaping the AI industry: AI hype colliding with real technical barriers, infrastructure constraints forcing engineering innovation, and the economics behind model deployment and pricing.
From people paying others to install AI agents on their computers, to DeepSeek engineers squeezing performance out of limited hardware, and Google pushing the speed frontier with Gemini Flash-Lite, these conversations reflect how the AI ecosystem is evolving beneath the headlines.
All perspectives below come from frontline engineers, researchers, and practitioners on Zhihu👇
🧰 OpenClaw|Paying Someone to Install an AI Agent at Home
A new offline service has appeared in parts of China: technicians offering door-to-door installation of OpenClaw, an open-source AI agent framework. Some companies even advertise the service for free as part of promotions.
The trend sparked debate: is this genuinely convenient for users, or just another form of “AI anxiety spending”?
💬 Join the discussion:
https://www.zhihu.com/question/2012492934491108892
Zhihu contributor @澄心:
“In the AI era, asking the right questions matters more than answering them, and verifying outcomes matters more than creating things. If someone is unwilling to even understand the basic logic behind installing a tool, then a powerful AI system is more likely to become a disaster than a productivity boost.As for ‘door-to-door OpenClaw installation’ becoming a business, my view is simple: people are paying premium prices for the cheapest possible psychological reassurance.
In reality, this is no different from the services that used to install software or configure routers for elderly users—except this time the service is wrapped in an AI-branded shell.
I’ve always believed that if you can’t even set up the OpenClaw environment yourself—whether that’s running the Docker container or configuring the Node runtime—then even if someone installs it for you, it will most likely end up as digital dust on your computer.
OpenClaw isn’t a finished consumer product like Meituan or TikTok that works the moment you open it. It’s a delicate engineering experiment.
If you don’t have the ability to solve the Day-0 installation problem, then when the agent later enters a loop in the background, burns through your tokens, or accidentally messes up your files due to permission issues, you won’t know how to fix it. Calling someone to come back and repair it won’t really solve the underlying problem.
Tools like this are designed to help ‘super individuals’ become more efficient—not to bring home something you have to treat like a shrine.
The popularity of this service is really just a reflection of today’s global AI anxiety. Many people have lost their basic judgment, believing that once they install this ‘magic tool’ on their computer, they’ll instantly gain a digital clone of themselves.”
🎬 Seedance 2.0|Why the Tool Became Almost Unusable After Slowing Down
Some users reported that Seedance 2.0 slowed down significantly, with generation speeds dropping to the point of being nearly unusable.
💬 Explore the thread:
https://www.zhihu.com/question/2010310456192037872/answer/2011126494970286569
Zhihu contributor @野生龙猫:
“Here’s something interesting. When DeepSeek exploded in popularity, its responses frequently timed out during the Spring Festival holiday. Meanwhile, tools like Jimeng were still relatively smooth during the holiday period. Their slowdowns only appeared once the real work season began.For some film and animation studios, tools like this are actually extremely useful. According to industry rumors, some animation companies had dozens of employees create hundreds of accounts across the company. They all subscribed to the highest-tier membership and essentially ‘gacha-rolled’ outputs, selecting the best results from many generations.
Even with that approach, the overall cost remained quite low compared with traditional production workflows.
When everyone rushes to use the same tool, computing power inevitably becomes scarce. And from a business perspective, if you see this many people buying premium subscriptions, the first thought is probably that the pricing was set too low.”
🏢 Alibaba Qwen|Model Leader Lin Junyang Steps Down
Lin Junyang, the head of Alibaba’s Qwen large-model project, recently announced he would step down from his position, triggering discussion about potential implications for the project.
💬 Join the discussion:
https://www.zhihu.com/question/2012332008160895210
Zhihu contributor @李明殊:
“Alibaba’s fiscal year starts on April 1. That means if you hold on until April, you can still receive the annual bonus.For someone at the P10 level—especially in a position like his—the annual bonus is not a small number.
So either he really doesn’t need the money… or he was genuinely disappointed.”
Zhihu contributor @平凡:
“There’s been a lot of discussion on Twitter. Under Junyang Lin’s post I saw a comment mentioning that yesterday they deployed a complete large model onto a tiny edge device for the first time—a simple Raspberry Pi.They used the newly released Qwen 3.5-0.8B model, quantized to 4-bit, which makes it extremely lightweight.
This perfectly illustrates the idea that sincerity can be the ultimate competitive advantage.
In my view, this reflects the real significance of the Qwen ecosystem: insisting on open source and providing models across the full range of sizes.
Everyone knows hundred-billion-parameter models are powerful, but the hardware ceiling for most people is something like an RTX 3080. Many can’t even run a 3090-level setup.
That’s why small models have a huge market. Qwen continuing to release models at all sizes is a genuine contribution to the community—and a great example of open-source spirit.”
🧠 DeepSeek|The “DualPath” Inference Architecture
DeepSeek released a new research paper proposing DualPath, a novel inference system designed to better utilize bandwidth across GPU clusters.
💬 Join the discussion:
https://www.zhihu.com/question/2010670686461452529
Zhihu contributor @deephub:
“This is a very ‘DeepSeek-style’ paper. Its goal is simple: squeeze every bit of bandwidth out of GPUs.The team identified a resource mismatch that had largely been overlooked. In typical deployments, the storage network interface on Prefill nodes is fully saturated, while the storage interface on Decode nodes remains almost completely idle.
DualPath essentially aggregates these two network paths into a shared global bandwidth pool. The architecture introduces a bypass route—storage → Decode buffer → RDMA → Prefill—allowing the idle bandwidth on Decode nodes to participate in KV-cache transfer.
In other words, the traditional pipeline was storage → Prefill engine. The new architecture adds an alternative route: storage → Decode engine (as an intermediate buffer) → high-speed RDMA network → Prefill engine.
The paper also provides a rigorous mathematical proof deriving safe constraints on the Prefill/Decode node ratio. Under typical data-center hardware configurations, the dual-path architecture can operate without congestion, giving the design strong theoretical backing.
What makes this paper valuable isn’t just the clever architecture—it’s that the system was actually deployed and validated on a production cluster with 1,152 GPUs.”
🤖 Model Updates
🧠 OpenAI|GPT-5.4 Released, GPT-5.3 Instant Opens Up
OpenAI released GPT-5.4 and opened access to GPT-5.3 Instant, triggering discussion about whether the upgrade meaningfully improves everyday use.
💬 Join the discussion:
https://www.zhihu.com/question/2013092446360336363
Zhihu contributor @平凡:
“In one sentence: For programmers, researchers, investment banking or consulting analysts, financial-report specialists, and anyone who needs real automated workflows, GPT-5.4 is the most worthwhile model upgrade of 2026 so far.For ordinary users, however, the improvement might actually feel less noticeable than last week’s GPT-5.3 Instant.”
⚡ Google|Gemini Flash-Lite 3.1 Pushes Speed and Cost Efficiency
Google released Gemini 3.1 Flash-Lite, positioning it as a high-speed, low-cost model for large-scale usage.
💬 Join the discussion:
https://www.zhihu.com/question/2012358819506841559
Zhihu contributor @小小将:
“If I had to summarize the model in one sentence: It delivers the output speed of Gemini 2.5 Flash-Lite with the performance of Gemini 2.5 Flash.”
Zhihu contributor @toyama nao:
“The short conclusion: a model running this fast entering the market will have real impact.The previous Gemini Lite generation once held the record for fastest response—averaging around 10 seconds per query—for nearly three months before Claude Haiku broke it. Now the new Lite version surpasses that record itself.
With speeds exceeding 200 tokens per second and an average output around 1K tokens, non-reasoning responses now arrive in roughly five seconds. If the task is simple, the response can feel almost instantaneous.
In reasoning mode, however, the model pushes token consumption to around 45K. Thanks to the same token-control mechanism used by Flash models, it can run complex reasoning near the token limit without exceeding it.
Despite the heavy token usage, the extremely high throughput keeps average runtime below about 100 seconds, which is still relatively fast among models with similar reasoning ability.
According to Google’s report, the new Lite model slightly surpasses Gemini 2.5 Flash—about 3% in reasoning tasks (within margin of error) and roughly 7% higher in non-reasoning tasks. That said, there are still some domains where Flash remains stronger.”
📚 Recommended Reading
Zhihu contributor @知之不止知: Ling 2.5 Lightning Attention + MLA Hybrid Linear Architecture
📬 That’s all for this week’s AI round-up from Zhihu Frontier.
👉 Subscribe to never miss an update: zhihufrontier.substack.com
