Back to blogs

AI in Practice: Signal Over Noise

AI in Practice: Signal Over Noise is a filtered roundup of the most relevant Applied AI developments that shows real movement, not just momentum. It brings together updates across research, models, and tools, with a focus on what holds up in real-world use and what that means for teams building with AI.
This section is part of our newsletter, where we curate these signals on a regular basis.

Research Update

SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?

Research Update

Adding “skills” to AI coding agents is meant to improve how they perform on real software tasks. This paper tests that assumption across real GitHub projects and finds the impact is limited. Most skills make no difference, some help in specific cases, and a few even reduce performance when they clash with the project context. The overall improvement is marginal. The conclusion is clear: more capabilities do not automatically translate into better outcomes. What matters is how well those capabilities fit the task.

Why it matters:
AI systems are moving into production, where performance is judged by outcomes, not features. This shows that task-model fit and evaluation matter more than adding capabilities, reinforcing the need for grounded, system-level engineering over feature stacking.

Model Update

Google Opens Gemma 4 Under Apache 2.0 with Multimodal and Agentic Capabilities

Model Update

Gemma 4 is less about scale and more about accessibility with intent. By releasing a full family, from lightweight edge models to a 31B dense variant, under an Apache 2.0 license, Google is pushing capable models into developer-controlled environments. The addition of native multimodality and long context windows signals readiness for agentic workloads, not just prompt-response tasks. The standout detail is efficiency: the top-end model competes with significantly larger systems, tightening the gap between open-weight and frontier performance. This lowers the barrier for teams that want control without sacrificing capability.

Why it matters:
High-performance, open-weight models reduce dependence on closed APIs and accelerate enterprise adoption of agentic systems.

Tool Update

Anthropic launches Claude Design Launch

Tool Update

Claude Design expands the role of LLMs from text generation into structured visual output. It enables users to co-create slides, prototypes, and design artifacts directly with the model, collapsing the gap between ideation and execution. The strategic move is clear: bring AI closer to end-user deliverables, not just intermediate thinking. For teams, this reduces tool fragmentation; for vendors, it increases stickiness by owning more of the workflow.

Why it matters:
AI is moving upstream into end-user deliverables, not just supporting tasks behind the scenes.

Build AI Systems That Hold Up in Production

Work with us to design, evaluate, and scale AI products that deliver reliable outcomes.