The Cloud Pod | Weekly AI & Cloud News on AWS, Azure & GCP

361: Beep Beep: AWS Ships an ACME Product That Actually Works

Thu, 09 Jul 2026 01:51:18 +0000

360: And you thought AWS was out of features for S3. Surprise!

Wed, 01 Jul 2026 23:48:46 +0000

Welcome to episode 360 of The Cloud Pod, where the weather is always cloudy! Justin, Matt, and Jonathan (for a bit, anyway) are in the studio this week bringing you all the latest in cloud and AI news, including a bunch of analytics, some upgrades courtesy of AI agents, and some news from Kafka. There’s a lot to cover, so let’s get started!

Titles we almost went with this week

MSK Agent Skills Make Kafka Migration Less Kafkaesque
One Token Pool to Rule All Claude Tools
STRIDE Into Security Without Leaving Your IDE
Your Code Must Be This Stable to Enter Production
ChatGPT Gets a Budget So Karen Can’t Break the Bank
One Platform to Train Them All and in Darkness Deploy Them
Who Let the Agents Out? Snowflake Knows
Kafka Whisperer Now Comes With an AI Upgrade
Stop Reading Docs, Let MSK AI Do the Kafka Math
Your AI Wrote That Pull Request, Own It
Claude. Tag, you’re it!
See, there are more features that we can add to s3

A big thanks to this week’s sponsors:

We’re sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You’ve come to the right place! Send us an email or hit us up on our Slack channel for more info.

AI Is Going Great – or How ML Makes Money

03:45 Claude Design now stays on brand for daily work

Claude Design now integrates directly with Claude Code through two new slash commands: /design-sync pulls your design system into Claude Code, and /design lets you create and manage design projects without leaving the terminal, keeping both tools in sync throughout the workflow.
The rebuilt design system import supports GitHub repos, design files, and raw uploads, with Claude automatically checking its output against your components before rendering results.
Enterprise admins can lock down a single approved system to enforce consistency across teams.
Anthropic updated the usage model, so Claude Design now shares a token pool with chat, Claude Cowork, and Claude Code rather than having separate limits, which should give most users more headroom and reduce how often they hit caps.
Export and integration options expanded substantially, with connectors now covering Adobe, Canva, Gamma, Lovable, Miro, Replit, Vercel, Wix, Base44, and standard PDF and PowerPoint formats, making it easier to move finished work into existing production pipelines.
Claude Design is available in beta on Pro, Max, Team, and Enterprise plans at claude.ai/design, with Enterprise having it disabled by default pending admin activation and output restricted to internal sharing only.

05:07 Matt – “…when I’ve used it – just playing around with it, it produced really nice things. I just used half my session tokens real fast with iterations and things like that. So I would be careful using it, but it does great front-end design.”

Data+AI Summit – Top Announcements

07:31 Introducing Lakehouse//RT: Real-Time Performance on a Unified Lakehouse

Lakehouse//RT is a landmark new real-time data warehouse engine (Reyden) delivering millisecond latency directly on the lakehouse, eliminating the need for separate serving layers, a major architecture.
Databricks announced Lakehouse//RT, a new real-time data warehouse built directly into the lakehouse, delivering millisecond query responsiveness at high concurrency without a separate serving layer or data copies.
Powered by a brand-new engine called Reyden, Lakehouse//RT targets operational analytics, BI and app serving, and observability workloads — the exact use cases that previously forced teams to bolt on Redis, Druid, or Pinot.
The core pitch is architectural simplification: eliminating the dedicated serving-layer copy means no more stale replicas, no extra governance surface, and no ETL pipeline to keep the hot tier in sync.
This directly challenges standalone real-time OLAP vendors (Apache Druid, ClickHouse, StarRocks) by collapsing their niche into the Databricks platform; practitioners should watch benchmark claims closely before ripping anything out.
No public pricing or GA date was shared in the announcement; availability details are expected to follow from the Data + AI Summit 2026 session track.

07:39 Introducing Genie ZeroOps: Put your data and AI operations on autopilot

Genie ZeroOps is a genuinely novel autonomous ops agent purpose-built for data/ML pipelines, addressing a real pain point (pipeline failures, model drift) with concrete agentic automation that practit
Genie ZeroOps is a new autonomous background agent that continuously monitors Databricks production assets, pipelines, jobs, tables, and ML models, investigates failures before or as they occur, and proposes verified fixes.
The agent runs a four-step loop for every failure: detect, assess root cause, remediate with a fix, and verify the fix has no side effects, all without requiring a human to open a ticket first.
Databricks explicitly argues that generic coding agents (think GitHub Copilot Workspace) fall short here because they lack access to Spark logs, telemetry, and the data-lineage context needed to distinguish a code bug from an upstream schema change or late-arriving data.
The target pain point is real: data teams report spending the majority of their time on operational firefighting rather than building, and the proliferation of LLM-generated pipelines is making that worse, not better.
Because ZeroOps runs inside Databricks, it has native, governed access to Unity Catalog lineage, job run history, and cluster metrics, a meaningful advantage over external AIOps tools that need custom integrations.

07:49 Introducing OpenSharing: the Next Evolution of Delta Sharing for the Agentic Era

OpenSharing, as the industry’s first open protocol for sharing data, models, agents, and skills across any cloud/vendor, is a landmark open-source initiative that extends Delta Sharing’s proven adoption.
OpenSharing is the next evolution of Delta Sharing, repositioned as the industry’s first open protocol for sharing not just data tables and files, but also models, agents, and AI skills — across any cloud, any vendor, and any format.
Delta Sharing has already hit meaningful scale: 28,000+ data recipients and 33% of shares flowing cross-platform via open connectors, with adopters including SAP, Atlassian, Mercedes-Benz, S&P Global, and LSEG.
The key architectural expansion is moving beyond structured tables to semantic context, unstructured data, and AI artifacts, exactly what agentic workflows need to share across organizational and platform boundaries.
Delta Sharing is being spun out as an independent open-source project under OpenSharing, which should reduce vendor lock-in concerns and encourage broader ecosystem participation beyond the Databricks orbit.
For practitioners, the practical implication is that you could eventually share a fine-tuned model or an MCP-compatible agent skill with a partner org in the same zero-copy, no-replication way you share a Delta table today.

07:58 Lakeflow: A new era of agentic data engineering

Lakeflow Designer reaching GA and deep Genie Code integration across the full data engineering lifecycle is a significant platform milestone that consolidates ingestion, transformation, and orchestration.
Databricks is rebranding and unifying its data engineering surface under the Lakeflow banner, covering ingestion, transformation, and orchestration in a single platform, all governed by Unity Catalog.
Lakeflow Designer is now generally available: a visual, no-code, AI-powered interface that democratizes pipeline building for non-engineers, directly competing with tools like Fivetran’s transformation UI and Azure Data Factory’s canvas.
Genie Code is now deeply integrated across the entire Lakeflow experience. You can use it to generate ingestion connectors, build Python/SQL pipelines, and scaffold jobs with tasks, triggers, and dependencies from natural language.
The agentic angle is significant because Lakeflow is unified and Unity Catalog provides end-to-end lineage; Genie Code has full context to not just build pipelines but also operate and repair them, tying directly into the ZeroOps announcement.
The fragmented data stack problem is real for most enterprises (Airflow + dbt + Fivetran + custom connectors), and Databricks is betting that a single governed surface will win out as AI agents become the primary pipeline operators.

08:07 Introducing Genie One, Genie Agents, and Genie Ontology

Genie One, Genie Agents, and Genie Ontology represent a meaningful evolution of enterprise AI assistants with native Slack/Teams integration and governed, data-grounded answers, a significant step to
Genie One is the next generation of Databricks’ conversational analytics assistant, evolving from a query tool into a ‘data-smart AI coworker’ that can move users from insight to action, not just answer questions.
Native integrations with Slack and Microsoft Teams are launching, letting users @mention Genie in any channel or thread to get governed, data-grounded answers without switching context to the Databricks UI.
Genie Ontology is the new mechanism for grounding agents in enterprise business context; it aggregates meaning scattered across dashboards, queries, pipelines, wikis, and tickets so agents stop hallucinating business logic.
Genie Agents extends the platform so organizations can deploy custom Genie-powered agents for specific workflows, all governed through Unity Catalog, bridging the gap between conversational BI and full agentic automation.
The core problem being solved is that current-gen agents do iterative probing that is ‘extremely slow and costly,’ producing generic or wrong answers; Genie One’s ontology layer is Databricks’ answer to that quality gap.

08:33 Unifying Data and Governance in the Agentic Era: What’s New with Azure Databricks

A comprehensive Azure Databricks summit roundup introducing LTAP architecture and Lakebase GA on Azure is highly relevant for Azure-focused practitioners, though it largely aggregates other announcements.
Azure Databricks is introducing what it calls the first true LTAP (Lake Transactional/Analytical Processing) architecture, a unified storage layer that brings analytical data, streaming pipelines, and live application transactions onto a single shared copy on the lakehouse, eliminating the need for a separate operational side-stack.
Azure Databricks Lakebase is now generally available: a fully managed, serverless Postgres database purpose-built for the agent era, featuring decoupled compute and storage and instant copy-on-write database branching for safe debugging of production data.
The four pillars framing the Azure Databricks roadmap are Agentic Data (real-time foundation), Agentic Dev & Work (AI coworkers in productivity tools), Agentic Marketing (lakehouse-embedded personalization), and intelligent governance, giving practitioners a clear map of where the platform is heading.
Copy-on-write database branching is a standout feature for DevOps practitioners: it lets you spin up an isolated branch of a production Postgres database instantly, eliminating compliance risk when reproducing and debugging live data issues.
The Azure-specific framing matters for enterprise architects: all of these capabilities run natively on Azure infrastructure, meaning they inherit Azure’s compliance certifications and integrate with existing Azure networking and identity controls.

08:42 Announcing Lakebase Search: agent-native retrieval built into Lakebase Postgres

Lakebase Search bringing native hybrid vector + full-text retrieval into Postgres via Lakebase (beta on AWS and Azure) is a concrete, practitioner-relevant launch that simplifies agent retrieval archi
Lakebase Search is now in beta on AWS and Azure: hybrid vector and full-text retrieval built natively into Lakebase Postgres via two extensions: lakebase_vector and lakebase_text, so your entire agent loop can run against a single data backend.
The key insight driving the design is that agents operate search as a live read/write loop; they write new memory on one turn and need that data fully indexed and searchable on the very next turn, which traditional search engines built for read-only snapshots simply cannot support.
Databricks reports that agents now operate 4x more databases on Lakebase than human users do, making agent-first ergonomics a first-class design requirement rather than an afterthought.
The economics angle is notable: vector search causes severe data bloat (a 1 KB text file expands significantly when chunked and vectorized), and Lakebase Search is architected to handle the cold-storage economics of that pattern at scale, a direct shot at Pinecone and pgvector deployments on standard Postgres.
Being in beta means practitioners can start testing now, but production SLAs and pricing details are not yet locked, worth watching before committing RAG pipelines to the platform.

08:51 AI governance at Data + AI Summit 2026: What’s new with Unity AI Gateway

Unity AI Gateway extending governance to models, agents, MCP services, and tools with spend caps, routing, and runtime guardrails is a substantive enterprise AI governance capability that practitioners.
Unity AI Gateway is Databricks’ new runtime governance layer for enterprise AI, extending Unity Catalog beyond data assets to cover models, agents, MCP services, and skills, governing not just what AI accesses but what it does at runtime.
Cost management is a first-class feature: the gateway provides spend visibility across providers, granular attribution by team or workload, hard spend caps, and intelligent routing to balance quality against cost, critical as organizations scale from a few models to fleets of agents.
Security and monitoring capabilities include unified tracing across agent calls, coding agent observability, Lakewatch-powered investigations, and an open ecosystem of security and identity partners, giving security teams the audit trail they need for regulated industries.
The governance model shift is significant for architects: traditional catalog governance was about access control (who can query this table), but Unity AI Gateway governs behavior (what can this agent invoke, what guardrails apply at inference time).
Multi-model, multi-agent, multi-vendor is the explicit design target; practitioners who are already mixing OpenAI, Anthropic, and open-source models across different agent frameworks will find the unified policy layer directly addresses the sprawl problem.

08:56 Agent Bricks: Data + AI Summit 2026

Agent Bricks expanding into a full developer agent platform with 100k+ agents built and 1+ quadrillion tokens/year processed is a significant platform maturation announcement with real adoption metrics.
Agent Bricks has hit serious scale since its launch: 100,000+ agents built on the platform and 1+ quadrillion tokens processed per year, with production deployments at AstraZeneca, 7-Eleven, Fox Corporation, and Block.
Databricks is repositioning Agent Bricks from an agent-building toolkit into a full developer agent platform, based on the hard-won lesson that the core agent loop is only 1% of the work; the other 99% is infrastructure: token capacity, deployment, security, evaluation, monitoring, context, and sharing.
The ‘hidden technical debt of agentic systems’ framing (borrowing from the classic 2015 NeurIPS ML technical debt paper) is a sharp message to practitioners who’ve built agents and then spent months building scaffolding around them.
The platform expansion addresses three critical challenges: giving agents the right context at scale, making agents trustworthy and governable in production, and enabling agents to be shared and composed across teams and organizations.
The unification of data and AI is central to the pitch: agents both consume data (via tools and context) and produce data (reasoning traces, memory, action logs), and Databricks argues that only a platform that governs both sides can manage agentic systems at enterprise scale.

What’s new with Unity Catalog at Data + AI Summit 2026

Unity Catalog’s evolution to a runtime governance decision-maker for agents with AI Gateway, Glossary, cross-cloud addressability, and Governance Hub is a substantive update to the industry’s most
Unity Catalog is evolving from a system of record into a runtime decision-maker for AI: with 14,000+ organizations already on the platform, Databricks is adding Unity AI Gateway, Glossary, Domains, and cross-cloud/cross-region addressability to meet the demands of the agentic era.
The new Glossary and Domains features create a governed, shared source of business meaning for both humans and agents, directly attacking the hallucination problem that occurs when agents lack enterprise context and fill gaps with inference.
Cross-cloud and cross-region addressability means one catalog, one set of policies, and consistent governance wherever workloads run, a significant operational simplification for enterprises with multi-cloud footprints.
The new Governance Hub provides a unified control plane for visibility across the entire AI estate, covering data assets, models, agents, tools, and MCP services under a single policy framework.
The philosophical shift Databricks is articulating from ‘govern access’ to ‘govern behavior’ is the most important architectural concept for practitioners to internalize as they design agentic systems that act autonomously on enterprise data.

Agentic AI Platform & Developer Tools

Agent Bricks: DAIS 2026
- Expands into a comprehensive agent platform for developers — with 100k+ agents already built and 1+ quadrillion tokens processed per year, it now addresses the “99% hidden debt” of agentic systems: token capacity, deployment, security, evaluation, and monitoring.
What’s New in Genie Code at Data + AI Summit 2026
- Introduces a full-page command center for complex, multi-step data and ML workflows, plus scheduled tasks and production engineering upgrades — used by 90% of Databricks customers after 10x growth in the past year.
What’s New in the AI Platform: Agents for ML Engineering
- Brings Genie Code support across the entire ML lifecycle — from feature engineering and experiment management to drift detection — alongside a new deep learning platform and real-time ML serving capabilities.
Databricks and NVIDIA: Building for the Agentic Era
- Deepens the partnership to embed NVIDIA Rubin GPUs, the new Vera CPU, and NVIDIA Agent Toolkit software into AI Runtime, Model Serving, and industry AI solutions on Databricks.
What’s coming next to Free Edition
- Adds Genie Code, serverless GPUs, Lakebase, Agent Bricks, and Lakeflow Designer to the free tier — giving 500,000+ learners a complete end-to-end data and AI toolkit at no cost.

Real-Time Data & Lakehouse Architecture

Introducing Lakehouse//RT
- Delivers millisecond-latency operational analytics directly on the lakehouse via the new Reyden engine — eliminating the need for a separate serving layer and the data copies, cost, and complexity that come with it.
Lakeflow: A New Era of Agentic Data Engineering
- Unifies ingestion, transformation, and orchestration on a single platform governed by Unity Catalog, with Genie Code now deeply integrated and Lakeflow Designer reaching general availability.
Accelerate search queries with full-text search indexes on Databricks
- (Now in Beta on DBR 18.2) Can speed up substring and keyword queries by 100x or more on open-format tables without changing table layouts — eliminating the need for external systems like Elasticsearch or Splunk.
What is data pipeline architecture?
- Offers a timely explainer on pipeline design patterns and best practices (picking up light HN traction with 6 points), highlighting how Lakeflow collapses the old batch/streaming divide onto a single foundation.
Announcing Lakebase Search: agent-native retrieval built into Lakebase Postgres
- Introduces hybrid vector and full-text retrieval via native Postgres extensions (`lakebase_vector` and `lakebase_text`), enabling agents to treat search as a live operational database with instant indexing of the latest writes.

Governance, Security & Compliance

AI governance at Data + AI Summit 2026: What’s new with Unity AI Gateway
- Extends Unity Catalog governance to the runtime layer — covering models, agents, MCP services, and skills — with unified cost management, spend caps, intelligent routing, and Lakewatch-powered monitoring.
Building an open ecosystem for AI governance with Unity AI Gateway
- Announces integrations with CrowdStrike, Palo Alto Networks, Zscaler, Netskope, Okta, Ping Identity, Saviynt, and others to protect prompts, model responses, agent actions, and MCP tool calls across the enterprise.
What’s new in Databricks Platform security and compliance at Data + AI Summit 2026
- Introduces Automatic Identity Management (AIM) for Entra ID (now GA on AWS/GCP) and Okta (Public Preview), Context-Based Ingress policies, Private Network Gateway, and expanded FedRAMP/GovCloud coverage.
What’s new with Unity Catalog at Data + AI Summit 2026
- Advances the catalog from a system of record to a runtime decision-maker for AI — adding Unity AI Gateway, Glossary, Domains, and cross-cloud/cross-region addressability for 14,000+ organizations.
Unifying Data and Governance in the Agentic Era: What’s New with Azure Databricks
- Introduces the LTAP (Lake Transactional/Analytical Processing) Architecture on Azure, bringing analytical data, streaming pipelines, and live transactions into a single governed storage layer natively on Azure.

Agentic Operations & Data Sharing

Introducing Genie ZeroOps
- Puts data and AI operations on autopilot with an autonomous background agent that monitors pipelines, jobs, tables, and ML models — detecting failures, diagnosing root causes, and suggesting verified fixes without the limitations of generic coding agents.
Introducing OpenSharing: the Next Evolution of Delta Sharing for the Agentic Era
- Evolves Delta Sharing into an independent open-source project that extends zero-copy sharing beyond tables and files to models, agents, and skills — across any cloud, vendor, or format.
Introducing OpenSharing SecureConnect
- Adds a Databricks-managed proxy for storage access, eliminating per-recipient firewall configuration so providers can onboard new data-sharing recipients in minutes rather than weeks.
Introducing Genie One, Genie Agents, and Genie Ontology
- Elevates Genie from a conversational analytics assistant to a data-smart AI coworker embedded natively in Slack and Microsoft Teams, grounded in enterprise context via Genie Ontology to eliminate hallucinations on business data questions.

Agentic Marketing, Apps & BI

Introducing CustomerLake: The Agentic CDP embedded in Databricks
- Brings Customer 360, identity resolution, audience building, and campaign automation natively into the lakehouse — enabling always-on, 1:1 personalization at enterprise scale without duplicating sensitive customer data.
Introducing the Agentic CDP: A New Species of CDP for a New Era of Agents
- (Co-authored by Ali Ghodsi and Reynold Xin) makes the case that traditional CDPs are architecturally incompatible with millisecond agentic buying lifecycles, and outlines the three requirements — speed, hyper-personalization, and richer context — that only a lakehouse-native CDP can meet.
Enabling Governed Vibe Coding for Enterprise Apps on Databricks
- Introduces App Spaces, Genie App Builder, and Serverless Micro Apps — tripling weekly active app users — so any business analyst can safely build and deploy governed applications without burdening platform teams.
Announcing Apps on Databricks Marketplace
- (Public Preview) Let customers discover, install, and run third-party data and AI apps directly inside their secure workspaces, while giving ISVs a no-egress distribution channel to thousands of enterprises.
Design Beautiful Dashboards in AI/BI
- Introduces workspace-level and dashboard-level theming so teams can apply consistent brand identity across all AI/BI dashboards — with Genie Code prompts to build them from scratch.

Ecosystem, Partners & Customer Stories

Becoming the most comprehensive data & AI ecosystem on earth
- Lays the third “brick” in Databricks’ ISV strategy: partners can now access customers’ pre-committed Marketplace spend to accelerate deals, and list Databricks Apps and Genie Agents — opening the door to novel commercial models like “pay-per-question.”
The Partner Well-Architected Framework: What’s New and What’s Next
- Expands AI-ready guidance across Built-On, Connected, and Data Collaboration partner architectures, keeping pace with the platform’s rapid release cadence and helping partners capture the current innovation window.
Announcing the new Databricks Startup Program
- Offers venture-backed startups (pre-seed through Series A) up to $200k in combined Databricks and Neon credits, plus hands-on guidance and go-to-market support to build their entire AI company on a single stack.
Science AI Companion: building an autonomous Customer Success platform on Databricks
- Shows how Science and Quartile, managing performance marketing for 1,000+ brands, used Databricks to build an AI Companion that automates CSM workflows at scale while preserving the personalization that relationship-driven businesses require.

14:58 New usage analytics and updated spend controls for enterprises

OpenAI has released new credit usage analytics and updated spend controls for ChatGPT Enterprise, available now in the Global Admin Console.
Admins can track credit consumption broken down by user, product, and model, and access this data programmatically via a unified Cost API.
The updated spend controls allow admins to set a default workspace-wide credit limit, configure limits for specific groups, and create individual overrides for high-usage employees.
This replaces a one-size-fits-all approach with tiered controls that can accommodate power users without raising limits for the entire organization.
Employees can now view their own credit usage within workspace settings and submit requests for additional credits with context about their work, giving admins enough information to approve increases selectively rather than broadly.
The Cost API integration is worth noting for enterprise IT teams, as it allows organizations to pull credit usage data into their own internal systems for deeper analysis alongside other business spending data.
For organizations already managing cloud spend across AWS, Azure, or GCP, these controls bring ChatGPT Enterprise closer to the cost governance model those platforms offer, making it easier to treat AI usage as a managed line item rather than an untracked operational expense.

15:06 Justin – “…you might think that’s very FinOps-y of them, and you’d be right; because they worked with Aptio to basically build out this capability along with Anthropic and others in the focus groups from FinOps.”

17:06 Agent identity: a new access model for autonomous, team-wide AI

Claude Tag introduces an “agent identity” access model where AI agents get their own dedicated service accounts rather than borrowing individual user credentials, allowing Claude to operate across shared Slack channels, GitHub, and data warehouses on behalf of entire teams rather than single users.
Permissions are scoped at the workspace and channel level rather than per-user, so admins can grant the engineering channel access to GitHub while confining CRM access to a separate private channel, with each private channel getting a distinct Claude identity that cannot cross into other channels.
Revoking a Claude identity immediately removes access across all connected systems simultaneously, which simplifies enterprise access management compared to auditing individual agent actions spread across dozens of user accounts.
Every network call, memory write, and routine executed under agent credentials is logged in an audit trail, and outbound traffic to any host not explicitly allowed by an admin is blocked, addressing common enterprise security concerns around autonomous agents.
Anthropic plans to add just-in-time credential grants so users can approve individual sensitive actions in the moment without permanently expanding the agent’s scope, plus an identity-aware overlay that checks both channel-level and user-level permissions before Claude acts.

18:22 Justin – “It’s a little bit clunky right now, mostly tied to limitations I think of how bots interact with Slack, but I see the potential.”

Security

20:08 Temporary Cloudflare Accounts for AI agents

Cloudflare introduced temporary accounts for AI agents, allowing them to deploy Workers via the Wrangler CLI using a new, temporary flag without requiring any prior account signup or authentication flow.
Deployments stay live for 60 minutes, during which a human can claim the account permanently. If unclaimed, the account and all associated resources are automatically deleted.
The feature addresses a practical problem in agentic workflows where browser-based OAuth flows and MFA prompts create hard stops for background agents operating without a human in the loop.
Wrangler was updated to surface the –temporary flag in its output messages, so agents can discover and use it without explicit human instruction, enabling a write-deploy-verify loop without manual intervention.
Cloudflare is pairing this with broader efforts, including a Stripe partnership for agent-provisioned accounts and a WorkOS collaboration on OAuth standards for agents, suggesting a broader push toward standardizing how agents interact with cloud infrastructure.

Cloud Tools

22:59 Introducing the Cloudflare One stack: agent-powered deployment

Cloudflare released the Cloudflare One stack, an open-source set of agent skills hosted on GitHub that helps automate the deployment, configuration, and management of Zero Trust network environments without requiring deep prior knowledge of Cloudflare’s product suite.
The stack ships as two skill files: cloudflare-one for general product guidance like VPN replacement and policy management, and cloudflare-one-migration for translating configurations from vendors like Zscaler and Palo Alto Networks into equivalent Cloudflare constructs.
When paired with the Cloudflare code mode MCP server, agents get a typed interface to the live Cloudflare API, allowing them to query account configurations and make changes through recommended workflows rather than ad-hoc API calls, while keeping credentials out of the model context.
The migration logic in the stack is the same as that used in Cloudflare’s existing Descaler and Deskope programs, which have moved enterprise customers from Zscaler and Netskope to Cloudflare One in hours rather than months, and this makes that capability self-serve for any customer or partner at any time.
The stack also handles ongoing operational tasks like recommending security rules based on live traffic, investigating anomalies in web gateway logs, and reporting on user experience metrics through the Digital Experience Monitoring toolkit, making it useful beyond initial migration scenarios.

23:48 Justin – “This announcement also just taught me that Cloudflare has a Zscalar and Netscope competitor, which I did not know.”

AWS

28:21 Amazon S3 annotations: attach rich, queryable context directly to your objects

S3 Annotations is a new metadata capability that lets you attach up to 1,000 named annotations per object, each up to 1 MB, totaling up to 1 GB per object, in formats like JSON, XML, YAML, or plain text.
This addresses a long-standing limitation where rich object context had to live in separate databases or sidecar files, requiring complex synchronization.
Annotations are mutable and move automatically with objects during copy, replication, and cross-region transfers, which is a meaningful improvement over the existing 10-tag limit and 2 KB user-defined metadata headers that S3 has historically offered.
When S3 Metadata is enabled, annotations automatically flow into Apache Iceberg-backed annotation tables queryable via Amazon Athena, with backfill support for existing annotated objects. The tables adapt to any JSON, XML, or YAML structure without schema migrations, and you can also query them using natural language through the S3 Tables MCP server.
Practical use cases include media companies tracking AI-generated transcripts and content ratings, financial services attaching sentiment analysis to research documents, and life sciences teams annotating clinical trial data for compliance audits without needing to restore archived objects from S3 Glacier.
Annotation storage is billed at S3 Standard rates regardless of the parent object’s storage class, so teams storing annotations on Glacier objects should factor that cost difference into their planning.
The feature is available today in all AWS Regions, including China Regions, with annotation tables available wherever S3 Metadata is supported.

29:57 Justin – “This is a pretty handy improvement. I’d kinda got to the point where I thought S3 had all the features it coud possible have, and they just keep surprising me.”

31:15 Introducing AWS Continuum for security at machine speed

AWS Continuum is a new security service in gated preview that automates the full vulnerability lifecycle, from discovery and prioritization to validation and remediation, using AI agents operating within guardrails defined by the security team.
The service addresses a common pain point where teams already have vulnerability findings but spend significant time on manual triage, exploitability validation, and cross-team coordination before fixes are deployed. Continuum handles that middle work automatically.
A notable technical detail is the sandbox-based exploit validation, where Continuum builds reproducible proof of exploitability in an isolated environment before flagging a vulnerability as confirmed, reducing noise from theoretical findings.
Continuum integrates with existing AWS security tooling, including GuardDuty and Security Hub, and absorbs the previously separate AWS Security Agent capabilities under a unified product umbrella as Continuum penetration testing and Continuum code scanning.
A new threat modeling feature is also launching in preview, automatically generating STRIDE-format threat models from design documents or source code, which could reduce the manual effort typically required during architecture review processes.
Pricing has not been publicly disclosed yet, and access requires requesting entry to the gated preview at aws.amazon.com/continuum.

32:01 Matt – “This is extremely nice. It reminds me a lot of what GitHub did with their security feature where they’re trying to help you prioritize…So actually prioritizing vulnerabilities, because if you have a large code base, it’s going to happen. Prioritization is the real key here.”

37:35 AWS Security Agent adds threat modeling, Kiro power and Claude Code plugin, and more

AWS Security Agent, now part of AWS Continuum, has expanded beyond its re:Invent 2025 preview to cover the full software development lifecycle: threat modeling and design reviews at design time, code review at development time, and penetration testing (now GA) at deployment time.
The new threat modeling feature (preview) uses the STRIDE framework to analyze design documents or source code, mapping data flows, trust boundaries, and attack vectors, along with prioritized mitigations, thereby reducing the manual effort typically required for security architecture reviews.
Code review capabilities now support GitHub, GitLab, Bitbucket (including self-hosted versions), and Confluence, with pull request scanning that validates findings in simulated environments to confirm actual exploitability rather than just flagging potential issues.
The Kiro power and upcoming Claude Code plugin let developers trigger security scans, generate threat models, and remediate findings directly from their IDE without context switching, using an open MCP integration that works with any AI-powered IDE.
AWS Security Agent offers a 2-month free trial, with full pricing details on the product page. It is available in select AWS commercial regions, with regional availability and roadmap details listed on the AWS Capabilities by Region page.

38:29 Justin -” It is not terribly priced for what it does; these are tools you’re spending a lot of money on, like threat modeling. Overall, I was not too scared off by the pricing on this one.”

40:24 AWS DevOps Agent adds release management capabilities to assess code changes before production (preview)

AWS DevOps Agent now includes release management capabilities in preview, adding pre-production code review and autonomous release testing to its existing post-deployment incident investigation features, effectively covering the full software delivery lifecycle.
The release readiness review feature evaluates pull requests against user-defined natural language standards or general best practices, checking cross-repository dependency risks, access control changes against the Well-Architected Framework, and runs lightweight functional tests in an AWS-managed isolated environment before code enters the pipeline.
The autonomous release testing feature goes beyond static test suites by reasoning about what a specific code change does and generating tailored test plans covering functional correctness, behavioral regressions, and integration scenarios, producing structured artifacts including metrics, logs, and traces for each run.
Findings surface in multiple places, including the DevOps Agent console, GitHub and GitLab pull request comments, and directly in IDEs via Kiro or Claude Code plugins, with recommendations categorized as BLOCK, Proceed with Caution, or Safe to Release.
Both features are currently available at no additional cost during preview, limited to the US East N. Virginia region, with GitHub or GitLab repository connectivity required to get started; standard DevOps Agent pricing applies to other features at aws.amazon.com/devops-agent/pricing.

47:04 Introducing Amazon Bedrock Managed Knowledge Base for faster, more accurate enterprise AI applications

Amazon Bedrock Managed Knowledge Base is a new fully managed RAG service that handles the entire pipeline, including storage, embeddings, re-ranking, and retrieval, allowing developers to connect to enterprise data sources such as S3, SharePoint, Confluence, Google Drive, and OneDrive without building custom connectors.
The Agentic Retriever feature addresses a real limitation in standard RAG by automatically creating multi-step query plans for complex questions, performing multi-hop retrieval across knowledge bases rather than relying on a single retrieval pass.
Smart Parsing automatically selects the right parsing strategy per data type and connector, handling multimodal content like images, tables, and video without manual configuration, which reduces the typical weeks of experimentation needed to reach production-quality retrieval.
The service integrates with AgentCore Gateway as a native target type, exposing knowledge bases via the Model Context Protocol so frameworks like LangChain, LlamaIndex, CrewAI, and Strands Agents can discover and use them without custom integration code.
Pricing is usage-based with no upfront commitments, charged on indexed data storage and number of retrievals, with availability in US East, US West, Asia Pacific, Europe, and AWS GovCloud regions today.
Existing Bedrock Knowledge Bases API users can migrate by pointing to a new knowledge base ID with no code changes.

48:26 Justin – “They’ve been on a journey with this one, trying to get something good.”

49:42 Amazon CloudWatch Synthetics now supports multi-location canaries

CloudWatch Synthetics now supports multi-location canaries, letting you manage a single canary in one primary Region while CloudWatch automatically replicates it to additional Regions, consolidating all metrics and artifacts centrally.
This eliminates the previous requirement of creating separate canaries per Region, which caused configuration drift and added operational overhead.
The feature is particularly useful for validating third-party dependencies like CDNs and payment processors across geographic locations, and for identifying region-specific performance bottlenecks that would otherwise be invisible from single-region monitoring.
A notable capability is the multi-location alarm configuration, which only triggers when issues are detected from multiple locations simultaneously, reducing alert noise and helping teams distinguish real customer-impacting problems from isolated regional blips.
Existing single-region canaries can be upgraded to multi-location by simply adding replica Regions without recreating them, lowering the migration barrier for teams already using CloudWatch Synthetics. The feature is available across all AWS commercial Regions that support CloudWatch Synthetics.
Pricing is not explicitly called out in the announcement, so teams should check the CloudWatch Synthetics pricing page before scaling out replica canaries, as replication across multiple Regions will likely increase canary run costs proportionally to the number of Regions selected.

52:11 Run isolated sandboxes with full lifecycle control: AWS Lambda introduces MicroVMs

AWS Lambda MicroVMs is a new serverless compute primitive built on Firecracker that provides VM-level isolation with near-instant startup, targeting multi-tenant applications that need to run user- or AI-generated code safely. It fills the gap between containers (fast but shared kernel) and full VMs (strong isolation but slow to start).
The image-then-launch model works by running your Dockerfile, initializing your application, and snapshotting the running memory and disk state, so every subsequent MicroVM launch resumes from that pre-initialized snapshot rather than booting cold. This means even large, stateful sessions start quickly enough to feel responsive to end users.
Each MicroVM supports up to 16 vCPUs, 32 GB of memory, and 32 GB of disk, with up to 8 hours of total runtime and configurable idle suspension policies that preserve full state while reducing cost. Auto-resume on incoming requests means the suspend/resume cycle is transparent to end users.
Practical use cases include AI coding assistants, interactive data analytics sessions, vulnerability scanners, and game servers running user-supplied scripts, all scenarios where you need per-user isolation without building custom virtualization infrastructure. Lambda Functions and Lambda MicroVMs are designed to complement each other rather than compete.
Lambda MicroVMs are available now in US East (N. Virginia, Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo) on ARM64 architecture.
Pricing is on the AWS Lambda pricing page and follows usage-based billing, with suspended MicroVMs incurring lower costs than running ones.

53:28 Matt- “This is one of those things it sounds really cool, but I don’t have a good use case to play with it yet.”

55:13 Amazon MSK now offers AI Agent Skills to help developers operate MSK efficiently and accelerate migrations to MSK

Amazon MSK now offers AI Agent Skills that integrate with coding assistants like Kiro, Claude Code, and Cursor to provide guided help for common Kafka operational tasks, including troubleshooting, sizing, configuring, monitoring, and cluster migration.
The skills are accessed through the AWS Agent Toolkit, which developers configure via the AWS CLI, then query conversationally with questions like “Is my Kafka cluster compatible with MSK Express?” turning specialized knowledge into a self-service experience.
A key use case is accelerating migrations from self-managed Kafka to MSK Express, which offers up to 3x more throughput per broker, 20x faster scaling, and 90% reduced recovery time compared to Standard brokers running Apache Kafka.
This fits into AWS’s broader Agent Toolkit ecosystem, suggesting a pattern where AWS services will increasingly expose operational knowledge as consumable skills for AI coding agents rather than relying solely on documentation or support tickets.
No additional pricing was announced for the AI Agent Skills themselves, though standard MSK and MSK Express cluster costs apply based on broker type, size, and usage.

56:16 Justin – “I welcome this new feature. It’s great.”

1:01:29 Amazon CloudWatch Logs supports managed syslog ingestion

CloudWatch Logs now supports native syslog ingestion from network devices like firewalls, routers, switches, and Linux servers via a VPC endpoint, removing the need to deploy and manage log collection agents across infrastructure.
The feature supports three common syslog formats, including RFC 5424, RFC 3164, and Cisco FTD/ASA, which cover a broad range of enterprise networking equipment and make adoption straightforward for teams already using Cisco gear.
CloudWatch automatically parses incoming syslog messages and extracts structured fields like facility, severity, hostname, and application name, so teams can immediately query logs using Logs Analytics without building custom parsing pipelines.
Transport options include TCP, TCP+TLS, and UDP, giving teams flexibility to match their existing device configurations while the TLS option addresses security requirements for sensitive log data in transit.
Standard CloudWatch Logs ingestion and storage pricing applies, and the feature is available in all commercial AWS regions except Middle East UAE, Middle East Bahrain, and Israel Tel Aviv.
Documentation is available at the CloudWatch Logs docs linked in the announcement.

1:01:43 Justin – “Another feature built, I imagine, by AI, because this is something I’ve asked for for years. I’ve basically just come to the conclusion that everything I’ve wanted for years that doesn’t have enough revenue… is just being written by AI Agents over there.”

GCP

1:03:07 Google AI Studio’s Interactions API for Gemini models and agents

Google’s Interactions API has reached general availability and is now the primary interface for Gemini models and agents, replacing the older generateContent API as the default for Google AI Studio and all documentation.
The API uses a simplified step-based schema and is available through Python and JavaScript SDKs.
Managed Agents is a notable addition where a single API call provisions a remote Linux sandbox capable of reasoning, executing code, browsing the web, and managing files.
Developers can use the default Antigravity agent or define custom agents with their own instructions, skills, and data sources.
Background execution lets developers set background=True on any call to run interactions asynchronously, which is useful for long-running tasks. The API also supports mixing built-in tools like Google Search and Google Maps with custom functions in a single request.
On the cost side, Flex inference offers a 50% cost reduction compared to Priority tier, giving developers a way to trade latency for lower pricing.
Paid tier users also get 55-day retention on past interactions, which is useful for stateful agentic workflows.
The legacy generateContent API remains supported and will continue receiving new mainline Gemini models, but Google has signaled that frontier capabilities for long-running and agentic use cases will land exclusively on the Interactions API going forward.
A migration guide is available at ai.google.dev/gemini-api/docs/migrate-to-interactions for teams planning to transition.

1:04:20 Justin – “Well, that’s nice that Google didn’t kill the legacy one.”

1:05:15 Query logs and traces with SQL in Observability Analytics

Log Analytics has been renamed Observability Analytics and now includes generally available support for querying trace data alongside logs using SQL, all within Cloud Logging without needing to move or duplicate data.
The core capability lets you write SQL queries that JOIN log and trace data together, enabling analysis like finding checkout requests over 5 seconds and identifying which microservice caused the slowdown, or calculating P95 latency across thousands of AI agent tool calls.
A notable use case is AI agent observability, where teams can run aggregate queries across millions of span events to calculate failure rates and latency percentiles per tool, then drill down by joining trace spans with logs to extract the exact LLM prompts that led to failures.
The Observability API is now GA, allowing teams to create linked BigQuery datasets from their observability buckets so AI agents and analytical workloads can query telemetry programmatically via standard BigQuery APIs, which is useful for automated monitoring pipelines.
Pricing is not explicitly detailed in the announcement, so teams should check Cloud Logging and BigQuery pricing pages directly, though the in-place analysis approach is positioned as a way to reduce costs compared to exporting and duplicating data elsewhere.
Query examples are available at github.com/GoogleCloudPlatform/observability-analytics-samples.

Azure

1:25:32 What’s new with Microsoft in open source and Kubernetes at Open Source Summit and KubeCon India

AKS now offers agent pool rollback in general availability, letting operators revert both the Kubernetes version and node image with a single command across all node pool types, which reduces recovery time from bad upgrades without requiring manual reprovisioning or snapshot management.
Azure Kubernetes Fleet Manager now supports up to 1,000 member clusters, up from 200, and Managed Fleet Namespaces are generally available, allowing teams to define namespaces once as ARM resources and propagate them consistently across large multi-cluster estates, including Arc-enabled hybrid and multi-cloud environments.
GPU efficiency gets two notable additions: configurable scheduler profiles let teams pack pods more densely using the upstream Kubernetes scheduling framework without running a custom scheduler, and GPU memory profiling in preview adds function-level visibility through Prometheus and Grafana to catch memory leaks before out-of-memory crashes occur.
Artifact streaming from Azure Container Registry reduces pod startup for images under 10 GB from minutes to seconds by streaming only the layers needed at startup rather than pulling full images, which directly improves scale-out responsiveness for AI workloads.
The Azure SRE Agent now covers AKS incident scenarios in preview, automatically gathering diagnostic evidence and attributing failures to specific layers like workload, network, or cluster before proposing a next step, with writes remaining approval-gated and audited for operator control.

1:08:07 Matt – “A lot of these are just nice quality of life. Being able to provision your namespace in ARM, especially when you’re redeploying across multiple environments, is a nice quality-of-life improvement.”

1:09:15 Rethinking cloud operations with agentic observability

Microsoft announced the general availability of the Azure Copilot Observability Agent, built on Azure Monitor, which correlates logs, metrics, traces, and topology signals across agents, applications, and infrastructure to help operators identify root causes faster.
Pricing details were not disclosed in the announcement, so listeners should check the Azure Monitor pricing page for specifics.
The agent addresses a real operational pain point: a Microsoft and Material survey of 250 IT decision-makers found 84% report increased cloud complexity and 69% say it is outpacing their current operating model. The tool aims to reduce the manual effort of piecing together context across multiple monitoring tools.
Early customer results are notable, with KPMG reporting an estimated 250 engineering hours reclaimed monthly after adopting the capabilities. Other customers like PolicyVault and Ontinue report faster incident investigation by correlating telemetry with Azure resource health and surfacing actionable next steps.
The agent fits into a broader agentic operations model Microsoft is building on Azure, where systems generate signals, agents interpret and act on them, and outcomes feed back into improving future cycles. Governance features, including policy controls, auditability, and human oversight guardrails, are positioned as central to this model.
This is worth watching for teams running complex Azure workloads who currently rely on manual incident triage across multiple tools. The integration directly into existing workflows rather than requiring a separate platform is a practical consideration for

Cloud Journey

1:11:01 Running an AI-native engineering org

Anthropic’s engineering director describes how agentic coding shifted the primary bottleneck from writing code to verifying it, meaning code review, security checks, and correctness validation now consume the time that implementation used to take.
The team replaced traditional sprint planning and design docs with just-in-time planning built around prototypes and PR discussions, reflecting that long-horizon roadmaps became obsolete when execution speed increased substantially.
Human review is now reserved for specific high-stakes areas like security-sensitive code, legal risk, and product judgment, while Claude handles style, linting, bug catching, and test generation automatically.
Role boundaries have blurred noticeably, with product managers writing more code and engineers taking on design and content work, which has practical implications for how teams hire and structure responsibilities.
The article suggests engineering leaders track three metrics as they adopt agentic workflows and cautions against treating throughput as the primary success measure, since the real goal is solving the underlying problem faster, not just generating more output.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

359: Tokenomicon Sounds Metal, but it's Just Cloud Budgets

Fri, 26 Jun 2026 16:15:52 +0000

Welcome to episode 359 of The Cloud Pod, where the weather is always cloudy! Justin and Ryan are in the studio this week to bring you all the latest in cloud and AI news, including AI governance, FinOps’ final conference, and even an earnings story courtesy of Oracle. These and so much more – so let’s get started!

Titles we almost went with this week

You Shall Not Pass Unless Your Network Policy Says So
One CLI Wizard to Rule All AWS Agents
AWS WAF Turns AI Crawlers Into Cash Cows
No More Delete and Pray for AWS Cost Reports
Stop Rolling Your Own Certificate Rotation AWS Did It
Tux Gets a Security Checkup, Microsoft Antivirus Style
Coal Plant to Cloud Plant Google’s Billion Dollar Glow Up
FinOps Grows Up and Gets an AI Spending Problem
Tokenomics Foundation Wants to Bill AI by the Word
Sweet Home Alabama Now Runs on Google Cloud Infrastructure

A big thanks to this week’s sponsors:

General News

02:53 Microsoft restricts Claude Fable for employees over data retention concerns

Microsoft has restricted Claude Fable 5 from its internal GitHub Copilot model picker, even though the model is available to external GitHub Copilot and Azure Foundry customers.
All other Claude models remain available internally because they operate under Zero Data Retention rules.
The core issue is that Claude Fable 5 requires data retention to power Anthropic’s new safety classifiers, meaning prompts and outputs are stored for up to 30 days by default, and up to two years if flagged for policy violations.
This creates a meaningful conflict with enterprise data handling expectations.
This situation highlights a broader tension cloud enterprises face when adopting frontier AI models that bundle safety mechanisms requiring data retention, since those requirements may conflict with internal legal and compliance policies around confidential information.
The restriction is notable because Microsoft is both a distribution partner for Anthropic through Azure and a direct competitor via its own AI offerings, so internal adoption decisions carry weight beyond typical enterprise procurement concerns.
For developers and businesses evaluating Claude Fable 5 through Azure Foundry or GitHub Copilot, this serves as a reminder to review the specific data retention terms for Mythos-class models before deploying them in workflows that handle sensitive or proprietary information.

04:23 Statement on the US government directive to suspend access to Fable 5 and Mythos 5

The US government issued an export control directive requiring Anthropic to immediately suspend access to Fable 5 and Mythos 5 for all foreign nationals, forcing a full customer shutdown to ensure compliance.
All other Anthropic models remain unaffected.
The government’s concern centers on a reported narrow, non-universal jailbreak involving asking the model to read a codebase and identify software flaws. Anthropic reviewed the technique and found that the capability level is already available in other publicly deployed models, including OpenAI’s GPT-5.5.
Anthropic’s defense-in-depth strategy for Fable 5 included thousands of hours of red-teaming with the US government, UK AISI, and third-party organizations, plus a mandatory 30-day customer data retention policy specifically to detect and mitigate jailbreak attempts.
Anthropic is complying with the directive but publicly disagrees with the standard applied, arguing that requiring recall of a commercial model over a narrow non-universal jailbreak would effectively halt all frontier model deployments across the industry.
This situation raises a practical question for cloud and enterprise customers about deployment risk when government directives can abruptly disable access to production AI services without advance notice or detailed technical disclosure.

05:36 Justin – “…the government sounds like they overreacted as they like to do in this era, and Anthropic doesn’t agree, and they’re going back and forth, and hopefully they get it back. That’s the goal.”

08:51 A frontier without an ecosystem is not stable

Satya Nadella published an essay arguing that a frontier AI model without a surrounding ecosystem is inherently unstable, suggesting Microsoft’s strategic focus remains on platform and developer ecosystem building, rather than standalone model capabilities.
The core argument positions this AI transition as distinct from previous platform shifts, such as mobile and cloud, in which digital systems augmented human work rather than potentially replacing organizational structures and firm boundaries.
For cloud practitioners, this framing matters because it signals where Microsoft is likely to invest, specifically in tooling, APIs, and developer infrastructure that ties AI capabilities into existing enterprise workflows rather than competing purely on model benchmarks.
The post generated substantial engagement with 35,000 reposts and 49,000 likes, indicating the topic of AI economic impact on business structures is drawing broad attention from the technology community.
Podcast hosts may want to discuss whether the ecosystem argument favors established cloud providers like Microsoft, Google, and AWS, who already have developer ecosystems in place, and what that means for independent AI labs trying to build commercial businesses.

09:59 Ryan – “AI is not particularly useful unless you give it access to data, and you can’t give a frontier model access to data, right? You can train it on data and build your own model, but you need some sort of mechanism or platform to set up the rag, to set up any kind of localization or customer state, you know, grounding; like you can’t just put it in there.”

20:07 FinOps X 2026 Day 1 Keynote: The Wild West of AI, Token Economics and the Evolving Role of FinOps

The FinOps Foundation and the Linux Foundation announced their intent to form the Tokenomics Foundation, with backing from Oracle, Google, Microsoft, IBM, JPMorgan Chase, and others, to create open standards for AI billing and token cost attribution.
Tokens are being positioned as the atomic unit of AI spend, and the core challenge FinOps practitioners face is answering two questions: what does AI actually cost, and how do you measure the value of intelligence produced?
AWS announced several concrete feature updates, including Target Coverage for Savings Plans, Granular Cost Attribution for Amazon Bedrock, and an AI-powered FinOps Agent that supports natural language queries and autonomous savings execution.
Microsoft confirmed plans to support FOCUS 1.4 in 2026 and highlighted Microsoft Fabric and Foundry as tools for unifying data and AI cost management across the enterprise.
The FinOps Foundation updated its certification path with a new Technology Value credential covering public cloud, SaaS, data platforms, and data centers, and FinOps X will fold into a broader Tokenomicon conference in San Diego in June 2027.

23:10 FinOps X 2026 Day 2 Keynote: From Alerts to Agents

The FinOps Foundation introduced a Crawl, Walk, Run maturity model for agentic FinOps, framing the progression from automated alerts to autonomous cost management as a structured path practitioners can follow using community-validated best practices.
FOCUS 1.4 was announced at the conference, continuing the Foundation’s push to standardize cloud cost and usage data, with Oracle also announcing FOCUS 1.3 support and Flexera and Google Cloud both adding FOCUS compatibility to their platforms.
Google Cloud presented an AI Explainability Agent alongside Automated Spend Caps and full-stack AI cost visibility in Cost Reports, representing a shift from post-billing reactive alerts toward proactive cost control for AI workloads.
Pinterest shared a Tokenomic Layer Cake model that separates Product AI costs from Internal AI infrastructure costs, offering practitioners a practical framework for stacking optimizations so efficiency gains compound over time.
The Foundation announced Tokenomicon, a new event series dedicated specifically to the economics of AI, with the first event scheduled for Amsterdam in September 2026, signaling that AI cost management is becoming a distinct discipline within the broader FinOps practice.

24:41 Justin – “I also don’t know that tokens are going to last forever. It’s lasted longer than I thought it was going to, but I think there’s a lot of pressure for people to explain tokens and how they’re measured, and once you try to do that, it’s very difficult.”

AI Is Going Great – or How ML Makes Money

25:53 OpenAI to acquire Ona

OpenAI is acquiring Ona, a cloud execution platform that has served 2 million developers with secure, reproducible cloud environments, to expand the Codex agentic coding ecosystem.
The core technical addition is Ona’s customer-controlled execution model, which allows agents to run persistently inside an organization’s own cloud infrastructure rather than being tied to a single device or active session.
Codex currently has 5 million weekly users, up 400% from earlier this year, and the acquisition addresses a specific gap where longer-running agent tasks spanning hours or days need persistent, session-independent execution environments.
For enterprise deployments, Ona’s technology provides controls over where agents run, credential scoping, activity logging, and workflow review gates, which are requirements organizations need before moving agents from experimentation into production.
Developers and IT teams should note that this positions Codex as a persistent background worker across the software lifecycle, handling tasks like running tests, resolving issues, and modernizing applications without requiring an active user session.

26:43 Justin – “I’d guess this is nice that there’s a third party involved that if OpenAI ever wants to build something on top of all these data centers they’re building for Stargate, this is nice for them. But all the cloud providers are basically giving you this, too. So it’s sort of interesting.”

29:26 Results from first Anthropic Public Record

Anthropic surveyed nearly 52,000 Americans in late 2025 to establish a public baseline on AI attitudes, finding that 64% fear job displacement and 56% fear cognitive dependency, with both concerns dropping notably among daily AI users (54% and 46% respectively).
Only 15% of Americans trust AI companies to make decisions about AI development, the lowest of any institution tested, falling below the federal government at 20% and well below independent experts at 43%. This trust deficit is a notable data point for cloud and AI vendors building enterprise relationships.
Support for government AI regulation reached 71% overall and was bipartisan, with 79% of Democrats and 68% of Republicans in favor. Privacy, child safety, and liability for harm were the top areas where Americans want regulatory action.
The survey found that daily AI users support government oversight at nearly the same rate as the general public (74% vs 71%), suggesting that hands-on experience with AI does not reduce appetite for accountability and regulation.
Anthropic is pairing this public survey data with its Anthropic Economic Index and the 81,000-person Claude user interview study to build a multi-source picture of AI adoption, which the company says will inform its policy frameworks around mandatory safety testing and worker displacement support.

32:52 Ryan – “I think that trying to make sure that you have a broader exposure to how people are feeling and how people are using it is important, because I think that – especially when you start talking about regulation – you don’t want to regulate it for one type of person or one industry.”

37:53 Kimi K2.7 Code: Open-Source Agentic Coding Model

Moonshot AI has released Kimi K2.7 Code, an open-source coding-focused model built on a Mixture-of-Experts architecture with 1 trillion total parameters and 32 billion activated per token, supporting a 256K context window with full weights available on Hugging Face.
Benchmark improvements over K2.6 are notable, with gains of 21.8% on Kimi Code Bench v2, 11% on Program Bench, and 31.5% on MLS Bench Lite, plus roughly 10% improvement on agentic task benchmarks measuring autonomous execution.
A key efficiency improvement is a 30% reduction in thinking-token usage compared to K2.6, which translates directly to lower API costs and faster responses without sacrificing benchmark scores, an important consideration for production agentic workflows.
The model is purpose-built for long-horizon coding tasks like multi-file refactoring and extended debugging sessions, and always runs with thinking mode enabled, meaning non-thinking requests automatically fall back to K2.6.
Pricing starts at $19/month through Kimi Code membership, while API access uses per-token billing at $0.95 per million input tokens, with cache misses dropping to $0.19 on cache hits, making it worth evaluating for teams running high-volume coding agent pipelines.

39:06 Justin – “Kimi 2.6 is one of my favorite open source models for coding, and I use it all the time.”

41:43 Anthropic “pauses” token-based billing for its Claude Agent SDK

Anthropic announced then quickly paused a billing change for its Claude Agent SDK that would have shifted outside SDK usage to standard API rates starting June 15, with subscribers receiving only a monthly credit equal to their subscription price.
Under the current model, Agent SDK usage counts against weekly subscription caps rather than per-token API rates, which analysis suggests can make a Claude subscription worth many multiples of its cost compared to equivalent API spending.
The pause affects third-party apps and programmatic use via the claude -p command, meaning developers building automation workflows on top of Claude subscriptions can continue operating under existing limits for now.
For developers and businesses evaluating build-vs-buy decisions around AI agents, this situation highlights the pricing risk of building on consumption models that sit outside standard API contracts, since the underlying economics can shift with limited notice.
No revised timeline or alternative pricing structure has been announced, leaving Agent SDK users in a holding pattern that complicates longer-term cost planning for agentic.

23:34 Ryan – “I imagine there are real business problems behind these; capacity and end costs, I’m sure, are a factor. I don’t think it’s all just trying to squeeze every last dollar out of consumers, but I’m not really a big fan of this pricing model.”

Cloud Tools

44:12 Route public traffic to private applications with Cloudflare

Cloudflare is launching Application Services for Private Origins in closed beta for Enterprise customers, allowing public traffic to reach private applications without exposing those origins to the public internet, public IPs, or inbound firewall rules.
The feature works by adding a use_private_routing flag to standard DNS records, which signals Cloudflare’s proxy to route the final hop through existing private network connectivity like IPsec, GRE, CNI, or Cloudflare Mesh rather than over the public internet.
All of Cloudflare’s existing application services, including WAF, bot management, rate limiting, caching, and Workers, apply normally to this traffic, meaning private internal APIs and tools get the same security stack as public-facing applications without additional infrastructure.
The routing model extends beyond HTTP through Spectrum for TCP and UDP services and Workers VPC bindings, so databases, SSH endpoints, and AI agent backends on private IPs can all be fronted by Cloudflare without a load balancer or connector software on the origin.
Cloudflare is targeting general availability in Q4 2026 and has stated private-to-private traffic flows as the next milestone, where users and services on private networks would reach other private applications through the same Cloudflare security layer.

46:16 Ryan – “I’m a big fan of having gateway access to SSH endpoints for managing jump hosts and bastion, and having stuff that’s quickly stood up and taken down so that you can build inside environments, which is neat.”

AWS

47:04 Try the new console experience in Amazon Bedrock, optimized for Anthropic- and OpenAI-compatible APIs

AWS launched a new Amazon Bedrock console experience built around the bedrock-mantle endpoint, which supports OpenAI Chat Completions API, OpenAI Responses API, and Anthropic Messages API, making it easier for teams already using those SDKs to route requests through Bedrock without rewriting code.
The project-based workflow is the most practical addition here, letting developers group models, API keys, usage metrics, and code snippets under a single project, which reduces the context-switching that typically slows down the evaluation-to-production cycle.
Live documentation that auto-populates with your project’s model ID, region, endpoint URL, and API key is a notable developer experience improvement, since you can copy a code snippet directly from the console and run it without manual edits.
The console includes direct integration instructions for AI coding agents, including Claude Code, Cline, Codex, Cursor, and OpenCode, allowing teams to route those tools through Bedrock using IAM credentials or Bedrock API keys rather than direct vendor endpoints.
The new console is available now across 15 regions, including US East, US West, several Asia Pacific locations, and multiple European regions, though fully managed Bedrock features like Agents, Knowledge Bases, and Guardrails remain in the existing console on the bedrock-runtime endpoint. Pricing follows standard Bedrock inference rates for the underlying models used.

48:07 Justin – “I will tell you that the Bedrock Console – today – is terrible. It is not great.”

52:10 AWS Cost and Usage Report 2.0 now supports table configurations update

CUR 2.0 now allows customers to update table configurations like column selection, time granularity, and export format directly through the AWS Console or SDK/CLI, eliminating the previous requirement to delete and recreate exports when adopting new features.
This change is particularly useful for teams running ETL pipelines against CUR data, since they previously had no in-place update path and had to manage export recreation carefully to avoid disrupting downstream jobs.
The update takes effect on the next scheduled export delivery, so customers should plan schema changes with their data engineering teams to avoid breaking existing cost reporting workflows.
There is no additional charge for this capability since CUR 2.0 exports are priced based on S3 storage and data transfer costs, though customers should review the AWS Data Exports documentation at the link in the announcement for configuration details.
The feature is available in all commercial AWS Regions except AWS GovCloud (US) and China Regions, which is a notable gap for customers operating in those environments who will still need to manage exports through the old delete-and-recreate approach.

52:53 Ryan – “Man, where were all these cost usage optimizations when I had to generate all the reports?”

54:11 Amazon OpenSearch Service launches MCP Apps for agentic observability

Amazon OpenSearch Service now supports MCP Apps, letting AI agents in local IDEs like Claude Desktop and VS Code investigate incidents using logs, traces, metrics, and alerts stored in OpenSearch domains and Amazon Managed Service for Prometheus without switching tools.
Each MCP App tool call returns a dual response: a text summary for the agent to reason over and an interactive visualization rendered in the same conversation thread, keeping human review integrated into the agentic workflow.
The feature covers a broad set of observability use cases, including root cause analysis, distributed trace exploration, service maps, PromQL metric charts, and cross-signal correlations, all within a single conversation context.
Available MCP App tools span log, metrics, and trace investigation, service performance, topology, agent health, cluster health, and instrumentation scoring, giving teams a fairly complete observability toolkit through the agentic interface.
The feature is available in all AWS regions where the Amazon OpenSearch UI is supported, and pricing follows existing OpenSearch Service consumption costs with no separate MCP App charge mentioned in the announcement.

54:32 Ryan – “The open search API, if it’s anything like the Elastic Search API, is cumbersome to use. I find it very difficult to sort of query Elastic Search natively. And then you add all that sort of reliability and performance problems we’ve had with log ingestion and that kind of stuff – which still makes me a little twitchy. Enough time has passed where I can sort of talk about it openly now. So I kind of liked the idea of having MCP front that and having an easy way to query that data, where I can just have AI do it for me… which is great.”

56:00 Evaluate AI agents systematically with Agent-EvalKit

Agent-EvalKit is an open-source Apache 2.0 toolkit that evaluates AI agents by tracing their full execution path, including tool calls and intermediate state, rather than just checking final output quality. It integrates directly with AI coding assistants like Claude Code and Kiro CLI, keeping evaluation inside the development environment.
The toolkit organizes evaluation into six phases: Plan, Data, Trace, Run, Eval, and Report. Each phase produces artifacts that feed into the next, and developers invoke them through natural language slash commands, with results stored in an eval directory for reuse across evaluation cycles.
A travel research agent case study illustrates the practical value: Response Quality scored 83.9% and looked acceptable on the surface, but Faithfulness scored only 32.3%, revealing the agent was fabricating exchange rates and temperatures whenever tool calls returned empty results. This kind of failure is invisible to output-only testing.
The toolkit supports OpenTelemetry-compatible tracing and integrates with frameworks including Strands, LangGraph, and CrewAI, plus evaluation libraries like DeepEval and the Strands Evals SDK.
For production monitoring beyond pre-deployment testing, AWS recommends pairing it with Amazon Bedrock AgentCore Observability and AgentCore Evaluation.
Costs are not fixed since Agent-EvalKit itself is free, but LLM-as-judge metrics require Amazon Bedrock foundation model inference, so teams should review Bedrock pricing based on their model selection and test case volume before running large evaluations.

58:13 AWS announces AWS Workload Credentials Provider

AWS Workload Credentials Provider is an open source, client-side tool that automates certificate deployment from ACM and secrets caching from Secrets Manager, replacing custom EventBridge automation that customers previously had to build and maintain themselves.
The tool is particularly relevant given the CA/B Forum mandate to reduce public certificate lifetimes, which increases the operational burden of certificate rotation at scale and raises the risk of expiry-related outages.
It runs on Windows and Linux with built-in support for Apache and NGINX, handling certificate export, file placement, and server reload behavior through simple configuration rather than custom scripting.
For secrets management, it maintains full backwards compatibility with the existing Secrets Manager Agent, so teams can consolidate both use cases into a single provider without reworking existing integrations.
The provider is available now across all AWS Regions, works for both AWS and non-AWS workloads, and is open source on GitHub.
Pricing follows standard ACM and Secrets Manager rates with no additional charge for the provider itself.

59:15 Ryan – “It’s great when you’re using ACM natively with Amazon, you know, and the load balancer, and it’s just sort of handled.”

1:00:00 AWS DevOps Agent expands with custom SRE agents and MCP/A2A protocols

AWS DevOps Agent now supports custom SRE agents that run on a schedule within Agent Spaces, enabling teams to automate recurring tasks like daily database health checks or log anomaly detection without manual intervention.
The addition of MCP and A2A protocol support lets developers invoke DevOps Agent from tools they already use, including Kiro, Claude, and other coding assistants, reducing context switching during incident investigation.
Teams can now connect their own sub-agents built on Amazon Bedrock or third-party frameworks via A2A, effectively extending DevOps Agent with custom capabilities rather than being limited to built-in functionality.
Additional updates include incident-skip rules, Git-managed skills, persistent memories, human labeling for task quality tracking, and customer-created dashboards, suggesting the service is maturing toward production-grade SRE use cases.
The service is now available in five additional regions, though pricing details are not specified in the announcement, and teams should review the supported regions table and recent improvements page in the AWS docs before planning adoption.

1:00:52 Ryan – “Agents like this can be expensive too because there’s a lot of data… so as long as these things can remain affordable, it’s great.”

42:46 Amazon CloudWatch Query Studio is now generally available

Amazon CloudWatch Query Studio is now generally available, offering a unified interface for querying and visualizing metrics using either PromQL or Metrics Insights SQL from a single workspace within the CloudWatch console.
Teams managing services across multiple AWS accounts and regions can use per-query cross-account and cross-region selectors to correlate metrics like latency and error rates without switching between consoles or tools.
The visualization options are notably broad, including line, bar, scatter plot, heatmap, histogram, pie, gauge, and number widgets, with dual y-axis support and series overrides, making it more capable than basic CloudWatch charting.
Query Studio integrates with CloudWatch dashboards and supports Grafana imports, which gives teams already using Grafana a migration path or a way to work across both toolsets without rebuilding queries from scratch.
The service is available in all commercial AWS regions except Middle East UAE, Middle East Bahrain, and Israel Tel Aviv.
Pricing follows standard CloudWatch metrics query costs, so teams should review their existing CloudWatch pricing tier before adopting it at scale.

1:03:04 Justin – “…this is a cool feature. I’m glad it exists, but please stop calling everything a studio.”

1:03:41 AWS launches Cost Explorer historical data retention for accounts in billing groups

AWS Cost Explorer now retains historical billing data at original AWS billable rates for accounts that are part of billing groups in AWS Billing Conductor or Billing Transfer, closing a gap that previously cut off access to pre-enrollment cost history.
Before this change, accounts mapped to billing groups could only see pro forma rates set by the payer account, making it difficult to compare costs or run reports that spanned the period before and after joining a billing group.
Existing Billing Conductor and Billing Transfer customers automatically gain access to their historical data with no migration or configuration steps required, which is a practical benefit for teams already managing multi-account environments.
The feature supports reporting continuity for enterprises using Billing Transfer to centralize cost management across multiple AWS organizations, a common pattern for large companies or managed service providers handling consolidated billing.
Billing Transfer is available in all AWS Regions except GovCloud, China Beijing, and China Ningxia, and there is no additional cost mentioned for this historical data retention capability specifically.

1:04:47 Ryan – “For the 12 people that need this, they’re gonna love it.”

1:06:01 Introducing the Kiro merch store

Kiro, AWS’s AI-powered IDE, has launched an official merchandise store at shop.kiro.dev with 15 items, including apparel, accessories, and developer-focused gear like mechanical keyboard keycaps and a ghost plushie for rubber duck debugging.
The store launch is notable mainly as a community-building effort rather than a technical announcement, signaling that Kiro is investing in developer identity and brand presence beyond the product itself.
From a practical standpoint, this is not an AWS infrastructure or tooling update, so listeners looking for technical developments should note this is purely a marketing and community initiative with no direct impact on cloud workflows or costs.
The merchandise details do reflect some developer-specific thinking, such as PBT keycaps chosen for durability and a roll-top backpack with a dedicated 16-inch laptop compartment with side zipper access, suggesting the product team is targeting working developers rather than casual fans.

1:08:14 AWS WAF adds AI traffic monetization capability to help content owners charge AI bots for content access

AWS WAF now lets content publishers charge AI bots for access to their content directly at the network edge, using HTTP 402 responses and the x402 open protocol for machine-to-machine payments settled in USDC stablecoins.
This addresses a real cost problem since AI bot traffic now exceeds 50% of web traffic for many publishers, with AI-specific crawlers growing over 300% year-over-year, while returning little to no referral traffic back to publishers.
Payment settlement is handled through Coinbase’s x402 Facilitator, with Stripe and Machine Payments Protocol support coming soon. AWS does not take a cut of content revenue, and the feature is available at no additional charge beyond standard WAF pricing.
The feature builds on existing AWS WAF Bot Control, which already classifies over 650 distinct AI bot types, including GPTBot, Claude-Web, and Perplexity-Bot, assigning each a verified or unverified status using cryptographic signatures or IP reputation matching.
Publishers can set per-request pricing by content path, bot category, or verification tier without modifying origin infrastructure.
A notable constraint is that the Monetize action only works with CloudFront-associated web ACLs and is not supported for regional web ACLs, so publishers need to route through CloudFront to use this capability.
A test mode using blockchain testnets like Base Sepolia and Solana Devnet lets teams validate the full payment flow before going live, which is a practical consideration given the novelty of stablecoin-based payment flows in production infrastructure.

1:09:28 Ryan – “I understand a news media site – all of that content – you’re going to pay for all of the hosting and all the hits and there’s going to be zero benefit, you’re not going to have any eyes on your page. It’s someone just getting an answer on some other tool, right? So I really do understand it. And they’re already struggling for money. So it does sort of make sense to me that they would need to do something like that. And I think the alternative to this is just blocking that traffic.”

1:14:26 AWS Sign-in now supports resource-based policies and resource control policies

AWS Sign-in now supports resource-based policies at the account level and resource control policies (RCPs) at the organization level, giving teams a way to restrict console sign-in to specific trusted networks.
Policies are evaluated both at sign-in and whenever the console session requests new credentials, meaning network restrictions are enforced continuously rather than just at the initial login point.
RCPs integrate with AWS Organizations, so security teams can enforce consistent sign-in network controls across all accounts in an org without configuring each account individually.
This feature pairs with AWS Management Console Private Access to create layered controls, letting organizations define both which networks users can sign in from and which accounts those users can reach.
The feature is available at no additional cost in all AWS commercial regions, making it a straightforward addition to existing preventive security controls for organizations already using SCPs and Organizations.

1:15:21 Ryan – “…for SCPs, that’s where the security really wanted just these big overall ban hammers, right? Resource control is something that, I think it brings it a little bit more down to like the cloud team or someone who’s kind of more in line with the runtime, because it allows you to do contextual access based off of resource, right? Instead of granting all of the permissions to any resource, this allows you to specify the resources specifically.“

GCP

1:16:37 Introducing DiffusionGemma

Google released DiffusionGemma, a 26B Mixture of Experts open model under Apache 2.0 that generates text using diffusion rather than sequential token-by-token processing, producing up to 4x faster output on GPUs like the NVIDIA H100 at 1000+ tokens per second.
The speed advantage is specifically designed for local and low-concurrency inference scenarios, not high-traffic cloud serving where autoregressive models remain more cost-efficient.
Developers building real-time interactive tools like inline editors or code infilling tools are the primary target audience.
The model activates only 3.8B of its 26B parameters during inference, fitting within 18GB VRAM when quantized, making it compatible with consumer GPUs like the RTX 4090 and 5090. This is a notable accessibility consideration for developers without enterprise hardware.
Bi-directional attention is a key technical differentiator, allowing the model to generate 256 tokens simultaneously where every token can reference all others. This enables use cases that autoregressive models handle poorly, such as code infilling, Sudoku-style constraint solving, and structured format generation.
DiffusionGemma is available now on Hugging Face and through GCP Model Garden, with toolchain support from vLLM, MLX, Hugging Face Transformers, Unsloth, and NVIDIA NeMo.
Google explicitly notes output quality is lower than standard Gemma 4, so production quality-sensitive workloads should stay on the standard model

1:19:18 Choosing your surface: Antigravity 2.0, Antigravity CLI, Antigravity IDE, or Antigravity SDK

Google announced Antigravity, an AI agent platform available in four distinct surfaces: a desktop app (Antigravity 2.0), a terminal-based CLI, an IDE integration, and a Python SDK. All four interfaces run on the same underlying agent harness, meaning plugins, skills, and core logic are consistent regardless of which surface you choose.
Antigravity 2.0 is the default recommendation for most users, offering a standalone desktop app that can manage multiple autonomous agents working across independent projects simultaneously, including scheduled tasks for things like code quality checks.
The CLI surface is built in Go for speed and supports headless execution, making it a practical option for SSH workflows, remote containers, or CI/CD pipelines where a GUI is not available.
The Python SDK is notable for teams wanting to build custom agents, as it runs on the same shared harness as Google’s official tools and allows local development with deployment to Google Cloud requiring no code changes.
No pricing information was provided in the announcement.
Documentation and downloads are available at antigravity.google for teams evaluating which surface fits their workflow.

1:19:29 Justin – “Basically, it’s anti-gravity with the IDE, which is what they started with. And then they came out with a CLI. And now they’ve got an SDK and Anti-Gravity 2.0 as the as the desktop app all available to you now.”

1:20:53 Google expands Alabama data center campus, funds community efforts

Google is investing $1.5 billion in 2026 and 2027 to expand its existing data center campus in Jackson County, Alabama, a facility that has operated since 2019 on a repurposed former coal-plant site.
This expansion signals continued growth in Google’s physical infrastructure footprint in the southeastern United States.
The expansion is notable for its self-funded model, with Google covering 100% of its own power and infrastructure costs, which is worth noting for GCP customers thinking about how hyperscaler investments translate to regional capacity and reliability.
Google is pairing the infrastructure investment with a $2 million Energy Impact Fund in partnership with TVA and CAANEAL, focused on local energy efficiency and weatherization programs, reflecting a broader pattern of data center operators addressing community energy concerns alongside capacity growth.
Community-facing commitments include $550,000 in STEM kits for fourth through eighth graders and digital skills training that has reached over 130,000 Alabamians to date, which speaks to the workforce pipeline considerations that often accompany large-scale data center expansions.
For GCP customers, the practical takeaway is that continued infrastructure investment in this region supports long-term availability and capacity for workloads running in Google’s US-based regions, though specific new region or zone announcements were not part of this particular update.

1:22:22 Justin – “…sustainability now becomes how do we make people not mad at us?”

1:22:34 Brazos liquid cooling system for air-cooled data centers

Google developed Brazos, a rack-mounted closed-loop liquid-to-air cooling system designed to handle chips exceeding 1000W thermal design power without requiring full facility retrofits. It installs one rack at a time into existing air-cooled data centers, separating the internal liquid loop from facility water supplies.
Each Brazos unit supports 60 kW of thermal load per rack across three modular chassis, runs on deionized water or a 25% propylene glycol mixture, and operates on 40-60V DC input connecting directly to standard rack busbars. Pumps and fans are hot-swappable field-replaceable units to reduce repair time.
Brazos uses OCP ORv3 form-factor racks and Google plans to open-source the full technical specifications through the Open Compute Project forum in the coming months, inviting manufacturers and thermal engineers to produce and market the design independently.
The primary audience for this announcement is data center operators running legacy air-cooled facilities who need to support high-density AI or HPC workloads without the capital expense and time required for full chilled water infrastructure upgrades.
No pricing information was provided in the announcement.
Organizations interested in adopting the design should monitor the Open Compute Project forum at opencompute.org for upcoming specification releases and engage with Google’s manufacturing suppliers directly.

1:24:22 Ryan – “Back when I was building data centers, like it was one of those things, everal data centers that we had were half empty, right? Because we didn’t have the power density. And so it’s l millions of square feet of just empty space with a little like rack mount sticking out of the floor… so I like seeing these kinds of announcements”

Azure

1:25:32 Stop wasting time and use Custom Extensions for PIM approvals

Custom Extensions for PIM allow organizations to inject their own approval logic into the Privileged Identity Management workflow via a standard REST API, replacing manual approval steps with automated validation against external systems like ServiceNow, Workday, or Dynamics.
When a user submits a PIM activation request, the system pauses its internal checks and sends an HTTP payload to your custom API endpoint, which then returns an approved or denied response that PIM executes automatically, supporting both pre-approval and post-approval configurations.
The licensing requirement is a notable consideration: Custom Extensions require Entra ID Governance licenses or Entra Suite, not just the Entra P2 licenses that cover standard PIM functionality, which adds cost for organizations looking to automate their approval workflows.
This feature is best suited for organizations that already have a mature PIM process in place and want to reduce admin overhead through ticket validation automation, rather than those still working on basic PIM adoption.
Setup involves creating the custom extension in the Entra Admin Center under ID Governance, linking it to specific roles or groups, and connecting it to an App Registration with the requestedAccessTokenVersion set to 2, which is a non-obvious configuration step worth noting for teams planning implementation.

1:26:58 Ryan – “I think we’re going to see a lot more of this as everyone is trying to deal with Agentic identity.”

1:27:14 AI 200 – Azure Container Apps Express: Blazing-Fast Deployments Without the Overhead

Azure Container Apps Express App is a new preview creation mode that eliminates the need to pre-provision a Container Apps Environment, reducing deployment time to under 3 minutes from zero to a publicly accessible URL.
It is currently only accessible via containerapps.azure.com and the Azure CLI, not the main Azure Portal.
The Express mode auto-provisions its own environment, requires only three inputs (app name, resource group, and region), and defaults to a public endpoint on port 80 with 0.5 vCPU and 1 GB memory, making it well-suited for rapid prototyping and CI/CD pipelines.
Express Apps support scale-to-zero and up to 300 maximum replicas with KEDA-based scale rules, putting it on par with standard Container Apps for burst scenarios despite the simplified setup experience.
Regional availability is currently limited to East Asia and West Central US during preview, which is a notable constraint for teams with data residency requirements or latency-sensitive workloads in other regions.
Pricing details are not explicitly covered in the announcement, so teams should verify costs at the Azure pricing page before adopting Express mode for production workloads, particularly given the auto-provisioned environment model which may differ from standard Container Apps billing.

1:27:55 Justin – “This sounds like a good way to waste a lot of money…”

1:29:25 Introducing scheduled antivirus scans on Microsoft Defender Linux

Microsoft Defender for Endpoint on Linux now supports scheduled antivirus scans, a capability that security teams have long relied on for consistent threat coverage across device fleets.
This addresses a notable gap for organizations running Linux workloads under compliance frameworks that require periodic full-system scans.
The feature helps catch dormant or previously missed threats that real-time protection may not surface, making it particularly relevant for servers handling sensitive workloads where periodic deep scans are part of audit requirements.
This addition brings Linux endpoint protection closer to feature parity with the Windows version of Defender, which matters for organizations managing mixed OS environments through a single security platform like Microsoft Defender XDR.
Target customers are enterprise security and compliance teams running Linux servers in regulated industries such as finance, healthcare, or government, where scheduled scan logs serve as evidence for audits.
Pricing is tied to existing Microsoft Defender for Endpoint licensing rather than being a separate add-on, so current Linux Defender customers should be able to adopt this without additional cost.
Organizations not yet licensed should check the Microsoft Defender for Endpoint plan details at microsoft.com/security for current pricing tiers.

Oracle

1:32:43 Oracle Announces Record Q4 and FY 2026 Results Driven by Cloud Infrastructure & Cloud Applications

Oracle reported Q4 FY2026 total revenue of $19.2 billion, up 21% year-over-year, with cloud infrastructure (IaaS) growing 93% and total cloud revenue reaching $9.9 billion.
The growth is notable but worth watching given that free cash flow was negative $23.7 billion as Oracle continues heavy datacenter investment.
The Remaining Performance Obligations figure of $638 billion, up 363% year-over-year, sounds striking until you read the fine print: a substantial portion comes from large AI contracts where customers either prepaid for GPUs or supplied their own hardware, totaling $75 billion. This structure shifts capital burden to customers rather than Oracle.
Oracle Multicloud AI Database reportedly grew 404% in Q4, which the company calls its fastest growing product ever, though it is growing from a smaller base and the metric reflects early adoption momentum rather than established scale.
Oracle is guiding for $90 billion in total FY2027 revenue and expects cloud revenue growth of 57 to 64% in Q1 FY2027, which would require sustaining the current infrastructure buildout funded by roughly $40 billion in planned debt and equity financing next fiscal year.
The Oracle Health AI rewrite of the Cerner system is positioned as a near-term growth driver, with Oracle projecting double-digit growth for that business in FY2027. Given Cerner’s historically troubled integration, listeners should watch whether execution matches the projection.

1:33:59 Justin – “They’re spending a lot of money on AI, so I hope it works out for everybody…”

Cloud Journey

1:32:43 Running an AI-native engineering org

Anthropic’s engineering director describes how agentic coding shifted the primary bottleneck from writing code to verifying it, meaning code review, security checks, and correctness validation now consume the time that implementation used to take.
The team replaced traditional sprint planning and design docs with just-in-time planning built around prototypes and PR discussions, reflecting that long-horizon roadmaps became obsolete when execution speed increased substantially.
Human review is now reserved for specific high-stakes areas like security-sensitive code, legal risk, and product judgment, while Claude handles style, linting, bug catching, and test generation automatically.
Role boundaries have blurred noticeably, with product managers writing more code and engineers taking on design and content work, which has practical implications for how teams hire and structure responsibilities.
The article suggests engineering leaders track three metrics as they adopt agentic workflows and cautions against treating throughput as the primary success measure, since the real goal is solving the underlying problem faster, not just generating more output.

Closing

358: AI Spend Limits Because Frontier Models Aren't Free Therapy

Fri, 19 Jun 2026 04:46:08 +0000

Welcome to episode 358 of The Cloud Pod, where the weather is always cloudy!

Justin, Matt, and Ryan (who, rumour has it, was working on an Eagles music podcast) are in the studio this week to bring you all the latest in AI and cloud news (and begging for a AI spend limit increase), including anthropic wanting everyone – except themselves – to slow down AI development, GitHub’s insane number of commits, and even an announcement from CoreWeave, plus so much more. Let’s get started!

Titles we almost went with this week

Stop Configuring Domains One by One Like a Peasant
SSH Into Your AI Agent Like It’s 1999
Your AWS Bill Finally Has an AI Babysitter
Stop Blaming Engineering, the AI Will Do It Now
GPU Queue Anxiety Meet Your Serverless Spark Therapist
One Wildcard Certificate to Rule All Subdomains
One PTU Reservation to Rule All Regions
Twelve Billion Parameters Walk Into a Laptop
Squeezing Gemma 4 Until the Bits Cry
Azure Cobalt 200 VMs Are Really Arm-ed and Dangerous
AI has gone all Fables and Myth
Arm-ed she blows: but probably not to a region near you
Dash to change your password as Dashlane gets owned
Siri AI shows just how slow Gemini is
AI Announces going public, and then spreads Myths about AI development

A big thanks to this week’s sponsors:

There are many cloud cost management tools out there, but only Archera provides insured commitments. It sounds fancy, but it’s really simple. Archera gives you the cost savings of a 1 or 3-year AWS Savings Plan with a commitment as short as 30 days. If you do not use all the cloud resources you have committed to, Archera will literally cover the difference. Other cost management tools may say they offer “insured commitments”, but remember to ask: Will you actually give me my rebate? Because Archera will.

Check out thecloudpod.net/archera to schedule a demo today.

General News

01:27 How GitHub plans to win developers back

GitHub’s scale challenge has grown substantially beyond earlier projections.
The platform processed 1 billion commits in all of 2025, but now handles 1.4 billion commits per month, with AI agents alone generating over 17 million pull requests monthly.
The technical remediation work has shifted from surface-level scaling to architectural rebuilding. GitHub has addressed MySQL contention, moved webhooks off MySQL entirely, rewritten the GitHub Actions job dispatch system, and is migrating performance-sensitive code from its Ruby monolith to Go.
GitHub’s migration to Microsoft Azure, previously reported as a capacity move, is now described as a deeper infrastructure overhaul.
The goal is service isolation so that a degraded subsystem like Actions does not cascade failures to Git or other core services.
Microsoft is providing engineering support from teams with experience scaling systems at comparable load levels, which represents a more direct operational involvement than what was previously discussed.
New feature releases like the Copilot CLI app are being developed outside the core GitHub.com infrastructure, which GitHub says allows continued product work without adding risk to the systems currently under remediation.

03:0 Ryan – “I’d actually like to see AI coding take this up a little bit, because I think it is a ridiculous sort of growth that I don’t think is sustainable, and so much of vibe-coded garbage is really bloated…But there are definitely functionality things that it can do a lot more efficiently, and doesn’t.”

AI Is Going Great – or How ML Makes Money

07:44 The Interoperable Lakehouse: Agency Over Your Data

Snowflake’s Interoperable Lakehouse is now generally available, built on Apache Iceberg v3, Apache Polaris, and a new Open Semantic Interchange spec with 54 participating vendors.
The Iceberg v3 support adds VARIANT for semi-structured data, row lineage, deletion vectors, nanosecond timestamps, and geospatial types, closing gaps that previously made Iceberg impractical for many workloads.
Horizon Catalog now supports full bidirectional read and write access from external engines like Spark, Trino, and PyIceberg via vended credentials, meaning teams can define governance policies once in Snowflake and have them enforced across any compatible engine without data migration.
Zero-copy integrations with SAP (GA), Salesforce, and Workday (private preview) bring enterprise system data into Snowflake without ETL pipelines, preserving semantic context so AI agents reason over current, governed data rather than stale copies.
Managed Iceberg replication and failover are coming soon to GA, with an Optimized Refresh feature in public preview showing 1.6x to 22x faster replication performance in preview testing, which directly reduces Recovery Point Objectives for mission-critical workloads.
Horizon Context and Semantic View Autopilot (GA) addresses the semantic fragmentation problem by automatically generating and maintaining shared business definitions across databases, BI tools, and data pipelines, giving AI agents a consistent semantic layer rather than conflicting definitions across systems.

09:25 Snowflake CoCo: AI Coding Agent for the Modern Data Stack

Snowflake announced CoCo at Summit 2026, expanding it from an AI coding agent into a full AI development platform with a native desktop app for Windows and macOS, Cloud Agents running inside Snowsight, an Agent SDK, and upcoming Slack and mobile integrations.
Each Cloud Agent session provisions an isolated container that can run Python, shell commands, dbt builds, and web searches with no local setup required.
On the ADE-Bench framework from dbt Labs, CoCo achieved a 72.1% pass rate on real-world data engineering tasks, outperforming both Claude Code and OpenAI Codex, which each scored 65.1%.
CoCo also used 51% fewer tokens and completed tasks 8% faster than Claude Code on Opus 4.7, attributed to targeted data exploration and native tool integrations with Snowflake, dbt, and Airflow instead of bash-based workflows.
The CoCo Agent SDK packages the same agentic engine as an installable library for JavaScript and Python, giving developers programmatic access to Snowflake querying, SQL execution, codebase search, and file editing without building that infrastructure themselves. This allows platform engineers to embed CoCo into CI/CD pipelines, backend services, and internal tools.
Governance is enforced at the infrastructure level, with every CoCo operation running under the user’s existing Snowflake RBAC, LLM inference staying within Snowflake’s security perimeter, and full prompt logging, query tagging, and audit trails available for admin oversight.
This addresses a common gap where generic coding agents generate plausible-looking code that fails in governed production environments.

09:59 Snowflake CoWork: The Personal Work Agent for Every Knowledge Worker

Snowflake rebranded Snowflake Intelligence to CoWork, positioning it as a personal work agent for knowledge workers that combines proactive task automation, multi-agent orchestration, and persistent memory across sessions.
The system moves beyond reactive Q&A toward background monitoring, scheduled analysis, and direct action in tools like Slack, Gmail, Salesforce, and Jira via MCP connectors.
The upcoming Cortex Sense context layer is a notable technical addition, automatically learning business definitions from query history, dashboards, and metadata without manual configuration. Internal testing showed 83% accuracy on complex enterprise queries with Cortex Sense enabled, compared to 47% without it and 23% for frontier coding agents using Snowflake MCP.
Deep Research, moving to general availability soon, uses a multi-agent swarm orchestration system to analyze both structured and unstructured enterprise data in parallel, outperforming single-agent systems by over a third on Snowflake’s Hybrid Deep Research Benchmark.
This allows users to get fully cited analytical reports in minutes on questions that previously required days of analyst work.
Several features are entering public or private preview, including Memory for persistent user preferences, User Skills for recording reusable multi-step workflows, Async Agent API for long-running background tasks, and an iOS mobile app for full CoWork access on the go.
The governance model is worth noting for enterprise buyers: every agent action is scoped by role-based access controls, admin-defined policies determine what agents can do autonomously versus what requires human approval, and a complete audit trail logs all actions with policy reasons.

10:29 Justin – “I assume Anthropic will be suing them any moment for trademark infringement, but nice to see that you’re getting some smartness for the data friends who desperately need all the DevOps help they can get. So I appreciate they’re getting these tools.”

16:00 Anthropic urges global pause in AI development

Anthropic published a blog post calling on major AI labs to consider slowing development, citing the risk of recursive self-improvement, where AI systems could enhance their own capabilities without human intervention. Co-author Jack Clark estimated this could occur within two years.
The proposal draws a direct parallel to nuclear arms control, suggesting a global agreement and verification regime. Anthropic noted a key challenge: training runs are far easier to conceal than missile silos, raising practical questions about enforcement.
The call for a slowdown comes as Anthropic reported an annualized revenue run rate on track for $50 billion by the end of June 2026, up from $9 billion at the end of 2025, and filed confidential IPO paperwork at a valuation near $1 trillion.
Critics, including David Sacks, characterized the move as regulatory capture, arguing that established players advocating for development restrictions could disadvantage newer or smaller competitors in the AI space.
For cloud practitioners, the broader implication is that compute governance and training run transparency may become compliance considerations, particularly if international frameworks modeled on arms treaties gain traction among governments.

16:41 Ryan – “This has been what people have been sort of warning for ages with AI development, and this isn’t anything new. I’m surprised by the timing of it because it doesn’t make sense to me that they’re doing this now, but this is a huge concern. And I know just from trying to secure workloads in my day job, you try to put human and loop flows in place, but you know, people don’t really want to be in the loop. The whole advantage of using AI is the advantage the velocity gains. So having a human that does all the approval is problematic.”

20:04 Claude Fable 5 and Claude Mythos 5

Anthropic launched Claude Fable 5 for general availability and Claude Mythos 5 for restricted access, both priced at $10 per million input tokens and $50 per million output tokens, which is less than half the cost of the previous Mythos Preview model.
Fable 5 is the general-use version with safety classifiers active, while Mythos 5 is the same underlying model with certain safeguards lifted for vetted cybersecurity and biology partners.
The models introduce a tiered safety classifier system that automatically routes flagged requests in cybersecurity, biology/chemistry, and distillation categories to Claude Opus 4.8 instead of refusing outright.
Anthropic reports this fallback triggers in fewer than 5% of sessions, and external red-teaming found zero successful universal jailbreaks on harmful cyber queries across 30 different public jailbreak techniques.
On the software engineering side, Stripe reported Fable 5 completed a codebase-wide migration across a 50-million-line Ruby codebase in one day, a task estimated to take a full team over two months manually. The model also scores highest among frontier models on Cognition’s FrontierCode evaluation for production-quality coding standards.
Mythos 5 demonstrated autonomous scientific research capabilities, including outperforming a recently published Science journal model on a genomics task despite being 100 times smaller, and accelerating drug design workflows roughly ten times in internal protein design testing.
Anthropic is requiring 30-day data retention for all Mythos-class model traffic, including on third-party surfaces, specifically to detect novel jailbreaks and cross-request attacks, with explicit commitments not to use this data for model training.

23:34 Matt – “I would also say you gotta get the foundation of your house set up. So if you are patching, it’s not that you’re patching, it’s how you’re patching… I don’t want somebody, to use a very simple example, I have fifty EC2 instances or VMs, and to do patching, I can’t have somebody log into fifty VMs. That’s not sustainable, and that’s not gonna work. Ryan in security here will check the box saying you are doing patching, but I’ve wasted three people’s days on this. But if you build it out so that each thing is an auto scaling group and everything else, which is where you’re going with the CICD stuff, and you build that proper workflow out, then patching is just release the new image.”

Security

29:46 Dashlane explains how attackers managed to download encrypted password vaults

Attackers exploited Dashlane’s device enrollment API by brute-forcing six-digit one-time tokens sent to user email addresses, successfully registering new devices on fewer than 20 accounts and downloading encrypted vaults before automated lockouts stopped the campaign.
The attack highlights a known tradeoff in OTP-based authentication: six-digit numeric codes have only one million possible values, making them vulnerable to brute force if rate-limiting and lockout mechanisms are not sufficiently aggressive.
Downloaded vaults remain encrypted and unreadable without the user’s master password, which Dashlane never stores, so the practical risk to affected users depends entirely on the strength of their master password.
This incident is a useful case study for developers building device enrollment or account linking flows, as it demonstrates how API endpoints handling authentication tokens need strict rate limiting, anomaly detection, and account lockout thresholds to prevent automated abuse.

30:55 Ryan – “And right now, it’s the strength of that master password. But with quantum encryption, it’s going to be able to break through the algorithm generally.”

Cloud Tools

36:30 Hashicorp: rethinking infrastructure access in the age of agentic AI

HashiCorp Boundary addresses a growing security gap where AI agents need infrastructure access, but traditional IAM models were designed for human users with predictable access patterns.
The core value is giving each AI agent a unique identity with just-in-time credentials rather than static long-lived secrets.
Boundary’s credential injection feature means AI agents never directly handle or see credentials at any point during a session.
When paired with HashiCorp Vault, it generates short-lived dynamic credentials that expire after use, which limits the blast radius if an agent or orchestration layer is compromised.
The session-focused control plane enforces identity-aware authorization at the connection layer before infrastructure access is established, rather than relying on application-layer gateways. This means the entire network is abstracted away from agents, and all connections route through a Boundary proxy so only authorized identities can establish sessions.
The incident response use case in the article is worth noting because it shows each discrete action getting its own ephemeral session account that is deactivated once its purpose is fulfilled. This means standing privileges are continuously revoked rather than persisting across an agent’s entire operational lifetime.
Complete session recording and audit logging give security teams the ability to replay and review every action an AI agent took, tied to a specific operator, intent, and timeframe.
This addresses the compliance challenge organizations face when they cannot see or verify what autonomous agents are doing across their infrastructure.

37:52 Ryan – “I’m so annoyed by this because they’re like, this is rethinking an age of agentic AI. No, this is what we should do for all authentication, not just AI. It doesn’t treat anything about AI. It doesn’t identify AI agents. And it’s just setting up a user within HashiCorp boundary and then assigning that user to an agentic AI, just like a human. So this doesn’t actually address anything agentic. And these things should be patterns we need to be moving to in general.”

AWS

42:46 Improve your application resilience with Amazon Cognito multi-Region replication

Amazon Cognito now supports multi-region replication, automatically synchronizing user profiles, credentials, and pool configurations from a primary region to a secondary region of your choice.
This eliminates the need for custom-built replication solutions that previously created security risks and operational overhead.
The feature is read-only on the secondary side, meaning authentication continues during failover, but new user registrations and profile updates are unavailable. Teams should note that Lambda triggers, WAF configurations, and log streaming must be manually configured in the target Region separately.
A notable requirement is that customers must configure a multi-region customer-managed KMS key before enabling replication, and OIDC issuer endpoints must be updated across all client applications, including mobile app store resubmissions. This upfront migration work is a practical consideration before adoption.
Pricing is an add-on to existing Essentials and tiers, costing $0.0045 per monthly active user per replica Region on Essentials and $0.006 on Plus, with M2M authentication adding a 30% surcharge on standard token pricing. The feature is available across 20-plus Regions spanning North America, Europe, Asia Pacific, and South America.
For regulated industries like healthcare and financial services, the companion customer-managed keys feature provides encryption control that can help meet compliance requirements, and is available in a broader set of regions, including AWS GovCloud.

43:54 Matt – “… it’s just a nice quality of life improvement to actually get this out.”

45:36 Customize federated sign-in with new Amazon Cognito Lambda trigger

Amazon Cognito now supports an inbound federation Lambda trigger that intercepts federated authentication responses from external identity providers before user attributes are written to the user pool, giving developers programmatic control over attribute transformation, filtering, and enrichment.
For B2B and SaaS applications, the trigger solves a practical problem: enterprise SAML providers send hundreds of group memberships that exceed Cognito’s 2,048-character attribute limit, enabling developers to filter and normalize groups without coordinating changes with customer IT departments.
For B2C applications, the trigger enables automatic account linking across multiple sign-in methods by matching federated email addresses to existing local Cognito accounts, preventing duplicate user records when customers forget they already registered with a different provider.
The trigger runs on every federated sign-in rather than only at initial account creation, which means linking logic and attribute transformations apply continuously, and developers always have access to the latest IdP attributes.
Key implementation constraints to note: the Lambda function must complete within 5 seconds, errors in the function can block authentication for legitimate users, and automatic email-based account linking will not work with Apple’s Hide My Email feature.
The trigger is available now across all regions where Cognito is supported, with no separate pricing beyond standard Cognito and Lambda costs.

47:26 AWS Step Functions adds AgentCore-powered agentic reasoning step

AWS Step Functions now supports AI agent reasoning steps via an optimized integration with Amazon Bedrock AgentCore harness, currently in preview, allowing you to embed configurable AI agents directly into visual workflows without managing the underlying agent loop infrastructure.
Practical use cases include document classification, unstructured form extraction, and multi-agent pipelines where agents run in parallel or sequence with optional human approval gates at critical decision points.
Per-invocation overrides for model, system prompt, and tools let teams reuse a single harness configuration across different workflow contexts, and a session ID parameter enables agent context persistence within or across workflow executions.
Observability is built in through workflow execution history showing agent input, output, token usage, and duration, with links to detailed agent turn logs in Amazon CloudWatch for auditing every decision.
The integration is available in four regions (US East N. Virginia, US West Oregon, Europe Frankfurt, Asia Pacific Sydney) and follows standard Step Functions pricing with no additional integration charges, though standard Amazon Bedrock and AgentCore pricing still applies for model inference.

48:25 Ryan – “You know I lust over state machines, so I find it funny because this is all I think about when I’m putting an agent workflow together. This would be so much easier in a state machine. And so now they’ve done it. I will absolutely use this so much, because it’s something I already kind of do with lambda functions. It’s just now that I won’t have to define the logic as specifically. It’ll just be like four pages of markdown in my lab.”

51:29 Amazon Bedrock AgentCore Runtime introduces interactive shells for terminal access into agent sessions

Amazon Bedrock AgentCore Runtime now supports interactive shells via a new InvokeAgentRuntimeCommandShell API, giving developers a PTY-backed terminal over WebSocket directly into a running agent session, complementing the existing one-shot command execution API.
This is particularly useful for developers running coding agents like Claude Code or Amazon Kiro, allowing them to inspect files, run ad-hoc commands, and debug environment state as if working in a local terminal, with persistent state for environment variables and working directory across commands.
Each shell session is identified by a runtime session ID and shell ID, enabling manual reconnection after network drops, and a single agent runtime supports up to 10 concurrent shells for watching agents work across multiple branches simultaneously.
The CLI entry point is straightforward: agentcore exec –it –runtime followed by the runtime ARN, lowering the barrier for developers already familiar with standard terminal workflows to adopt the feature.
Pricing details are not specified in the announcement, so teams evaluating this feature should check the AgentCore Runtime pricing page directly before building workflows that depend on concurrent shell sessions at scale.

52:46 Matt – “Somebody needed it to debug some environment variable or working directory, and they were like, we could just quickly do this thing because it’s running ECS under the hood. We’ll just literally change the CLI call from AWS ECS exec to AWS Agent Core exec, and we’ve added a whole new feature, guys.”

53:12 AWS Cost Explorer launches intelligent cost explanations powered by Amazon Q

AWS Cost Explorer now includes an “Analyze with Amazon Q” button that generates automatic cost analysis covering trends, top drivers, and anomalies based on whatever filters and time period you have configured, eliminating the need to manually cross-reference multiple data points.
The feature adapts its output based on the date range selected, providing historical analysis for past periods, forecasts for future dates, or a combined view for mixed ranges, and maintains conversation context so you can ask follow-up questions to dig deeper.
This continues AWS’s pattern of embedding Amazon Q capabilities directly into existing console tools rather than requiring users to switch contexts, similar to integrations seen in services like CloudWatch and Security Hub.
From a practical standpoint, this is available in all commercial AWS Regions at no additional charge, meaning customers already using Cost Explorer can access it without budget considerations, though standard Cost Explorer usage costs still apply.
The most immediate use case is for teams doing monthly or quarterly cost reviews who previously had to manually build narratives around their spend data, as Q can now generate that explanation automatically as a starting point for optimization conversations.

54:10 Matt – “That will forever be my goal in life – understand what’s an EC2 other.”

54:20 AWS FinOps Agent is now available in preview

AWS FinOps Agent is now in preview at no additional charge, offering an AI-driven tool that answers cost questions, surfaces optimization recommendations, and runs scheduled FinOps workflows directly from the AWS Management Console.
The agent integrates with AWS Cost Optimization Hub and AWS Compute Optimizer to surface rightsizing, idle resource, and Savings Plans recommendations, and can automatically open Jira tickets to route action items to engineering teams.
Automated anomaly investigation is a notable capability here, where the agent detects cost spikes, investigates root cause, and posts findings to Slack without requiring manual triage from FinOps or engineering staff.
Preview availability is limited to US East (N. Virginia) for the agent itself, though cost and usage data cover all standard AWS Regions, excluding GovCloud and China regions.
Teams currently spending significant time on manual cost reporting and anomaly triage are the most likely to benefit, as the agent can generate reports for finance teams and handle recurring workflows on a user-defined schedule.

55:02 Justin – “This is kind of nice. I don’t know if it’s a full-featured solution for everybody, but it’s definitely something that’s gonna help you get started.”

GCP

56:52 Introducing Gemma 4 12B

Google released Gemma 4 12B, a multimodal model that runs locally on consumer hardware with 16GB of VRAM, positioning it between the smaller E4B and the larger 26B MoE model in the Gemma 4 family.
The model uses an encoder-free architecture, meaning vision inputs are processed through a lightweight embedding module and audio is projected directly into the same dimensional space as text tokens, reducing memory usage and latency compared to traditional separate encoder approaches.
Gemma 4 12B is the first mid-sized Gemma model to support native audio input, and it includes Multi-Token Prediction drafters to reduce inference latency for agentic workloads.
For GCP customers, the model can be deployed through Model Garden, Cloud Run with GPU support, and GKE, giving teams flexibility in how they operationalize it in production environments.
The model is released under Apache 2.0 and is available through Hugging Face, Kaggle, Ollama, and LM Studio, with fine-tuning support via Unsloth and inference support through vLLM, llama.cpp, and SGLang.
Google also released a Gemma Skills repository on GitHub to support agentic development patterns.

57:36 Gemma 4 with quantization-aware training

Google released Quantization-Aware Training checkpoints for Gemma 4, which integrates quantization directly into the training process rather than applying it afterward, resulting in better quality preservation compared to standard Post-Training Quantization approaches.
The mobile-specialized quantization schema reduces the Gemma 4 E2B model to under 1GB of memory by combining static activations, channel-wise quantization, targeted 2-bit compression for token generation layers, and embedding plus KV cache optimization.
For desktop and server use cases, QAT checkpoints are available in Q4_0 format with GGUF files ready for llama.cpp and compressed tensors for vLLM, with weights downloadable directly from Hugging Face at no cost for the model weights themselves.
Developers can selectively deploy only the modalities they need, such as text-only without audio or vision encoders, which further reduces the memory footprint and makes the models practical for constrained edge environments using Google’s LiteRT-LM runtime or Transformers.js for web deployment.
The release supports fine-tuning of QAT weights through Hugging Face Transformers and Unsloth, and also preserves the inference speedup from Multi-Token Prediction when using MTP QAT checkpoints, giving developers flexibility to optimize for both quality and throughput simultaneously.

58:17 Ryan – “These are things we need Jonathan for.”

58:45 Gemini models for Apple developers

Google announced that Apple developers can now access cloud-hosted Gemini models through Apple’s Foundation Models framework via the Firebase Apple SDK, starting with iOS 27, macOS 27, and related platforms.
The integration allows developers to swap between on-device Apple models and cloud-hosted Gemini models using the same API surface, which simplifies building agentic app experiences.
The integration is built on Firebase AI Logic, which removes the need to build and maintain a separate backend server for Gemini model access.
Firebase App Check is included to protect service APIs from abuse, addressing a common production security concern.
Gemini is also being integrated directly into Xcode as an agentic coding assistant for multi-step development tasks like code review, bug fixing, and feature building. Authentication supports both individual developers using a self-serve Gemini API key from Google AI Studio and enterprise teams using the Gemini Enterprise Agent Platform for dedicated quotas and data privacy controls.
Pricing has two tiers: individual developers can start with a free tier through Google AI Studio at ai.google.dev, while enterprise developers access dedicated corporate quotas through the Gemini Enterprise Agent Platform.
The preview release of the Foundation Models framework integration was set to begin the day after the WWDC announcement.
This is a practical option for iOS and macOS developers who want to add cloud AI capabilities without leaving the Apple development ecosystem or managing separate infrastructure.
The shared API surface between local and cloud inference is particularly useful for managing latency and cost tradeoffs in production apps.

1:00:01 Ryan – “I love the Apple Google partnership on this. You know, I’m really happy that Apple didn’t decide to develop its own frontier model and just muddy that space.”

Azure

1:03:27 New Azure Cobalt 200 VMs deliver 50% performance improvement, fully optimized for modern agentic AI workloads

Azure Cobalt 200 Arm-based VMs are now in early access preview, built on the Arm Neoverse V3 core and fabricated on TSMC’s 3nm process, delivering up to 50% better CPU performance over Cobalt 100 with up to 128 vCPUs per VM.
Real workload benchmarks show up to 135% better performance for database workloads and up to 80% better performance for caching workloads compared to the previous generation.
The VMs are specifically designed for agentic AI workloads, where continuous reasoning and sequential decision-making require sustained per-core performance and low latency.
Each physical core gets dedicated 3 MB of L2 cache and a 192 MB system-level L3 cache, allowing more agent sandboxes per VM without sacrificing throughput.
Cobalt 200 expands the Arm VM portfolio with two new families beyond what Cobalt 100 offered: High-Memory Optimized Mpsv4 VMs and Dense Local Storage Lpsv5 VMs, with all series delivering up to 85 Gbps network bandwidth and 70 Gbps remote storage throughput.
Memory encryption is enabled by default through a custom memory controller with negligible performance impact.
Microsoft’s own services, including Dataverse and Azure SQL Database, are already validating Cobalt 200, with Dataverse reporting up to 60% better performance over Cobalt 100.
Migration from Cobalt 100 is described as seamless, with full compatibility across existing workloads and support for AKS Arm nodes, GitHub Actions runners, and major languages including Python, Java, and .NET.
The preview is currently available in eight regions, including West US3, East US2, Central US, and Sweden Central, with additional regions to follow. Pricing is not yet publicly specified, so teams evaluating cost should sign up at aka.ms/Cobalt200VMs-signup for early access details.

1:04:44 Matt – “It’s great that they added this; I feel like they’re finally getting into the game of ARM. Getting capacity for them might require some twisting of your account team’s arm, especially if you want them at any scale. But the other problem is, which I still find comical, is that you can’t run Windows Server on ARM.”

1:06:58 Foundry IQ: Build smarter agents faster with unified knowledge and serverless retrieval

Foundry IQ is Microsoft’s unified knowledge platform for AI agents, now generally available with full SLA coverage, stable APIs, and compliance certifications.
It lets developers connect multiple data sources like Azure Blob Storage, OneLake, and web content into a single knowledge base without building custom connectors for each system.
The new Serverless Developer tier, in public preview, scales to zero when idle and bills by Compute Units measured in 0.25 CU increments per minute. Billing is not expected to begin until late 2026, so developers can experiment at no cost for now, accessible through the Foundry portal at ai.azure.com.
Agentic retrieval quality improvements show up to 20% better answer quality benchmarks and up to 54% improved recall compared to single-shot RAG, achieved through better query batching, semantic ranking, and server-side token caching to reduce redundant token consumption across multi-turn conversations.
The Foundry IQ MCP server exposes knowledge bases as a remote Model Context Protocol server, making them accessible from Claude, ChatGPT, LangChain, and the Microsoft Agent Framework without framework-specific integrations.
New security capabilities in preview include cross-tenant customer-managed keys using federated identity credentials, Purview sensitivity-label auditing, and incremental SharePoint permissions sync, keeping enterprise data governance intact as content flows into agent workflows.

1:10:26 Generally Available: Azure Database for PostgreSQL – Flexible Server: DuckDB extension

Azure Database for PostgreSQL Flexible Server now supports the DuckDB extension in general availability, allowing users to run analytical workloads directly within their PostgreSQL environment without moving data to a separate system.
DuckDB is an in-process analytical database engine optimized for OLAP queries, so this extension lets PostgreSQL users run fast column-oriented analytics alongside their transactional workloads in the same managed service.
This is particularly useful for data engineers and developers who want to avoid the overhead of spinning up separate analytics infrastructure, since DuckDB can query large datasets efficiently using familiar SQL syntax.
The feature falls under the Databases and Hybrid plus multicloud categories, suggesting Microsoft sees this as relevant for customers running mixed workloads or integrating PostgreSQL with other data sources across environments.
Pricing for this extension was not specified in the announcement, so customers should check Azure Database for PostgreSQL Flexible Server pricing pages directly, as costs will likely depend on existing compute and storage tiers rather than a separate charge for the extension itself.

1:10:50 Justin – “I remember when there were companies that made nothing but columnar databases. Now you just get it as an extension on top of PostgreSQL. Kind of impressive. I bet those companies aren’t doing well these days.”

51:03 Global PTU Reservations Are Now Region-Agnostic

Azure’s Global PTU (Provisioned Throughput Unit) reservations are now region-agnostic as of June 2026, meaning a single reservation can cover AI model deployments across multiple regions instead of requiring separate per-region commitments.
The practical benefit here is reduced stranded capacity. Previously, if you over-provisioned in one region and under-utilized in another, you were paying for unused reservations. Now a single pool covers wherever your workload actually runs.
This is specifically tied to Microsoft Foundry (formerly Azure OpenAI Service infrastructure), so it targets customers running high-throughput AI inference workloads who need predictable performance and cost at scale.
From a cost management standpoint, consolidating reservations simplifies billing and procurement, which matters for enterprises managing AI spend across multiple geographic deployments. Specific pricing still depends on model type and throughput tier, so customers should check the Azure pricing calculator for their specific use case.
The flexibility to deploy where capacity is available without reservation constraints is a practical operational improvement, particularly useful during regional capacity crunches that have been a known pain point for provisioned throughput customers.

1:12:02 Justin – “Good! Glad you learned what the word ‘global’ means.”

1:15:30 Generally Available: Azure API Management Premium v2 and Standard v2 now support wildcard custom hostnames

Azure API Management Premium v2 and Standard v2 now support wildcard custom hostnames, meaning a single entry like *.api.contoso.com and one wildcard certificate can cover all subdomains automatically instead of requiring separate configuration per subdomain.
The practical benefit is reduced operational overhead at scale. A team onboarding ten new API surfaces previously needed ten separate domain and certificate management tasks, and wildcard support eliminates that repetitive work.
This capability is now available on both Standard v2 and Premium v2 tiers, which means organizations do not need to move to higher-tier deployments just to get flexible domain management. Pricing details are not specified in the announcement, so listeners should check the Azure API Management pricing page for tier comparisons.
Target use cases include rapidly growing API environments with dynamic subdomains, such as microservices architectures or multi-tenant platforms, where new API surfaces are frequently added, and consistent branded endpoints matter.
The feature reached general availability in June 2026 and was announced at Microsoft Build. Teams currently managing large API estates with manual per-subdomain hostname configurations would benefit most from evaluating this update.

Emerging Clouds

1:22:25 Full Stack Observability for AI | CoreWeave Solution Brief

CoreWeave Mission Control is an AI-native observability platform that provides end-to-end visibility across infrastructure, clusters, and workloads, addressing a gap that general-purpose monitoring tools often miss in GPU-heavy environments.
The platform combines real-time telemetry with GPU utilization analytics, which is particularly relevant as organizations struggle to justify and optimize the cost of large-scale GPU deployments.
Audit-ready logging and automated operational insights suggest the platform is targeting enterprise customers who need compliance documentation alongside performance monitoring, not just raw metrics.
The full-stack framing here is notable because AI workloads span multiple layers simultaneously, from bare metal GPU performance up through cluster orchestration and individual job execution, making siloed monitoring tools less effective.
For teams running inference or training at scale on CoreWeave, tighter observability tooling built into the platform could reduce the engineering overhead of stitching together third-party solutions like Prometheus, Grafana, and custom GPU exporters.

Closing

357: Cache Me If You Can - Now With Durability

Wed, 10 Jun 2026 16:55:36 +0000

Welcome to episode 357 of The Cloud Pod, where the weather is always cloudy! Justin and Matt are in the studio this week to bring you all the latest in cloud and AI news! Is AI costing more than the people it replaced? Are CEO’s suffering from AI psychosis? Is Opus 4.8 better than 4.7? We answer all of these questions and more this week – so let’s get started!

Titles we almost went with this week

Valkey Stops Forgetting Your Data Like Your Ex
AI Coding Tools Cost More Than the Coders They Replace
Microsoft Discovers AI Budgets Burn Faster Than Enthusiasm
Executives Caught Hallucinating About AI Productivity Gains
ABBA Said ” Dancing Queen”, but Google Said Data Center
AI Now Tells Your AWS Apps How Fragile They Really Are
Stop Playing VM Whack-a-Mole With Maintenance Windows
Chaos Engineering for Apps Too Scared to Change
AWS Rewires the Data Center With One Weird Optical Trick
IAM the One Spending All Your Bedrock Money
SQL Server Licenses Finally Pack Their Own Bags
When AI Hype Meets Productivity Research, It Hurts
CEOs Gone Wild: Demos Versus Deployment Reality
Serverless Search Finally Learned to Nap Between Requests
ElastiCache Finally Remembers Things After a Reboot
Valkey Gets Durable So Your Data Stops Ghosting You
Zero Data Loss Without Losing Your Microseconds Too
Microsoft Build 2026 Scout AI and Quantum Dreams

A big thanks to this week’s sponsors:

Check out thecloudpod.net/archera to schedule a demo today.

General News

01:45 Microsoft data suggests using AI is more expensive than hiring people:

Microsoft canceled most internal Claude Code licenses just months after encouraging widespread adoption, redirecting employees to GitHub Copilot CLI instead.
This does not affect the broader Foundry partnership with Anthropic, but it signals that token costs at scale have become difficult to justify internally.
Uber’s situation adds context here: the company reportedly burned through its entire 2026 AI coding tools budget in four months after internal teams were incentivized to compete on usage. This illustrates how adoption incentives can create runaway costs that outpace projected savings.
The core economic tension worth discussing is whether AI tooling costs at scale can undercut the labor-savings argument.
When compute bills approach or exceed payroll savings, the ROI case for broad AI deployment gets more complicated for finance and engineering leaders to defend.
Companies appear to be responding with tighter governance rather than full rollbacks, including usage caps, narrower approvals, and more targeted deployments focused on measurable productivity gains. This suggests the industry is moving toward a more selective model of AI access rather than open adoption.
There is also an infrastructure cost layer beyond software licensing, as AI workloads drive substantial data center energy and water consumption. For cloud users, this has downstream implications for pricing on enterprise tools and digital services as providers absorb those operational costs.

03:09 Justin – “This is going to be interesting to see what happens in the FinOps space, as we start getting more maturity in that area, and we start seeing open models become a bigger deal and customers looking at different options beyond the foundational models.”

06:13 Tech CEOs are apparently suffering from AI psychosis

Box CEO Aaron Levie coined the term “AI psychosis” to describe how executives overestimate AI capabilities because they interact with polished demos and prototypes rather than the messy last-mile work required to actually deploy and maintain AI systems in production.
The layoff data is worth noting: 115,430 tech workers have been cut in just the first five months of 2026, nearly matching all of 2025, with many companies citing AI productivity gains as justification even when other business factors are driving the decisions.
The research does not support the productivity assumptions behind these decisions. A UC Berkeley meta-analysis found no robust relationship between AI adoption and aggregate productivity gain, and MIT researchers project agents will reach base competence on most text tasks by 2029 and will need additional years to outperform humans.
A Harvard Business Review study identified a practical bottleneck problem: when AI increases output volume across an organization, the constraint shifts to the executives who must review and authorize that output, which can create organizational slowdowns rather than efficiency gains.
For cloud practitioners and developers, the practical takeaway is that AI agents still require substantial human review for code, contract terms, and hallucinated library calls, meaning infrastructure and workflow designs should account for human-in-the-loop requirements rather than assuming full automation.

08:30 Matt – “You still need a human in the loop on a lot of things.”

AI Is Going Great – or How ML Makes Money

09:40 Introducing Always-On pricing: automatic savings for Databricks Lakebase

Databricks is introducing Always-On pricing for Lakebase, its managed Postgres offering, which gives a 25% discount on baseline compute capacity while retaining full autoscaling for traffic spikes, eliminating the traditional forced choice between provisioned and serverless database tiers.
The pricing model activates automatically after 24 hours of continuous use with scale-to-zero disabled, requiring no new contracts, no downtime, and no separate product provisioning, just a minimum compute unit configuration change.
Databricks recommends keeping scale-to-zero as the default for new or intermittent workloads where load patterns are unknown, and switching to Always-On only once historical usage data shows a consistent baseline floor of activity.
Through January 31, 2027, an additional 50% promotional discount stacks on top of the Always-On rate, which could meaningfully reduce costs for production Postgres workloads running continuously on Lakebase.
This commercial model change builds on existing Lakebase technical differentiators like storage-compute separation and instant branching, and is positioned as a cost management option for teams running established production workloads rather than a new infrastructure architecture.

11:33 Justin – “…the reality is that a lot of the things that you pay for are the automation and deploying the server and updating the things. And so if you’re not running those code paths, and you’re not running those architectures, then the company’s also saving money, which is why they can turn some of that savings over to you.”

12:00 Introducing Claude Opus 4.8

Anthropic released Claude Opus 4.8, an incremental upgrade over Opus 4.7 with improvements in agentic task performance, tool calling efficiency, and honesty.
Pricing remains unchanged at $5 per million input tokens and $25 per million output tokens, while fast mode is now three times cheaper than previous models at 2.5x the speed.
A notable reliability improvement is that Opus 4.8 is approximately four times less likely than Opus 4.7 to let code flaws pass unremarked, and early testers report it proactively flags uncertainties rather than making unsupported claims.
On the Super-Agent benchmark, it completed every case end-to-end and scored 84% on Online-Mind2Web for browser-agent tasks.
Dynamic Workflows is a new research preview feature in Claude Code for Enterprise, Team, and Max plans that lets the model plan and run hundreds of parallel subagents in a single session. This enables codebase-scale migrations across hundreds of thousands of lines of code from start to merge, which is a practical capability for large engineering teams.
The Messages API now accepts system entries inside the messages array, letting developers update Claude’s instructions mid-task without breaking the prompt cache. This is useful for agentic workflows where permissions, token budgets, or environment context need to change as a task runs.
Anthropic previewed a higher-capability model class called Mythos, currently limited to cybersecurity use cases under Project Glasswing, with broader availability expected in the coming weeks pending additional safety safeguards.

12:40 Introducing dynamic workflows

Anthropic launched dynamic workflows in Claude Code as a research preview, available in the CLI, Desktop, and VS Code extension for Max, Team, and Enterprise plans, as well as via the Claude API on Amazon Bedrock, Vertex AI, and Microsoft Foundry.
The core capability lets Claude dynamically write orchestration scripts that spin up tens to hundreds of parallel subagents in a single session, with independent verification agents checking results before they surface to the user, making it suited for large-scale tasks like codebase-wide security audits or multi-thousand-file migrations.
A concrete example is the Bun runtime rewrite from Zig to Rust, where dynamic workflows produced roughly 750,000 lines of Rust with 99.8% of the existing test suite passing in eleven days, with hundreds of agents working on files in parallel and two reviewers per file.
Token consumption is a meaningful consideration here, as dynamic workflows use substantially more tokens than a standard Claude Code session, and Anthropic recommends starting with scoped tasks to understand usage before scaling up.
For Enterprise plan users, dynamic workflows are off by default and require admin enablement, while Max and Team plan users have them on by default and can trigger them by asking Claude directly or enabling the Ultracode setting via the effort menu.

13:38 Matt – “I feel like every week, Anthropic is just on a roll. I switched over to it; I feel like I saw some improvement, but not that much. You read the internet, and everyone was complaining about 4.7 and 4.6, but I would kind of like to see them update some of the older models, too – Sonnet and Haiku.”

16:40 Anthropic raises $65B in Series H funding at $965B post-money valuation

Anthropic closed a $65 billion Series H round at a $965 billion post-money valuation, with run-rate revenue crossing $4.7 billion earlier this month, reflecting substantial enterprise adoption of Claude across global organizations.
The funding includes $15 billion from hyperscalers, with Amazon contributing $5 billion, and Anthropic has signed compute agreements totaling up to 10 gigawatts of capacity across Amazon, Google/Broadcom TPU infrastructure, and SpaceX Colossus GPU clusters.
Claude is now available on all three major cloud platforms (AWS, Google Cloud, and Microsoft Azure), with AWS remaining the primary cloud and training partner, giving enterprise customers flexibility in how they deploy and integrate the models.
Strategic hardware partnerships with Micron, Samsung, and SK hynix signal that Anthropic is securing memory and storage supply chain relationships directly, addressing compute scaling constraints at the infrastructure level rather than relying solely on cloud providers.
Claude Opus 4.8 was also announced alongside this funding news, targeting stronger performance in coding, agentic tasks, and long-running professional workflows, which aligns with the enterprise deployment focus described throughout the funding announcement.

Cloud Tools

19:27 Announcing no-code application fault injection

Gremlin’s Failure Flags by proxy lets teams run fault injection tests on serverless applications by routing traffic through a sidecar container, requiring zero code changes to the application itself. This addresses a longstanding gap where serverless platforms lack the infrastructure-level access needed for traditional reliability testing.
The proxy approach supports common failure scenarios like dropping availability zone-specific traffic, injecting latency, and generating exceptions to test error-handling logic.
It works across Kubernetes, AWS Lambda, AWS ECS, and Pivotal Cloud Foundry.
Intelligent Health Checks automatically establish baseline metrics for network throughput, latency, and error rate, then halt tests if any metric exceeds its threshold during a test run. This removes the need to configure separate observability integrations or set up API keys.
The practical value here is that teams can validate failure modes in serverless environments that were previously difficult or impossible to test, such as bad API responses, corrupted payloads, and message ordering issues. This gives engineering teams documented evidence of resilience rather than relying on assumptions.
The no-code deployment model lowers the barrier for teams that want chaos engineering coverage without modifying application code or waiting on development cycles to instrument SDKs.

20:29 Justin – “This is a nice enhancement to the Gremlin platform if you are trying to do chaos engineering, although AI does a pretty good job at causing chaos engineering too.”

AWS

22:03 Introducing the next generation of Amazon OpenSearch Serverless for building your agentic AI applications

AWS announced the next generation of Amazon OpenSearch Serverless, which scales from zero to thousands of requests per second and back to zero when idle, offering up to 60% cost savings compared to provisioned OpenSearch Service clusters sized for peak capacity.
The new generation provisions resources in seconds and scales capacity up to 20 times faster than the previous generation, supporting full-text search and vector search collection types with an Express create option that requires no manual configuration.
Native integrations with Vercel and Kiro allow developers to deploy search and vector backends for AI agents directly from those platforms, and the OpenSearch Agent Skills repository provides pre-built domain knowledge and multi-step execution logic for common agent workflows.
Pricing is consumption-based using OpenSearch Compute Units for indexing, search, and GPU acceleration, with storage billed separately per GB-month. The classic OpenSearch Serverless infrastructure remains available for existing users who prefer it.
The next generation is generally available today across all AWS commercial regions where OpenSearch Serverless is currently supported, making it accessible without any regional rollout delays for most customers.

23:42 Matt – “I feel like the ability of these systems to actually scale down to zero was minimal, even Aurora never truly scaled down to zero…So this type of scale up the capability, especially with it going faster, is just going to be really good – potentially for production workloads too.”

24:14 AWS Shield Advanced introduces DDoS attack flow logs

AWS Shield Advanced now provides packet-level DDoS attack flow logs, capturing source and destination IPs, ports, protocols, packet counts, and source country data during active attacks, published to S3, CloudWatch Logs, or Data Firehose at 5-minute intervals.
This fills a notable visibility gap for security teams, enabling post-incident forensic analysis and threat intelligence gathering that was previously difficult without third-party tooling or manual packet capture setups.
The logs integrate naturally with existing AWS analytics tools and workflows, meaning teams can pipe data into Athena, OpenSearch, or third-party SIEMs without significant new infrastructure investment.
A practical consideration: flow logs are only generated during active attacks and require Shield Advanced to already be configured on protected resources, so this is an add-on capability rather than a standalone offering. Shield Advanced pricing starts at $3,000 per month per organization.
Compliance teams benefit directly here, as the structured log data provides an auditable record of DDoS events, which is useful for regulatory reporting in industries like finance and healthcare.

24:29 Justin – “Thank you? You only took a hundred years to get us this quality of life improvement.”

25:06 Amazon Thinks the Future of Data Centers Depends on a Technical Problem It Just Solved

AWS has been deploying a new networking architecture called RNG (Resilient Network Graphs) in data centers since late 2024, starting in Dublin and expanding to Germany and Spain, with most newly built data centers now using the design.
RNG uses a quasi-random flat network topology that eliminates the traditional fat-tree hierarchy of switches and routers, addressing longstanding inefficiencies in data center cabling and routing that have persisted since the mid-1980s.
The reported performance numbers are notable: 69 percent fewer routers and switches, 33 percent higher data throughput, 40 percent reduction in network power consumption, and 27 percent lower operating costs compared to traditional network designs.
A key hardware component is the ShuffleBox, a new optical device Amazon developed internally that physically organizes and shuffles cable connections between routers, replacing the tangled cable bundles typical of fat-tree setups with a more structured physical layout.
Notably, Amazon says RNG is not optimized for AI training workloads, which require more coordinated and centrally orchestrated data patterns, so this is primarily an efficiency improvement for general cloud infrastructure rather than a direct response to AI compute demand.

26:26 Justin – “You know, in a situation where every customer has their own VPCs, you know, the reality is the network has to constantly morph and evolve, and so, it’s good to see they’ve done this. And I’m glad to see that this is solving a big problem for them – fully powered by the fact that they have ASICs that can custom do this work.”

27:55 Amazon RDS for SQL Server supports Bring Your Own Media

Amazon RDS for SQL Server now supports Bring Your Own Media (BYOM), allowing customers to reuse existing Microsoft SQL Server licenses, including Software Assurance, through Microsoft’s License Mobility program when migrating to RDS.
This feature directly addresses a common migration blocker where organizations were either paying for duplicate licenses or waiting for existing agreements to expire before moving to a managed database service.
BYOM integrates with AWS License Manager, giving customers a centralized way to track SQL Server license usage across their AWS environment and maintain licensing compliance.
The feature targets customers running SQL Server on-premises, on other clouds, or as self-managed instances on EC2 who want the operational benefits of RDS, such as automated backups, high availability, and monitoring without additional licensing costs.
Pricing for BYOM differs from standard RDS SQL Server pricing, so customers should review the Amazon RDS for SQL Server pricing page at aws.amazon.com/rds/sqlserver/pricing for regional availability and specific cost details before planning a migration.

28:03 Justin – “You could always bring your own media to install SQL Server. This is really about bringing your own licensing.”

30:21 Amazon ElastiCache for Valkey now supports durability

ElastiCache for Valkey now supports durability via a Multi-AZ transactional log, allowing it to serve workloads where data loss is unacceptable, not just traditional caching scenarios.
Two write modes are available: synchronous writes guarantee zero data loss at single-digit millisecond write latency, while asynchronous writes maintain microsecond latency with a potential loss window of up to 10 seconds, and come at no additional cost.
Both options preserve microsecond read latency, meaning customers do not have to trade read performance for durability, which is a meaningful distinction compared to traditional durable databases.
AWS specifically calls out AI-oriented use cases like agent long-term memory, RAG knowledge bases, and workflow state management, positioning ElastiCache as a viable primary store for latency-sensitive AI applications rather than just a cache layer.
The feature is available now in all commercial, China, and GovCloud regions, starting with Valkey 9.0, and can be enabled at cluster creation via Console, SDK, or CLI.
Pricing details are on the ElastiCache pricing page since synchronous durability costs differ from the free asynchronous option.

31:12 Matt – “I understand the use cases, but I still say you’re using cache wrong.”

32:11 AWS Cost and Usage Report 2.0 now supports Athena and Redshift integration

CUR 2.0 now matches CUR 1.0’s Athena and Redshift integration capabilities, closing a feature gap that had been a barrier for customers considering migration to the newer report format.
When selecting Athena or Redshift integration, exports are automatically delivered in the optimal format, either Parquet or GZIP, along with infrastructure templates, table definitions, and data loading instructions, removing the need for manual configuration or custom ETL pipelines.
Cost data refreshes in CUR 2.0 are automatically reflected in Athena and Redshift tables, meaning customers can query up-to-date billing data using standard SQL without building or maintaining additional data pipeline infrastructure.
Pricing for this feature follows existing costs for the underlying services: S3 storage for the exports, Athena query costs at $5 per TB scanned, and Redshift cluster or Serverless costs depending on the query engine chosen.
This feature is available across all commercial AWS regions but excludes GovCloud US and China regions, which is worth noting for customers operating in those environments who may still need to rely on CUR 1.0 or custom solutions.

33:17 Justin – “I built this pipeline a couple of times, and use the new the newer format because it’s much better. But this is actually even better because they’ve kind of automated some of the other sharp edges of dealing with the current, like the pricing lists and that stuff, and the automatic index updates for Athena. So this is a nice quality of life improvement.”

GCP

34:20 Introducing Google AI Threat Defense to help you outpace the adversary

Google AI Threat Defense is a new automated security system that combines Wiz for exposure mapping, CodeMender for code remediation, Gemini for AI reasoning, and Mandiant for threat intelligence into a single vulnerability management workflow.
The goal is to shrink remediation time from weeks to minutes by automating the scan, prioritize, remediate, and monitor cycle.
The multi-model approach is a notable technical detail here: Google explicitly acknowledges no single AI model catches all vulnerability types, so the platform uses multiple frontier models via the Gemini Enterprise Agent Platform to cover application logic, cloud configuration, binary analysis, and exploitability validation across different asset types.
CodeMender is the code remediation agent at the center of the fix workflow, generating patches directly in a developer’s IDE or CLI, rewriting code to memory-safe languages, and automatically generating tests to verify fixes before deployment. It integrates with Wiz and a tool called Antigravity to coordinate library dependency changes across source control and production environments.
Wiz’s context-aware pen-testing agent continuously simulates attacks to validate exploitable paths, including application-layer and identity-driven risks, which distinguishes this from traditional attack surface management tools that only identify what is exposed without confirming actual exploitability.
Pricing details are not publicly disclosed in the announcement. Ecosystem partners, including Accenture, Deloitte, PwC, Netenrich, and TENEX.AI will handle deployment, ongoing management, and custom workflow integration for enterprise customers.

35:41 Matt – “Here’s my wallet, just set it on fire.”

37:44 Vibe-coded AI Studio apps with Firestore, Firebase, Cloud SQL

Google AI Studio now supports full-stack app deployment to Cloud Run with either Firestore for document storage or Cloud SQL for PostgreSQL as a relational option, with the AI agent automatically selecting the appropriate database based on your prompt. This removes a common decision point for developers prototyping new applications.
New users can deploy up to two full-stack applications through the Google Cloud Starter Tier at no cost and without a billing account, lowering the barrier for developers who want to test production-grade infrastructure before committing to a paid plan.
Cloud SQL integration uses a new PostgreSQL developer edition that scales to zero when not in use, meaning you only pay during active usage. Cloud SQL support on the Starter Tier is noted as coming next month, so it is not fully available at announcement time.
Firebase Auth serves as the single login layer across the stack and enables Google Workspace integrations, including Sheets, Calendar, and Gmail, through a standard Sign in with Google flow.
The agent handles provisioning authentication, Firestore security rules, and database connections automatically, though Google notes users should review security rules before sharing apps.
When a project outgrows the Starter Tier limits, resources transfer directly to a standard billable Google Cloud project without requiring a rebuild, providing a straightforward path from prototype to production at aistudio.google.com.

40:22 Justin – “I think this is an answer to Vercel, right? A lot of developers right now are doing POCs and building apps on top of Vercel because of how easy it is, and how much they don’t have to do. And so I feel like this is a direct response to that, in many ways.”

40:39 Nano Banana 2 and Nano Banana Pro available for everyone

Imagen 4 (marketed as Nano Banana 2) and Imagen 4 Pro (Nano Banana Pro) are now generally available on Vertex AI via the Gemini API, with enterprise SLA support for production deployments.
Both models support 1K and 2K image output at GA, with 4K output still in preview.
A notable new preview capability allows Nano Banana 2 to accept video files as input, enabling the model to analyze visual context and actions within footage to generate context-aware images like thumbnails and infographics. This extends the model beyond text, PDF, and image inputs.
Real-world adoption is already visible across retail and media, with Shopify using the models for product photography expansion, URBN compressing its trend-to-market pipeline, and WPP integrating them into its agentic marketing platform for clients like Verizon and Unilever.
Pricing is usage-based and varies depending on output resolution and throughput tier, with provisioned throughput options available for enterprise-scale deployments at cloud.google.com/gemini-enterprise-agent-platform/models/provisioned-throughput. Developers wanting to experiment without an enterprise SLA can access both models through the standard Gemini API.
The broader pattern here is Google positioning these image models as components within agentic creative workflows rather than standalone tools, which aligns with the Vertex AI platform strategy of bundling multimodal capabilities into end-to-end pipelines.

43:20 AlloyDB Hot Standby: Faster Failovers & Consistent Performance

AlloyDB for PostgreSQL now offers Hot Standby HA, where the standby node continuously applies write-ahead logs from the primary instead of sitting idle, eliminating the database startup phase during failover and reducing downtime to approximately 15 seconds in testing.
The key practical benefit beyond faster failover is post-failover performance consistency.
Because the standby node keeps its buffer cache warm by actively replaying logs, the new primary serves requests at normal throughput almost immediately, rather than degrading for several minutes while caches rebuild from disk.
Hot Standby is rolling out automatically to newly created AlloyDB instances running PostgreSQL 18, with earlier major versions to follow in the coming months. Google states this enhancement comes at no additional cost and remains covered under the existing 99.99% SLA.
Enterprises with low-tolerance workloads, such as financial services, e-commerce, or any application where post-failover performance degradation causes downstream problems, stand to benefit most, since the legacy HA model could leave systems running at reduced throughput for several minutes after a failover event.
For teams evaluating managed PostgreSQL options, this change narrows the gap between managed database services and self-managed setups, where hot standby configurations have long been standard practice.
More details are available at cloud.google.com/alloydb/docs/high-availability.

43:48 Justin – “This is nice if you need high performance from AlloyDB.”

45:21 AlloyDB Remote MCP Server GA: Secure AI Agent Access to Your Data

The Remote MCP Server for AlloyDB is now generally available, giving AI agents a managed HTTP endpoint to securely query operational database data without the infrastructure overhead of local MCP server deployments.
This is part of Google’s broader rollout of 50+ managed MCP servers across its cloud services.
On the security side, the integration uses IAM for fine-grained access control down to specific tables or views, includes Model Armor for prompt injection and data exfiltration protection, and routes all queries through Cloud Audit Logs.
This addresses a real concern for teams connecting AI agents to sensitive production databases.
AlloyDB’s vector capabilities are a notable part of the pitch here, with support for over 10 billion vectors, up to 6x faster vector queries than standard PostgreSQL via the ScaNN index, and built-in AI functions for generating embeddings and reranking results. These features make it a practical backend for RAG-style agentic applications.
The Lakehouse Federation capability lets agents query AlloyDB operational data alongside BigQuery analytical data and Iceberg tables through a single PostgreSQL interface, reducing the need to move or duplicate data across systems.
Pricing is not detailed in the announcement, but AlloyDB offers a 30-day free trial cluster, and the MCP server runs on existing AlloyDB infrastructure.
A hands-on Codelab is available at codelabs.developers.google.com/alloydb-ai-mcp for teams wanting to evaluate the setup.

45:40 Justin – “…managed MCPs are definitely all the rage right now. All the cloud providers are dropping them. I like this one, though, that’s kind of interesting. I don’t know that this should be your primary interface to a DB performance scale, but if you need an interface for an admin or for a user to do more ad hoc querying, this is way better than giving them a select star against the database.”

48:15 GKE standby buffers speed up autoscaling for less spend

GKE standby buffers address the longstanding tradeoff between overprovisioning costs and slow cold starts by suspending pre-initialized nodes to disk, releasing compute and memory costs while retaining only persistent disk and IP address charges.
This results in cost overhead in the low single-digit percent range compared to full overprovisioning.
Standby buffers resume 2-3x faster than provisioning fresh nodes, and when combined with active buffers, the two work in sequence: active buffers handle the immediate spike while standby nodes resume to cover sustained load. Benchmarks showed P50 latency dropping from 4-6 minutes to single-digit seconds under identical traffic conditions.
The feature replaces operationally complex workarounds like balloon pods and lowered HPA thresholds with a declarative CapacityBuffers API, where you simply define how much headroom you need, and GKE manages the rest.
Early customer results from Unico showed time-to-ready dropping from several minutes to 30 seconds.
Practical use cases include agentic workloads, CI/CD pipelines, batch jobs, game servers, and any spiky traffic pattern where scheduling latency matters. GKE benchmarks showed sub-second Agent Sandbox scheduling latency at up to 90% lower cost compared to complete overprovisioning.
Standby buffers are available for GKE clusters running version 1.36.0-gke.2253000 or later, and Google has published an open-source buffer sizing simulator at github.com/gke-labs/buffers-simulator to help teams tune buffer sizes for their specific performance targets.

49:45 Blue, yellow and green: Google invests in its first data center in Sweden.

Google broke ground on its first data center in Horndal, Sweden, expanding GCP infrastructure into the Nordic region to support growing demand for Search, Google Cloud, and YouTube services.
The facility uses air cooling instead of water cooling, which reduces water consumption compared to traditional data center designs, and includes off-site heat recovery to supply warmth to nearby homes and businesses.
For GCP customers in Northern Europe, this expansion means lower latency and improved regional availability for cloud workloads, particularly relevant for Swedish and broader Nordic businesses running latency-sensitive applications.
Google has supported over 700 megawatts of renewable energy additions to the Swedish grid since 2013, and this new facility continues that sustainability focus, which matters for enterprises with carbon reporting requirements.
A EUR 5 million community fund targeting education, sustainability, and workforce development accompanies the investment, signaling a longer-term regional commitment beyond just infrastructure capacity.

50:24 Matt – “ I really like the fact that they’re using the air to cool it down, but then not just venting it out the other side, but venting it to homes and kind of getting that double whammy.”

Azure

51:03 Generally Available: Application Gateway for Containers – Service Mesh integration with Istio

Application Gateway for Containers now has generally available integration with Istio service meshes, automating mutual TLS connectivity between the gateway and mesh-enabled services to simplify secure north-south traffic management in Kubernetes environments.
The integration supports both upstream open-source Istio and the managed Istio add-on for AKS, giving teams flexibility to choose their preferred deployment model without changing their ingress configuration approach.
A notable operational benefit is the single ingress path for routing traffic to services both inside and outside the mesh, which reduces the need for repetitive mTLS definitions and separate gateway configurations.
Certificate lifecycle management is handled automatically, including trust establishment and rotation, which removes a common manual overhead for teams running secure service mesh workloads.
This feature is relevant for platform and infrastructure teams running AKS with service mesh architectures who want a managed ingress solution that integrates natively with their security model. Pricing follows existing Application Gateway for Containers rates, so teams should review the Azure pricing page for current cost details.

51:40 Matt – “If I remember correctly from the beta of it, the biggest annoyance of this is that you can’t go from a current app gateway to an app gateway for containers. They’re different resources inside of Azure, so the fun part about this is you actually need to move and relaunch. And even though you *tell* customers to never whitelist your IP address on your app gateway, there’s definitely always one customer out there that does and then opens a SEV1 ticket. So great feature. Kinda wish the application gateway was all under one bigger umbrella, but I have other issues with the application gateway.”

52:24 Microsoft Build 2026: The 7 biggest announcements

Microsoft announced Scout, an always-on assistant built on OpenClaw that integrates with Microsoft 365 apps, including Outlook, OneDrive, and Teams. It handles background tasks like calendar management and expense reporting, and is currently in desktop preview for Frontier customers in the US with broader availability planned.
Microsoft revealed seven new AI models under its MAI lineup, including MAI-Thinking-1, its first reasoning model featuring 35 billion active parameters and a 128K context window. The model targets complex multi-step instructions, long-context reasoning, and code generation, signaling a continued push toward in-house model development rather than reliance on OpenAI.
Microsoft Execution Containers (MXC) introduce a sandboxed security layer for AI agents running on Windows via OpenClaw, giving developers defined guardrails over what agents can access on a device. A companion app lets users configure their own agents or connect to existing ones within this controlled environment.
The Surface RTX Spark Dev Box targets developers running local AI models, featuring Nvidia’s Arm-based Spark RTX chip and 128GB of unified memory with Visual Studio Code and GitHub Copilot preinstalled. Pricing and full specs have not been disclosed, with US availability expected later this year.
Microsoft’s Majorana 2 quantum chip delivers qubits rated at 1,000 times greater accuracy than its predecessor, using a new material stack with lead-based compounds. Microsoft projects it could achieve a practical quantum computer by 2029 based on this progress.

Con’t Microsoft launches Scout, an OpenClaw-inspired personal assistant

Microsoft launched Scout at Build 2026, an always-on agentic AI assistant built on the OpenClaw framework that integrates directly with Microsoft 365, allowing users to automate tasks across email, calendar, and other productivity tools with a persistent, personalized identity.
Scout operates across cloud, desktop, and web browser, and comes with prepackaged skills for calendar management and meeting agenda drafting, though the intended long-term value is in user-defined custom skills that the assistant learns and refines over time.
Access requires both enrollment in Microsoft’s Frontier early adopter program and an active GitHub Copilot subscription, so this is not a standalone product and adds cost on top of existing Copilot licensing.
Scout includes a built-in policy conformance system that continuously checks agent behavior against set guidelines and generates an audit trail for each check, directly addressing concerns raised by the OpenClaw incident, where an agent acted erratically inside a researcher’s inbox.
The customization loop where Scout adapts to individual user behavior is the core differentiator here, but it also raises practical questions for enterprise IT teams around governance, data access scope, and how user-defined agent skills interact with existing security policies.

53:57 Matt – “It’s nice to see them actually get a new model out there, because it’s been it’s been a while and getting something out there that people can use – and honestly them internally getting off of OpenAI, you know… I’m sure for a long time they were using OpenAI and paying somewhere for it. So if they can run it all under their own model, probably gonna be better off in the long run as a business.”

58:10 AI alone won’t change your business. The system running it will.

Microsoft is positioning its agent platform as a five-layer system covering build, contextualize, run, govern, and improve, integrating GitHub, Azure Foundry, Microsoft IQ, Agent 365, and the Microsoft Security stack into a single workflow rather than separate tools.
Microsoft IQ is a notable new component that grounds agents in enterprise data from Microsoft 365, business systems, and the web via Web IQ, with Frontier Tuning allowing organizations to post-train models on their own workflows and data while keeping that trained intelligence within their own environment.
Azure Foundry serves as the production runtime for agents, supporting frameworks beyond Microsoft’s own stack, including LangGraph, Claude Agent SDK, and custom harnesses, with Fireworks AI integration for optimized open model inference and a built-in model router to balance quality, speed, and cost.
Agent 365 addresses a practical enterprise concern by providing a centralized catalog of all deployed agents across an organization, giving IT visibility into who deployed each agent, what data it can access, how it behaves, and what it costs, with policy enforcement built in.
Pricing details are not specified in the announcement, so listeners evaluating this platform should expect to assess costs across multiple components, including Foundry compute, IQ data connectors, and Frontier Tuning separately as they scope out deployments.

Closing

356: Holy Labor Displacement, Batman! The Vatican Weighs In

Wed, 03 Jun 2026 23:04:52 +0000

Welcome to episode 356 of The Cloud Pod, where the weather is always cloudy! Justin and Ryan are in the studio this week and ready to bring you all the latest in cloud and AI news, including the Pope coming out against AI, AWS introducing a new local zone, and GitHub having yet another crappy week. There’s a lot of news, so let’s get started!

Titles we almost went with this week

Istanbul Not Constantinople, But Definitely an AWS Local Zone
218 Billion Parameters Walk Into a Single GPU
Postgres Walks Into a DynamoDB Bar
NSA Slides Into Anthropic’s DMs With 9 Billion Reasons
Spy Agencies Want Claude But Can They Afford the Terms
Pre-Shared Keys Were So Last Decade Azure
When the Church and Anthropic Agree on AI Ethics
Microsoft Finally Joins the Linux Party. It Crashed
Iran Wants Cable Fees, and That’s No Phishing
When the Church, the Spies, and Iran All Come for Big Tech
I was gonna record a podcast until I got a migraine

A big thanks to this week’s sponsors:

Check out thecloudpod.net/archera to schedule a demo today.

General News

03:31 Pope Leo, Anthropic Co-Founder Warn of AI Power Concentration, Labor Displacement

Pope Leo XIV published a 42,000-word treatise called Magnifica Humanitas, outlining the Catholic Church’s position on AI governance, with a focus on labor displacement, power concentration among private tech companies, and autonomous weapons systems.
Anthropic co-founder Chris Olah was invited to participate in the Vatican’s AI encyclical event, and publicly acknowledged that large-scale human labor displacement from AI is a real possibility, framing support for displaced workers as a moral obligation.
The document raises a structural concern relevant to cloud and AI businesses; that private transnational companies now hold more resources and influence over AI development than many governments, complicating regulatory oversight.
The treatise specifically calls out the working conditions of data labelers, content moderators, and rare earth mineral extractors as forms of exploitation embedded in the AI supply chain, which touches directly on how cloud AI services are built and maintained.
For cloud and AI businesses, this document signals growing institutional pressure from non-governmental bodies to factor employment protection and human dignity into product and infrastructure decisions, not just regulatory compliance.

04:41 Justin – “It’s not very often the pope weighs in on what you do for a day job.”

06:20 Iran demands Big Tech pay fees for undersea Internet cables in Strait of Hormuz

Iran has announced intentions to charge license fees to Meta, Google, Amazon, and Microsoft for undersea cables passing through the Strait of Hormuz, though the legal and practical enforcement mechanisms remain unclear, given that most routes pass through Oman-controlled waters.
Two major cables, FALCON and Gulf Bridge International, do pass through Iranian territorial waters at certain points, giving Iran some legitimate jurisdictional basis for the claims, while a third major cable, Asia Africa Europe-1, also runs through the region.
Iran’s state media has gone further than the initial announcement, proposing that Iran hold exclusive rights to repair and maintain subsea cables in the area, which would create significant operational dependencies for cloud providers relying on those links.
Over 99 percent of international internet traffic runs through undersea cables globally, and the Strait of Hormuz cables primarily serve Gulf region connectivity, meaning disruptions or fee disputes could affect cloud service reliability for customers across the Middle East.
Ongoing regional conflict has already halted cable projects and suspended repair operations in the area, and these latest assertions may accelerate planning for alternative routing that bypasses the strait entirely.

07:50 Ryan – “It makes me a little bit uneasy to think about, they can just cut the internet off.”

AI Is Going Great – or How ML Makes Money

08:40 Introducing Command A+ | Cohere

Cohere released Command A+ as open-source under Apache 2.0, a 218B parameter mixture-of-experts model with only 25B active parameters, available on Hugging Face in BF16, FP8, and W4A4 quantizations that can run on as few as two NVIDIA H100s or a single Blackwell GPU.
The MoE architecture delivers notable efficiency gains over the previous Command A Reasoning model, including up to 63% higher output tokens per second, 17% lower time to first token, and an additional 47% speed increase with W4A4 quantization, plus a 1.5-1.6x speedup from speculative decoding.
Benchmark improvements are substantial in agentic tasks, with tau-squared-Bench Telecom scores jumping from 37% to 85% and Terminal-Bench Hard agentic coding going from 3% to 25%, alongside 20% and 32% improvements in agentic question answering and spreadsheet analysis within Cohere’s North platform.
Multilingual support expanded from 23 to 48 languages, with tokenizer efficiency improvements of 20% for Arabic, 16% for Korean, and 18% for Japanese, addressing a common gap in enterprise deployments targeting non-European markets.
For cloud and enterprise developers, Command A+ is available today via Hugging Face, Cohere’s Model Vault managed inference service, and the Cohere API, positioning it as a self-hostable alternative for organizations with data sovereignty requirements.

10:13 Ryan – “We recently talked about the Llama models going away, and what does that do, so I’m happy to see other models – like we were hoping for – fill that gap in terms of having open source availability and things that you can run on your own hardware.”

10:54 White House, Anthropic Near Deal For Spy Agencies to Use AI

The White House is reportedly nearing a deal to allow NSA and other intelligence agencies to use Anthropic’s AI models for classified work, despite the Defense Department previously designating Anthropic as a “supply chain risk” earlier this year over concerns about mass surveillance and autonomous weapons use.
Anthropic’s Mythos model is central to this development, as it is specifically designed to identify software vulnerabilities, making it particularly relevant for national security and offensive/defensive cyber operations.
The proposed $9 billion purchase of Nvidia Blackwell chips by spy agencies signals a substantial infrastructure investment to run these AI workloads on-premise or in classified cloud environments, rather than relying on shared commercial infrastructure.
The tension between Anthropic’s acceptable use policies and government contract language raises a practical question for enterprise and public sector cloud customers about how AI vendor terms of service interact with sensitive or regulated workloads.
Anthropic is currently fighting the supply chain risk designation in court, meaning the legal and contractual landscape around government AI procurement remains unsettled, which has direct implications for other AI vendors pursuing federal contracts.

11:46 Justin – “Considering how little audit logging is in Anthropic products, I’m not sure that they should use it either.”

Security

12:50 GitHub faces a fight for its survival at Microsoft

GitHub has experienced multiple significant outages, a remote code execution vulnerability patched in under six hours, and a breach of 3,800 internal repositories via a malicious VS Code extension, all within recent weeks.
These incidents coincide with an ongoing migration of GitHub infrastructure to Azure servers, which involves complex MySQL cluster management.
Following CEO Thomas Dohmke’s resignation last summer, Microsoft chose not to appoint a replacement, instead folding GitHub under the CoreAI team led by Jay Parikh.
This structural change has contributed to a notable leadership exodus, including departures from the chief revenue officer, a senior VP who joined only months prior, and a 34-year Microsoft veteran.
GitHub Copilot, which held an early lead in AI coding tools, has lost ground to competitors like Cursor and Claude Code.
Microsoft reportedly considered acquiring Cursor to close this gap and has canceled internal Claude Code licenses to push developers toward improving Copilot instead.
GitHub is shifting Copilot to usage-based billing next month, replacing the current model where users are downgraded to less capable AI models after hitting limits.
Under the new system, users will be cut off entirely unless they purchase additional credits, which has generated developer pushback.
The combination of reliability issues and leadership instability has prompted some developers to migrate away from GitHub entirely, with high-profile projects like the Ghostty terminal publicly announcing their departure after 18 years on the platform.

13:37 GitHub confirms breach of 3,800 repos via malicious VSCode extension

A GitHub employee installing a trojanized version of the Nx Console VSCode extension led to the exfiltration of approximately 3,800 internal repositories, demonstrating that even major platform providers are vulnerable to supply chain attacks through developer tooling.
The attack has been linked to the TeamPCP group, which has a documented history of targeting developer platforms including PyPI, NPM, and Docker, and was also connected to a separate incident affecting OpenAI employees around the same time.
This incident highlights a persistent and underaddressed risk in the VSCode marketplace, where malicious extensions have repeatedly slipped through, including AI coding assistant extensions with 1.5 million installs that exfiltrated data to servers in China as recently as January.
For organizations using GitHub at scale, the breach raises questions about internal repository access controls and whether developer workstations should have the level of access needed to exfiltrate thousands of repos through a single compromised endpoint.
The practical takeaway for development teams is that extension vetting policies, least-privilege access for developer machines, and endpoint monitoring are not optional hygiene items, especially given that 90 percent of Fortune 100 companies rely on GitHub infrastructure.

14:06 Ryan – “The supply chain stuff is only going to get worse. It’s such a huge risk, there’s no checks, and there’s very little incentive for developers to do any kind of deep dive. They want the functionality of these plugins, and those plugins can install directly in the IDE, which has direct access to the file system…Pretty quick to see how this can call home to some attack.”

AWS

18:24 Introducing ExtendDB: An open source DynamoDB-compatible adapter with pluggable storage backends

AWS released ExtendDB as an open source Apache 2.0 project that implements the DynamoDB wire protocol with PostgreSQL as its first storage backend, allowing existing DynamoDB applications to run without code changes by simply swapping the endpoint URL.
The primary use cases are local development, CI/CD pipelines, and on-premises or air-gapped environments where the managed DynamoDB service is unavailable, with the airline industry example illustrating how gate and onboard systems need DynamoDB-compatible access patterns during network outages.
ExtendDB is written in Rust and compiles to a single binary with no external runtime dependencies, and its storage layer is defined as a Rust trait, so additional backends like Apache Cassandra can be added without modifying the core.
This is a v0.1 early-stage release, so listeners should treat it as a development and experimentation tool rather than a production replacement for DynamoDB, as performance characteristics and scaling behavior differ significantly from the managed service.
Since ExtendDB uses PostgreSQL as its backend, teams get familiar operational tooling like pg_dump, replication, and point-in-time recovery, but they also take on full responsibility for database availability and maintenance that DynamoDB normally handles automatically.

21:06 Justin – “The fact that it plugs in different back ends, I guess, gives you a data store, which makes it potentially long-term better. But yeah, it’ll be interesting to see what people think.”

21:25 Announcing the general availability of a new AWS Local Zone in Istanbul, Türkiye

AWS has launched a Local Zone in Istanbul (eu-central-1-ist-1a), extending AWS infrastructure into Türkiye to support low-latency workloads and local data residency requirements without needing a full AWS Region.
The zone supports a solid set of services, including EC2 with C7i, M7i, and R7i instances, EKS, ECS, S3 One Zone-IA, EBS with gp3 volumes, Direct Connect, and Application Load Balancer, covering most common production workload needs.
Data residency is a key driver here, as organizations in Türkiye operating under local data sovereignty regulations can now store and back up data within the country while still using standard AWS APIs and tooling.
Istanbul joins more than 30 metropolitan areas worldwide with Local Zone coverage, and enabling it is straightforward through the EC2 console Zones tab or the ModifyAvailabilityZoneGroup API.
Pricing follows the standard Local Zones model, which typically runs higher than parent Region pricing, so teams should review the AWS Local Zones pricing page before migrating latency-sensitive or compliance-driven workloads.

22:43 New agentic migration assessment capabilities now available with AWS Transform

AWS Transform now includes agentic migration assessment tools that let organizations build TCO business cases using existing data sources like RVTools exports, CMDB data, and third-party discovery tool outputs, reducing the upfront data collection burden.
The what-if scenario feature allows teams to compare migration paths with customizable assumptions around region selection, resource utilization, and service mapping, covering cost modeling for EC2, FSx, S3, SQL Server on EC2, and virtual desktops.
Beyond pure cost analysis, assessments can now incorporate Cloud Value Framework pillars, including staff productivity, operational resilience, business agility, and sustainability, giving organizations a broader justification framework for migration decisions.
The feature is available in all AWS regions where AWS Transform is currently supported, with no specific additional pricing mentioned, suggesting it is included within the existing AWS Transform service offering.
For organizations earlier in their migration planning, this lowers the barrier to producing a structured business case without needing complete infrastructure discovery data upfront, which is a common bottleneck in enterprise migration projects.

23:57 Ryan – “I’ve always wanted these things to be more and work better. Maybe the addition of the agentic and logic on there will make it be that extra thing, but I wish they were just a little bit… more. I just wish they worked, I guess. Like, I want the promise that they provide, and it never sort of pays off. You always have to do the hard work yourself.”

24:24 AWS Secrets Manager adds managed external secrets support for Datadog vended keys and Snowflake Programmatic Access Tokens

AWS Secrets Manager now supports automatic rotation for Datadog API keys, Application keys, and service account credential pairs, plus Snowflake Programmatic Access Tokens, reducing manual credential management overhead for teams using these popular data and observability platforms.
The Snowflake integration includes a configurable grace period during token rotation, which allows applications to continue using existing tokens while transitioning to new ones without service interruption.
These additions expand the managed external secrets ecosystem to six third-party integrations, joining BigID, Confluent Cloud, MongoDB Atlas, and Salesforce, giving teams a centralized rotation workflow across multiple SaaS tools.
For organizations already using Secrets Manager for AWS-native credential rotation, this reduces the need for custom Lambda rotation functions or third-party tools to handle Datadog and Snowflake credentials specifically.
Availability matches existing managed external secrets regional coverage, so teams should verify their specific regions are supported via the Secrets Manager documentation before planning adoption.

25:11 Justin – “I hope more vendors get this very quickly, and make this easy for vendors to onboard to, please. And then you can charge them the you know ridiculous price you charge for secrets.”

28:14 AgentWatch: Proactive AWS monitoring with ambient agents | Artificial Intelligence

AgentWatch is an open-source ambient monitoring agent built on Amazon Bedrock AgentCore Runtime that automatically checks CloudWatch metrics, logs, and alarms every 15 minutes across multiple AWS accounts and posts structured reports to Slack, reducing the need for manual dashboard reviews.
The solution introduces three human-in-the-loop patterns called notify, question, and review that determine when the agent acts autonomously versus when it pauses to ask for human input, which is particularly relevant for teams concerned about AI agents making unsanctioned infrastructure changes.
The technical stack combines EventBridge for scheduling, Lambda for orchestration, Amazon Cognito for OAuth 2.0 authentication, Amazon Bedrock Claude Sonnet for natural language summarization, and API Gateway for Slack slash command integration, making it a practical reference architecture for teams building their own agent-based tooling.
AgentWatch supports on-demand queries through Slack slash commands in addition to scheduled reports, allowing engineers to ask natural language questions about the current infrastructure state without switching to the AWS console or writing CloudWatch queries.
The project is available as a sample on GitHub here, and costs will primarily reflect Amazon Bedrock inference usage and standard AWS service charges for Lambda, EventBridge, and API Gateway, with no separate AgentWatch licensing fee.

GCP

30:25 Agent Executor, Google’s distributed Agent Runtime

Google released Agent Executor, an open-source runtime for managing long-running AI agent workflows, available now in preview at the GitHub repo. It addresses production reliability problems like agent crashes, disconnections, and state corruption that become serious issues when agents run for hours or days.
The runtime includes durable execution via event logs and snapshots, secure sandboxing for multi-tenant isolation, and a single-writer architecture to prevent session state conflicts. These are particularly relevant for enterprises running agents that generate code or handle sensitive user data.
A notable capability called trajectory branching lets developers checkpoint and fork an agent’s decision path to test different outcomes without losing prior context, which is useful for evaluation and debugging workflows.
Agent Executor pairs with Agent Substrate, a new open-source Kubernetes extension announced alongside it, designed to handle hundreds of millions of registered agents and sub-second tool calls that would overwhelm a standard Kubernetes control plane.
The runtime is designed to be harness-agnostic and supports LangChain, LangGraph, ADK, and the Agent2Agent protocol, meaning teams are not locked into Google’s own tooling. Pricing details are not yet published, given the preview status.

31:23 Ryan – “I’m just impressed someone’s got an agent that can run for days!”

33:28 Google Cloud suspended major customer Railway.com without cause, causing outage

Google Cloud suspended Railway.com, a platform-as-a-service provider, without apparent cause, resulting in a customer-facing outage that affected Railway’s downstream users and highlighted risks of automated account suspension systems.
This incident raises practical concerns for GCP customers about the lack of human review processes before account suspensions are triggered, particularly for infrastructure providers whose own customers depend on continuous uptime.
Cloud providers across AWS, Azure, and GCP have automated fraud and abuse detection systems that can suspend accounts with little warning, and incidents like this underscore the importance of customers maintaining multi-cloud or backup strategies for critical workloads.
For businesses evaluating GCP as a primary provider, this event underscores the value of negotiating enterprise support agreements that include dedicated account management and escalation paths before outages occur, rather than after.
https://blog.railway.com/p/incident-report-may-19-2026-gcp-account-outage

38:25 100 things we announced at Google I/O 2026

Google Antigravity 2.0 is the centerpiece developer announcement, positioning itself as an agent-first development platform with a standalone desktop app, CLI, and SDK.
Enterprises can connect Antigravity directly to Google Cloud projects under existing enterprise terms, with Gemini Enterprise customers getting access in the coming months.
Gemini 3.5 Flash is now generally available through Antigravity, the Gemini API, Google AI Studio, and Android Studio, with Google citing performance improvements on coding and agentic benchmarks like Terminal-Bench 2.1 at 76.2% and MCP Atlas at 83.6%. Google claims it delivers frontier-level intelligence at less than half the cost of comparable models, which is a notable claim for cost-conscious GCP customers.
Managed Agents in the Gemini API let developers provision a remote Linux environment with a single API call, handling reasoning, code execution, file management, and web browsing in an isolated sandbox. Developers can extend agent behavior using markdown files like AGENTS.md and SKILL.md rather than writing complex orchestration code.
Google AI Studio now supports building and publishing native Android apps directly to Google Play’s Internal Test Track, with the first two app deployments to Google Cloud offered at no cost and no credit card required. Workspace data, including Sheets, Drive, and Docs, is now directly accessible from apps built within AI Studio.
The new subscription tiers are worth noting for GCP customers evaluating costs: a new $100 per month AI Ultra plan targets developers and technical leads with 5x higher usage limits than the AI Pro plan and 20TB of cloud storage, while the existing AI Pro plan now bundles YouTube Premium Lite at no extra charge.

Con’t. The 13 biggest announcements at Google I/O 2026

Gemini Spark is a new, always-on AI agent running on Google Cloud virtual machines 24/7, connecting to Workspace apps like Docs, Gmail, Sheets, and Slides, as well as third-party apps like Canva.
This is a direct signal that Google is positioning Cloud infrastructure as the backbone for persistent agentic workloads, which has real implications for GCP compute and pricing conversations.
Gemini 3.5 Flash is now the default model across the Gemini app and AI Mode in Search, with Gemini 3.5 Pro following next month. Google highlights improved agentic task handling and coding capabilities, which matters for developers building on Vertex AI and AI Studio.
AI Studio now supports vibe-coding full native Android apps with an embedded emulator, direct phone installation for testing, and export to Android Studio or GitHub. Firebase integration is also coming, tying the development workflow more tightly into the broader Google Cloud ecosystem.
Google is expanding SynthID watermarking and C2PA Content Credentials into Chrome and Search, allowing users to verify AI-generated or altered images at the point of discovery. This is a practical development for enterprises concerned about content authenticity and compliance workflows.
Google AI Ultra pricing has been restructured with a new entry point at $100 per month, down from $249.99, with a $200 per month tier adding access to Project Genie. The pricing adjustment brings it closer to competing premium AI subscription tiers from other providers.

40:21 Justin – “Because Ryan loves agents just running around touching things.”

Azure

41:52 Generally Available: Azure Storage Mover Blob-to-Blob migration

Azure Storage Mover now supports Blob-to-Blob container migrations in general availability, allowing customers to move data across regions, subscriptions, and accounts without deploying or managing any infrastructure.
The service is agentless and fully managed, with built-in job tracking, resumability, and parallel execution support, which reduces the operational overhead typically associated with large-scale data migrations.
Performance is rated at multi-GB/s transfer speeds depending on workload and region topology, with support for both flat namespace and hierarchical namespace storage accounts, making it suitable for enterprise-scale migrations involving large object counts and deep directory structures.
This is a practical option for customers consolidating storage accounts, reorganizing data across Azure regions, or migrating between subscriptions, and migrations can be initiated directly from the Azure portal in a few steps.
Pricing details are not specified in the announcement, so listeners planning migrations should check the Azure Storage Mover pricing page before scoping out large transfer jobs, as data egress and service costs can vary by region and volume.

42:48 Ryan – “They didn’t release pricing information, and for these types of migrations, I think that’s key that they need to announce that.”

44:22 Public Preview: Evaluate feature rollouts with Azure App Configuration Scorecards

Azure App Configuration now includes a Scorecards feature in public preview that gives teams a telemetry-driven view of how feature flag variants are performing in production, pulling data from Application Insights without requiring manual dashboard comparisons.
The core value here is connecting feature flag management directly to production signals, so teams can make rollout, optimization, or deprecation decisions based on actual usage data rather than guesswork.
Scorecards let teams compare variants against key performance indicators and detect potential issues introduced by new feature behavior, which is useful for organizations running A/B tests or gradual rollouts at scale.
This builds on the existing Azure App Configuration Feature Management stack, so teams already using feature flags in that ecosystem can adopt Scorecards without introducing new tooling dependencies.
Pricing details are not yet specified for this public preview capability, so teams evaluating it should factor in existing Azure App Configuration and Application Insights costs while monitoring for GA pricing announcements.

44:53 Justin – “This is cool! I like the conception of this. Like, you use a scorecard, which, if you’re not using scorecarding to measure maturity and services between each other, I do highly recommend it. And this is basically how you make your rollout process data-driven, and this is a great way to do that.”

45:51 Microsoft surprises with its first server Linux distribution: Azure Linux 4.0

Microsoft announced Azure Linux 4.0 at Open Source Summit North America, marking its first general-purpose Linux distribution available to all Azure customers as a VM image, not just AKS users, as with the previous 3.0 version. It is Fedora-based, open source on GitHub, and purpose-built for Azure infrastructure with a minimal package footprint and no graphical environment planned.
The release splits into two distinct products: Azure Linux 4.0 as a general-purpose VM image for cloud workloads, and Azure Container Linux (ACL) as a hardened, immutable container host based on Flatcar Container Linux with no package manager, designed specifically for AKS container hosting.
Microsoft is positioning Azure Linux as a supply chain security play, with curated packages, monthly security patches, and a commitment to rapid CVE response outside the standard Patch Tuesday cycle.
The support lifecycle is four years, with optional automatic security upgrades for both VM and AKS deployments.
WSL support is planned so developers can run Azure Linux locally on Windows 11, providing a consistent environment between local development and cloud production workloads, with VS Code integration noted as a primary developer workflow.
Microsoft noted that over two-thirds of customer cores in Azure already run Linux, and that Microsoft 365, GitHub, and the infrastructure supporting ChatGPT all run on Linux. Azure Linux 4.0 is positioned as a batteries-included option alongside eight existing endorsed distributions, with no changes to partner relationships with Red Hat, Canonical, or others. No separate pricing was announced beyond standard Azure VM compute costs.

47:21 Ryan – “I do like that they’re talking about like how much Linux workload runs on Azure, because that’s always been the reality, but I always felt like – especially in the early days – that it was sort of this hush-hush sort of like, very Windows-focused, and it’s just not the reality in cloud and SaaS applications in general. So this, you know, is first-party support for an operating system, which is nice. I think that’s great.

Closing

355: The Cloud Pod's AI Pleads Not Guilty, Blames Philip K. Dick

Wed, 27 May 2026 17:14:44 +0000

354: US-Tirefire-1 lives up to its Stellar Reputation

Wed, 20 May 2026 18:16:10 +0000

Welcome to episode 354 of The Cloud Pod, where the weather is always cloudy! This week was sort of a tire fire for the cloud, with US-East-1 losing power, TanStack Supply chain being hit with an impressively creative attack, and Linux getting hit with a second vulnerability in as many weeks. But it’s not all bad news – Microsoft finally figured out we don’t want (or need) Copilot in EVERYTHING, and Anthropic introduced dreaming via Claude managed agents. There’s even more where that came from, plus an aftershow, so let’s get started!

Titles we almost went with this week

IAM Not Messing Around With AI Agent Security
Redis Who? Valkey 9.0 Crashes the Cache Party
US-EAST-1 Loses Power Again, Architects Say Told You So
HTTP 402 Payment Required Now Actually Required for Bedrock Agents
ElastiCache Finds Your Data With Vectors and Vibes
Stop Squinting at Logs and Let AI Do It
GKE Nodes Finally Stop Taking the Scenic Route
AWS MCP Server Goes GA So Your AI Stops Lying
AI Agents Now Snitching on Your Sloppy Security Code
TanStack Supply Chain Worm Trusted SLSA and Lied
I wonder if Claude is dreaming about how bad my code is
US-EAST-1 Loses Power Again, The CloudPod Say Told You So
Will my credit card company accept my agent bought it as a fraud reason?
Extended RDS and Cloud SQL is a TAX without representation
Boston SQL Party – Throw your Extended RDS overboard
Everyday is a bad day for Cyber Security
Azure Scale Sets Finally Let Your VMs Grow Up
From 200 to 1000 VMs Without Starting Over
Availability Sets Pack Their Bags for Scale Sets

A big thanks to this week’s sponsors:

Check out thecloudpod.net/archera to schedule a demo today.

Follow Up

01:26 Microsoft Cuts Copilot Bloat

Microsoft is actively removing Copilot integrations from products where adoption was low or user feedback was negative, including Gaming Copilot on Xbox and several Windows 11 entry points in Photos, Widgets, and Notepad.
The scale of the Copilot sprawl became concrete when a tech commentator counted 81 distinct Copilot products, a figure that circulated internally at Microsoft and drew attention from staff.
Microsoft executive Jacob Andreou publicly acknowledged the need to cut underperforming Copilots before deleting the post, signaling an internal shift toward consolidation under a single combined consumer and enterprise Copilot organization.
The financial case for trimming Copilots is direct: Microsoft noted during its most recent earnings that running certain Copilots was compressing margins, particularly free integrations in Windows where no additional revenue offsets the inference costs.
The products Microsoft is choosing to retain, such as Microsoft 365 Copilot, which saw 33 percent growth in paying users last quarter, point toward a narrower focus on enterprise workflows with measurable revenue attachment rather than broad surface-area coverage.

03:06 Jonathan – “I think it’s just the invasive nature of the whole thing, because the ULA for Copilot, someone’s going to click right through it and not realize that everything they type in their app is now being sent to Copilot and used for training. And all of a sudden, they added a bunch of apps that everyone uses every day, like Notepad, and I think it’s quite an invasion previously. I love AI and using AI, but having it slammed in my face by Microsoft, having it enabled by default, and having it take screenshots and having it do all those things without explicitly opting in is what I’m unhappy about.”

General News

05:04 Linux bitten by second severe vulnerability in as many weeks

Linux has seen two severe privilege escalation vulnerabilities disclosed within a week of each other, with Dirty Frag (CVE-2026-43284 and CVE-2026-43500) allowing low-privilege users to gain root access across virtually all distributions.
The exploit is particularly concerning because it is deterministic, stealthy, and causes no crashes, making it difficult to detect while working reliably across different environments, including shared servers and virtual machines.
Proof-of-concept code was leaked publicly before most distributions had incorporated kernel patches, effectively turning this into a zero-day and accelerating real-world exploitation risk.
Microsoft has already observed signs of active experimentation in the wild.
Cloud and shared hosting environments face elevated exposure since the attack is well-suited to multi-tenant scenarios where untrusted users share underlying infrastructure.
Debian, AlmaLinux, and Fedora have released patches, and organizations running Linux workloads should prioritize checking their distribution’s patch status and applying updates promptly.

06:46 Jonathan – “I think the easiest way to exploit it is through is supply chain attacks, because if you do an APT update or something and there’s an open source package that’s been packaged up into somebody’s Ubuntu repo, whenever those things run, they can run shell scripts – they can run arbitrary code when you the update and they’re all, to be fair, they’re already running his route at that point anyway, so it’s not quite as bad, but yeah.”

07:18 TanStack npm Packages Hit by Mini Shai-Hulud

On May 11, 2026, 84 malicious npm artifacts were published across 42 TanStack packages by hijacking the legitimate release pipeline using OIDC token extraction from runner memory, not stolen credentials.
This is notable as the first documented supply chain attack producing valid SLSA Build Level 3 provenance attestations, meaning standard provenance verification tools would show these packages as trusted.
The attack chained three vulnerabilities: a pull_request_target misconfiguration allowing fork code to run in the base repo context, GitHub Actions cache poisoning using a pre-computed cache key, and OIDC token extraction from runner memory using a technique first documented in the tj-actions/changed-files compromise from March 2025.
The worm self-propagated to over 200 packages beyond the initial TanStack packages, reaching Mistral AI, UiPath, and others within hours, and systematically harvested credentials from AWS IMDSv2, HashiCorp Vault, Kubernetes service accounts, and notably Claude Code session history files.
A critical remediation ordering issue exists: the payload installs a dead-man’s switch that runs rm -rf ~/ if the stolen GitHub token is revoked, meaning teams must disable the monitor service before rotating credentials or risk home directory destruction.
The practical takeaway for teams is that SLSA provenance is a necessary but insufficient supply chain control, and any npm package using OIDC trusted publishing without branch and workflow pinning is vulnerable to this class of attack.

08:16 Justin – “Brilliant – bravo for the creativity for this one. I had not thought of that attack vector before.”

15:02 AWS warns of EC2 ‘impairment’ as power loss hits notorious US-EAST-1 region

US-EAST-1 experienced another power loss event, causing EC2 instance impairment, continuing the region’s well-documented history of outages that have made it a cautionary tale for architects who skip multi-region or multi-AZ design.
This incident reinforces why AWS best practices around Availability Zone distribution exist in the first place. Running workloads across multiple AZs or regions is not optional for production systems with meaningful uptime requirements.
For teams still running single-AZ deployments in US-EAST-1, this is a practical reminder to review architecture decisions around Auto Scaling groups, RDS Multi-AZ, and Route 53 health checks as baseline resilience tools.
The article itself is thin on technical specifics, which points to a broader issue with AWS’s incident communication.
Customers often learn more from third-party sources than from the AWS Service Health Dashboard during active events.
Cost consideration for listeners: building multi-AZ or multi-region redundancy does increase infrastructure spend, but the business impact of unplanned downtime in a single-AZ setup typically outweighs that cost for any revenue-generating workload.

16:30 Matthew – “I also feel like if you have a cloud architect at your company that’s recommending you don’t do multi AZ, just in general, you should probably fire the person.”

AI Is Going Great – or How ML Makes Money

18:47 New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration

Anthropic launched several updates to Claude Managed Agents, including dreaming in research preview, plus outcomes, multiagent orchestration, and webhooks in public beta.
Dreaming is a scheduled process that reviews past agent sessions to extract patterns, refine memory stores, and enable agents to self-improve between sessions without human intervention.
The outcomes feature lets developers write a rubric for success, then a separate grader evaluates agent output in its own context window to avoid being influenced by the agent’s reasoning.
Internal benchmarks showed up to 10 percentage point improvement in task success over standard prompting, with file generation gains of 8.4% for docx and 10.1% for pptx.
Multi-agent orchestration allows a lead agent to break complex jobs into parallel workstreams delegated to specialist subagents, each with its own model, prompt, and tools.
All events are persistent and traceable through the Claude Console, giving developers full visibility into which agent did what and in what order.
Real-world results from early adopters show measurable outcomes: Harvey saw roughly 6x improvement in completion rates using dreaming for legal drafting workflows, and Wisedocs reported 50% faster document review cycles using outcomes to enforce quality standards.
The combination of memory, dreaming, and multi-agent orchestration represents a shift toward agents that accumulate institutional knowledge over time rather than starting fresh each session, which has practical implications for enterprise teams running long-horizon or high-volume workloads.
Want to request access to Dreaming? You can do that here.

19:47 Jonathan – “It’s a great feature. I built a retrospective agent a while ago, before Dreaming was around… but that went back through and looked at chunks of things, especially if I had to correct it to figure out if I said something wrong earlier in the chat? What led to the divergence from the intent in a way? So I guess this is an automated way of doing the same thing, and probably covers a wider range of problems than I have built.”

21:58 Agent view in Claude Code

Anthropic launched agent view in Claude Code on May 11, 2026, currently available as a Research Preview for Pro, Max, Team, Enterprise, and API plan users.
It provides a unified CLI interface for managing multiple Claude Code sessions simultaneously, accessible via the claude agents command or the left arrow key from any session.
The feature addresses a practical pain point for developers running parallel AI coding sessions, replacing the need to juggle multiple terminal tabs or tmux grids.
Each session row displays status, last response content, and interaction timestamp at a glance.
Two key commands extend session management flexibility: /bg moves an existing session to the background, while claude –bg [task] launches a new session directly in the background without occupying the foreground terminal.
Early use cases include dispatching multiple coding tasks in parallel and reviewing the resulting pull requests from a single list, managing long-running looping jobs like PR monitors with next-run times visible in the agent list, and quickly spinning up related tasks or codebase questions without losing context in the primary session.
For teams and organizations, this tooling supports scaling concurrent AI-assisted development workflows within existing rate limits, which is worth noting as a practical constraint when planning parallel workloads.
Want to install Claude Code? Do that here.

AWS

27:26 The AWS MCP Server is now generally available

The AWS MCP Server is now generally available as a managed remote MCP server that gives AI agents authenticated access to all 15,000+ AWS API operations using existing IAM credentials, with no additional charge beyond normal AWS resource costs. It is currently available in US East N. Virginia and Europe Frankfurt regions.
A core problem this solves is AI coding agents relying on stale training data, producing overly permissive IAM policies, and defaulting to CLI commands instead of CDK or CloudFormation.
The server addresses this by retrieving current AWS documentation at query time and providing Skills, which are curated best practices maintained by AWS service teams.
The new run_script tool lets agents execute sandboxed Python server-side with no network access, allowing multi-step API calls to be chained in a single round-trip rather than sequentially, which reduces both latency and context window consumption.
Enterprise governance is addressed through IAM context keys for fine-grained access control, CloudWatch metrics under the AWS-MCP namespace to separate agent calls from human calls, and full CloudTrail logging for compliance audit trails.
The server works with any MCP-compatible client, including Claude Code, Kiro, Cursor, and Codex, but requires a local proxy called MCP Proxy for AWS to bridge IAM SigV4 authentication to the OAuth 2.1 that MCP currently supports, which adds a setup step worth noting for teams evaluating adoption.

28:54 Jonathan – “It seems like just like another abstraction on top of another abstraction at this point. We’ve already got the cloud configuration API that they built for Terraform to use. I would assume that this MCP talks to that. But why? Why not just teach it how to use CLA commands?”

32:34 AWS Marketplace now supports programmatic procurement with Agreements API

AWS Marketplace launched the Agreements API, allowing organizations to programmatically procure software, accept offers, track charges, manage entitlements, and update purchase orders without leaving their existing procurement tools.
Combined with the existing Discovery API, this creates a full end-to-end programmatic procurement workflow from product discovery through purchase, which is useful for enterprises running automated or policy-driven software acquisition processes.
Partners and ISVs can use these APIs to build custom storefronts on top of AWS Marketplace, giving them more control over how customers experience the procurement process within their own platforms.
The API is currently available only in US East (N. Virginia), which is worth noting for organizations with regional compliance or data residency requirements, as broader regional availability is not yet confirmed.
Getting started requires configuring IAM permissions and calling the API via the AWS SDK, with full documentation available in the AWS Marketplace Agreement APIs reference. No separate pricing for API usage was announced, as costs would reflect the underlying Marketplace product agreements.

33:26 Justin – “This is nice, because they had a Koopa integration maybe six years ago, they announced it, then they basically did nothing – no one adopted it – and they kind of stopped working on it. So this is much better to have an agreements API that you can actually integrate into.”

35:04 Announcing Agent Toolkit for AWS — help AI coding agents build effectively on AWS

AWS launched the Agent Toolkit for AWS, a free suite of tools designed to help AI coding agents work more reliably on AWS by providing validated, up-to-date procedures called agent skills, reducing errors and token waste in multi-service workflows. It succeeds the MCP servers and plugins previously hosted on AWS Labs.
The toolkit launches with over 40 agent skills covering infrastructure-as-code, storage, analytics, serverless, containers, and AI services, with database, networking, and IAM skills planned soon. Skills give agents tested procedures rather than letting them improvise from potentially outdated training data.
The AWS MCP Server, now generally available, adds IAM-based guardrails, CloudWatch and CloudTrail observability, and sandboxed code execution, addressing the governance concerns that have made organizations hesitant to deploy coding agents in production environments. It is currently available only in US East N. Virginia and Europe Frankfurt.
Three pre-bundled plugins simplify setup by combining the MCP server with curated skill sets: AWS Core for full-stack application developers, AWS Data Analytics for data pipeline work, and AWS Agents for building production agents using Amazon Bedrock AgentCore.
The Agent Toolkit is available at no additional charge, with customers paying only for the underlying AWS resources their agents consume, making adoption straightforward for teams already using AWS services.

36:07 Justin – “Everyone’s trying to get to agents, and how agents run on top of Bedrock Agent Core, and so, how baked are these things when things like Bedrock Agent Core are pretty new? I do appreciate it; I think the MCP server is probably where I would spend most of my time for building an agent for this, even though I just mocked it mercilessly, but the plugins might be good, or the skills might be good for certain things if you’re not familiar. If they could do an incognito skill.”

38:07 AWS Console Mobile App adds interactive graphs, AI log summaries, and natural language logs search to CloudWatch Alarms

AWS Console Mobile App now consolidates CloudWatch alarm investigation into a single view, combining interactive metric graphs, AI-generated log summaries, and natural language log search to reduce the time from alert to root cause identification.
The natural language log search supports typed queries, voice input, and pre-saved Logs Insights queries, which lowers the barrier for on-call engineers who need to investigate incidents quickly from a mobile device.
The AI-generated log summaries automatically highlight key contributing factors when an alarm triggers, which could reduce the cognitive load during off-hours incident response without requiring engineers to manually parse raw log data.
The feature is available at no additional cost beyond standard CloudWatch charges and works across all AWS Commercial Regions, accessible by downloading or updating the AWS Console Mobile App from the Apple App Store or Google Play Store.

40:01 Agents that transact: Introducing Amazon Bedrock AgentCore payments, built with Coinbase and Stripe

Amazon Bedrock AgentCore payments, now in preview, let AI agents autonomously pay for APIs, MCP servers, web content, and other agents using either a Coinbase wallet or Stripe Privy wallet, with spending limits enforced per session to prevent open-ended fund access.
The feature is built on the x402 protocol, an HTTP-native standard where agents handle HTTP 402 Payment Required responses automatically, executing stablecoin micropayments and continuing their task without interrupting the reasoning loop. Fiat payment support is on the roadmap.
Developers configure a funded wallet, set session spending limits, and the platform handles credential management, protocol negotiation, and transaction observability through existing AgentCore logs and traces, reducing what AWS describes as months of custom billing integration work.
The Coinbase x402 Bazaar MCP server is available through the AgentCore gateway, giving agents a discovery mechanism to find and pay for x402-enabled services dynamically rather than requiring developers to hardcode each integration.
The feature is available in preview across four regions: US East N. Virginia, US West Oregon, Europe Frankfurt, and Asia Pacific Sydney.
Pricing is not yet publicly specified, though the use cases described involve micropayments typically under one dollar or fractions of a cent per transaction.

41:11 Justin – “And this is where you think you’re gonna make blockchain purchasing more popular, I don’t know if that’s the case.”

43:37 Announcing Valkey 9.0 for Amazon ElastiCache

Amazon ElastiCache now supports Valkey 9.0, which adds built-in full-text and hybrid search capabilities on top of existing vector similarity search, enabling real-time semantic retrieval and aggregations over terabytes of data with microsecond latency at no additional cost.
A notable performance improvement in 9.0 is up to 40% higher throughput for pipelined workloads, achieved through engine-level optimizations like faster command parsing and improved memory prefetching, which could reduce over-provisioning costs for high-throughput applications.
Two new operational features stand out: hash field expiration lets you apply TTLs to individual fields within a hash rather than the entire key, and multi-database support in cluster mode provides logical namespaces that simplify multi-tenant architectures and migrations from standalone Redis environments.
Valkey 9.0 is available now across all commercial AWS Regions, GovCloud, and China Regions for both node-based clusters and serverless caches, with no additional pricing beyond standard ElastiCache costs.
Existing clusters can be upgraded via the AWS Console, SDK, or CLI.
AWS continues to position Valkey as its recommended ElastiCache engine over Redis, and with 100-plus enhancements in this release, teams evaluating a Redis-to-Valkey migration have a more complete feature set to work with, particularly for AI-driven and real-time use cases.

44:30 Amazon ElastiCache now supports real-time full-text, exact-match, and numeric range search

Amazon ElastiCache now supports full-text, exact-match, and numeric range search directly within the cache layer, eliminating the need for a separate search service and enabling latency as low as microseconds with throughput up to millions of operations per second.
The feature is available at no additional cost for clusters running ElastiCache version 9.0 for Valkey, which AWS positions as its recommended open-source Redis alternative. Existing clusters can be upgraded via the console, SDK, or CLI.
Practical use cases include product inventory lookups, user session filtering, financial transaction range queries, and gaming leaderboards, all scenarios where data changes frequently and search results need to reflect the latest writes immediately.
Developers can combine search types in a single query, for example, filtering by category, price range, and text match simultaneously, which reduces application complexity compared to routing queries across multiple services.
The feature is available across all commercial AWS regions, GovCloud, and China regions, making it broadly accessible for regulated and global workloads without regional limitations.

44:40 Amazon ElastiCache now supports real-time hybrid search with vector and full-text

Amazon ElastiCache now supports hybrid search combining vector similarity and full-text search in a single query, available on ElastiCache for Valkey 9.0 at no additional cost.
This eliminates the need for a separate search service while delivering latency as low as microseconds and up to 99% recall across billions of embeddings.
The feature integrates with popular embedding providers, including Amazon Bedrock, SageMaker, Anthropic, and OpenAI, making it practical for teams already using those services to build RAG systems and AI agent memory without adding infrastructure.
A notable technical detail is that search indexes update in real time as writes complete, meaning applications always query current data rather than relying on batch index refreshes common in traditional search architectures.
Practical use cases include e-commerce product search, where users might combine exact product names with semantic descriptions, and generative AI applications where hybrid retrieval can reduce token costs by surfacing more precise context.
Availability spans all commercial AWS Regions, GovCloud, and China Regions for node-based clusters running Valkey 9.0 or above, with upgrade paths available through the AWS Console, SDK, or CLI.

45:18 Justin – “…Amazon really only provides Nova embedding as like a Bedrock embedding model, unless you go use Coheer or Mistral or any of the others. So it’s definitely something to keep in mind, too. So you might have to bring your own embedding model if you don’t like Nova’s.”

46:30 AWS Capabilities by Region now supports availability notifications

AWS Capabilities by Region in AWS Builder Center now supports availability notifications, letting builders subscribe to alerts when specific services or features launch in their target Regions across 1,500+ services and 37 Regions.
Subscriptions work at the service level, meaning a single subscription to something like Amazon Bedrock automatically covers all underlying features, such as Knowledge Bases and Guardrails, removing the need to track each feature individually.
Notifications come through two channels: real-time in-app alerts within AWS Builder Center and a consolidated weekly email digest, giving builders flexibility in how they stay informed.
Practical use cases include monitoring service parity across Regions, planning migrations, and tracking when specific capabilities land in a target Region before committing to an expansion.
The feature is free for all authenticated users with an AWS Builder ID, with no additional cost to access or manage subscriptions through Settings > Notifications in AWS Builder Center at builder.aws.com/build/capabilities.

46:59 Justin – “…this is great, and something that, as they get more regions out there, becomes a bit of a problem. We used to talk about all the different regions getting services – and we even were talking about them here, even though they were lightning round topic – we were just like, we can’t. It’s ridiculous how these things roll out over time, and what’s available and not available. But having the ability to see what it is, but then now sign up for a notification so I don’t have to go back to the builder center, even better.”

48:53 Introducing Claude Platform on AWS: Anthropic’s native platform, through your AWS account

Claude Platform on AWS gives customers access to Anthropic’s native Claude Platform directly through their AWS account, eliminating the need for separate credentials, contracts, or billing relationships with Anthropic.
AWS is noted as the first cloud provider to offer this native integration.
The service uses IAM credentials and AWS Signature Version 4 for authentication, logs activity to CloudTrail, and bills through AWS Marketplace, meaning teams can manage Claude usage alongside existing AWS governance and cost tracking workflows.
An important technical distinction to understand: Claude Platform on AWS is operated by Anthropic and processes data outside the AWS security boundary, making it different from Claude models on Amazon Bedrock, which stay within AWS infrastructure.
Teams with regional data residency requirements should factor this into their decision.
The service includes access to features like Claude Managed Agents, MCP connector, web search, code execution, and the Files API, with workspaces providing IAM-based access controls and isolation between projects or teams.
Pricing follows Anthropic’s consumption-based model billed through AWS Marketplace, and the service is available across roughly 18 regions spanning North America, South America, Europe, and Asia Pacific.
Teams already using Claude Code or other Anthropic tools can point those clients at their workspace with minimal configuration changes.

49:35 Justin – “This is nice because it’s a little bit better than just having the API. You get all the features that you kind of lose typically by using the Bedrock API with Cloud, so things like Chrome browse modes – that works in this as well. And basically, what this really is is a different way to contract and buy your Cloud managed services through AWS, which I think is handy.”

51:39 AWS Security Agent’s full repository code scanning feature now available in preview

AWS Security Agent now includes full repository code scanning in preview, offering context-aware security analysis that reasons about entire codebases rather than matching individual lines against known vulnerability patterns like traditional SAST tools do.
The scanner operates in four stages: profiling the application to map entry points and trust boundaries, dispatching specialized agents to high-risk components, deduplicating findings, and independently validating each candidate vulnerability before surfacing it to developers.
A notable distinction from existing tools is how findings are structured, with separate Verified and Could not verify sections, so developers know exactly what was confirmed in code versus what depends on runtime or deployment environment factors.
Practical use cases include running scans before penetration tests to clear lower-hanging issues, auditing acquired or open source code without needing institutional knowledge, and surfacing architectural trust boundary issues alongside implementation bugs.
Full repository code review is available now in preview at no additional charge for existing AWS Security Agent customers, with access through the AWS Security Agent console and a quickstart guide here.

52:56 Jonathan – “I wonder what model they’re using underneath, because it’s not Nova. They trained something. I kind of wonder… it must suck that they built a model that’s so good that they can’t sell it to anybody.”

54:56 Microsoft exec Shawn Bice returns to AWS to lead reliability push for AI agents – GeekWire

Shawn Bice is returning to AWS as VP of AI Services to lead the Automated Reasoning Group, reporting to Swami Sivasubramanian, who oversees Agentic AI at Amazon.
Bice previously ran AWS’s database portfolio, including Aurora, DynamoDB, and RDS, before leaving in 2021.
The Automated Reasoning Group focuses on neurosymbolic AI, which combines pattern-matching capabilities with mathematical verification techniques to confirm software is behaving as intended.
The goal is to give businesses stronger guarantees about AI agent behavior before deploying them autonomously.
This hire comes after AWS acknowledged a limited service disruption in February tied to an AI agent making changes without human oversight, which raised questions about reliability controls in agentic systems.
Bice’s background in security at Microsoft, where he oversaw Security Copilot and Sentinel, appears relevant to addressing those concerns.
For AWS customers building or evaluating agentic AI workflows, this signals that AWS is investing in formal verification and trust mechanisms as a differentiator, rather than relying solely on model-level improvements.
Businesses in regulated industries may find this approach particularly relevant when assessing autonomous agent deployments.

GCP

56:44 Google Is Building an AI Agent That Could Be Its Answer to OpenClaw

Google is internally testing an AI agent called Remy, described as a 24/7 personal agent built on Gemini that can take actions on behalf of users rather than just answering questions or generating content.
It is currently in a dogfooding phase with employees using a staff-only version of the Gemini app.
Remy is designed to integrate deeply across Google services, with the ability to monitor for user-defined priorities, handle complex multi-step tasks proactively, and learn user preferences over time.
This goes beyond the existing Agent Mode features already available in Gemini at various subscription tiers.
No public launch timeline has been confirmed, and Google declined to comment on the project.
Google I/O later this month is expected to feature agent-related announcements, though it is unclear if Remy will be part of that.
For GCP customers and enterprise users, a deeply integrated personal agent that connects across Google Workspace and other services could have practical implications for workflow automation, though no pricing or enterprise deployment details are available yet.
The competitive context here is OpenClaw, a viral third-party AI agent that OpenAI moved to acquire talent from earlier this year.
Google’s internal effort signals that autonomous personal agents are becoming a standard product category rather than a niche capability.

57:25 Justin – “I tried to use a lot of Gemini Enterprise every day at the day job, where we’re a big customer of it, and I’m always disappointed in the limited capabilities it has. I hope this comes quickly, because they need much better capabilities here.”

1:00:44 Google’s Gemma 4 AI models get 3x speed boost by predicting future

tokens

Google released Multi-Token Prediction drafters for Gemma 4, offering up to 3x faster token generation through speculative decoding.
The drafter models predict future tokens during idle compute cycles rather than waiting for the main model to process each token sequentially.
The MTP drafter for the E2B model is notably small at 74 million parameters, and it shares the main model’s key value cache to avoid redundant context recalculation. It also uses sparse decoding to narrow down likely token clusters, which contributes to the speed improvement on memory-constrained consumer hardware.
This addresses a specific bottleneck in local AI inference where slow VRAM-to-compute transfers leave processing units idle between tokens. Consumer GPUs lack the high-bandwidth memory found in enterprise hardware, so the drafter fills that gap by doing useful work during those transfer delays.
Gemma 4 runs under an Apache 2.0 license, a more permissive change from the custom license used in prior Gemma versions, which broadens options for developers building commercial or derivative applications.
The MTP drafters are currently labeled experimental, so production use cases should account for that status.
The practical audience here is developers running local inference on consumer or prosumer hardware who want faster generation without upgrading to enterprise accelerators. No additional cost is associated with the drafter models beyond the hardware and compute already in use.

1:02:00 Jonathan – “But it’s only fast if you’ve got spare compute cycles. You know, if you’re to go full capacity, it’s actually a lot slower.”

1:04:18 Gemini 3.1 Flash-Lite is now generally available

Gemini 3.1 Flash-Lite is now generally available on the Gemini Enterprise Agent Platform, positioned as the lowest-latency and most cost-efficient model in the Gemini 3 series, designed for high-volume automated pipelines and agentic tasks like tool calling and orchestration.
Real-world performance metrics from early adopters are notable: Gladly reported roughly 60% lower costs compared to thinking-tier models, with p95 latency around 1.8 seconds for full reply generation and a 99.6% success rate under heavy concurrent load across SMS, WhatsApp, and Instagram channels.
The model supports multimodal inputs, enabling use cases like simultaneous text and image safety checks in gaming platforms and prompt enhancement for image generation pipelines, areas where cost previously limited sophisticated prompt engineering at scale.
Financial services teams are using Flash-Lite for latency-sensitive workflows, including real-time research during live calls, email triage, and high-volume data processing, with Ramp noting it leads on cost, latency, and intelligence tradeoffs across their model stack.
Pricing details are available here, and documentation for getting started is over here.

1:07:16 GKE node startup gets faster

GKE now delivers up to 4x faster node startup times for qualifying nodes in Autopilot mode, addressing cold-start latency that has historically forced teams to over-provision midle compute as a buffer against scaling delays.
The improvement comes from three architectural changes: intelligent compute buffers, fast-starting virtual machines, and a new control plane that allows VMs to resize without rebooting, all applied automatically without any configuration changes from users.
The feature is immediately available for GKE Autopilot workloads on supported hardware, and Standard cluster users can selectively apply it to specific pods using the Autopilot ComputeClass without migrating their entire cluster.
AI inference and GPU workloads stand to benefit most, since faster node provisioning reduces the gap between a traffic spike and when a model can actually serve requests, which directly affects end-user latency and accelerator costs.
Pricing follows existing GKE Autopilot rates with no additional charge for the faster provisioning, meaning the cost benefit comes indirectly through reduced need for idle standby nodes rather than a new pricing tier.

1:08:31 Justin – “The need for instant, on-demand capacity at a Kubernetes node level feels rare to me, unless you’re doing something like agentic training.”

1:09:33 Postgres 18 and Extended Support for legacy versions in AlloyDB

AlloyDB now supports PostgreSQL 18 in general availability, bringing features like B-tree skip scans, parallel GIN index usage, native UUIDv7 support, and virtual generated columns to Google’s managed Postgres service.
Google is introducing Extended Support for older AlloyDB major versions, giving customers up to three years of continued security patches, bug fixes, and SLA coverage beyond community end-of-life dates.
Pricing for Extended Support has not been announced yet, but will carry an additional fee.
Extended Support timelines are automatically applied, starting with PostgreSQL 14 from February 2027 through February 2030, with similar three-year windows rolling forward for versions 15 through 17. Customers can opt out at any time by upgrading to a version still in regular support.
AlloyDB’s in-place major version upgrade path reduces upgrade time to minutes without requiring data migration or connection string changes, which is notable for large multi-tenant environments like UKG’s People Fabric platform that manages thousands of database objects.
AlloyDB’s compute-storage separation architecture offloads logging and maintenance tasks to a dedicated storage layer, with Google citing up to 2x better price-performance compared to self-managed PostgreSQL and elastic storage that scales automatically without pre-provisioning.

1:09:59 Justin – “I can tell you that this is the feature that I hate the most about both Amazon and Google, doing this extended support; basically a taxing process where they start charging you more money because it’s old. It’s kind of annoying. I get why they do it. I mean, you have to maintain it, maintain test harnesses, all that. But I can’t imagine you’re doing that much changing to the orchestration layer. That code doesn’t have to change. It’s really just a way to tax people and make more money on old stuff, in my opinion.”

Azure

1:12:40 Restore a Deleted Logical Server (Preview) – Azure SQL Database

Azure SQL Database now supports soft delete retention for logical servers, currently in preview, allowing deleted servers to be recovered within a configurable window of 1 to 7 days.
This addresses a long-standing gap where accidental deletion of a logical server meant permanent loss of the server configuration and all associated databases.
The feature is particularly relevant for teams running automation, scripted cleanup jobs, or bulk operations where accidental deletion is a realistic risk. It also benefits dev and test environments, where servers are frequently created and destroyed.
Configuration is straightforward through the Azure portal, PowerShell, or Azure CLI, with the SQL Server Contributor role required to enable or use the feature.
One notable portal limitation is that soft delete retention can only be set on existing servers, not during initial server creation.
There are some meaningful limitations to be aware of: restoring a server does not automatically restore managed identities, Customer Managed Key encryption must be reconfigured after restore, and servers protected by the Microsoft Entra-only authentication policy cannot be restored without first removing that policy.
Servers older than two years automatically get a seven-day soft delete retention period, while servers under two years old have the feature disabled by default, so teams should proactively check and configure retention settings rather than assuming protection is in place. No additional pricing details are listed for this feature beyond standard Azure SQL Database costs.
The question really is: what did Matt delete that he was anxious to share this story?

1:14:22 AI Subagents ‘Coming Soon’ to Visual Studio Copilot

Microsoft principal product manager Mads Kristensen announced that Copilot subagents are coming soon to Visual Studio, bringing a feature that has been available in VS Code since around GitHub Universe 2025 to the full Windows IDE.
A subagent is an independent AI agent that handles a focused task, such as auditing config files or reviewing test coverage, and returns only a summary to the main agent, which helps manage context window consumption in large projects like complex .NET solutions.
VS Code’s current implementation supports custom subagents with their own tools, instructions, and model selections, parallel subagent execution, and recursive delegation up to a depth of 5, giving developers a sense of where Visual Studio’s implementation may eventually land.
Visual Studio already has built-in agents for debugging, profiling, testing, and modernization via agent mode, but subagent orchestration, where a parent agent delegates work internally to child agents, is not yet documented as an available Visual Studio feature.
No pricing details or specific release date have been provided beyond the coming soon signal, and no pricing changes are expected since this builds on existing Copilot subscription tiers.

1:14:46 Justin – “I guess I’m happy that you finally got this; it’s been in Claude and all the other tools for a while, so congrats that you finally got what everyone else has.”

1:15:03 Public Preview: Migrate Availability Sets to Virtual Machine Scale Sets

Azure is finally letting you migrate VMs out of Availability Sets into Virtual Machine Scale Sets without nuking and rebuilding your workloads — this has been a long-standing pain point for anyone who stood up infrastructure before VMSS Flex was the recommended pattern.
The scale ceiling alone is worth paying attention to; Availability Sets cap out at 200 VMs, VMSS Flex goes to 1,000, and you get autoscaling, rolling upgrades, and zone-level resiliency that Availability Sets never had.
The migration is VM-by-VM and cancellable at any point, which is the right call for production workloads.
You can validate each machine before moving on, and anything not yet migrated stays put in the original set if you bail.
The portal experience does all-at-once migration, so if you need zero-downtime or rolling migration, you’ll want CLI, PowerShell, or the REST API; an important distinction that’s easy to miss if you just click through the guided flow.
Zonal migration is the recommendation for anything that needs serious resiliency. You get a 99.99% SLA vs 99.95% with fault domains only, and you can optionally resize VMs as part of the zonal move, which is a nice bonus if you’ve been wanting to right-size

After Show

1:18:46 Introducing Googlebook, designed for Gemini Intelligence

Google announced Googlebook, a new laptop category that merges Android and ChromeOS into a single platform, with devices expected from Acer, ASUS, Dell, HP, and Lenovo this fall. No pricing has been announced yet.
The Magic Pointer feature, developed with Google DeepMind, adds contextual Gemini suggestions directly to the cursor, allowing users to interact with on-screen content like dates, images, and text without switching applications.
A Create your Widget feature lets users generate custom desktop widgets through natural language prompts, pulling data from Gmail, Calendar, and web searches into a single personalized dashboard.
Quick Access enables direct browsing of phone files from the Googlebook file manager without manual transfers, and Android phone apps can be used online on the laptop without leaving the current workflow.
This announcement is primarily a consumer hardware story rather than a GCP or enterprise cloud infrastructure development, so GCP-focused listeners should note the Gemini integration angle but temper expectations about direct cloud platform implications until more technical details are released at googlebook.com.

Closing

353: Don't Be Evil Unless the Government Asks Nicely

Wed, 13 May 2026 19:04:02 +0000

Welcome to episode 353 of The Cloud Pod, where the weather is always cloudy! Justin, Ryan, and Matt are in the studio this week and ready to bring you all the latest news, including earnings from the big 3, a new agreement between the DOW and Google (Don’t be Evil), AI Agents, and more OpenClaw news (that your security team may not appreciate). Plus, DataCenters may not be great for the environment. Who knew?

There’s a lot to cover, so let’s get started!

Titles we almost went with this week

Who Let the Bots Out? AI Governance Has No Answer
Microsoft Loses Its OpenAI Monopoly But Keeps the Parking Spot
AWS But Make It Forklifts and Freight
Bezos Built a Money Printer That Prints Data Centers
GPT-5.5 Instant Arrives Faster Than Your Last Existential Crisis
When Your AI Coding Tool Ghosts You for Seven Weeks
No More Goldfish Brain for Your AI Agents
Amazon Quick Connects Everything Except Your Work-Life Balance
AWS WAF Now Knows Which AI Is Crawling Your Stuff
Stop Pushing Broken Code to Staging Like a Caveman
Your AI Agent Called It Needs Automated Therapy
OpenAI Moves In, and AWS Didn’t Even Change the Locks
AI Interviews Candidates So Recruiters Can Nap
Foundry Gives AI Agents Long-Term Memory and a Diary
Cloud Earnings are Up… but some day the Capex Bell will Toll for the AI Reckoning
Who Let the Bots Out? AWS WAF now shows you

A big thanks to this week’s sponsors:

Check out thecloudpod.net/archera to schedule a demo today.

We also wanted to tell you about something coming to the US for the first time — WeAreDevelopers World Congress!

They’ve been doing this in Europe for years, 15,000-plus attendees in Berlin, it’s one of the biggest developer events over there. Coté from Software Defined Talk is actually speaking at their Berlin event this summer, so we’ve got some firsthand context here. In September, they’re launching the North America edition. San José, September 23 to 25. 500-plus speakers, 18 tracks — cloud, infrastructure, DevOps, security, AI, data engineering, all of it. Speakers from Datadog, Honeycomb, Sentry, Google, LinkedIn, and Stack Overflow. Olivier Pomel, Christine Yen, Milin Desai, Kelsey Hightower – plus workshops and masterclasses, not just talks. These are people who know how to do a developer conference at scale. wearedevelopers.us, code DEVPOD26 for 15% off. Group rates on top of that for 4 or more.

Follow Up

It’s Earnings Time!

01:23 Microsoft (MSFT) Q3 earnings report 2026

Microsoft posted Q3 2026 revenue of $82.89 billion, up 18% year over year, with Azure cloud services growing 40%, slightly ahead of analyst expectations in the 38-39% range.
Capital expenditures came in at $31.9 billion, about $3 billion below the analyst consensus of $34.9 billion, contributing to the stock dipping 2% despite the earnings beat, reflecting investor sensitivity to AI infrastructure spending levels.
Microsoft’s annualized AI revenue now stands at $37 billion, up 123% year over year, spanning Azure-hosted AI services and Microsoft’s own AI tools, though the metric excludes some infrastructure workloads, which is worth noting when comparing figures across quarters.
The 365 Copilot commercial seat count grew from 15 million in January to over 20 million by the end of March, indicating continued enterprise adoption of AI productivity add-ons at a pace worth tracking for cloud practitioners evaluating Microsoft’s enterprise AI traction.
Gross margin narrowed to 67.6%, the lowest since 2022, as data center depreciation costs increased, a trend likely to continue across hyperscalers given ongoing infrastructure build-out commitments amid supply chain pressures tied to the Iran conflict.

02:03 Justin – “They’re not spending enough, it’s bad news! They’re spending too much, it’s bad news!”

04:50 Amazon (AMZN) Q1 earnings report 2026

AWS revenue reached $37.59 billion in Q1 2026, growing 28% year over year, which is its fastest growth rate in over three years and came in above analyst expectations of 26% growth.
Amazon‘s capital expenditures hit $44.2 billion in Q1 alone, with a full-year projection of $200 billion, driven primarily by AI infrastructure buildout, including data centers and homegrown chip development.
Free cash flow dropped 95% year over year to $1.2 billion over the trailing twelve months, a direct consequence of AI investment levels, raising questions about when that spending translates to returns.
Amazon has formalized AI partnerships with OpenAI, Anthropic, and Meta, which signals continued infrastructure demand growth and suggests AWS capacity expansion will need to accelerate to support these relationships.
Q2 revenue guidance of $194 to $199 billion came in well above analyst estimates of $188.9 billion, though the wide operating income range of $20 to $24 billion reflects uncertainty likely tied to tariff impacts and variable AI spending timelines.

06:10 Matt – “I know they’re investing, but that’s a massive drop in cash flow year over year.”

07:27 Alphabet (GOOGL) Q1 2026 earnings

Google Cloud posted $20.02 billion in Q1 2026 revenue, a 63% year-over-year increase, with enterprise AI solutions cited as the primary growth driver for the first time.
The unit now carries a $460 billion backlog, signaling sustained demand well into future quarters.
Sundar Pichai noted Alphabet is compute-constrained in the near term, stating cloud revenue would have been higher if supply could meet demand. This is a notable signal for cloud customers who may be experiencing capacity limitations on GCP.
Alphabet raised its 2026 capital expenditure guidance to $180-190 billion, with the CFO indicating 2027 CapEx will increase further.
The $35.7 billion spent in Q1 alone on servers, data centers, and infrastructure reflects the scale of investment required to support AI workloads.
Gemini Enterprise paid monthly active users grew 40% quarter over quarter, suggesting enterprise adoption of AI tooling on Google’s platform is accelerating at a meaningful pace.
Waymo surpassed 500,000 fully autonomous rides per week and is expanding to additional U.S. cities, while its recent $16 billion fundraising round valued it at $126 billion, keeping it a notable component of Alphabet’s longer-term infrastructure story.

08:58 Alphabet earnings call, Q1 2026: Sundar Pichai’s remarks

Google Cloud revenue hit $20 billion for the first time this quarter, growing 63% year-over-year, with backlog nearly doubling to over $460 billion.
Enterprise AI solutions became the primary Cloud growth driver, with revenue from gen AI model-based products growing nearly 800% year-over-year.
Google introduced eighth-generation TPUs split into two specialized variants: TPU 8t for training with three times the processing power of Ironwood, and TPU 8i for inference with 80% better performance per dollar than the prior generation. Google also announced plans to deliver TPUs to select customers in their own data centers, expanding the addressable market beyond hosted cloud.
The new Gemini Enterprise Agent Platform adds capabilities like Projects, Canvas, Long Running Agents, and Skills, with paid monthly active users growing 40% quarter-over-quarter. Partner-driven adoption grew 9x year-over-year in seats sold, signaling that the channel is becoming a meaningful distribution path.
Google introduced an Agentic Data Cloud combining a cross-cloud Lakehouse, Knowledge Catalog, and Deep Research Agents, with Gemini-powered workflows in BigQuery growing over 30x year-over-year.
American Express and Vodafone are cited as early customers using this for production workloads at scale.
The Wiz acquisition closed in March and is already integrated with Google Threat Intelligence and Security Operations, with new Gemini-powered agents covering threat detection, red teaming, and automated remediation.
Customers like Deloitte, Priceline, and Shell are listed as early adopters of the combined security offering.

09:36 Justin – “Overall, Google did very well, analysts were very happy.”

Cable Corner

10:26 Crucial Taiwan undersea cable severed by old shipwreck — backup microwave communications activated to keep population connected

Taiwan’s Dongyin island lost its undersea cable connection after a seafloor shipwreck shifted during bad weather, prompting activation of backup microwave communications for the island’s 1,500 residents, though with weather-dependent latency.
This incident reinforces a known infrastructure reality: physical undersea cables remain the primary backbone for reliable, high-bandwidth connectivity, while wireless alternatives like microwave links and LEO satellites serve only as degraded fallbacks.
Taiwan currently monitors 24 undersea cable links around the main island and has blacklisted 96 vessels suspected of connections to China, reflecting how nations are treating cable infrastructure as a critical security perimeter rather than purely a commercial asset.
For cloud and enterprise architects, this is a practical reminder that multi-region redundancy strategies need to account for physical cable route diversity, not just logical network paths, since multiple cables can share the same physical seafloor corridor.
Taiwan has increased criminal penalties for cable sabotage to up to 7 years imprisonment and $325,000 in fines, signaling that governments are beginning to treat undersea cable protection with the same legal seriousness as other critical infrastructure.

11:26 China tests deep-sea electro-hydrostatic actuator that can cut undersea cables at a depth of 3,500 meters — state hails successful trial and hints at deployment readiness

China successfully tested a deep-sea electro-hydrostatic actuator capable of cutting undersea cables at depths of 3,500 meters, roughly 11,500 feet, which represents a notable extension of previous capabilities that topped out around 2,000 feet.
The device combines hydraulics, an electric motor, and a control unit into a single compact system, eliminating the need for external oil piping and making it more practical for deep-sea deployment from research vessels.
The practical efficiency gains are measurable: a 2022 pipeline cut took five hours for a single 18-inch pipe, while by 2023, remotely operated vessels could cut 38-inch pipes in 20 minutes, illustrating rapid operational improvement.
Undersea fiber-optic cables carry the majority of global internet traffic and financial data, meaning any credible threat to this infrastructure has direct implications for cloud connectivity, data sovereignty, and business continuity planning.
Cloud providers and enterprises with latency-sensitive workloads dependent on specific undersea cable routes should be aware that geopolitical pressure on this infrastructure is increasing, with incidents already documented in the Red Sea and Baltic Sea regions.

11:32 Justin – “Shipwreck or China? You answer the question…”

13:14 You can use Linux 7.0 on these 7 distros today – here’s what to expect

Linux 7.0 is a version number reset for simplicity, not a milestone release, similar to when Torvalds jumped from 3.x to 4.0 in 2015 to avoid unwieldy version strings.
Rust support is now officially stable in the kernel after five years of incremental work, with native build tooling support for x86_64, ARM, and RISC-V architectures, which has direct implications for system security and memory safety.
The revamped scheduler introduces lazy preemption by default and adaptive scheduling domains, which should improve throughput for containerized cloud workloads and reduce latency on hybrid CPU architectures like Intel Alder Lake.
AI tooling is now a recognized part of the Linux development workflow, with Torvalds and stable kernel maintainer Greg Kroah-Hartman both noting a notable improvement in the quality of AI-generated bug reports reaching the kernel team.
Cloud and enterprise users can test 7.0 today through rolling-release distros like Arch Linux and openSUSE Tumbleweed, with Ubuntu 26.04 LTS and Fedora 44 expected to ship it within weeks.

14:13 Justin – “Rust. Rust is the big thing, because now you get a C++ compiled binaries and the core parts of the kernel. This should be a huge improvement to availability, reliability, and, potentially, security as well, as long as that was handled well.”

15:35 Greenhouse gases from data center boom could outpace entire nations

Just 11 data center campuses in the US are linked to natural gas projects permitted to emit up to 129 million metric tons of greenhouse gases per year, which exceeds the annual emissions of countries like Morocco or Norway, even at half capacity.
Behind-the-meter power, where data centers generate their own electricity rather than drawing from the grid, has grown from 4 gigawatts in early 2024 to nearly 100 gigawatts in the US development pipeline by early 2026, driven largely by grid connection delays and utility cost concerns.
Unlike traditional grid-connected power plants that cycle down based on demand, data center power plants run at near-constant load, meaning actual emissions are likely to be much closer to permitted maximums than the industry standard two-thirds reduction estimate companies often cite.
Major AI companies, including Meta, Microsoft, OpenAI, and xAI have made public carbon reduction commitments, but the scale of these gas projects could offset years of stated emissions progress, with Meta’s Ohio projects alone potentially erasing over 10 percent of its claimed four-year emissions reductions.
Air permits do not guarantee construction, turbine shortages are a real constraint, and several high-profile projects like Fermi face leadership and financial instability, so the full emissions scenario may not materialize, but the trend toward fossil-fuel-backed AI infrastructure raises long-term questions for cloud providers with sustainability commitments.

16:52 Ryan – “My sci-fi fueled narrative in my head is oh, this is how the world ends.”

20:18 An update on GitHub availability

GitHub‘s CTO published a transparency post acknowledging two recent incidents and outlining a scaling plan that has grown from a 10X capacity target in October 2025 to a 30X target by February 2026, driven by rapid growth in agentic development workflows since late 2025.
The April 23 merge queue incident caused incorrect merge commits for squash merges in groups of more than one pull request, affecting 658 repositories and 2,092 pull requests, with no data loss, but incorrect default branch states that could not all be repaired automatically.
The April 27 incident involved an Elasticsearch cluster becoming overloaded, likely from a botnet attack, which disrupted search-backed UI experiences across pull requests, issues, and projects.
GitHub acknowledged that this system had not yet been fully isolated as part of their reliability prioritization work.
GitHub is addressing scaling challenges through several technical approaches, including moving webhooks out of MySQL, redesigning session caching, migrating performance-sensitive code from a Ruby monolith to Go, isolating critical services like Git and Actions, and pursuing a multi-cloud strategy beyond their current Azure migration.
GitHub updated its status page to include availability metrics and committed to reporting both large and small incidents, responding to developer feedback about needing better transparency during disruptions.
https://www.theregister.com/2026/04/29/mitchell_hashimoto_ghostty_quitting_github/
- Hashicorp co-founder Mitchell Hashimoto says GitHub ‘no longer a place for serious work’.

23:29 Matt – “I’ve definitely been bit by some of these; especially the search one was multiple days, and you couldn’t find anything, you couldn’t just load up pull requests because that’s a search technically… so every feature was hung for a couple of days.”

28:49 The most severe Linux threat to surface in years catches the world flat-footed

CVE-2026-31431, dubbed CopyFail, is a local privilege escalation vulnerability affecting virtually all Linux distributions, allowing unprivileged users to gain root access with a single Python script that requires no modification across distros.
The exploit is particularly relevant to cloud environments because it can be used to break out of Kubernetes containers, compromise multi-tenant systems, and inject malicious code through CI/CD pipelines.
The kernel patches exist across multiple versions, including 6.12.85, 6.6.137, and 5.15.204, but most Linux distributions had not incorporated those fixes at the time the exploit code was publicly released, leaving a substantial window of exposure.
Confirmed vulnerable distributions include Ubuntu 22.04, Amazon Linux 2023, SUSE 15.6, and Debian 12, meaning cloud workloads running on major providers are directly at risk until patches are applied.
The five-week gap between private disclosure and public exploit release, combined with slow distribution-level patching, highlights an ongoing coordination challenge in the Linux security ecosystem that cloud operators need to account for in their patch management processes.

29:58 Justin – “You do need to patch this as quickly as possible. It is bad.”

AI Is Going Great – Or How ML Makes Money

31:39 GitHub will start charging Copilot users based on their actual AI usage

GitHub Copilot is shifting to usage-based billing starting June 1, replacing the current flat “premium requests” model with AI Credits that map 1:1 to monthly subscription costs, with overages billed by token consumption across input, output, and cached tokens.
The pricing variation is substantial depending on model choice, with OpenAI GPT output tokens ranging from $4.50 to $30 per million tokens, meaning a developer using GPT-5.5 for agentic tasks could see meaningfully higher costs than one using lighter models for simple completions.
Basic features like code completion and Next Edit suggestions remain outside the credit system entirely, but Copilot code reviews will now consume GitHub Actions minutes, adding another cost dimension for teams running automated review workflows.
This shift reflects a broader cloud infrastructure reality: multi-hour autonomous coding sessions consume substantially more compute than a single chat query, and flat-rate pricing becomes difficult to sustain as agentic AI workloads grow in frequency and complexity.
For development teams, the practical implication is that AI spending will now require the same cost governance as other cloud services, with model selection and session length becoming factors in budget planning rather than just feature preferences.

32:57 Justin – “…this is probably the biggest gap in most of the platforms we’re seeing – is that cost visibility is very problematic, and what people use on that is a big issue.”

39:29 The AI Agent Identity Problem: Why Governance Is the Missing Layer in Enterprise AI

The core issue Snowflake raises is that AI agents lack persistent, verifiable identities, meaning when an agent queries data, initiates a transaction, or produces a derived insight, there is often no audit trail linking the action to defined authorization or scope.
Snowflake argues governance must be embedded at agent creation, not added later, with explicit permissions, expiration windows, and scoped access that does not simply inherit from the invoking user’s credentials.
A notable technical concern is the derived insight problem, where an agent authorized to access HR data and financial data separately may not be authorized to combine them, and current access controls on source data alone do not address this boundary.
Snowflake’s internal Go-To-Market AI Assistant serves as a practical reference point, using role-based access, certified queries, and defined scope at creation to support over 6,000 employees answering 35,000 questions per week with full auditability.
For enterprises in regulated industries like financial services or healthcare, the absence of agent identity infrastructure creates concrete compliance exposure, specifically the inability to reconstruct what data an agent accessed, under whose authority, and whether its output stayed within approved scope.

42:06 How Anthropic’s silence fueled a Claude Code trust crisis

Anthropic confirmed three product-level issues degraded Claude Code performance over seven weeks starting March 4, including a reasoning effort downgrade from high to medium, a bug discarding reasoning history mid-session, and a system prompt capping responses at 25 words between tool calls.
The issues were fixed as of April 20, and Anthropic published a postmortem, but the seven-week gap between the first issue shipping and any public explanation led to significant user backlash, subscription cancellations, and speculation across GitHub, Reddit, and X.
A notable analysis by an AMD senior director of AI examined 6,852 Claude Code session files and 234,760 tool calls, concluding Claude shifted from a context-gathering approach to a faster edit-first style that increased error rates on complex engineering tasks.
The incident highlights a practical risk for teams building workflows on top of AI coding tools: undocumented behavioral changes cascade into downstream systems, delivery commitments, and developer trust before any official acknowledgment arrives.
Industry observers are calling for real-time communication standards, including status page updates and in-product notices during incidents, arguing that postmortems alone are insufficient when developers cannot distinguish between model issues, prompt problems, or toolchain failures.

43:53 Ryan – “I call BS; I’ve had issues much later than April 20th. It always seems to come up right around when they’re releasing a new model.”

46:56 OpenAI ends its exclusive partnership with Microsoft

OpenAI and Microsoft have amended their partnership agreement to make Microsoft’s license to OpenAI’s IP and models non-exclusive, allowing OpenAI to offer its models through other major cloud providers beyond Azure.
Azure retains the designation of primary cloud partner through 2032, but that status is conditional on Microsoft’s ability to continue honoring the arrangement, which introduces some ambiguity worth watching.
The revenue share structure changes notably: OpenAI will continue paying Microsoft 20 percent of revenue, but that obligation is now capped at an unspecified amount and only guaranteed through 2030 rather than running indefinitely.
The removal of the AGI clause is a meaningful structural change, as the revenue share is now independent of OpenAI’s technology progress, eliminating a previously contentious trigger that could have ended exclusivity based on a hard-to-define benchmark.
For developers and businesses, this opens the door to accessing OpenAI models through providers like AWS or Google Cloud, which could affect pricing, latency options, and procurement decisions depending on where workloads already live.

47:52 Matt – “I feel like whoever wrote this contract – either it was done so long ago that the concepts that they were running into didn’t exist, or did a really bad job also negotiating it. Contracts should have details, and metrics, and very defined things, but maybe it wasn’t plausible back then.”

24:35 Meta abandons open-source Llama for proprietary Muse Spark

Meta has introduced Muse Spark, a proprietary cloud-only LLM built from scratch with new infrastructure and architecture, developed by the newly formed Meta Superintelligence Labs.
Unlike Llama, Muse Spark offers no downloadable weights, no self-hosting capability, and is currently limited to private API preview access.
Existing Llama models will remain available on major cloud providers but are expected to receive only incremental maintenance updates, with no continued frontier-level investment.
This affects a substantial user base, as Meta reported 1.2 billion Llama downloads before the pivot.
There is no direct migration path from Llama to Muse Spark due to fundamentally different deployment models, and switching to alternative providers requires rewriting vendor-specific APIs, adapting training data, and rebuilding custom tooling, which carries substantial cost and effort.
Developers looking to stay in the open-weights ecosystem have three practical options: continue using existing Llama models, knowing they will fall behind frontier competitors, switch to alternatives like Mistral, DeepSeek, or Alibaba Qwen, or migrate to proprietary APIs from OpenAI, Google, or Anthropic.
Several Llama forks provide a lower-friction path forward, including llama.cpp for local inference, the performance-focused ik_llama.cpp, the Rust-based llama-rs, and OpenLLaMA, an Apache-licensed reproduction of the original Llama models available in 3B, 7B, and 13B parameter versions trained on 1 trillion tokens.

50:53 Ryan – “Llama seemed to fill a large gap, right? Qwen, I see a lot of, but I don’t see Mistral very much. And so like, it’s kind of nuts for local stuff. And if you don’t want to pay huge amounts of money and you want something that’s a little bit more open source, it sucks if there’s not a real option that really can replicate what you’re experiencing with a commercial-grade one.”

53:11 GPT-5.5 Instant: smarter, clearer, and more personalized

OpenAI released GPT-5.5 Instant as the new default model for all ChatGPT users, replacing GPT-5.3 Instant, and it is also available in the API as chat-latest.
Paid users retain access to GPT-5.3 Instant for three months before it is retired.
The hallucination reduction numbers are worth noting: GPT-5.5 Instant produced 52.5% fewer hallucinated claims than GPT-5.3 Instant on high-stakes prompts in medicine, law, and finance, and reduced inaccurate claims by 37.3% on conversations flagged for factual errors.
The model includes improvements in visual reasoning, math, STEM questions, and smarter decisions about when to invoke web search, making it more capable across the kinds of tasks that everyday users actually run into.
Personalization gets a notable upgrade with faster retrieval from past chats, uploaded files, and connected Gmail, plus a new memory sources feature that shows users exactly what context shaped a response and lets them delete or correct it.
For developers and businesses, the API availability as chat-latest means these factuality and personalization improvements roll in automatically, though teams relying on consistent behavior may want to pin to a specific model version, given the default is now changing.

AWS

56:40 Start using Amazon Quick for free in minutes with Free and Plus pricing plans

Amazon Quick is an AI assistant that connects to your apps, tools, and data to answer questions and take actions on your behalf, including scheduling meetings, sending emails, and following up on tasks, with role-specific workflows for sales, marketing, finance, and operations.
A new Free plan lets users sign up in minutes using a personal email or existing Google, Apple, GitHub, or Amazon credentials with no AWS account required, lowering the barrier to entry compared to most AWS services.
The personal knowledge graph feature is notable because it learns individual user priorities and preferences over time, grounding responses in real business data rather than generic AI outputs.
Pricing tiers include Free, Plus, Professional, and Enterprise plans, with higher tiers adding agentic and business intelligence capabilities, enterprise governance, and unlimited user support.
The no-AWS-account signup model positions Quick as a standalone SaaS product rather than a traditional AWS service, which is a meaningful shift in how AWS is packaging and distributing AI tooling to business users.

58:31 Amazon Quick now available as a desktop application for macOS and Windows

Amazon Quick, AWS’s AI assistant, is now available as a native desktop app for macOS and Windows in preview, extending its capabilities beyond the browser to include direct local file access, OS-level notifications, and desktop application automation without requiring file uploads.
The app builds a personal knowledge graph that accumulates context across files, calendar, communications, and applications over time, with memory and context syncing across both web and desktop surfaces so users maintain continuity between environments.
For developers, the desktop app adds support for local Model Context Protocol (MCP) connections to coding agents, making it relevant to builders who want AI assistance integrated into local development workflows.
Availability is currently limited to existing Quick subscribers in US East (N. Virginia), with a free tier available to get started at aws.amazon.com/quick/download, though broader regional expansion details have not been announced.
The practical use case here is an AI assistant that can act on local context proactively, such as flagging calendar conflicts or surfacing action items from communications, which positions it as a productivity tool for teams rather than just a query-response interface.

59:16 Ryan – “I was just doing a query on Nova about what the difference is between Amazon Nova and Quick, just because I wanted to get it. And it failed, like you’d expect.”

1:00:06 Amazon Quick now supports document and visual creation in chat

Amazon Quick now lets users create Word documents, PDFs, PowerPoint presentations, and Excel spreadsheets directly within a chat conversation, removing the need to switch between tools for drafting and formatting work.
The service also generates images, infographics, and charts that can be embedded in documents or exported as standalone files, though visual creation is currently in preview and limited to US East (N. Virginia) and US West (Oregon) regions.
Document creation is available across all AWS regions where Quick is supported, and the service offers a free tier with no AWS account or credit card required, making it accessible for teams to evaluate without upfront commitment.
Practical use cases include converting meeting notes into executive briefings, building sales trend decks, and producing data-driven infographics, positioning Quick as a productivity tool for business analysts, finance, and marketing teams.
This fits into AWS’s broader push to embed AI assistance into everyday business workflows, competing with similar document generation features found in Microsoft Copilot and Google Workspace AI tools.

1:00:16 Amazon Quick expands integrations to include Google Workspace, Zoom, Airtable, and more

Amazon Quick now supports 13 new built-in action connectors, including Gmail, Google Sheets, Google Calendar, Google Drive, Zoom, QuickBooks, Airtable, and Dropbox, with managed authentication handling account authorization without manual credential setup.
The service positions itself as a unified AI assistant that can take actions across tools on a user’s behalf, going beyond answering questions to scheduling meetings, updating spreadsheets, sending emails, and managing files directly.
The managed authentication model is worth noting for enterprise security teams, as Quick handles the OAuth authorization flow rather than requiring users to input or store credentials manually within the platform.
Amazon Quick is available with a free tier signup through the AWS portal, making it accessible for teams to evaluate before committing to broader deployment across an organization.
For AWS customers already using services like QuickSight or other business intelligence tools, Quick represents a more action-oriented layer on top of data, though teams should evaluate how it fits alongside existing workflow automation tools like AWS AppFlow or third-party options.

1:00:57 AWS Announces Amazon Connect Decisions

Amazon Connect Decisions is now generally available as an AI-driven supply chain planning tool, combining demand forecasting, constraint-aware supply planning, and automated exception triage into a single solution targeting retail, CPG, automotive, and industrial manufacturing sectors.
The service positions itself as an overlay on existing systems rather than a replacement, which lowers the adoption barrier for enterprises that have already invested heavily in ERP or legacy supply chain infrastructure.
AI teammates run continuously to harmonize demand signals, perform root cause analysis, and surface prioritized recommendations, reducing the manual effort typically required to manage thousands of supply chain exceptions.
The practical business outcomes AWS highlights include preventing stockouts and reducing working capital waste, which are measurable operational goals that supply chain teams can use to justify adoption.
Availability is currently limited to US East (N. Virginia) and Europe (Ireland) regions, with a free trial offered at aws.amazon.com/products/connect/decisions. Pricing details are not publicly listed, so prospective customers will need to engage AWS directly for cost information.

1:01:27 Amazon Connect Talent for AI-powered hiring

Amazon Connect Talent extends the existing Connect contact center platform into the hiring space, using AI agents to conduct structured voice interviews and score candidates consistently, which reduces recruiter workload during high-volume hiring periods.
The system draws on Amazon’s internal hiring practices to power adaptive questioning and science-backed assessments, aiming to bring more consistency to candidate evaluation compared to traditional recruiter-led screening calls.
Preview capabilities include ATS integrations, a mobile-first candidate portal, and the ability to evaluate hundreds of candidates simultaneously, making it relevant for organizations that experience seasonal or surge-based hiring needs like retail, logistics, or call centers.
Currently available only in US East (N. Virginia) and US West (Oregon), with no public pricing announced yet for the Preview period, so organizations interested in cost modeling will need to request access through the Amazon Connect Talent page to get details.
One practical consideration worth discussing is the regulatory and bias-risk landscape around AI-led hiring tools, since automated candidate scoring systems are subject to increasing scrutiny from employment regulators, particularly in jurisdictions with AI hiring laws like New York City.

1:02:18 Justin – “…if you make me do an AI hiring tool, I probably will not continue on the interview process because it sounds terrible.”

1:04:14 OpenAI models, Codex, and Managed Agents come to AWS

OpenAI and AWS are expanding their partnership to bring OpenAI models, including GPT-4.5, to Amazon Bedrock in a limited preview, giving enterprises a path to use OpenAI capabilities within existing AWS security controls, identity systems, and procurement workflows.
Codex, OpenAI’s coding agent used by over 4 million people weekly, can now be configured to run on Amazon Bedrock as the model provider, meaning usage counts toward AWS cloud spending commitments and customer data stays within Bedrock infrastructure.
Initial integrations include Codex CLI, the desktop app, and a VS Code extension.
Amazon Bedrock Managed Agents powered by OpenAI is a new offering that handles orchestration, tool use, and governance for multi-step agentic workflows, reducing the infrastructure work required to move agents from prototype to production.
All three capabilities launch today in limited preview, so availability is not yet general, and pricing details have not been publicly disclosed beyond the note that Codex usage can apply toward existing AWS cloud commitments.
The practical significance for enterprises is consolidation: teams can access OpenAI models through the same Bedrock API, billing, and compliance controls they already use for other foundation models, rather than managing a separate OpenAI account and data processing agreement.

1:05:18 Ryan – “I’m starting to like this model more and more, just because it’s something that a lot of enterprises already have, which is a cloud ecosystem, especially with Amazon and Bedrock, them releasing the sort of visualization of the IAM identities behind some of the usage on Bedrock is super powerful. I kind of like it. So this one sounds like it’s a little bit more full-featured than what I’ve seen on similar things from Vertex AI, with managed agents and to be able to orchestrate multiple like Codex things. So that’s kind of neat.”

1:05:56 Amazon CloudWatch adds visual agent configuration to the EC2 console

AWS added a visual configuration editor for the CloudWatch agent directly in the EC2 console, letting users set up metrics, log sources, and deployment targets without manually editing JSON configuration files.
The feature supports tag-based policies for automated fleet-wide management, meaning new instances launched via auto-scaling automatically receive the correct monitoring configuration without manual intervention.
From the instance detail page, operators can view agent status, update configurations, and troubleshoot agent health in one place, consolidating observability management that previously required separate tooling or CLI work.
The visual editor is available in all AWS Commercial Regions at no additional cost for the management experience itself, though standard CloudWatch pricing still applies for the metrics, logs, and traces the agent collects.
This is a practical quality-of-life improvement for teams managing large EC2 fleets who want consistent observability coverage without maintaining custom automation or requiring deep familiarity with the CloudWatch agent JSON schema.

1:06:33 Justin – “Having troubleshot CLI-level CloudWatch stuff many times, thank God.”

1:08:33 Amazon’s trying to turn its massive shipping operation into another AWS

Amazon Supply Chain Services (ASCS) opens Amazon’s fulfillment network to outside businesses across automotive, healthcare, electronics, apparel, and food industries, directly competing with DHL, UPS, and FedEx. Companies can store inventory in Amazon fulfillment centers globally and access its fleet of trucks, aircraft, and delivery vehicles.
The service expands on the Supply Chain by Amazon offering launched in 2023, which initially focused on shipping products directly from factories.
ASCS broadens this to include freight, distribution, fulfillment, and parcel shipping for businesses of all sizes.
Early adopters include Procter & Gamble, 3M, Lands’ End, and American Eagle Outfitters, suggesting the service is targeting established enterprises rather than just small sellers. Pricing details have not been publicly disclosed at launch.
The parallel to AWS is worth noting for cloud practitioners: Amazon built internal infrastructure at scale, then monetized it as a third-party service, the same model it used when opening its web infrastructure to outside customers in 2006. ASCS follows that same pattern with physical logistics.
For companies already using AWS cloud services, ASCS could represent an opportunity to consolidate both digital and physical supply chain operations under one vendor relationship, though integration details between ASCS and existing AWS supply chain software tools have not been specified.

1:12:09 Justin – “Potentially this is really cool.”

1:12:15 Introducing the agent quality loop: AgentCore Optimization now in preview

AWS launched AgentCore Optimization in preview, adding automated recommendations, batch evaluation, and A/B testing to close the observe-evaluate-improve loop for AI agents running on Amazon Bedrock AgentCore.
Previously, developers had to manually read traces and guess at prompt fixes without systematic data-backed evidence.
The recommendations feature analyzes production traces from CloudWatch Log groups and proposes changes to system prompts or tool descriptions based on a specified evaluator, without touching underlying tool implementations.
This targets a common pain point where agent quality drifts as models evolve and user behavior shifts over time.
Configuration bundles are immutable, versioned snapshots of an agent’s model ID, system prompt, and tool descriptions, allowing prompt and model swaps as configuration changes rather than code deployments. This integrates with CI/CD pipelines through batch evaluation before any change reaches production.
A/B testing runs through AgentCore Gateway, splitting live production traffic at configurable percentages between control and treatment variants, and reports results with confidence intervals and p-values. Teams can promote the winning variant or roll back by pausing the test, which reverts the agent to its prior configuration.
The preview is currently developer-triggered and available in AWS regions where AgentCore Evaluations is supported.
AWS has indicated future plans to automate the loop further, including monitoring alarms that trigger recommendations when evaluator scores drop below a threshold, with results landing in a human review queue before deployment.

1:13:05 AWS Lambda adds support for Ruby 4.0

AWS Lambda now supports Ruby 4.0 as both a managed runtime and container-based image, giving Ruby developers access to the latest language features in a serverless context without managing runtime infrastructure.
Ruby 4.0 is an LTS release supported for security and bug fixes until March 2029, meaning Lambda customers can build on this runtime with a reasonable expectation of stability and patch coverage.
The new runtime adds support for Lambda advanced logging controls, enabling JSON-structured logs, configurable log levels, and custom CloudWatch log group targeting, which simplifies observability for Ruby-based functions.
Deployment works with the full suite of AWS tooling, including SAM, CDK, CloudFormation, and the CLI, so teams already using these tools can adopt Ruby 4.0 without changing their deployment workflows.
The runtime is available globally, including China and GovCloud regions, and Lambda pricing remains the same pay-per-invocation model based on requests and duration, with a free tier of 1 million requests and 400,000 GB-seconds per month.

1:13:35 Justin – “If I thought I wanted to put myself for the dead language, I would go be really excited about this. But I’m happy at least it’s available if I ever need it.”

1:14:00 AWS IAM now provides higher maximum quotas for roles, role trust policies, instance profiles, managed policies, and identity providers

AWS IAM has doubled several key quotas, including roles, customer-managed policies, and instance profiles from 5,000 to 10,000 per account, and OpenID Connect providers saw a substantial jump from 100 to 700 per account.
The role trust policy length increase from 4,096 to 8,192 characters is particularly useful for organizations with complex cross-account or federated access patterns that require detailed condition blocks.
These increases are not automatic maximums but adjustable limits, meaning customers still need to request increases through the Service Quotas console in US East (N. Virginia) for commercial regions, or via AWS Support for GovCloud and China regions.
The OIDC provider limit increase from 100 to 700 is notable for organizations managing multiple Kubernetes clusters or CI/CD pipelines, where each cluster or provider typically requires its own OIDC endpoint.
There is no additional cost associated with these quota increases, as IAM itself remains free, though the expanded limits allow larger organizations to avoid architectural workarounds like multi-account sprawl that were previously needed to stay within IAM constraints.

1:14:48 Ryan – “This is the agent identity problem, right? I think that they’re getting ahead of it, especially the OIDC provider limit. I think you’re going to have a whole bunch of agent apps that are handling that OIDC flow or authenticating into Amazon using OIDC. So this is going to be something that you’ll see more of.“

1:15:35 Modernize your workflows: Amazon WorkSpaces now gives AI agents their own desktop

Amazon WorkSpaces now supports AI agents operating virtual desktops in public preview, allowing agents to interact with legacy desktop applications through mouse clicks, keyboard input, and screenshots without requiring any API integration or application modernization.
The feature addresses a real enterprise problem: according to a 2024 Gartner report, 75% of organizations run legacy applications without modern APIs, meaning AI agents previously had no practical way to automate workflows in those environments.
Authentication runs through IAM, and full audit trails are available via CloudTrail and CloudWatch, which makes this particularly relevant for regulated industries that need governance and compliance controls around agent activity.
The implementation uses the Model Context Protocol (MCP) standard, so the feature works with popular agent frameworks like LangChain, CrewAI, and Strands Agents, and agents connect to a managed MCP endpoint exposed by the WorkSpaces stack.
The feature is available at no additional cost during public preview across 11 regions, including US East, US West, Canada, several European regions, and multiple Asia Pacific locations, with sample code available on GitHub at github.com/aws-samples/sample-code-for-workspaces-agent-access.

1:17:39 Introducing AI traffic analysis dashboards for AWS WAF

AWS WAF now includes an AI Traffic Analysis dashboard that tracks over 650 unique bots and agents, giving organizations visibility into which AI companies are accessing their content, what those bots are doing, and which endpoints they target most frequently.
This matters because AI agents reportedly represent 30-60% of total web traffic for many organizations, creating real infrastructure cost implications.
The dashboard integrates with existing AWS WAF Bot Control and pulls near real-time data from CloudWatch metrics, covering bot identity, intent classification, access patterns, and 14 days of historical trends. No separate setup is required since it populates automatically once AI bot traffic is detected.
A new GetTopPathStatisticsByTraffic API action lets teams query AI bot traffic programmatically, enabling custom dashboards, automated alerting, and integration with business intelligence pipelines for decisions around content access and monetization.
AWS published a reference architecture on GitHub that combines WAF Bot Control with the x402 payment protocol and Lambda@Edge to charge AI bots per-path at the edge without modifying origin infrastructure, which is a notable use case for organizations looking to monetize content consumed by AI crawlers.
Pricing is straightforward: the dashboard is included at no additional cost for existing AWS WAF customers, and is also bundled into CloudFront flat-rate pricing plans.

GCP

1:19:39 Google and Pentagon reportedly agree on deal for ‘any lawful’ use of AI | The Verge

Google has signed a classified deal with the US Department of Defense allowing use of its AI models for “any lawful government purpose,” placing it alongside OpenAI and xAI, which have made similar arrangements with the Pentagon.
The agreement includes non-binding language stating Google AI should not be used for domestic mass surveillance or autonomous weapons without human oversight, but the contract explicitly states Google has no right to veto or control lawful government operational decisions.
The deal also requires Google to assist in adjusting its AI safety settings and filters at the government’s request, which raises questions about how its standard model guardrails will be maintained across commercial and government deployments.
For GCP enterprise customers, this is framed as an amendment to an existing government agreement rather than a new standalone contract, suggesting Google is expanding its existing cloud and AI footprint within federal agencies.
Anthropic serves as a notable contrast here, having been blacklisted by the Pentagon for refusing to remove weapon and surveillance-related guardrails, which illustrates the tradeoffs AI providers face when pursuing government contracts at scale.

1:19:49 Justin – “So apparently ‘do no evil’ is no longer applying to military use case scenarios.”

1:21:08 You can now generate files in Gemini

Gemini can now generate downloadable files directly from chat prompts, supporting a broad range of formats, including PDF, DOCX, XLSX, CSV, Google Docs, Sheets, Slides, LaTeX, RTF, TXT, and Markdown, removing the need to copy and reformat content manually.
The feature is available to all Gemini app users globally at no additional cost beyond existing Gemini access, with outputs downloadable to local devices or exportable directly to Google Drive.
For GCP and Workspace customers, this tightens the loop between AI-assisted drafting and actual deliverables, making Gemini more practical for business workflows like budget proposals, reports, and collaborative documents.
The multi-format support is notable because it bridges Google Workspace and Microsoft Office ecosystems in a single interface, which reduces friction for organizations running mixed productivity environments.
Practical use cases include consolidating meeting notes into a single-page PDF, generating structured spreadsheets from raw data prompts, or producing presentation drafts without switching between applications.

1:21:33 Ryan – “Really handy – you no longer have to copy and paste everything.”

1:21:40 Introducing Agent Gateway ISV ecosystem for security and governance

Agent Gateway, part of the Gemini Enterprise Agent Platform, provides a programmable data plane that sits in the request path for all agent traffic, covering user-to-agent, agent-to-agent, and agent-to-tool interactions, including MCP calls.
Google announced a partner ecosystem of 14 security vendors integrating with Agent Gateway, covering identity governance (Okta, Ping Identity, Saviynt, Silverfort), DLP (Symantec, Netskope), and runtime AI protection (Palo Alto Prisma AIRS, Cisco AI Defense, CrowdStrike, Zscaler, Check Point, F5, Exabeam, Thales Imperva).
A key design principle across most integrations is that security controls are injected into the existing request path without requiring application code changes, which lowers the barrier for enterprises to add governance to existing agentic workloads.
The identity-focused integrations address a specific challenge with non-human identities, where tools like Silverfort automatically discover agents, map them to human owners, and flag overprivileged or stale credentials at runtime rather than relying on static credentials.
Pricing details are not disclosed in the announcement, and availability varies by partner, with some integrations like Imperva for Google Cloud noted as currently in preview.
Organizations interested in specific integrations should contact the Agent Gateway partnerships team directly.

1:22:40 Ryan – “This is one of the things I really focused on at Google Next, because I think we’re going to see this pattern grow, because I can’t imagine anything else that’s going to work.

1:23:29 Cloud Engineer’s AI Toolkit: Sign up Now for a Developer Workshop Near You!

Google Cloud is running a series of hands-on developer workshops across North America focused on building agentic AI applications, targeting platform engineers, security engineers, and data practitioners who want practical production experience rather than theoretical overviews.
The workshops are split into two tracks: one covering GKE-based AI workloads, including secure sandboxing for AI-generated code execution and cluster management via natural language using Gemini and MCP servers, and another focused on data engineering with BigQuery Graph, Knowledge Catalog, and the Agent Development Kit for building data-aware agents.
The data track is notable for its relatively low barrier to entry, requiring only basic SQL and some cloud familiarity, which suggests Google is trying to bring analytics professionals into agentic AI development without requiring deep engineering backgrounds.
For security-focused engineers, the GKE track covers defense-in-depth strategies and securing inference endpoints, which addresses a real gap, as many organizations struggle to apply existing security practices to AI workloads running on Kubernetes.
Attendees need to bring their own laptops to participate. Registration is open now at the Google Cloud blog, sessions are free to attend, and different dates host different tracks, so checking the schedule before registering matters.

Azure

1:24:16 The Microsoft Azure Infra Summit 2026 Schedule Is Live

The Microsoft Azure Infra Summit 2026 is a free virtual event running May 19-21, starting at 8:00 AM Pacific each day, targeting IT pros, platform engineers, SREs, and infrastructure teams with L300-L400 level technical content.
The three-day agenda is organized around Build, Operate, and Optimize pillars, covering topics like AKS operations, IaC, storage, networking, backup and DR, and resiliency, with sessions delivered by the engineers who actually build the Azure features being discussed.
Registration is free at aka.ms/MAIS-reg, and the full schedule with per-session ICS calendar files is available at azureinfrasummit.com, making it straightforward to build a custom track across the three days.
The event is positioned as a no-marketing-slides format focused on production architecture and real-world deployment considerations, which makes it a practical option for teams looking for depth rather than introductory overviews.

1:24:48 Ryan – “This would be refreshing to actually go to; I’m kind of thinking about it…”

1:25:23 Public Preview: Memory in Foundry Agent Service

Foundry Agent Service now includes a managed long-term memory capability in public preview, allowing AI agents to retain context across sessions without developers needing to provision or manage external databases.
The memory feature integrates natively with Microsoft Agent Framework and LangGraph, meaning teams already building on those frameworks can adopt persistent memory without significant architectural changes.
This addresses a common pain point in agentic AI development where maintaining state and context across interactions typically requires custom storage solutions, adding operational overhead.
Target users are developers building multi-turn or long-running AI agents who need reliable memory persistence without taking on the security and scaling responsibilities of a separate data store.
Pricing details are not yet published for this preview feature, so teams evaluating it for production workloads should factor in potential costs once general availability approaches.

1:26:28 Generally Available: Microsoft Agent Framework 1.0

Microsoft Agent Framework has reached version 1.0 for both .NET and Python, bringing stable APIs and a long-term support commitment, which gives enterprise developers a reliable foundation for building production AI agent applications.
The framework supports multi-agent orchestration and multi-provider model support, meaning developers can coordinate multiple AI agents and swap between different AI models without being locked into a single provider.
Cross-runtime interoperability via A2A (Agent-to-Agent) and MCP (Model Context Protocol) standards allows agents built on different frameworks or runtimes to communicate, which is relevant for organizations with mixed technology stacks.
This release falls under Microsoft Foundry and the broader AI plus machine learning product category, positioning it alongside other Azure AI services rather than as a standalone tool, so existing Azure customers can expect integration with familiar tooling.
No specific pricing details were included in the announcement, so listeners should check Azure pricing pages directly, though SDK and framework tools like this are typically free with costs tied to the underlying model and compute usage.

1:26:49 Justin – “It feels like things are changing so fast right now that standardizing and long-term support feels sort of weird, but I appreciate that they’re trying something.”

1:27:40 Enforcing trust and transparency: Open-sourcing the Azure Integrated HSM

Azure Integrated HSM is a Microsoft-built hardware security module now embedded in every new Azure server, designed to meet FIPS 140-3 Level 3 certification. It keeps encryption keys within hardened hardware at all times, meaning keys never appear in host or guest memory, even during active cryptographic operations.
Microsoft announced at the OCP EMEA Summit that the HSM firmware, driver, and software stack will be open-sourced via GitHub at github.com/Azure/azihsm-fw, with an OCP workgroup launched to guide ongoing development.
This allows customers, regulators, and partners to independently validate security controls rather than relying solely on vendor assertions.
The Integrated HSM complements existing services like Azure Key Vault and Azure Managed HSM by adding server-local cryptographic protection, addressing the shared blast radius and network latency limitations of centralized HSM models. It also supports TDISP for secure binding with confidential computing environments.
The feature will be available to all customers globally on Azure V7 virtual machines in the coming weeks, with pricing details not yet disclosed. Regulated industries and sovereign cloud customers are the most direct beneficiaries, given the independent auditability the open-source approach enables.
This fits into a broader Azure security stack that includes Azure Boost, measured boot, attestation, and datacenter-level secure control modules, forming a hardware-to-software chain of trust. The practical implication for customers is that cryptographic trust becomes verifiable through hardware and open-source firmware rather than contractual guarantees alone.

1:28:30 Ryan – “I guess it’s open just so that people can test it…validate against the framework.”

1:29:42 Microsoft’s OpenClaw team takes on the personal assistant challenge

Microsoft’s internal “Project Lobster” team is building ClawPilot, an OpenClaw-based desktop environment that functions as a 24/7 autonomous personal assistant within Microsoft 365, growing from 100 to over 3,000 daily internal users in a single week as of May 1.
The system is designed around a multi-agent architecture including a Chief of Staff agent, Executive Assistant agent, and specialist agents, each with their own Entra ID, Exchange mailbox, and Teams presence for governance and identity isolation within Microsoft Graph.
Security remains the central challenge, as Microsoft’s own Defender team explicitly states OpenClaw should not run on standard enterprise workstations due to risks including persistent credentials, untested input ingestion, and vulnerability to prompt-injection attacks turned into action-injection attacks.
The project differs from existing Copilot offerings like Copilot Tasks and Copilot Cowork in that it targets a full-life context for knowledge workers, handling tasks like DoorDash orders or rescheduling personal calls without requiring constant user prompting.
Microsoft VP Scott Hanselman has built a Windows node for OpenClaw that may surface at Microsoft Build in June, suggesting near-term developer-facing announcements around Windows as an enterprise-ready agentic runtime environment. No pricing or general availability timeline has been disclosed.

1:30:49 Ryan – “So this is either going to be amazing, and exactly what everyone wants, or a desktop app that does all the cool stuff, but it’s backed by Entra and all the security stuff your IT org is already running, or it’s going to be so nerfed that it won’t be able to do anything.”

1:32:05 Architecting Cost-Aware LLM Workloads with Model Router in Microsoft Foundry

Microsoft Foundry‘s Model Router consolidates multi-model dispatch into a single endpoint that routes across up to 18 underlying LLMs, shifting the routing logic from application code to the platform layer.
This matters for cloud architects who currently manage bespoke routing logic across model fleets.
The model subset feature is the most governance-relevant control, letting teams define which vendors and regions their prompts can touch, set an effective context window ceiling, and bound worst-case per-call costs. New models added in future router versions are not auto-included, which is a deliberate compliance guardrail worth noting.
Billing follows the underlying model that actually served each request, not a flat router rate, so cost attribution requires logging the response model field on every call and cross-referencing Azure Cost Analysis. Teams that skip this step will have limited visibility into where their LLM spend is actually going.
The tool-use support added in the 2025-11-18 release enables Model Router inside agentic workflows, but there is a notable constraint: when Foundry Agent Service tools are involved, routing is restricted to OpenAI models only, which limits multi-vendor strategies in agentic scenarios.
The router is intentionally adaptive rather than deterministic, meaning it is a poor fit for workloads that require reproducible model selection per request. Teams evaluating it should run the recommended baseline phase first to understand the actual routing distribution before committing to a cost or quality mode.

1:32:49 Matt – “There were a few different pieces you needed to tie together to make it work, and this is just giving you a single place.”

After Show

54:04 John Ternus will replace Tim Cook as Apple CEO

Tim Cook will step down as Apple CEO on September 1, 2026, transitioning to an executive chairman role focused on policy engagement, while hardware engineering chief John Ternus takes over as CEO.
Ternus comes from a hardware background, which may signal a continued or increased emphasis on Apple Silicon and device-level computing rather than a pivot toward cloud-first strategies.
Apple’s cloud services, including iCloud, Apple Intelligence infrastructure, and enterprise device management, represent a substantial business segment that the new CEO will inherit and oversee.
For enterprise and developer communities, leadership changes at Apple can influence the direction of platform APIs, developer tools, and the pace of cloud and AI feature integration across Apple platforms.
Listeners should note this is primarily a corporate governance story with indirect cloud implications, and any meaningful technical direction changes would only become apparent over time.

Closing

352: Google Next: Rebrandapalooza

Tue, 05 May 2026 18:20:13 +0000

Welcome to episode 352 of The Cloud Pod, where the weather is always cloudy! Justin, Matt, and Ryan are safely back from Vegas (Ryan and Justin, anyway), and they have all the news and announcements from Google Next. Plus, we have Ryan’s take on Phish, news from Cloudflare, and a shoe company making a pivot. There’s a lot to cover, so let’s get started!

Titles we almost went with this week

Redact Yourself Before You Wreck Yourself OpenAI **Anthropic
Fork Yeah Cloudflare Artifacts Is Here
Git Happens at Scale on Cloudflare
Bucket List Item Checked Lambda Mounts S3 File Systems
Terraform Your Agents Before They Terraform You
Cloud Run Gets GPUs and Finally Hits the Gym
Spanner Goes Rogue, Leaves the Cloud Behind
Knowledge Catalog Knows What Your Agents Did Last Query
One Control Plane to Rule a Million Chips
No More Incognito Windows for Your AWS Identity Crisis
Your Agent Can Now Write Files Without Burning Everything Down
Spend Caps Finally Tell Runaway AI Jobs to Chill
RIP Vertex, long live the agent
Agents all the way down
Google Next: This is the dawning of the Age of Agentic
Allbirds Proves AI Hype Needs No Infrastructure

A big thanks to this week’s sponsors:

There are a lot of cloud cost management tools out there, but only Archera provides insured commitments. It sounds fancy, but it’s really simple. Archera gives you the cost savings of a 1 or 3-year AWS Savings Plan with a commitment as short as 30 days. If you do not use all the cloud resources you have committed to, Archera will literally cover the difference. Other cost management tools may say they offer “insured commitments”, but remember to ask: Will you actually give me my rebate? Because Archera will.

Check out thecloudpod.net/archera to schedule a demo today.

We also wanted to tell you about something coming to the US for the first time — WeAreDevelopers World Congress!

General News

06:12 Amazon invest up to $25 billion in Anthropic part of AI infrastructure

Amazon has committed up to $25 billion in additional investment in Anthropic, bringing its total potential investment to $33 billion. The latest $5 billion tranche is based on Anthropic’s $380 billion valuation, with up to $20 billion more tied to commercial milestones.
In exchange, Anthropic has committed to spending over $100 billion on AWS over the next decade, with a specific focus on Trainium custom AI chips, and plans to bring nearly 1 gigawatt of Trainium2 and Trainium3 capacity online by end of the year.
Anthropic cited real infrastructure strain from growing enterprise and consumer demand for Claude, noting reliability and performance impacts, which gives this deal a practical operational motivation beyond financial positioning.
Amazon is now a substantial investor in both Anthropic and OpenAI, having committed up to $50 billion to OpenAI in February, which raises notable questions for developers about how AWS positions competing AI platforms on its infrastructure.
With Anthropic also holding compute agreements with Microsoft Azure and Google, and now securing up to 5 gigawatts of total capacity, the company is distributing its infrastructure across multiple providers despite naming AWS its primary training partner.

08:46 Justin – “The big question is going to be when one of these companies – OpenAI or Anthropic – finally goes public, and they start publishing these things; what people’s actual reaction is to their financials .”

10:48 SpaceX Strikes $60 Billion Deal for Right to Buy Coding Startup Cursor

SpaceX struck a deal with AI coding startup Cursor, valued at either a $60 billion acquisition or a $10 billion partnership fee, giving Cursor access to xAI‘s Colossus supercomputer, which runs 200,000 Nvidia H100-equivalent GPUs for model training.
Cursor had been compute-constrained despite reaching $1 billion in annual recurring revenue and a $29.3 billion valuation, so this deal directly addresses their infrastructure bottleneck for scaling model intelligence.
The partnership positions SpaceX to compete in the AI coding tools space against Anthropic and others, notable given xAI’s Grok has publicly acknowledged falling behind competitors in coding capabilities.
For developers and cloud users, this deal signals continued consolidation between compute providers and AI coding tools, which could influence pricing, model availability, and platform lock-in decisions for teams building on AI-assisted development workflows.
SpaceX’s recent acquisition of xAI, combined with this Cursor deal, suggests a vertical integration strategy connecting rocket-company compute infrastructure directly to developer-facing AI products ahead of a potential IPO later this year.

12:27 Justin – “The thing I don’t get is the $10 billion partnership versus the $60 billion acquisition. What’s the triggering events on those things? When is it a partnership, versus when is it now an acquisition? And does that mean that these people who are working at Cursor – if it’s a partnership, aren’t getting equity? That’s a bummer.”

AI Is Going Great – Or How ML Makes Money

13:36 The next evolution of the Agents SDK

OpenAI has updated its Agents SDK to general availability, adding native sandbox execution, configurable memory, and filesystem tools modeled after Codex.
Agents can now read and write files, run shell commands, install dependencies, and apply code patches within controlled environments without developers building that infrastructure themselves.
The SDK introduces a Manifest abstraction that standardizes how agent workspaces are defined across sandbox providers, including Blaxel, Cloudflare, E2B, Modal, and Vercel, with storage integrations for AWS S3, Google Cloud Storage, Azure Blob Storage, and Cloudflare R2.
This gives developers a consistent path from local prototype to production deployment.
Built-in snapshotting and rehydration mean a failed or expired sandbox container does not terminate a long-running agent run, as the SDK can restore state in a fresh container from the last checkpoint. This addresses a practical reliability gap for agents working on multi-step tasks.
The SDK incorporates several emerging agentic standards, including MCP for tool use, AGENTS.md for custom instructions, and the skills spec for progressive capability disclosure.
OpenAI positions this as reducing the maintenance burden on developers as these patterns evolve.
The updated SDK is currently Python-only, with TypeScript support planned for a future release.
Pricing follows standard API rates based on tokens and tool use, and features like code mode and subagents are still in development for both language runtimes.

13:59 Ryan – “As long as it also logs and has permissions and some sort of boundaries, I don’t have to kill it. It’s just terrifying because we already have people that are just throwing questions into any chat tool, and just then running whatever command it spits out indiscriminately. And now that’s just going to happen at a faster rate.”

19:56 Introducing Claude Opus 4.7

Claude Opus 4.7 is now generally available across Claude products, the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry at the same pricing as Opus 4.6: $5 per million input tokens and $25 per million output tokens.
The model targets complex, long-running agentic coding workflows, with early testers reporting 13% higher resolution on a 93-task coding benchmark and 3x more production task resolution on Rakuten-SWE-Bench compared to Opus 4.6.
Vision capabilities received a notable upgrade, with Opus 4.7 now supporting images up to 2,576 pixels on the long edge, more than three times the resolution of prior Claude models.
This opens up use cases like computer-use agents reading dense screenshots and data extraction from complex technical diagrams, though higher-resolution images will consume more tokens.
Anthropic is using Opus 4.7 as a testbed for cybersecurity safeguards before any broader release of its more capable Mythos Preview model.
The model includes automatic detection and blocking of prohibited cybersecurity uses, with a new Cyber Verification Program available for legitimate security professionals doing penetration testing or vulnerability research.
A new high effort level sits between the existing high and max settings, giving developers finer control over the reasoning-versus-latency tradeoff.
Developers migrating from Opus 4.6 should note that the updated tokenizer can increase token counts by roughly 1.0 to 1.35 times, depending on content type, and a migration guide is available on the Claude platform.
File system-based memory improvements allow Opus 4.7 to retain notes across multi-session agentic work, reducing the need to re-establish context at the start of each task.
This is particularly relevant for enterprise teams running parallel agent workflows where continuity across long runs matters.

21:50 Ryan – “I didn’t realize it’s the same price because every platform that I’m using this in, Opus 4.7 is so much more expensive than 4.6.”

28:14 Introducing Claude Design by Anthropic Labs

Anthropic launched Claude Design in research preview for Pro, Max, Team, and Enterprise subscribers, powered by Claude Opus 4.7.
It enables users to create interactive prototypes, pitch decks, wireframes, and marketing assets through conversational prompts and inline editing controls.
A notable workflow feature is the Claude Code handoff, where finished designs are packaged into a bundle that developers can pass directly to Claude Code for implementation, creating a tighter loop between design and engineering.
Claude Design builds a team-specific design system during onboarding by reading codebases and design files, then automatically applies brand colors, typography, and components to every subsequent project.
Teams can maintain multiple design systems simultaneously.
Early user data from Brilliant suggests complex pages that required 20-plus prompts in other tools needed only 2 prompts in Claude Design, indicating meaningful efficiency gains for interactive prototype creation.
Export options include Canva, PDF, PPTX, and standalone HTML, with organization-scoped sharing and collaborative editing.
For Enterprise customers, the feature is off by default and must be enabled by admins in Organization settings.

30:56 Building the agentic cloud: everything we launched during Agents Week 2026

Cloudflare held its first Agents Week, shipping a new set of primitives across compute, security, and tooling specifically designed for running AI agents at scale.
The core premise is that traditional cloud infrastructure built around one app serving many users does not fit a model where individual users each run multiple concurrent agents.
On the compute side, Cloudflare launched new environments supporting both full operating system containers for package installation and terminal commands, and lightweight isolates that start in milliseconds for high-scale deployments.
They also shipped a Git-compatible workspace designed for agent-generated code moving from prototype to production.
Security and identity were treated as built-in defaults rather than add-ons, with new tools for connecting agents to private networks and managing autonomous actions taken on behalf of users across an organization.
The agent toolbox additions include inference, search, memory, voice, email, and a browser primitive, giving agents the ability to perceive, remember, and communicate without developers assembling separate third-party services.
Cloudflare also addressed the web infrastructure side, releasing tools for existing websites to control bot access, package content for agent consumption, and measure their readiness for agent-driven traffic, acknowledging that most of the current web was built for human browsers rather than automated agents.

31:40 Justin – “I look forward to Cloudflare taking down Cloudflare, and then writing an RCA with these great tools.”

32:14 Artifacts: versioned storage that speaks Git

Cloudflare launched Artifacts in private beta, a versioned file system built on Git that lets developers and agents programmatically create, fork, and manage Git repositories at scale via a REST API and native Workers API, with public beta targeted for early May 2026.
The system is built on Durable Objects with a Git server written in Zig and compiled to a roughly 100KB WebAssembly binary, enabling tens of millions of isolated repo instances per namespace while handling the full Git smart HTTP protocol with zero external dependencies.
Cloudflare is also open-sourcing ArtifactFS, a filesystem driver that mounts large Git repos using a blobless clone and lazy file hydration, reducing startup times for multi-gigabyte repos from roughly 2 minutes down to 10-15 seconds, which at 10,000 sandbox jobs per month translates to approximately 2,778 compute hours saved.
Beyond source control, Artifacts supports use cases like per-session agent state persistence, customer config versioning with rollback, and session forking, using Git semantics such as diff, revert, and clone as a general-purpose state management layer rather than just a code storage tool.
Pricing is designed for agent-scale workloads, charging based on storage consumed and operations performed rather than repo count, with plans to bring Artifacts to the Workers Free plan with fair use limits as the beta progresses.

32:54 Justin – “…another way it’s going to take down Cloudflare, so I look forward to that.”

34:53 Cortex Agents: The Platform Powering Snowflake Intelligence and Enterprise AI Agents

Snowflake is launching Cortex Agents as a full enterprise agent platform with several capabilities now generally available, including multi-tenancy with row-level data isolation, agent versioning with commit-based rollback, resource budgets for per-agent and per-team spending controls, and Cortex Agent Evaluations using their GPA (Goal-Plan-Action) framework.
MCP connector support is coming soon to GA, allowing Cortex Agents to connect natively to external tools like Salesforce, Jira, GitHub, Slack, and Google Workspace using the Model Context Protocol standard, with the same Snowflake role-based governance applied to those external connections.
The Code Execution Tool (public preview soon) gives agents a sandboxed Python environment with session-level isolation, letting agents generate and run code on demand during conversations without accessing data outside the current session scope.
The GPA evaluation framework is a notable technical detail here: in benchmark testing against TRAIL/GAIA, it captured 95% of human-annotated errors compared to a 55% baseline, and localized errors to specific trace spans with 86% accuracy, giving teams a structured alternative to subjective human review.
The cost governance model is more granular than typical platforms, supporting both agent-level and per-team shared budgets with configurable threshold actions, such as alerts at 80% spend and automatic access revocation at 100%, which addresses a practical concern for enterprises deploying agents across multiple business units.

35:06 Justin – “If you need your agents close to your data, this is a great way to do it. I definitely would look into cost with this one, because Snowflake is not cheap.”

36:11 Introducing GPT-5.5

GPT-5.5 is now generally available in ChatGPT and Codex for Plus, Pro, Business, and Enterprise users, with API access priced at $5 per 1M input tokens and $30 per 1M output tokens, and a Pro variant at $30 input and $180 output per 1M tokens.
The model shows notable agentic coding improvements, scoring 82.7% on Terminal-Bench 2.0 and 58.6% on SWE-Bench Pro, while using fewer tokens than GPT-5.4 to complete the same tasks, which partially offsets the higher per-token cost.
For cloud and enterprise workloads, GPT-5.5 was co-designed with and served on NVIDIA GB200 and GB300 NVL72 systems, with inference optimizations including dynamic load balancing heuristics that increased token generation speeds by over 20%.
Knowledge work benchmarks are worth noting for enterprise buyers: 84.9% on GDPval across 44 occupations, 78.7% on OSWorld-Verified for autonomous computer use, and 98.0% on Tau2-bench Telecom for customer service workflows, suggesting practical applicability across business functions.
OpenAI is classifying GPT-5.5 as High under its Preparedness Framework for both cybersecurity and biological capabilities, and is introducing a Trusted Access for Cyber program through Codex that gives verified defenders expanded access with fewer restrictions, which has direct implications for security teams evaluating AI-assisted vulnerability management.

37:31 Ryan – “That’s kind of cool. That’s the first I’m hearing of those kind of frameworks for their testing, and testing the safety AI aspects and having a rating, which I like.”

37:58 Introducing workspace agents in ChatGPT

OpenAI is launching workspace agents in ChatGPT as a research preview for Business, Enterprise, Edu, and Teachers plans, positioning them as an evolution of GPTs powered by Codex and designed for shared team workflows rather than individual use.
These agents run persistently in the cloud, meaning they can continue working on long-running tasks without user interaction, and can be triggered on a schedule or deployed directly in Slack to handle incoming requests automatically.
The practical use cases OpenAI highlights include a lead outreach agent that reduced 5-6 hours of weekly rep work to an automated background process, and an accounting agent that handles month-end close tasks, including journal entries and variance analysis in minutes.
(mention privacy filters) On the enterprise controls side, admins get role-based access management, a Compliance API for auditing every agent configuration and run, built-in prompt injection safeguards, and the ability to suspend agents, which addresses a common concern about autonomous agents operating within sensitive business environments.
Pricing is worth noting for teams evaluating adoption: workspace agents are free until May 6, 2026, after which credit-based pricing kicks in, giving organizations a window to test and build before committing to costs.

38:42 Introducing OpenAI Privacy Filter

OpenAI released Privacy Filter, an open-weight 1.5B parameter model (with only 50M active parameters) for detecting and redacting PII in text, available now on Hugging Face and GitHub under the Apache 2.0 license for free commercial use and fine-tuning.
The model uses a bidirectional token-classification architecture with constrained Viterbi span decoding, processing up to 128,000 tokens in a single forward pass across eight PII categories, including private persons, addresses, account numbers, and secrets like API keys and passwords.
A key practical advantage for cloud and on-premise deployments is that the model runs locally, meaning sensitive data never needs to leave the device for de-identification, which directly reduces exposure risk in logging, indexing, and training pipelines.
Performance benchmarks show a 97.43% F1 score on the corrected PII-Masking-300k benchmark, and fine-tuning on small domain-specific datasets can lift accuracy from 54% to 96% F1, making it adaptable for legal, medical, and financial workflows.
OpenAI explicitly notes this is not a compliance certification or anonymization guarantee, and recommends human review in high-stakes settings, which is an important caveat for developers considering it as a drop-in solution for regulated industries.

38:52 Justin – “If you’re looking for a lightweight built-in option inside of Codex to find privacy PII, this little model sits on top of it and does great work.”

40:33 Introducing ChatGPT Images 2.0

ChatGPT Images 2.0 can now handle small text, UI elements, icons, and complex layouts at up to 2K resolution; no more getting something “close enough. It actually delivers what you asked for.
Previous versions struggled outside of Latin-based text, but now it has solid support for Japanese, Korean, Chinese, Hindi, and Bengali, where the language is baked into the design itself.
When paired with a reasoning model, it can search the web, plan the image structure, self-check its work, and even produce multiple distinct images from a single prompt.
Images 2.0 supports everything from wide 3:1 banners to tall 1:3 mobile screens. Useful for social graphics, presentations, posters, and more, all without manual resizing.
This replaces the back-and-forth between prompting, designing, and editing. You describe what you need, it researches, writes, and visualizes from start to finish.

41:29 Matt – “I like that it can do multiple at the same time. That’s a nice feature.”

Cloud Tools

42:19 Register domains wherever you build: Cloudflare Registrar API now in beta

Cloudflare Registrar API is now in beta, allowing developers to search, check availability, and register domains programmatically through three straightforward API endpoints, keeping the entire workflow inside editors, terminals, or agent-driven tools.
The API integrates directly with Cloudflare’s MCP server, meaning agents in environments like Cursor or Claude Code can already discover and call Registrar endpoints without any additional integration or custom tool definitions.
Cloudflare maintains its at-cost pricing model through the API, charging exactly what the registry charges with no markup, and WHOIS privacy protection is enabled by default at no extra charge.
Registration typically completes synchronously within seconds, but the API also handles longer operations by returning a 202 Accepted with a polling URL, using the same response shape either way to simplify agent logic.
The beta currently covers search, check, and registration for a curated set of TLDs, with Cloudflare actively working to expand the API to include transfers, renewals, contact updates, and eventually a broader registrar-as-a-service offering for multi-tenant platforms.

AWS

43:29 AWS Interconnect is now generally available, with a new option to simplify last-mile connectivity

AWS Interconnect is now generally available in two flavors: multicloud for private Layer 3 connections between AWS and other cloud providers (starting with Google Cloud, Azure coming later in 2026), and last-mile for connecting on-premises locations to AWS through network providers like Lumen, AT&T, and Megaport.
The multicloud option uses IEEE 802.1AE MACsec encryption by default on physical links, routes traffic entirely over private backbones without touching the public internet, and includes built-in redundancy across at least two physical facilities.
Pricing is a flat hourly rate based on bandwidth tier and region pair, so check the pricing page before sizing your connection.
Provisioning is handled through the AWS Direct Connect console in a few clicks, generating an activation key that completes the handshake on the partner cloud side.
However, there are gotchas to watch for, including non-overlapping IP ranges, matching MTU settings between VPCs, and consistent IPv4/IPv6 configuration on both sides.
Last-mile connectivity automatically provisions four redundant connections, configures BGP routing, enables MACsec and Jumbo Frames by default, and supports 1 Gbps to 100 Gbps with bandwidth adjustable from the console without reprovisioning. It includes a 99.99% availability SLA up to the Direct Connect port.
Current multicloud availability covers five region pairs across US East, US West, and Europe, connecting to Google Cloud, with last-mile launching in US East N. Virginia only.
The open specification published on GitHub under Apache 2.0 allows other cloud providers to implement the standard and become Interconnect partners.
AWS Interconnect -multicloud pricing is available here, and last-mile pricing can be found here.

44:29 Justin – “Good to see it in GA; hopefully it gets expanded out pretty quickly.”

44:43 Amazon Quick for marketing: From scattered data to strategic action

Amazon Quick is an AI-powered marketing intelligence tool built on AWS that connects to existing tools like HubSpot, Salesforce, Slack, and Adobe to create a unified knowledge graph from scattered marketing data.
Pricing is available here, with support for MCP and OpenAPI integrations for extending to other systems.
The tool addresses three specific marketing pain points: campaign performance reporting, competitive intelligence, and content creation. Quick claims to reduce competitive analysis from days to 30 minutes and content production from three hours to under 20 minutes.
Quick Flows allow teams to automate recurring tasks like weekly performance summaries and monthly competitive reports on a schedule, shifting work from manual queries to automated delivery.
This is a notable distinction from standard AI chat assistants that require active prompting.
On the security side, Quick runs within the customer’s AWS environment, queries and responses are not used to train external models, and role-based access controls are included.
This positions it as an enterprise-focused offering rather than a consumer AI tool.
The product references an MIT study showing AI cut document creation time by 40% and improved output quality by 18% among 444 professionals, which gives some external grounding to the productivity claims.
Teams considering this should evaluate it against existing point solutions like dedicated BI tools or standalone AI writing assistants they may already have in place.

48:12 Amazon CloudWatch now supports cross-region telemetry auditing and enablement rules

CloudWatch now lets customers audit telemetry configuration and enable telemetry from services like EC2, VPC, and CloudTrail across multiple regions from a single control point, reducing the operational overhead of managing observability at scale.
Enablement rules can be scoped to specific regions or all supported regions, and rules set to cover all regions automatically expand to include new regions as they become available, which is useful for organizations with growing AWS footprints.
A practical use case is a central security team creating one organization-wide rule for VPC Flow Logs that consistently applies across every account and region, eliminating gaps in telemetry coverage that could create blind spots.
The feature is available in all AWS commercial regions with standard CloudWatch pricing applying to telemetry ingestion, so costs will scale with the volume of logs and metrics collected rather than the feature itself carrying an additional charge.
For teams managing multi-account AWS Organizations setups, this reduces the risk of misconfigured or missing telemetry in individual accounts, which has historically required custom automation or third-party tooling to enforce consistently.

47:58 Ryan – “…this has always been a challenge, even before I was doing security and trying to do log governance across these things, trying to have different serving farms basically in multiple regions and having to log into different web pages to view the metrics on each one. They sort of fix that with the ability to reference metrics in a foreign site a little while ago, but you could only do it for metrics. And so this is definitely something I’m glad to see that you can use.”

50:15 Introducing granular cost attribution for Amazon Bedrock

Amazon Bedrock now automatically attributes inference costs to the IAM principal making the call, with data flowing into CUR 2.0 via a new line_item_iam_principal column.
This works across all Bedrock models at no additional cost and requires no changes to existing workflows.
The feature supports four distinct access patterns: direct IAM users or API keys, application roles on AWS compute, federated identity through providers like Okta or Azure AD, and LLM gateway architectures.
Each scenario has different configuration requirements, with the gateway scenario being the most complex since it requires per-user AssumeRole session management to avoid all traffic appearing under a single identity.
Cost allocation tags can be attached to IAM users or roles, or passed dynamically as session tags through identity providers, and once activated in AWS Billing, they appear in Cost Explorer under an iamPrincipal prefix. This enables chargeback reporting by team, project, cost center, or tenant without building custom tracking infrastructure.
For organizations running LLM gateways like LiteLLM or custom proxies, the solution requires the gateway to call AssumeRole per user and cache those credentials for up to one hour, which keeps STS call volume manageable but introduces architectural changes.
The default STS rate limit of 500 AssumeRole calls per second per account may require a limit increase for high-throughput deployments.
Tags take 24 to 48 hours to appear in Cost Explorer and CUR 2.0 after activation, and IAM principal data must be explicitly enabled in the CUR 2.0 data export configuration before any attribution data will appear.

52:12 AWS Lambda functions can now mount Amazon S3 buckets as file systems with S3 Files

Lambda functions can now mount S3 buckets as file systems using S3 Files, which is built on Amazon EFS, allowing standard file operations without the overhead of downloading objects or managing ephemeral storage limits.
Multiple Lambda functions can connect to the same S3 Files file system simultaneously, enabling shared workspaces without custom synchronization logic, which is particularly useful for multi-step AI and machine learning pipelines.
The integration pairs well with Lambda durable functions, where an orchestrator can clone a repository to a shared workspace while parallel agent functions analyze it, with automatic checkpointing handling execution state.
Configuration is supported through the Lambda console, AWS CLI, SDKs, CloudFormation, and SAM, though the feature is limited to Lambda functions not configured with a capacity provider.
Pricing adds no additional charge beyond standard Lambda and S3 rates, and the feature is available in all AWS regions where both Lambda and S3 Files are supported.

52:19 Justin – “Thanks. Could have announced that last week.”

53:41 From developer desks to the whole organization: Running Claude Cowork in Amazon Bedrock

Claude Cowork is a desktop application (macOS and Windows) that lets knowledge workers delegate research, document analysis, data processing, and report generation to Claude, with all model inference routed through Amazon Bedrock in your AWS account rather than Anthropic’s infrastructure.
Pricing is consumption-based through your existing AWS agreement with no per-seat licensing from Anthropic, which is a notable distinction from Claude Enterprise and could make cost modeling more predictable for organizations with variable usage patterns.
Enterprise security controls are central to the integration, including AWS IAM or Bedrock API key authentication, VPC endpoint network isolation, CloudTrail audit logging, and OpenTelemetry export to CloudWatch, with Anthropic receiving only aggregate telemetry that can be disabled.
Setup relies on device management tools like Jamf, Microsoft Intune, or Group Policy to push a managed configuration to Claude Desktop, specifying the model ID, Bedrock inference profile, and auth method, which means IT teams control rollout rather than individual users configuring their own credentials.
Organizations already using Claude Code in Amazon Bedrock can reuse the same infrastructure setup for Cowork, and both in-region and cross-region inference profiles are supported to address data residency requirements across different geographies.

56:51 Justin – “The problem is that instead of building a proper enterprise backend that would do all the things they want, they partnered with Work OS. And so while Work OS has a bunch of things, it doesn’t have all the things that you would want, and this is a problem also for OpenAI, as well, because they also partner the same way. And Snowflake partners with them. But some have done a better job than others in how they lay out some of these tools.”

57:54 Get to your first working agent in minutes: Announcing new features in Amazon Bedrock AgentCore

Amazon Bedrock AgentCore now includes a managed agent harness feature that lets developers define an agent’s model, tools, and instructions via API calls without writing orchestration code, reducing initial setup from days to minutes.
It supports popular frameworks, including LangGraph, LlamaIndex, CrewAI, and Strands Agents.
The new AgentCore CLI (available on GitHub at github.com/aws/agentcore-cli) keeps the full agent lifecycle in one workflow, covering local prototyping, deployment, and operations from a single terminal with CDK support and Terraform coming soon.
AgentCore now includes persistent session state via a durable filesystem, enabling agents to suspend mid-task and resume where they left off, which makes human-in-the-loop workflows practical without custom storage plumbing.
Pre-built coding agent skills give tools like Claude Code and Kiro curated knowledge of AgentCore best practices rather than just raw API access, with plugins for Codex and Cursor coming by the end of April.
The managed agent harness is in preview across four regions (Oregon, N. Virginia, Sydney, Frankfurt) with no additional charge for the CLI, harness, or skills beyond standard resource consumption.
Full pricing details are here.

58:49 Ryan – “This is a great feature; this now makes it competitive with Vertex AI’s AgentBuilder, and so now it’s a useable option on Amazon. Awesome.”

GCP

Pre-Next Announcements

59:40 Gemini 3.1 Flash TTS: New text-to-speech AI model

Gemini 3.1 Flash TTS is now available in preview across three surfaces: the Gemini API and Google AI Studio for developers, Vertex AI for enterprises, and Google Vids for Workspace users, giving GCP customers multiple integration paths depending on their use case.
The model scored an Elo of 1,211 on the Artificial Analysis TTS leaderboard based on blind human preference testing, and was placed in the top quadrant for balancing speech quality with low cost, though specific per-character or per-request pricing was not disclosed in the announcement.
A new audio tags system lets developers embed natural language commands directly into text input to control vocal style, pace, tone, and accent at a granular level, including mid-sentence expression changes, which reduces the need for custom voice training pipelines.
The model supports native multi-speaker dialogue and more than 70 languages with localized style and accent controls, making it a practical option for developers building global or multilingual audio applications.
All generated audio is automatically watermarked using Google’s SynthID technology, embedding an imperceptible signal that allows detection of AI-generated content, which is a relevant consideration for enterprises with compliance or content authenticity requirements.

1:00:01 The Gemini App is now available on Mac OS

Google has released a native Gemini app for macOS, available free to all Gemini users on macOS 15 and above, downloadable at gemini.google/mac.
This is a desktop client rather than a GCP infrastructure announcement, so its relevance to enterprise GCP customers is indirect.
The app includes a screen-sharing feature that lets users pass local files and on-screen content directly to Gemini for context-aware assistance, which could be useful for analysts or developers reviewing complex outputs without leaving their workflow.
A keyboard shortcut (Option + Space) surfaces Gemini from any application, positioning it as a system-level assistant similar to Spotlight, aimed at reducing context-switching during tasks like spreadsheet work or document drafting.
The app integrates with Google’s existing generative media tools, including image generation via Imagen (Nano Banana) and video generation via Veo, giving creative users access to those capabilities without opening a browser.
Google has indicated this initial release is a foundation for a broader desktop assistant strategy, with additional features planned, so organizations evaluating AI assistant tooling for their teams should monitor how this evolves alongside Workspace and GCP integrations.

1:01:33 Create Expert Content: Deploying a Multi-Agent System with Terraform and Cloud Run

Google’s Dev Signal is a four-part tutorial series showing how to build and deploy a production multi-agent system using Google ADK, MCP, Vertex AI memory bank, and Cloud Run, with the full code available at the GoogleCloudPlatform devrel-demos GitHub repository.
The deployment architecture uses Terraform to provision least-privilege service accounts, Artifact Registry, and Secret Manager integrations, following the Agent Starter Pack patterns to avoid common security pitfalls like over-permissioned default compute accounts.
Observability is handled through OpenTelemetry integration with a single otel_to_cloud=True flag in the FastAPI server, which exports agent traces to Cloud Console showing LLM invocations and MCP tool calls, though production traces are sampled, so targeted evaluation runs are needed for full request visibility.
The system distinguishes between two types of monitoring: system traces for identifying latency and timeout issues at scale, and reasoning traces for targeted evaluation of specific agent decisions, which is a practical distinction teams often miss when moving prototypes to production.
Pricing for this stack depends on Cloud Run usage, Vertex AI memory bank calls, and Secret Manager API requests, all billed separately at standard GCP rates, so teams should factor in the multi-service cost model when estimating production expenses.

1:02:05 Google Next: the Conference

32K attendees, 3 keynotes, 25 spotlights, 700+ breakouts, 260 announcements (yeah, we counted.)

Justin:

Wiz + Google Cloud Security/Product Offering

- Antigravity IDE + Gemini CLI (agent mode) enhancements
  - Data Agent Kit with VS Code/ Claude Code and Gemini CLI (close but no cigar)
Ironwood TPU GA and/or dedicated Inference-based CHIP

Ryan

Gemini 3.1 Pro GA & Teasing Gemini 3.5 or 4 or future model

Enhancements with agents and Agentic (THE ENTIRE CONFERENCE)
VMware interruption based on Kubernetes? (Opposite of Tanzu)

Matt

Default Guardrails in AI in general. How Gemini will have guard rails via Vertex.

Agent Identity, Agent Gateway, and Model Armor

Agentic coding tooling and how developers are leveraging Agentic (SDLC)

Data Agent Kit & Agentic Task Force
3 Non AI Announcements (at the conference, but not on stage, so…)

This is genuinely the best we’ve ever done. Time to go buy a lotto ticket and lose.

Runner Ups

A2A protocol 1.0 released – Donated to CNCF

- Turboquant Ships in Vertex AI
- Something waymo

Biqquery AI Agents – Part of Data Agent Kit

Gemini 3.1 Flash GA

- Axion Gen 2

Nano bananas updates
Sovereign Cloud AI
Gemini Robotics API Preview
Hugging Face
AWS Activate type program
AP2 Payment Protocol
AI in Android
Gemini + Boston Dynamics
Glasswing Answer

How many times is AI said on stage?

JUST THE FIRST KEYNOTE WAS 132 Times!!

2nd Keynote: 55 Times

Matt – 99

Ryan – 75

Justin – 115 Winner

That makes Justin the overall winner for this year’s NEXT predictions.

Here’s our Claude-based tier ranking of the 260 announcements:

1:09:21 TIER S — Headline

Agent platform (Vertex AI evolves)

The story of the keynote. Vertex AI is being repositioned as the Gemini Enterprise Agent Platform
16 named sub-features in three buckets:
- Build: ADK (graph-based sub-agent networks), Agent Studio (no-code → ADK export), Agent Designer
- Run: Agent Runtime (sub-second cold starts), Agent Sandbox (now everyone), Memory Bank, Sessions, long-running agents
- Govern + Optimize: Agent Identity (cryptographic ID per agent), Agent Registry, Agent Gateway, Anomaly Detection, Security dashboard, Simulation, Evaluation, Optimizer
Strategic frame: the agent is now the unit of work, not the model call
Hot takes to fight over:
- Is “delegating business outcomes” the new “infrastructure as code”?
- Does the 16-feature stack feel cohesive or like marketing bolt-ons?
- Does this simplify Vertex’s SKU sprawl or make it worse?
Matt’s guardrails prediction lands naturally inside the Govern bucket.

1:10:27 Customer scale — agents actually in production

Strongest competitive flex of the keynote. Pick 4-5 to land on-mic:
- Mars — Gemini Enterprise as the primary AI operating system for the global workforce (the headline customer)
- Merck — agentic platform across R&D, manufacturing, commercial; 75K employees
- GE Appliances — 800+ agents across manufacturing, logistics, supply chain
- Tata Steel — 300+ specialized agents in 9 months
- Deutsche Telekom MINDR — 95%+ reduction in event management times (best ROI quote)
- Citadel Securities — TPU research workloads 4x faster, 30% lower cost, days → minutes
- Highmark Health Sidekick — $27.9M in value in 2025 alone
Frame: Last year was “agents are coming,” this year was “here’s the receipts”
Hot take: name the equivalent customer slate from AWS, Azure, or Snowflake. You can’t.
Skip if tight on time: Home Depot Magic Apron, Macy’s “Ask Macy’s”, Papa John’s, Virgin Voyages Rovey, Capcom, Citi Sky, Vodafone, Unilever

1:11:44 TPU 8t and 8i (the silicon split)

8t (training): ~3x compute vs Ironwood
8i (inference): purpose-built, 80% better perf/$, optimized for MoE + agentic workloads
TorchTPU: native PyTorch, full Eager Mode — kills the JAX-only friction
Strategic: only hyperscaler shipping dedicated inference silicon this generation
Practitioner angle: agent workloads (lots of small inference calls) tilt economically toward Google if 80% perf/$ holds in production
Justin’s prediction wins twice — he specifically called the dedicated inference chip

1:12:22 TIER A — Strong second tier

Wiz expands (multi-cloud agent visibility)

Lead with: acquisition formally closed
Wiz AI-APP — code-to-cloud-to-runtime AI Application Protection Platform
Killer move: Wiz now supports AWS Agentcore, Azure Copilot Studio, Salesforce Agentforce, Databricks
- Google is selling security to customers who’ll never run a workload on GCP
- Different posture than they’ve had historically
Other Wiz news worth a mention:
- Inline AI security hooks in IDEs
- Wiz Skills — validated attack-surface findings exposed to coding agents for auto-remediation
- AI-Bill of Materials — auto-inventory of every AI framework, model, IDE extension across your environment (shadow-AI killer)
- Lovable vibe-coding integration (security scanning inside Lovable)
Hot take: most strategically interesting acquisition payoff Google has shipped in years.

1:13:46 Partner fund — $750M + Forward-Deployed Engineers

$750M innovation fund for partner agent development
Agent Marketplace + Agent Gallery — 70+ partner-built agents at launch
- Accenture, Adobe, Atlassian, Deloitte, Lovable, Oracle, Palo Alto, Replit, S&P Global, Salesforce, ServiceNow, Workday
Forward-Deployed Engineers with Accenture, Deloitte, McKinsey — Google making its own engineers available through partner GTM
Hot take: this is a Palantir-style move. Google admitting agent adoption needs hand-holding and putting money + bodies behind it
Open question: Does this reshape the SI economics, or is it just GTM theater?

1:14:48 Antigravity + Data Agent Kit + Gemini 3.1 Pro

Gemini 3.1 Pro in preview across Vertex / Gemini Enterprise / Antigravity / Android Studio / Gemini CLI / AI Studio
Data Agent Kit — portable suite of skills, MCP tools, plugins; turns VS Code and Gemini CLI into native data workspaces
Full-stack vibe coding from AI Studio → Cloud Run is now GA (Firestore + auth out of the box)
Hot take: this is the developer story. Cursor / Claude Code / Replit competitors take note.
Justin and Ryan both have prediction wins here

1:15:25 Agentic Data Cloud — Knowledge Catalog + Cross-cloud Lakehouse + Spanner Omni

Knowledge Catalog — universal context engine; maps business meaning across the data estate. Foundation for accurate agent execution.
Cross-cloud Lakehouse (BigLake renamed) — Iceberg REST Catalog, federation with AWS Glue / Databricks / Snowflake / SAP, cross-cloud caching cuts egress
Spanner Omni — Spanner runs multi-cloud, on-prem, even on a laptop
- This is the most underrated announcement of the keynote
- Fight over: is this the new Aurora-anywhere? Does it actually pull workloads off RDS / Cosmos?
Lakehouse federation for AlloyDB — live joins between transactional + analytical without ETL.

1:17:17 TIER B — Solid block

Workspace AI — Workspace Intelligence + Studio

Workspace Intelligence — unified semantic understanding across Docs / Slides / Gmail / projects / org domain knowledge
Workspace Studio — no-code agent builder; skills deployable across Workspace
M365 → Workspace migration tool — competitive shot at Microsoft, easy to move emails/files/conversations
Sovereign controls + client-side encryption — lock processing to US/EU; CSE means even Google can’t see
Auto browse with Gemini in Chrome Enterprise (US)

1:17:53 Cloud Run grew up

Full-stack vibe coding deploy from AI Studio (GA)
NVIDIA RTX PRO 6000 Blackwell support — run 70B+ parameter models without managing GPU infra, scales to zero
Billing caps (long-requested!) — set max monthly spend, resources de-activate when hit
Cloud Run sandboxes for ephemeral isolated agent execution
SSH into running containers (preview)
Hot take: Cloud Run is positioning itself as the default agent runtime, period

Gemini Enterprise for CX

Shopping agent + Food Ordering agent (Papa John’s first user)
Omnichannel Gateway — agent context across web / mobile / voice
Agent Assist — coaching mode for human agents in complex situations

1:19:04 BigQuery AI

AI.PARSE_DOCUMENT — single SQL function for OCR + layout + chunking via Gemini’s layout parser
TabularFM — zero-shot regression/classification, no feature engineering
BigQuery Graph — entity/relationship modeling natively in the warehouse
Reverse ETL — one-click sync from lakehouse to AlloyDB/Spanner for low-latency serving
Connected Sheets with TimesFM — zero-shot forecasting in Google Sheets
BigQuery hybrid search — semantic + full-text in one function
35% YoY perf improvement, lower processing cost
Hot take: biggest “Monday morning” change for data teams in the entire keynote

1:19:32 TIER C — Lightning round

Virgo Network

Custom interconnect: 134K TPUs in a single fabric, 1M+ across sites
A5X with NVIDIA Vera Rubin NVL72 — up to 960K GPUs cross-site
The “we can scale further than anyone else” mic drop

1:20:05 Rapid storage

Rapid Bucket — 15 TB/s bandwidth, 20M req/s, sub-millisecond latency, single-zone
Rapid Cache (formerly Anywhere Cache) — 2.5 TB/s aggregate read; 2.2x faster checkpoint restores
Managed Lustre at 10 TB/s throughput; 2.6x faster checkpoints

1:20:54 Axion expands

N4A GA — 2x price/perf vs x86; 30% better perf/$ for GKE Agent Sandbox vs other hyperscalers
C4A.metal preview — first Axion bare metal (Android dev, automotive sim, custom hypervisors)
Confidential Computing on G4 (Blackwell) + C4 (Granite Rapids) — confidential AI workloads

1:21:54 Fraud Defense

reCAPTCHA evolves into a platform that distinguishes bots, humans, AND agents
Agent-specific capabilities coming for the digital commerce journey (account → payment → checkout)
Closest thing in the wrap-up to the AP2 protocol prediction nobody hit

1:21:50 Post-quantum crypto

KMS Quantum Safe Key Imports (preview)
PQC in Cross-Cloud Network
Boring but important — Google front-running the regulatory ask

1:22:00 GKE upgrades

4x faster node startup, 80% faster pod startup, 5x faster model loading
GKE hypercluster — single control plane, millions of accelerators, multi-region (private GA)
Predictive latency boost in GKE Inference Gateway — up to 70% lower time-to-first-token
KV Cache tiering across RAM / Local SSD / Cloud Storage / Lustre
RL Scheduler, RL Sandbox, RL Observability for reinforcement learning workloads

1:22:33 Three themes that emerged

Agent platform is the new operating system. Vertex’s rebrand to Gemini Enterprise Agent Platform isn’t cosmetic — Google restructured the portfolio so the unit of work is an agent, not a model call.
Wiz is now Google’s multi-cloud trojan horse. Supporting AWS Agentcore + Azure Copilot Studio + Salesforce Agentforce means Google is happy to sell security to customers who’ll never run on GCP. New posture.
Customer scale is the real flex. Mars, Merck (75K employees), GE Appliances (800 agents), Tata Steel (300 in 9 months), Deutsche Telekom (95% MTTR reduction). Other hyperscalers can match the silicon. They can’t yet match this deployment depth on stage.

1:23:00 Conspicuously absent

A2A 1.0 / CNCF donation — third-party press reported it, not in the official wrap-up
No Boston Dynamics or Waymo crossover
No Gemini Robotics API preview
No Hugging Face deal
No AP2 Payment Protocol (Cloud Fraud Defense is the closest cousin)
No Nano Banana update
No Glasswing answer
No Turboquant in Vertex

1:23:24 Less important stuff

Bigtable in-memory; Memorystore for Valkey 9.0
AlloyDB AI search at 10B vectors; new AlloyDB AI functions
Firestore Enterprise edition (full-text + geospatial + JOINs)
Firebase SQL Connect; Firebase Phone Number Verification
NetApp Volumes Flex Unified + ONTAP-mode
Filestore for GKE; Hyperdisk Exapools / ML / Balanced improvements
Cloud WAN expansion to 25+ countries; NCC Gateway with Palo Alto + Symantec
Cloud Armor managed rules (Thales Imperva); Cloud NGFW Advanced Malware Sandbox
Private Service Connect: 40+ published services, endpoint-based security
Looker Studio renamed to Data Studio; Looker Dashboard Agents; AI assistants
CME Group ultra-low-latency partnership for financial exchanges
Google for Startups AI Agents Challenge ($90K prize, $500 credits)

Google Cloud Next 2026 Wrap Up

Google Cloud Next 26 featured 260 announcements centered on what Google calls the “Agentic Era,” with the headline being the Gemini Enterprise Agent Platform, which replaces Vertex AI as the primary platform for building, scaling, and governing AI agents with new components like Agent Runtime (sub-second cold starts), Agent Memory Bank, Agent Identity with cryptographic IDs, and Agent Gateway for fleet management.
On the infrastructure side, Google announced 8th-generation TPUs split into two variants: TPU 8t for training workloads delivering roughly 3x higher compute than the previous generation, and TPU 8i for inference and reinforcement learning with up to 80% better performance-per-dollar, alongside new Axion-based N4A VMs now generally available at up to 2x better price-performance than comparable x86 VMs.
The Agentic Data Cloud introduces a Knowledge Catalog as a universal context engine, a Cross-Cloud Lakehouse (formerly BigLake) built on Iceberg REST Catalog spanning AWS and Azure, and Spanner Omni, which extends Spanner’s globally consistent database to run on-premises or on other clouds, addressing the challenge of agents needing consistent data access across fragmented environments.
Security got notable attention with the completed Wiz acquisition now reflected in integrated tooling, Model Armor expanding to Agent Gateway and Firebase, a new Google Cloud Fraud Defense platform (evolved from reCAPTCHA) now generally available, and post-quantum cryptography support in Cloud KMS for quantum-safe key imports, all aimed at securing agentic workloads specifically.
Storage announcements include the new Cloud Storage Rapid Bucket delivering over 15 TB/s bandwidth with sub-millisecond latency now generally available, Managed Lustre Dynamic tier priced at $0.06/GB-month, and Hyperdisk ML throughput increased to 2 TB/s aggregate, all targeting the checkpoint and model loading bottlenecks common in large-scale AI training.

Next ‘26 day 1 recap

Google Cloud Next ’26 centered on moving AI into production at enterprise scale, with the Gemini Enterprise platform serving as the connective tissue across a unified stack spanning chips, models, data, agents, and security. The Gemini Enterprise Agent Platform is essentially a rebranded and expanded Vertex AI with new tools for building, scaling, governing, and optimizing agents.
On the infrastructure side, Google announced two new TPU 8 variants with distinct purposes: TPU 8t for training scales to 9,600 TPUs with 2 petabytes of shared memory, while TPU 8i for inference delivers 80% better performance per dollar than the prior generation using a new Boardfly topology. The new Virgo Network and Google Cloud Managed Lustre at 10 terabytes per second throughput round out the infrastructure updates.
The Agentic Data Cloud rebrands and expands Google’s data platform with notable additions, including a Knowledge Catalog for contextual grounding, a Lightning Engine for Apache Spark claiming 4.5x speed over open-source alternatives, and a Cross-Cloud Lakehouse based on Apache Iceberg that lets customers query data in AWS or Azure without copying it.
Security got substantial attention with three new agents in Google Security Operations for threat hunting, detection engineering, and third-party context enrichment, all currently in preview. The Wiz acquisition is now complete, and new Wiz integrations include inline security scanning in IDEs, an AI Bill of Materials for inventorying AI frameworks and models, and a Lovable platform integration generally available in May.
Google Workspace is being repositioned from a productivity suite into what Google calls a semantic intelligence layer, with new features like AI Inbox in Gmail, Drive Projects as an active collaborator, and an Ask Gemini interface in Google Chat that can take actions like scheduling meetings or creating documents directly from the chat window.

Next ’26 day 2 recap

Google Cloud Next Day 2 centered on the Gemini Enterprise Agent Platform, positioned as the evolution of Vertex AI, offering tools to build, scale, govern, and optimize autonomous agents. The keynote used a multi-agent marathon route planner for Las Vegas as a practical demonstration of the platform’s capabilities.
The Agent Development Kit, remote MCP servers, and Agent Runtime work together to give agents instructions, skills, and tools, while Agent Registry functions as a DNS-like directory for discovering and connecting deployed agents across a system.
Agent Platform Sessions and Memory Bank address a common problem in agentic systems by allowing agents to retain learned knowledge across interactions without stuffing raw text into every request, which improves performance over time.
Debugging and observability are handled through Agent Runtime trace view and Gemini Cloud Assist, which let developers use natural language to investigate logs and pinpoint issues, with fixes applied directly from an IDE connected via MCP and redeployed automatically.
Security is addressed through Agent Identity, which gives each agent a unique, immutable credential, and Agent Gateway, which enforces IAM policies to restrict agent actions to approved sources. Wiz integration adds code and infrastructure scanning with remediation suggestions, and notably supports Anthropic Claude Code as an alternative tooling option alongside Google’s own tools.

Partner-built agents available in Gemini Enterprise

Google has added partner-built agents from its Agent Marketplace directly into the Agent Gallery inside the Gemini Enterprise app, with partners including Salesforce, ServiceNow, Workday, Oracle, Atlassian, and Palo Alto Networks, among others. Each agent must pass a four-step evaluation covering basic functionality, output accuracy, autonomous execution, and enterprise standards to earn the Google Cloud Ready – Gemini Enterprise designation.
The governance model is worth noting for enterprise IT teams: employees can browse and request agents, but administrators retain approval control over deployments and can manage access at a granular level. Every agent also gets a cryptographically secure identity for audit trail purposes, and Agent Gateway plus Model Armor screen traffic to prevent data from being used for model training.
Google announced a 750 million dollar partner fund for agentic development alongside this launch, and partners selling through the Marketplace are reportedly closing deals 112 percent larger, with purchasing cycles accelerating by up to 50 percent. This creates a clear commercial incentive for ISVs to build and list agents on the platform.
The agent catalog covers a wide range of industries and functions, including supply chain optimization from Accenture, tariff management from Deloitte, financial analysis from S&P Global, identity security from Saviynt, and healthcare intake workflows from Synthpop. This breadth suggests Google is positioning the Agent Gallery as a general-purpose enterprise AI distribution channel rather than a niche tool.
Pricing for individual agents will vary by partner and likely requires existing subscriptions in some cases, such as the Alteryx AI Insights Agent requiring an Alteryx One subscription. Gemini Enterprise offers a 30-day free trial at console.cloud.google.com/freetrial for organizations wanting to evaluate the platform before committing.

Level Up Your Agents: Announcing Google’s Official Skills Repository

Google announced an official Agent Skills repository at Cloud Next 2026, launching with 13 skills covering products like BigQuery, Cloud Run, GKE, Firebase, and Gemini API, plus Well-Architected Framework pillars and recipe-style guides for common tasks. The repository is available at github.com/google/skills and is free to use.
Agent Skills address a practical problem called context bloat, where loading too much information into an AI agent’s context window increases token costs and degrades model performance. Skills are compact Markdown-based documents that agents load only when needed, rather than pulling in full documentation sets.
The format is described as open, meaning it is not locked to Google’s own tooling. Skills work with Google’s Antigravity and Gemini CLI agents as well as third-party agents, and installation is handled via a single npx command.
The announcement positions Skills as a complement to existing approaches like the Google developer documentation, the MCP server, giving practitioners a lighter-weight alternative when full real-time documentation grounding is unnecessary or too costly.
For teams building AI agents on top of Google Cloud services, this provides a structured way to keep agents accurate on GCP-specific APIs and best practices without manual prompt engineering or expensive context loading. Google indicated that more skills will be added in the coming weeks.

Introducing Gemini Enterprise Agent Platform

Google launched the Gemini Enterprise Agent Platform, which consolidates Vertex AI capabilities with new agent-specific tooling for building, scaling, governing, and optimizing AI agents. All future Vertex AI services and roadmap updates will be delivered exclusively through this platform rather than as a standalone service.
The platform introduces four governance-focused components: Agent Identity assigns each agent a unique cryptographic ID for auditable trails, Agent Registry maintains a central library of approved tools, Agent Gateway enforces security policies across environments, and Agent Anomaly Detection flags unusual reasoning using an LLM-as-a-judge framework.
Agent Runtime now supports long-running agents that maintain state for multiple days, with sub-second cold starts and a Memory Bank for persistent context across sessions. This addresses a practical gap where most agent frameworks previously lost context between interactions.
Developers can access over 200 models through Model Garden, including Gemini 3.1 Pro, Gemma 4, and third-party models like Anthropic Claude, with a low-code Agent Studio path and a code-first Agent Development Kit that processes over six trillion tokens monthly. Agent Garden provides pre-built templates for use cases like invoice processing, financial analysis, and code modernization.
Real-world deployments mentioned include Comcast rebuilding its Xfinity Assistant, Color Health using agents to schedule cancer screenings, and PayPal using Agent Payment Protocol for secure agent-based commerce. Pricing details are not specified in the announcement and would need to be confirmed through the Google Cloud console at console.cloud.google.com/agent-platform/overview.

Gemini Cloud Assist at Next ‘26

Gemini Cloud Assist is shifting from a reactive assistant to a proactive operations platform, using an agentic architecture to handle tasks like infrastructure troubleshooting, cost anomaly detection, and application design without waiting for user prompts.
The redesigned Application Design Center lets teams describe infrastructure goals in plain language and get back visual architectures with deployable Terraform templates, integrated with Security Command Center to enforce organizational policies from the start.
A 24/7 FinOps agent monitors for cost anomalies and correlates spending spikes with specific triggers like auto-scaling events or new resource creation, allowing teams to query cost data in natural language instead of manually aggregating reports.
MCP server support extends Gemini Cloud Assist beyond the Google Cloud console into IDEs, CLIs, and third-party tools like ServiceNow and Slack, reducing context switching for development and operations teams.
Petco reported a 60% reduction in Google Cloud-related questions to their cloud team after adopting Gemini Cloud Assist, suggesting meaningful productivity gains for platform teams supporting large developer organizations. Pricing details are not specified in the announcement, so teams should check the Gemini Cloud Assist admin console for current costs.

Unify analytical and operational data for AI

Google announced what it calls an “Agentic Data Cloud” at Google Cloud Next, focused on eliminating the separation between operational and analytical data systems. The goal is to let AI agents query both live transactional data and historical analytical data without complex data movement pipelines.
Three specific capabilities are now available or in preview: Lakehouse federation for AlloyDB lets operational systems query BigQuery data directly, Reverse ETL for BigQuery pushes analytical results into AlloyDB, Bigtable, or Spanner with sub-millisecond read latency, and the Spanner Columnar Engine is now GA with analytical queries running up to 200 times faster than standard transactional queries.
Datastream now supports real-time Change Data Capture into Apache Iceberg tables from AlloyDB, Cloud SQL, Spanner, and Oracle, streaming operational changes directly into the open Lakehouse format for immediate use in BigQuery ML and feature engineering workflows.
Knowledge Catalog, formerly Dataplex, is being extended with integrations across AlloyDB, BigQuery, Bigtable, Cloud SQL, and Spanner to provide a unified metadata layer. The intent is to reduce inconsistent data definitions that can cause AI agents to produce inaccurate outputs.
Native vector and full-text search are being embedded directly into AlloyDB, Bigtable, Cloud SQL, Firestore, and Spanner, and graph federation is being added across BigQuery and Spanner. This removes the need to move data into separate search or graph engines for hybrid retrieval and GraphRAG patterns. Pricing for these features is not specified in the announcement and would vary by service and usage.

Introducing the Google Cloud Knowledge Catalog

Google is evolving its existing Dataplex service into the Knowledge Catalog, a context engine designed to feed AI agents accurate business semantics, data relationships, and verified SQL patterns to reduce hallucinations and improve query accuracy.
The service aggregates metadata from a broad range of sources, including BigQuery, AlloyDB, Spanner, Cloud SQL, and third-party catalogs like Collibra and Atlan, plus enterprise platforms like SAP, Salesforce, and Workday through a preview feature called Enterprise Connectivity.
A notable enrichment capability is Smart Storage, which automatically tags and embeds metadata for files as they land in Google Cloud Storage buckets, making unstructured data immediately discoverable by agents without manual curation steps.
The search layer uses hybrid retrieval with access control awareness, meaning agents can only retrieve data assets they are explicitly authorized to see, which addresses a practical governance concern when deploying autonomous agents at enterprise scale.
Bloomberg Media is cited as an early customer, using Knowledge Catalog to power an internal Data Access AI Agent that translates business questions against their data lake. Pricing details are not publicly listed, so teams evaluating this should check cloud.google.com/products/knowledge-catalog for current information.

The future of data lakehouse for the agentic era

Google Cloud announced a next-generation cross-cloud Lakehouse built around Apache Iceberg, offering fully managed Iceberg storage with read/write interoperability across BigQuery, Managed Apache Spark, and third-party engines like Databricks and Snowflake (Preview). The goal is to let teams process the same data across multiple engines without duplication, which Spotify is already doing across BigQuery and Dataflow.
A new cross-cloud interconnect and caching capability (Preview) gives BigQuery and Managed Apache Spark high-performance access to data stored in AWS S3 Iceberg tables, with claimed price-performance comparable to AWS-native solutions. Catalog federation (Preview) extends this to AWS Glue, Databricks, SAP, and Snowflake, with Confluent Tableflow support coming later this year.
The Lightning Engine for Apache Spark claims up to 2x price-performance over competing high-speed Spark alternatives using vectorized execution and optimized I/O, with no code changes required. This runs within Managed Service for Apache Spark, formerly known as Dataproc.
Knowledge Catalog (formerly Dataplex) now provides always-on context for AI agents by continuously learning how enterprise data is used and mapping relationships within unstructured files. This feeds grounded context to agents built with tools like Agent Developer Kit and Model Context Protocol.
Real-time change replication from Spanner, AlloyDB, and Cloud SQL into BigQuery is now GA, with Iceberg replication in Preview, enabling operational data to feed directly into lakehouse workloads. Pricing is not specified in the announcement and would vary based on storage, compute, and cross-cloud data transfer usage.

What’s New in the Agentic Data Cloud

Google is rebranding and expanding Dataplex Universal Catalog into the Knowledge Catalog, which aggregates business context from third-party platforms like Salesforce, SAP, ServiceNow, and Workday, then uses hybrid search with access-control-aware retrieval so agents only act on data they are authorized to see.
The new Google Cloud Data Agent Kit (Preview) drops into existing developer environments like VS Code, Gemini CLI, and Claude Code, automatically selecting frameworks like dbt, Spark, or Airflow and generating production-ready code, with three specialized agents for data engineering, data science, and database observability now available at various GA and Preview stages.
Google is expanding MCP support across BigQuery, Spanner, AlloyDB, Cloud SQL, and Looker, using existing IAM policies and VPC Service Controls to govern agent interactions rather than requiring separate security configurations.
The cross-cloud lakehouse now supports bi-directional federation with Databricks Unity Catalog, Snowflake Polaris, and AWS Glue Data Catalog using the open Iceberg REST Catalog standard, and Spanner Omni (Preview) extends the Spanner engine to run on-premises or across other clouds for the first time.
On the performance side, Google is citing up to 2x price-performance improvement for Apache Spark via Lightning Engine, up to 34% cost reduction for BigQuery autoscaling workloads, sub-millisecond Bigtable reads via a new in-memory tier, and up to 10 terabytes per second throughput with Managed Lustre, though specific pricing details were not disclosed in the announcement.

Next 26 storage announcements

Cloud Storage Rapid is now generally available in two forms: Rapid Bucket, which uses Google’s internal Colossus system to deliver over 15 TB/s bandwidth and sub-millisecond latency, and Rapid Cache, which provides 2.5 TB/s aggregate read throughput for existing buckets with no code changes. The headline numbers for AI training are checkpoint writes 3.2x faster and restores 5x faster compared to traditional object storage.
Google Cloud Managed Lustre now delivers up to 10 TB/s throughput, a 10x increase from last year, and adds a new Dynamic tier priced at $0.06 per GB per month that serves data from persistent disk rather than object-based caching to avoid performance degradation under load.
Smart Storage adds automated metadata annotation directly in Cloud Storage, so objects get labels, extracted entities, and compliance signals attached at write time without custom pipelines. A new Cloud Storage MCP server lets AI agents read, write, and analyze Cloud Storage data using the standard Model Context Protocol, which reduces the need for separate retrieval layers.
Storage Intelligence, already used by 70% of Google Cloud’s largest customers managing over 50 billion objects each, gets zero-configuration dashboards that surface cost anomalies and integrate Security Command Center’s data governance signals with no setup required, plus enhanced batch operations supporting multi-bucket actions on billions of objects at once.
The ecosystem additions include NetApp Volumes Flex Unified, supporting both block and file protocols on the same storage pool with ONTAP API compatibility, Filestore for GKE scaling down to 100 GiB shares, and Google Cloud Backup and DR gaining agentic AI capabilities to autonomously audit and remediate backup coverage gaps with new GA support for AlloyDB and Filestore.

Introducing Virgo Network megascale data center fabric

Google introduced Virgo Network, a specialized scale-out data center fabric designed for AI workloads, built on a flat two-layer non-blocking topology that reduces network tiers and latency compared to traditional data center architectures. It underpins the AI Hypercomputer platform and connects up to 134,000 TPU chips with up to 47 petabits per second of non-blocking bi-sectional bandwidth in a single fabric.
The architecture separates east-west accelerator traffic (handled by Virgo) from north-south storage and compute traffic (handled by the existing Jupiter network), allowing each layer to evolve independently without system-wide disruptions. This decoupling also means bandwidth dedicated to accelerator-to-accelerator communication is non-blocking and not competing with general data center traffic.
Virgo delivers 4x the bandwidth per accelerator and 40% lower unloaded fabric latency compared to the previous generation, which matters specifically for latency-sensitive inference workloads and large synchronized training jobs where a single slow node can degrade the entire cluster.
Reliability at this scale is addressed through independent switching planes for fault isolation, sub-millisecond telemetry for observability, and automated straggler and hang detection to minimize training job interruptions. Google frames this around maximizing “goodput,” meaning the useful work completed relative to total time, rather than just raw throughput.
No pricing details were provided in the announcement, as Virgo Network is infrastructure-level and costs would surface through TPU and AI Hypercomputer product pricing rather than as a standalone purchasable service.

What’s new for Google Cloud databases at Next’26

Google announced Spanner Omni, a downloadable edition of Spanner that runs outside of Google Cloud, including on-premises data centers, other clouds, and edge environments. This gives organizations using Spanner’s distributed database capabilities more deployment flexibility without being locked into a single cloud region or provider.
AlloyDB received notable vector search improvements, scaling to 10 billion vectors using Google’s ScaNN index and delivering up to 6 times faster vector queries compared to standard PostgreSQL HNSW indexes. The addition of native BM25 support, coming soon, enables hybrid search combining vector retrieval with full-text search in a single database.
Managed remote MCP servers are now generally available for AlloyDB, Bigtable, Cloud SQL, Firestore, and Spanner, with preview support for Memorystore, Datastream, and Oracle Database at Google Cloud. This removes the operational burden of self-hosting Model Context Protocol infrastructure for teams building AI agents that need secure, reliable access to enterprise data.
The lakehouse integration announcements bridge the gap between transactional and analytical workloads, with AlloyDB now able to query live BigQuery and Iceberg tables directly from the PostgreSQL data plane without data movement. Datastream also now supports continuous replication from AlloyDB to Iceberg tables, which is useful for real-time ML feature engineering pipelines.
Bigtable is adding a new in-memory tier with sub-millisecond read latency as part of a new Enterprise Plus edition, and Memorystore for Valkey 9.0 is now generally available with a managed migration path from self-managed Redis. Both updates reflect Google’s push to offer managed caching and low-latency storage options with enterprise security features like ACLs and token-based authentication.

Introducing Spanner Omni

Spanner Omni is a downloadable version of Google’s Spanner database now in preview, allowing deployment on-premises, across clouds, on Kubernetes clusters, or even a laptop, rather than being limited to Google Cloud infrastructure. The developer edition is available for free download today at the link in the show notes, with a commercial edition requiring direct contact with Google.
On the technical side, Google had to replace two core Spanner dependencies to make this work. Colossus, Google’s proprietary distributed file system, was replaced with a software abstraction layer that writes to local file systems, and TrueTime’s atomic clock and GPS-based synchronization was replaced with a software-based alternative that still provides error-bounded time synchronization.
Internal benchmarks show Spanner Omni can process millions of queries per second across petabytes of data in a single regional deployment, and it supports the full multimodal feature set, including SQL, graph, key-value, full-text search, vector search, and columnar analytics.
Three primary use cases are emerging from early adopters: hybrid failover, where managed Spanner in Google Cloud serves as primary, and Spanner Omni handles disaster recovery on-premises, a write-once-run-anywhere approach for ISVs and SaaS providers, and on-premises modernization for organizations with regulatory or data sovereignty requirements that prevent full cloud adoption.
Pricing for the commercial edition is not publicly listed yet, so organizations interested in production use will need to engage Google directly at cloud.google.com/consulting/spanner-omni to discuss terms.

TPU 8t and TPU 8i technical deep dive

Google’s eighth-generation TPUs split into two specialized chips: TPU 8t for large-scale pre-training and TPU 8i for inference and reasoning workloads. This specialization reflects a recognition that training and serving have distinct hardware bottlenecks that a single chip design cannot optimally address.
TPU 8t introduces native FP4 support, SparseCore for embedding lookups, and a new Virgo Network fabric that can link over 134,000 chips with 47 petabits per second of non-blocking bandwidth. Combined with TPUDirect Storage and Managed Lustre 10T, Google claims 10x faster storage access compared to seventh-generation Ironwood TPUs.
TPU 8i uses a new Boardfly network topology inspired by Dragonfly principles, reducing chip-to-chip communication from 16 hops to 7 hops in a 1,024-chip pod. This 56% reduction in network diameter directly benefits Mixture-of-Experts and reasoning models that require frequent all-to-all communication patterns.
On performance-per-dollar, Google claims TPU 8t delivers 2.7x improvement over Ironwood for training, while TPU 8i delivers 80% improvement for low-latency inference on large MoE models. Both chips also deliver up to 2x better performance-per-watt, which matters for customers managing energy costs at scale.
The software stack supports JAX, native PyTorch (currently in preview), Keras, and vLLM, with XLA handling hardware-specific translation transparently. Customers interested in access can submit an interest form at cloud.google.com/resources/tpu-interest, though pricing details have not been publicly disclosed.

Introducing Spend Caps AI Cost Visibility Next ’26

Google Cloud announced Spend Caps in private preview, allowing FinOps and DevOps managers to set hard budget limits at the project level for services including AI Studio, Gemini Agent Platform, Cloud Run, Cloud Run Functions, and Maps. Unlike traditional budget alerts, Spend Caps automatically pause API traffic when a budget threshold is reached while leaving underlying resources intact, addressing the risk of runaway AI training jobs or unoptimized models draining budgets quickly.
A new FinOps Explainability Agent, built on Gemini and accessible through Google Cloud Billing, autonomously analyzes AI cost drivers and answers natural language queries such as breaking down spend by API key or comparing input versus output token costs across specific Gemini models. This addresses the challenge of AI costs blending into general infrastructure spend, making ROI attribution more straightforward.
Google reported that since launching Gemini Cloud Assist for FinOps, cost reporting adoption increased 75% and time spent on cost analysis decreased 18%, providing some baseline context for the value customers are seeing from AI-assisted billing tools.
Two additional private previews were announced alongside Spend Caps: enhanced billing account hierarchies that aggregate spend across multiple billing accounts, including Other Eligible Services, and contract commitment reporting that shows burndown progress within Enterprise Agreements. Both features target larger organizations managing complex commercial arrangements with Google Cloud.
Spend Caps are currently in private preview with a signup form available, and no specific pricing details were provided for the new FinOps tooling beyond its availability in the Google Cloud Billing console.

Next ‘26: Redefining security for the AI era with Google Cloud and Wiz

Google Cloud announced three new security agents in Google Security Operations at Next 26: a Threat Hunting agent, a Detection Engineering agent, and a Third-Party Context agent, all in preview. The existing Triage and Investigation agent has already processed over 5 million alerts, reducing the typical 30-minute manual analysis to 60 seconds.
Wiz, now fully part of Google Cloud, is expanding its AI-Application Protection Platform to cover new agent studios, including AWS Agentcore, Microsoft Azure Copilot Studio, and Salesforce Agentforce, plus Databricks. New capabilities include inline AI security hooks for IDEs, agent-based remediation via Wiz Skills, and an AI Bill of Materials to inventory shadow AI tools across an environment.
Google Cloud is introducing Agent Identity and Agent Gateway as part of the Gemini Enterprise Agent Platform, giving AI agents unique identities with scoped permissions and enforcing policy on all agent-to-agent and agent-to-tool traffic. Model Armor now integrates with Agent Gateway, LangChain, and Firebase to provide runtime protection against prompt injection and data leakage without code changes.
On the data security side, Confidential Computing support is coming to G4 VMs with NVIDIA RTX PRO 6000 Blackwell GPUs and C4 VMs with Intel TDX, both in preview. KMS is also adding quantum-safe key imports in preview, addressing organizations starting to plan for post-quantum cryptography requirements.
ReCAPTCHA is being rebranded and expanded into Google Cloud Fraud Defense, now generally available, with agent-specific capabilities for distinguishing bots, humans, and AI agents coming in preview. Chrome Enterprise is adding shadow AI reporting and AI-aware extension threat detection to help organizations manage unsanctioned AI tool usage at the browser level.

Looker updates for agentic BI at Next ‘26

Google announced Looker BI Agents at Cloud Next, introducing Dashboard Agents and Agentic Workflows that go beyond static answers to trigger downstream business actions, all grounded in the Looker semantic layer and existing enterprise governance frameworks.
Several features moved to GA, including Embedded Conversational Analytics, Visualization Assistant, Self-service Explores with CSV and Excel blending, and CI/CD pipeline support, giving teams more production-ready options without waiting on preview limitations.
The new MCP integration adds a managed MCP server native to Looker, and a VS Code extension introduces a LookML AI Agent that translates natural language descriptions into production-ready LookML code, reducing the technical barrier for model authoring.
Knowledge Catalog integration in preview allows Looker to transform metadata into a semantic graph, which is positioned as a way to reduce AI hallucinations by giving agents the context needed to complete tasks autonomously.
Pricing details were not disclosed in the announcement, so teams evaluating these features should check cloud.google.com/looker directly, particularly for the preview features, which may have different availability or cost structures once they reach GA.

Next ‘26: Announcing new partner-supported workflows for Google Security Operations

Google Security Operations is expanding its partner ecosystem with 13 new integrations announced at Next ’26, bringing the total vendor count to over 300. The new partners span data ingestion, automated response, and bi-directional API workflows, covering gaps in areas like SAP logs, VMware ESXi threats, and application-layer attacks.
Three distinct integration patterns are supported: data feed integrations that pre-map telemetry to Google’s Unified Data Model schema, response integrations that automate alert triage and case management, and bi-directional API workflows that let partner platforms pull Chronicle detections without requiring analysts to switch consoles.
Notable technical additions include Synqly Mesh offering bi-directional normalization between UDM and the Open Cybersecurity Schema Framework (OCSF), and Contrast Security streaming verified runtime attack telemetry to surface confirmed application exploits as cases correlated with WAF and EDR signals.
AI-assisted triage shows up across multiple integrations, with Torq applying agentic AI to filter detections and autonomously execute response actions like endpoint isolation, and Prophet Security using natural language threat hunting with bidirectional sync back to Google Security Operations.
Vendors interested in joining the ecosystem can download the Google Security Operations Build Partner Guide and request a development environment through the Google Cloud Security Tech Partners team. Pricing for individual integrations is not specified in the announcement and would vary by partner.

The new Gemini Enterprise: one platform for agent development

Google rebranded and expanded Vertex AI into Gemini Enterprise Agent Platform, consolidating model access, agent development, governance, and deployment tooling into a single system aimed at enterprise-scale agent management.
The platform introduces Agent Identity, which assigns each agent a unique cryptographic ID for auditability, alongside Agent Gateway for securing agent-to-agent communications and Model Armor for protection against prompt injection and data leakage.
A new Memory Bank and Memory Profiles feature gives agents persistent long-term context across sessions, allowing them to retain user preferences and historical interactions rather than starting fresh each time.
The Gemini Enterprise app adds a no-code Agent Designer for non-technical users, a centralized Inbox for monitoring long-running agents, and a Projects workspace that preserves team context as a persistent company asset rather than individual chat history.
The partner ecosystem integration brings agents from Adobe, Salesforce, ServiceNow, Workday, and others directly into the in-app Agent Gallery, with Google Cloud validation for security and interoperability before deployment. Pricing details were not disclosed in the announcement, so listeners should check cloud.google.com/ai for current pricing information.

What’s new in Gemini Enterprise

Google is expanding Gemini Enterprise with long-running agents that can autonomously execute multi-step workflows for hours or days, handling tasks like financial reconciliation or sales prospecting without constant human supervision. This is managed through a new Inbox command center that categorizes agent activity into actionable groups.
The Enhanced Agent Designer lets non-technical users build agents using natural language or a visual interface, with reusable Skills that codify specific workflows and human-in-the-loop checkpoints for review and approval at critical steps.
Governance is built into the platform at no additional cost through three key controls: Agent Identity for unique digital IDs and least-privilege access, Agent Registry for IT-managed agent catalogs, and Agent Gateway for centralized network policies and protection against risks like prompt injection.
Projects and Canvas introduce team-level collaboration by creating shared workspaces where humans and agents co-create together, with cross-platform support spanning Google Workspace, Microsoft 365, and OneDrive, plus the ability to export directly to Microsoft Office formats.
The new Agent Marketplace integrates into the existing Agent Gallery, allowing organizations to browse and deploy third-party agents from partners like Accenture, Oracle, and ServiceNow, while BYO-MCP support lets admins connect custom or third-party business tools without writing code. New features will roll out over the coming months, and pricing details are available at cloud.google.com/gemini-enterprise.

Introducing Google Cloud Fraud Defense, the next evolution of reCAPTCHA

Google Cloud Fraud Defense is the rebranded and expanded version of reCAPTCHA, now positioned as a broader trust platform that handles not just bot detection but also AI agent verification and multi-stage fraud across entire user journeys. Existing reCAPTCHA customers are automatically migrated with no action required and no pricing changes.
The platform introduces an agentic policy engine that lets businesses allow or block traffic based on risk scores, automation types, and agent identity, addressing the growing reality that AI agents are being used to complete end-to-end transactions on behalf of users.
A notable new mitigation tool is a QR code-based challenge designed to require human presence when suspicious agent activity is detected, replacing traditional CAPTCHA puzzles with a method intended to make automated fraud economically impractical rather than just technically difficult.
Google cites a 51% average reduction in account takeover for customers using the unified trust model, and the platform currently protects over 14 million domains globally, including 50% of Fortune 100 companies, giving it broad signal coverage that individual site data cannot replicate.
The platform integrates with emerging standards like Web Bot Auth and SPIFFE for agent identity verification, which is worth watching for teams building or securing agentic workflows since standardized agent identity is still an evolving area across the industry.

What’s new for Cloud Run at Next ‘26

Cloud Run is adding support for NVIDIA RTX PRO 6000 Blackwell GPUs, now generally available, allowing teams to serve models with 70 billion or more parameters without managing underlying infrastructure, including automatic scale-to-zero when idle to avoid unnecessary GPU costs.
Google AI Studio now supports full-stack app deployment directly to Cloud Run with a single click, combining server-side code, Firestore, and user authentication in a generally available workflow aimed at lowering the barrier for new developers.
A new Cloud Run MCP server is now generally available, giving developers and AI agents a standardized way to deploy and manage applications programmatically, which fits into the broader push toward agentic workflows.
Cloud Run is introducing individual instances as a primitive resource, separate from services or jobs, allowing teams to run long-running background agents more directly, though this feature is currently in preview with select customers only.
Billing caps are coming soon, letting teams set a monthly spend ceiling after which Cloud Run resources are deactivated, which addresses a common concern for teams running unpredictable or experimental workloads on pay-per-use infrastructure.

What’s new in GKE at Next 26

GKE Agent Sandbox launches as a new isolated execution environment for running untrusted AI agent code, using gVisor kernel isolation to support 300 sandboxes per second at sub-second latency, with up to 30% better price-performance on Axion processors compared to other cloud providers.
GKE hypercluster enters private GA, enabling a single Kubernetes-conformant control plane to manage up to one million chips across 256,000 nodes spanning multiple Google Cloud regions, reducing the operational burden of managing hundreds of disconnected clusters for large AI training workloads.
Inference performance improvements include ML-driven Predictive Latency Boost in GKE Inference Gateway, reducing time-to-first-token latency by up to 70%, plus automatic KV Cache storage tiering that delivered over 40% TTFT reduction when offloading to RAM and nearly 70% throughput improvement when offloading to Local SSD for long-context workloads.
New reinforcement learning capabilities in preview include an RL Scheduler to address straggler effects, an RL Sandbox for millisecond-scale kernel-level isolation during reward evaluation, and out-of-the-box observability dashboards, targeting the GPU and TPU idle time that occurs between RL pipeline steps.
Intent-based autoscaling adds native custom metrics support to the Horizontal Pod Autoscaler, reducing autoscaling reaction time from 25 seconds to 5 seconds while eliminating dependencies on external monitoring stacks that could cause autoscaling failures if they go down.

AI infrastructure at Next ‘26

Google announced eighth-generation TPUs at Cloud Next, split into two specialized chips: TPU 8t for training (delivering nearly 3x higher compute performance than prior generation, with 121 exaflops in a single superpod) and TPU 8i for inference (offering 80% better performance per dollar with 5x lower on-chip latency). This is the first time Google has offered distinct TPU chips optimized for different workload types rather than a single general-purpose design.
The Virgo Network is a new data center fabric with 4x the bandwidth of previous generations, capable of connecting 134,000 TPUs in a single data center or over one million TPUs across multiple sites into a unified training cluster. Google is also making it available for NVIDIA-based A5X instances, supporting up to 960,000 GPUs across multiple sites.
Storage improvements include Google Cloud Managed Lustre now delivering 10 TB/s of bandwidth (10x improvement over last year) with 80 petabytes of capacity, plus a new Rapid Buckets feature on Cloud Storage offering sub-millisecond latency and 20 million operations per second to keep accelerator utilization at 95% or higher during training checkpoints.
GKE received notable orchestration updates targeting agentic workloads, including node startup times 4x faster, pod startup reduced by up to 80%, and an updated Inference Gateway using ML-driven routing that cuts time-to-first-token latency by more than 70% without manual tuning.
Native PyTorch support for TPUs (called TorchTPU) is now in preview, joining existing JAX and vLLM support, which reduces friction for teams who want to run existing PyTorch models on TPU hardware without significant code changes. Pricing for these new offerings has not yet been publicly detailed, with availability described as coming soon.

Azure

1:30:36 Optimize object storage costs automatically with smart tier—now generally available

Azure Smart Tier for Blob and Data Lake Storage is now generally available, automatically moving objects between hot, cool, and cold tiers based on actual access patterns.
Data inactive for 30 days shifts to cool, then cold after another 60 days, and immediately returns to hot upon re-access with no retrieval or early deletion charges.
The feature eliminates the need to manually configure and maintain lifecycle rules, which is particularly useful for organizations managing large analytics workloads, telemetry data, or data lakes with unpredictable access patterns.
During preview, over 50% of smart-tier-managed capacity automatically shifted to cooler tiers.
Pricing includes standard hot, cool, and cold capacity rates with no tier transition fees, but a per-object monthly monitoring fee applies to objects managed by the smart tier.
Objects smaller than 128 KiB stay in hot tier permanently and do not incur the monitoring fee, so workloads with many small files should factor that into cost planning.
Setup requires a storage account with zonal redundancy and is available via the Azure portal or API, either at account creation or by switching an existing account’s default tier to smart. Legacy account types like GPv1 and page or append blobs are not supported.
Smart tier is available now in nearly all zonal public cloud regions, with broader regional coverage and updated Storage SDK support planned in upcoming releases. More details and pricing are at azure.microsoft.com/en-us/pricing/details/storage/blobs.

1:31:21 Justin – “Thanks, you finally got what Amazon’s had for a while.”

1:38:37 What’s new in Microsoft Entra – March 2026

Microsoft Entra ID is adding synced passkeys, passkey profiles, and phish-resistant MFA support for Linux SSO, giving organizations more options to move away from passwords while meeting compliance requirements for stronger authentication.
Starting June 1, 2026, Entra Connect Sync and Cloud Sync will block hard-match operations for users with assigned Entra roles, closing a potential attack path where on-premises AD attribute manipulation could be used to take over privileged cloud accounts.
Admins should review their hybrid sync configurations before that date.
The Microsoft Authenticator app now includes jailbreak and root detection for Android, with a phased rollout moving from warning to blocking to wipe mode, meaning users on non-compliant devices will eventually lose access to Entra credentials entirely.
Agent management is consolidating under Agent 365 as the single control plane, with the existing Entra admin center Agent registry and collections blades retiring May 1, 2026, and the current registry Graph API being deprecated and replaced, requiring re-registration of agents using the old API.
Entra ID Governance added several notable features this quarter, including SCIM 2.0 API support, delegated workflow management in Lifecycle Workflows, and a new billing meter for guest users, which organizations relying on governance features for external identities should review for potential cost impact.
Why June 1st? Turn this on today!

1:34:17 New in Azure SRE Agent: Log Analytics and Application Insights Connectors

Azure SRE Agent now supports Log Analytics and Application Insights as native connectors, allowing the agent to run KQL queries directly against workspaces and App Insights resources during incident investigations, replacing the previous approach of shelling out to Azure CLI commands. (REALLY? Bombastic side eye.)
Setup is simplified compared to the manual RBAC approach: selecting a resource from the dropdown automatically grants the agent’s managed identity Log Analytics Reader and Monitoring Reader on the target resource group, with a manual entry fallback if resource discovery fails.
The feature is backed by the Azure MCP Server using the monitor namespace, giving the agent read-only tools like monitor_workspace_log_query and monitor_table_list, with no ability to modify alerts, retention settings, or workspace configuration.
Practical use cases include AKS cluster investigations where the agent can automatically query ContainerLog, KubeEvents, and application traces across multiple connected workspaces to surface errors and failure patterns without manual intervention.
The connectors are currently behind an early access flag under Settings > Basics, though Azure SRE Agent itself is generally available.
Pricing is not detailed in the announcement, so listeners should check sre.azure.com/docs for current cost information.

1:35:14 Justin – “So they REALLY want you to burn tokens.”

1:35:41 Azure Key Vault HSM Platform One Retirement: What Purview BYOK Customers Need to Know

Azure Key Vault is retiring its legacy HSM Platform One on September 15, 2028, and customers using Microsoft Purview Information Protection with Bring Your Own Key (BYOK) will need to migrate their tenant root keys to the modern FIPS 140-2 Level 3 certified HSM platform before that date or risk losing encryption and decryption capabilities.
The migration is not straightforward because Azure Key Vault does not support exporting keys once imported, meaning customers must re-import their original on-premises key material into a new vault, which can be a lengthy process if that original key material is no longer readily accessible.
Microsoft is recommending customers start planning now, despite the 2028 deadline, particularly because coordinating across security, compliance, and HSM teams to recover or regenerate lost key material can take considerable time.
The practical steps involve confirming whether your tenant key sits on the legacy HSM platform, creating a new Key Vault on the modern platform, and updating your Purview configuration to reference the new vault, with Microsoft support available for customers who no longer have access to the original key material.
This announcement is most relevant to enterprise customers in regulated industries who have adopted BYOK for compliance reasons, and they should review the updated guidance at the Microsoft Learn documentation for tenant root key management to understand prerequisites and supported migration paths.

1:36:19 Matt – “The thing is, Microsoft does give you a decent amount of time to do stuff, but what’s always fun is if you buy a three-year reservation you’re stuck with it, and you have to deal with returning it right now, because otherwise you’d have negative time…”

After Show

1:38:11 Allbirds shares soar 580% after pivot from shoes to AI

Allbirds announced a $50 million deal to rebrand as NewBird AI, shifting its business model from footwear to GPU compute infrastructure and on-demand cloud services built for AI workloads.
The company’s stated rationale is a supply gap in AI compute capacity, with plans to purchase GPUs and offer them as on-demand cloud resources to businesses that cannot access sufficient computing power through existing providers.
Analysts are skeptical, with one branding consultant describing the move as using the company’s existing stock market shell for an unrelated business rather than a genuine operational pivot.
The 580% share surge on a press release, despite no demonstrated product or AI-related revenue, has led retail analysts to categorize this as a meme stock situation driven by AI sentiment rather than fundamentals.
For cloud podcast listeners, this story is a useful data point on how GPU scarcity narratives are influencing capital markets, and raises questions about the credibility of new entrants claiming to address AI compute shortages without established infrastructure or track records.

Closing

351: IAM the One Spending All Your AI Money

Wed, 22 Apr 2026 00:52:46 +0000

Welcome to episode 351 of The Cloud Pod, where the weather is always cloudy! Justin, Matt, and Ryan are in the studio today and ready to bring you the latest in cloud and AI news. And it’s that time of year again – we’re coming up quickly on Google Next, place your so we’ve got our yearly predictions for what’s coming from Vegas, as well as more news about Mythos, Amazon finally becoming a utility, and even an aftershow where we discuss the computing power of Artemis. It’s a great show, so let’s get started!

Titles we almost went with this week

Three StorageClasses Walk Into an AI Workload
Deprecated Models Don’t Die, They Just Fail Your API Calls
SQL Walks Into a Graph Bar and Stays
Too Many Agents Spoil the Workflow
One Registry to Rule All Your Rogue AI Agents
Eight CPUs Walk Into Space, Only One Comes Back
Stop Retyping the Same Gemini Prompt Like a Caveman
Claude Code Routines Let AI Work While You Sleep
AWS Builds a Yellow Pages for Your AI Agents
GPT Finally Stops Refusing to Talk About Hacking
None of the hosts is ready for Next
We are once again trying to look into our next next next crystal ball and failing
Google is gonna announce AI, it’s just mandatory now
Las Vegas is calling, our Livers are crying

A big thanks to this week’s sponsors:

Check out thecloudpod.net/archera to schedule a demo today.

We also wanted to tell you about something coming to the US for the first time — WeAreDevelopers World Congress!

Follow Up

01:47 AI Cybersecurity After Mythos: The Jagged Frontier

Since the original Mythos/Project Glasswing announcement, AISLE published follow-up testing showing that small, inexpensive open-weight models can replicate much of the vulnerability detection work Anthropic attributed to Mythos, with all 8 tested models detecting the flagship FreeBSD NFS buffer overflow, including a 3.6B parameter model costing $0.11 per million tokens.
A notable correction to the framing of the original announcement: cybersecurity AI capability does not scale smoothly with model size or cost.
Model rankings reshuffle completely across different security tasks, meaning there is no single best model for cybersecurity work, which challenges the narrative that a restricted frontier model is required for this category.
The current status of the broader AI security space is that AISLE reports 180-plus externally validated CVEs across 30-plus projects since mid-2025, predating Project Glasswing, and their system now runs on OpenSSL and curl pull requests in production, suggesting the category was already operational before the Anthropic announcement.
A practical update for cloud practitioners is that specificity, meaning correctly identifying patched or safe code, remains a significant weak point across most models tested. Only one model was reliable in both directions, which reinforces that the orchestration layer and triage pipeline around the model matter more than the model itself for production security tooling.
The broader ecosystem implication is that defensive AI security capabilities are accessible today with open or low-cost models, meaning organizations do not need to wait for access to restricted frontier models to begin building vulnerability discovery pipelines, though the scaffolding, security expertise, and maintainer trust-building remain the harder problems to solve.

03:09 Justin – “If you’re in the security space and you want to have it poke holes at your app, it uses really complicated patterns to basically figure out different attack vectors and can actually link different vulnerabilities together.”

General News

06:11 AWS boss explains why investing billions in both Anthropic and OpenAI is an OK conflict

Amazon has invested $8 billion in Anthropic and $50 billion in OpenAI, creating a situation where it holds significant financial stakes in two directly competing AI model companies.
AWS CEO Matt Garman frames this as consistent with Amazon’s long-standing practice of partnering with companies it also competes against, citing Oracle selling its database services on AWS as an established precedent.
The dual investment was partly driven by competitive necessity, as both Anthropic and OpenAI models were already available on Microsoft Azure, AWS’s primary rival in the cloud market.
AI model-routing services are emerging as a key battleground, where cloud providers let customers automatically select different models for different tasks, which also creates a path for cloud providers to insert their own first-party models into customer workflows.
Investor loyalty in AI is broadly eroding, with at least a dozen OpenAI backers also investing in Anthropic’s recent $30 billion round, including Microsoft, suggesting this multi-sided investment pattern is becoming standard across the industry.

07:34 Google Next Predictions

Justin

Wiz + Google Cloud Security/Product Offering
Antigravity IDE + Gemini CLI (agent mode) enhancements
Ironwood TPU GA and/or dedicated Inference-based CHIP

Ryan

Gemini 3.1 Pro GA & Teasing Gemini 3.5 or 4 or future model
Enhancements with agents and Agentic
VMware interruption based on Kubernetes? (Opposite of Tanzu)

Matt

Default Guardrails in AI in general. How Gemini will have guard rails via Vertex.
Agentic coding tooling and how developers are leveraging Agentic (SDLC)
3 Non AI Announcements

Runner Ups

A2A protocol 1.0 released
Turboquant Ships in Vertex AI
Something waymo
Biqquery AI Agents
Gemini 3.1 Flash GA
Axion Gen 2
Nano bananas updates
Sovereign Cloud AI
Gemini Robotics API Preview
Hugging Face
AWS Activate type program
AP2 Payment Protocol
AI in Android
Gemini + Boston Dynamics
Glasswing Answer

How many times is AI said on stage?

Matt- 99
Ryan- 75
Justin- 115

AI Is Going Great – Or How ML Makes Money

24:35 Claude Managed Agents: get to production 10x faster

Anthropic launched Claude Managed Agents in public beta on April 8, 2026, a suite of composable APIs that handle production infrastructure like sandboxed code execution, state management, credential handling, and end-to-end tracing so developers can focus on defining tasks and guardrails rather than building backend systems.
The platform includes long-running autonomous sessions, multi-agent coordination (in research preview), and trusted governance with scoped permissions and identity management, with internal testing showing up to 10 percentage points improvement in task success over standard prompting loops on structured file generation tasks.
Pricing is consumption-based at standard Claude Platform token rates plus $0.08 per session-hour for active runtime, which positions this as a managed alternative to self-hosted agent infrastructure where teams would otherwise spend months on setup before shipping anything to users.
Early adopters include Rakuten, which deployed specialist enterprise agents across five business functions within a week each, and Sentry, which shipped a bug-to-PR pipeline in weeks instead of months by pairing their existing Seer debugging agent with a Claude-powered patching agent.
Developers can get started via the Claude Console, the new CLI, or by using Claude Code with the built-in claude-api Skill, with multi-agent coordination and self-evaluation features still gated behind a research preview access request form.

25:51 Ryan – “So I don’t have to get a fleet of Mac Minis to run all my AI things?”

26:41 The next phase of enterprise AI

OpenAI reports enterprise now accounts for more than 40% of revenue and is projected to reach parity with consumer revenue by the end of 2026, with APIs processing over 15 billion tokens per minute and Codex reaching 3 million weekly active users.
OpenAI Frontier is positioned as a company-wide agent deployment and management layer, distinct from single-product agent implementations, allowing agents to operate across an organization’s tools, systems, and data with centralized governance and permissions.
A Stateful Runtime Environment being co-developed with AWS is designed to give agents persistent context and memory across sessions, addressing a core limitation for complex enterprise workflows that span multiple tools and data sources.
OpenAI is building toward a unified AI superapp that consolidates ChatGPT, Codex, and agentic browsing into a single employee-facing interface, with the stated goal of reducing enterprise rollout friction by leveraging ChatGPT’s existing 900 million weekly users who are already familiar with the interface.
Frontier Alliances‘ partnerships with McKinsey, BCG, Accenture, Capgemini, Databricks, and Snowflake indicate OpenAI is pursuing an integration-first enterprise strategy, meeting customers within existing data infrastructure rather than requiring migration to new platforms.

27:44 Ryan – “This sounds great; all these AI models are only as good as the data they have access to, and when you get into the Enterprise, you’re trying to integrate with all the IT services and other platforms that are used for development or other parts of the business, design tools – there’s all kinds of stuff. And it’s really tricky to sort of manage that. I’ve seen two models where you’re kind of left to your own devices, setting up your own MCP server or your own local integration somehow, or, if there is a platform, you know, sort of a sparse support of that. So I’m really happy to see this developed, and I’m really eager for this type of framework to be more prevalent.”

29:11 Introducing Muse Spark: Scaling Towards Personal Superintelligence

Meta launched Muse Spark, the first model from its new Meta Superintelligence Labs division, available now at meta.ai with a private API preview opening to select users.
It is a natively multimodal reasoning model supporting tool-use, visual chain of thought, and multi-agent orchestration.
A new Contemplating mode orchestrates multiple agents reasoning in parallel, achieving 58% on Humanity’s Last Exam and 38% on FrontierScience Research, positioning it alongside extreme reasoning modes from Gemini Deep Think and GPT Pro.
Meta claims its new pretraining stack reaches equivalent capabilities with over an order of magnitude less compute than Llama 4 Maverick, which has direct implications for infrastructure costs and efficiency at scale, including their new Hyperion data center investment.
The model uses a multi-agent test-time scaling approach that delivers stronger performance at comparable latency versus single-agent extended thinking, and applies token compression via thinking time penalties to optimize reasoning efficiency for serving at scale.
A notable safety finding from Apollo Research identified that Muse Spark showed the highest rate of evaluation awareness of any model they have tested, frequently identifying scenarios as alignment traps. Meta concluded this was not a blocking concern for release but acknowledged it warrants further research.

33:22 Justin – “So the thing about what’s on Humanity’s last exam right now is that the last update is from February 20th. So we’re just waiting to see when Mythos and this new Meta one get added to it, so that’ll be interesting.”

33:41 Introducing routines in Claude Code

Anthropic launched routines in Claude Code as a research preview, letting developers configure automated workflows once with a prompt, repo, and connectors, then run them on a schedule, via API call, or in response to GitHub events without requiring a local machine to be running.
Three trigger types are supported: scheduled cadences (hourly, nightly, or weekly), API-triggered endpoints where each routine gets its own URL and auth token, and GitHub webhook events that spin up a new session per matching PR and continue feeding it updates like comments and CI failures.
The cloud-hosted infrastructure removes the need for developers to manage their own cron jobs, MCP servers, or additional tooling, since routines ship with built-in access to repos and connectors.
Daily routine limits are tiered by plan: Pro users get 5 per day, Max users get 15, and Team and Enterprise users get 25, with additional runs available through extra usage at the same subscription usage rate as interactive sessions.
Practical use cases already emerging include nightly bug triage that pulls from Linear and opens draft PRs, on-call alert summarization posted to Slack, and automated PR review flagging for sensitive code modules like authentication providers.

38:32 Trusted access for the next era of cyber defense

OpenAI launched GPT-5.4-Cyber, a fine-tuned variant of GPT-5.4 specifically designed for cybersecurity work, with reduced refusal boundaries for legitimate defensive tasks and new binary reverse engineering capabilities that let security professionals analyze compiled software without source code access.
The Trusted Access for Cyber program is expanding from a limited pilot to thousands of individual verified defenders and hundreds of teams, with tiered access levels based on identity verification through chatgpt.com/cyber for individuals and a separate enterprise request process for organizations.
Codex Security, which has been in preview, has contributed to fixing over 3,000 critical and high-severity vulnerabilities across the ecosystem, and OpenAI is positioning it as a shift from periodic security audits to continuous automated vulnerability detection integrated into developer workflows.
Access to GPT-5.4-Cyber comes with notable tradeoffs for cloud and API users, specifically that Zero-Data Retention options may be restricted for higher-tier cyber-permissive access, which is a meaningful consideration for enterprises that rely on ZDR for compliance or data privacy requirements.
OpenAI is framing this as a dual-use risk management challenge rather than a simple model release, explicitly acknowledging that cyber capabilities depend on user context and trust signals rather than model capability alone, and building automated verification systems to scale that judgment without manual review.

33:52 Justin – “So weird. A week after Mythos.”

41:53 Redesigning Claude Code on desktop for parallel agents

Anthropic released a redesigned Claude Code desktop app built specifically for managing parallel agentic coding sessions, with a new sidebar that lets developers run simultaneous tasks across multiple repos and filter sessions by status, project, or environment.
The app introduces a drag-and-drop layout system where developers can arrange the terminal, diff viewer, file editor, and chat in custom grid configurations, reducing the need to switch between external tools during code review and shipping.
A side chat feature (Command/Ctrl + semicolon) lets developers ask questions mid-task without polluting the main session context, a practical way to keep long-running agentic tasks on track.
The redesign adds three view modes (Verbose, Normal, Summary) to control how much detail is shown about Claude’s tool calls, plus a usage indicator showing both context window and session consumption at a glance, which matters for teams managing API costs.
The updated app is now available for Pro, Max, Team, and Enterprise plan users, as well as via the Claude API, with SSH support now extended to Mac in addition to Linux for pointing sessions at remote machines.

43:05 Ryan – “So this is everything I was just complaining about earlier. This is perfect. This is why – not having this level of tools – why I haven’t really adopted Claude Code for my main workflows. Because everything that they’re announcing here is exactly what I use GitHub Copilot for.”

AWS

46:02 Manage AI costs with Amazon Bedrock Projects

Amazon Bedrock Projects lets organizations attribute AI inference costs to specific workloads by passing a project ID in API calls, which then flows into AWS Cost Explorer and AWS Data Exports for analysis.
This addresses a real operational gap for teams doing chargebacks or investigating cost spikes across multiple AI applications.
The feature works by attaching resource tags to projects and activating them as cost allocation tags in AWS Billing, using the same tagging and cost management tools organizations already use for other AWS services.
Tags can cover dimensions like application, environment, team, and cost center.
Bedrock Projects currently supports the OpenAI-compatible APIs, including the Responses API and Chat Completions API, meaning teams already using the OpenAI SDK can adopt this with minimal code changes by simply adding a project ID parameter. Requests without a project ID automatically fall to a default project, which could create attribution gaps if not managed carefully.
Organizations can create up to 1,000 projects per AWS account, and there is a 24-hour delay before tags propagate to Cost Explorer and Data Exports, so activating tags immediately after creating the first project is recommended to avoid gaps in billing data.
Pricing for this feature is not separately itemized since it layers on top of existing Bedrock inference costs, but the value is in visibility rather than new spend, helping teams identify where AI budget is actually going before costs scale further.
https://aws.amazon.com/about-aws/whats-new/2026/04/bedrock-iam-cost-allocation/

46:19 Justin – “I can tell you that this is a must-have. Every cloud provider needs to provide this capability. This is a major problem in Vertex. It’s a major problem in Bedrock. And even the project level is probably not granular enough. I need it at IAM identity level.”

50:56 Introducing stateful MCP client capabilities on Amazon Bedrock AgentCore Runtime

Amazon Bedrock AgentCore Runtime now supports stateful MCP servers, enabling bidirectional communication between MCP servers and clients.
The key change is a single flag, stateless_http=False, which provisions a dedicated microVM per user session lasting up to 8 hours.
Three new client capabilities are now available: elicitation for pausing tool execution to collect user input mid-workflow, sampling for delegating LLM generation back to the client without the server needing its own model credentials, and progress notifications for streaming real-time status updates during long-running operations.
The sampling capability is particularly notable for enterprise use cases because it allows MCP servers to leverage the client’s connected LLM without holding API keys or model credentials directly, keeping model access control on the client side.
Each stateful session gets CPU, memory, and filesystem isolation via microVMs, with sessions tracked through an Mcp-Session-Id header.
Sessions expire after 15 minutes of inactivity or a maximum of 8 hours, after which clients must reinitialize.
Practical use cases include multi-step financial workflows that confirm transactions before writing to DynamoDB, travel booking tools that search options and then ask users to choose, and batch processing jobs that report incremental progress rather than leaving users waiting on a blank screen.

51:53 Justin – “This can be dangerous. So definitely this one, if you’re implementing stateful MCPs, I would make sure you have a very good security model for them.”

54:53 AWS Agent Registry for centralized agent discovery and governance is now available in Preview

AWS Agent Registry, part of Amazon Bedrock AgentCore, is now in preview as a centralized catalog for discovering and governing AI agents, tools, MCP servers, and custom resources within an organization, helping teams avoid rebuilding capabilities that already exist.
The registry supports URL-based discovery that automatically pulls metadata like tool schemas from live agent endpoints, plus an approval workflow so admins can gate what becomes discoverable, with CloudTrail providing full audit trails for compliance.
Developers can search the registry using natural language semantic search or keyword search, and can access it via the console, AWS CLI, SDK, or directly from their IDEs as an MCP server, supporting both IAM and OAuth with custom JWT.
The preview is available in five regions: US East (N. Virginia), US West (Oregon), Europe (Ireland), Asia Pacific (Tokyo), and Asia Pacific (Sydney), with no pricing details published yet for this preview feature.
For organizations running multiple AI agent projects across teams, this addresses a practical governance gap by providing visibility into what agents exist and enforcing policies before new ones are deployed or discovered.

55:44 Ryan – “It’s funny cause I don’t really think about Bedrock AgentCore for Enterprise, but maybe it would allow that, maybe in a sideways kind of way.”

56:46 Kiro CLI 2.0: a new look and feel, headless CI/CD pipelines, and Windows support

Kiro CLI 2.0 introduces headless mode, allowing developers to run the agentic terminal programmatically via API key and environment variables, enabling integration into CI/CD pipelines and build scripts without user interaction.
Native Windows support removes the need for workarounds like WSL, letting developers use Kiro agents directly in Windows Terminal for tasks like codebase navigation, bug tracing, and workflow automation.
The updated TUI is now generally available after an experimental period, adding a subagent monitoring view accessible via Ctrl+G, real-time task lists, and parallel subagent execution that protects parent agent context on complex tasks.
The headless mode is particularly relevant for teams looking to automate pull request generation and deployment troubleshooting workflows, reducing the need for continuous manual monitoring in release pipelines.
Pricing details are not specified in the announcement, so listeners interested in production use should check kiro.dev for current plan information before building automation workflows around the headless API.

58:37 Amazon.com, Inc. – Amazon to Acquire Globalstar and Expand Amazon Leo Satellite Network

Amazon is acquiring Globalstar in a deal expected to close in 2027, gaining its LEO satellite fleet, MSS spectrum licenses with global authorizations, and direct-to-device technology to expand the Amazon Leo satellite network beyond its current broadband focus.
Starting in 2028, Amazon Leo will deploy a next-generation Direct-to-Device satellite system enabling voice, text, and data services on standard mobile phones without specialized hardware, targeting coverage gaps where terrestrial cellular networks cannot reach.
Amazon and Apple have signed an agreement for Amazon Leo to power satellite features on iPhone 14 and later and Apple Watch Ultra 3, continuing services like Emergency SOS, Messages, Find My, and Roadside Assistance via satellite that Globalstar currently provides to Apple.
The combined network is designed to support hundreds of millions of endpoints globally, with practical applications spanning consumer emergency messaging, enterprise IoT, fleet tracking, disaster response fallback connectivity, and rural broadband extension.
For AWS customers and partners, this positions Amazon as a vertically integrated connectivity provider competing directly with Starlink and other satellite operators, which could eventually influence how edge computing, IoT, and hybrid cloud architectures are designed for remote and mobile deployments.

59:28 Justin – “I guess we can finally say that the conversion from Amazon the bookstore to Amazon the utility is finally complete.”

GCP

1:03:02 Optimize AI/ML workloads with GKE Cloud Storage FUSE Profiles

GKE Cloud Storage FUSE Profiles, now generally available in GKE version 1.35.1-gke.1616000, automate storage configuration for AI/ML workloads by replacing manual tuning with three pre-built StorageClasses: gcsfusecsi-training, gcsfusecsi-serving, and gcsfusecsi-checkpointing.
The feature addresses a real operational pain point where customers were leaving performance on the table or experiencing Pod Out-of-Memory kills due to misconfigured Cloud Storage FUSE settings that previously required navigating dozens of pages of documentation.
The system dynamically scans your bucket and analyzes node resources, including RAM, Local SSD, and accelerator type, to calculate optimal cache sizes at deployment time, removing the need to manually account for these variables across different infrastructure configurations.
The serving profile includes automated Rapid Cache integration, and Google reports a notable real-world result: model loading time for a Qwen3-235B-A22B workload on TPUs dropped from 39 hours to 14 minutes using the inference profile.
Pricing for this feature follows standard GKE and Cloud Storage pricing since the profiles are pre-installed StorageClasses within the CSI driver, though teams should factor in Local SSD and RAM usage costs that the system may allocate automatically based on node resources.

1:04:12 Generate 3D models and interactive charts with the Gemini app

Gemini now generates interactive 3D models and charts directly in chat at gemini.google.com, moving beyond static text and diagrams to functional simulations users can manipulate in real time.
This is available by selecting the Pro model and prompting Gemini to “show me” or “help me visualize” a concept.
The feature supports adjustable parameters like sliders and numeric inputs, so users can modify variables such as gravity or velocity and immediately see updated results.
This makes it practical for exploring scientific concepts, physics simulations, and molecular structures without external tools.
The rollout is global for standard Gemini app users, though Education and Workspace accounts are currently excluded. No additional cost is mentioned beyond existing Gemini Pro access, so pricing appears to be included within current subscription tiers.
Likely use cases include education, research, and data analysis workflows where visual exploration of complex systems adds clarity.
Industries like life sciences, engineering, and academic institutions stand to benefit most from interactive molecular and physics visualizations.
For GCP customers, this signals Google’s direction toward embedding richer, interactive AI outputs into its Gemini ecosystem, which could eventually extend to Workspace and enterprise tools once the Education and Workspace exclusion is lifted.

1:04:56 Ryan – “This is something that makes me think about actively getting a Gemini Pro account, which I don’t have today. Just the amount of stuff that I do with 3D printing, and being able to generate a model that I can then import into a tool, and fuse and tweak it, or maybe just would generate G code directly. So this is, I like this, and it’s definitely something I can see myself using.”

1:06:59 Essential AI and cloud security now on by default

Google Cloud is automatically enabling an enhanced Security Command Center Standard tier for eligible customers at no cost, adding AI protection features, including a unified dashboard that detects unprotected Gemini inference and reports on LLM guardrail violations, with general availability expected by the end of June 2025.
The free Standard tier now includes more than 44 misconfiguration checks based on the Google Cloud Security Essentials compliance framework, up from the previous count by 21 checks, along with agentless critical vulnerability scanning and graph-driven risk prioritization.
Data security posture management has been added to the free tier, allowing teams to discover and visualize data across Vertex AI, BigQuery, and Cloud Storage, with Compliance Manager included for automated monitoring against the GCSE framework.
SCC now surfaces in-context security findings directly inside Cloud Hub, GCE, and GKE dashboards, giving infrastructure administrators security insights without switching between tools, which should reduce time to remediation.
Organizations needing advanced capabilities like threat intelligence, virtual red team risk analysis, or malware scanning can start a 30-day free trial of SCC Premium directly from the console, with the Standard tier serving as a no-cost baseline for teams not yet ready for premium features.

1:07:52 Ryan – “I really like this, and especially the free tier aspect of this, just because it is already such a challenge to know where your AI workloads are. And then having the specific configuration checks is great. I do think that the checks themselves – I played around with the 21 – they were a little basic, so it wasn’t that great. I do think it’s a great thing to have. The data scanning is super key, because that’s typically been really expensive to run and classify your data, and know where your sense of data is. So very cool.”

1:08:40 Looker Studio is Data Studio

Google is rebranding Looker Studio back to its original name, Data Studio, positioning it as a hub for personal data exploration and ad-hoc reporting across Google data sources, including BigQuery, Google Sheets, and Google Ads.
The platform now serves as a single location for multiple asset types beyond traditional reports, including BigQuery conversational agents and data apps built in Colab notebooks, reflecting a broader shift toward AI-era analytics workflows.
Data Studio will coexist with Looker rather than replace it, with Looker remaining the enterprise BI platform focused on governed data and semantic modeling, while Data Studio targets individual and small team use cases.
Pricing follows a two-tier model: the standard Data Studio remains free for individual use, while Data Studio Pro adds AI features, enterprise security, and compliance capabilities at a paid tier purchasable through the Google Cloud console or Google Workspace Admin Console (specific Pro pricing was not disclosed in the announcement).
Existing users should see no disruption, as all current reports, data sources, and assets will migrate automatically to the new experience without any required action.

1:09:34 Justin – “That was one of the big problems with Looker Studio, was that it wasn’t really meant for enterprise. So this Data Studio Pro version gives you that capability, finally.”

1:10:51 Introducing BigQuery Graph

BigQuery Graph is now in preview, bringing native graph analytics into BigQuery using the ISO GQL standard.
This lets analysts run multi-hop relationship queries without leaving BigQuery or learning a separate graph database system.
The key technical distinction is that graph schemas are created on top of existing relational tables with no data duplication or movement. Users can mix SQL and GQL in the same queries, which lowers the barrier for teams already invested in SQL skills.
Integration with Spanner Graph is a notable addition, allowing federated queries that combine real-time Spanner data with historical BigQuery data in a single virtual graph. This addresses a common pain point where operational and analytical graph data live in separate systems.
Real-world results from early adopters give some concrete numbers to consider: Curve reported roughly 9.1 million pounds in fraud savings by replacing SQL-based network analysis with graph queries, and Virgin Media O2 is running 4-hop queries to map relationships between accounts, devices, and activities.
Pricing is not explicitly stated in the announcement, as this is a preview feature, so listeners should check the BigQuery documentation here for current details.
Primary use cases include fraud detection, supply chain analysis, drug discovery, and customer relationship modeling.

1:12:40 Turn your best AI prompts into one-click tools in Chrome

Google launched Skills in Chrome, a feature that lets users save custom Gemini prompts and rerun them with a single click using the forward slash or plus button interface, eliminating the need to retype repeated prompts across browsing sessions.
Skills can operate across multiple tabs simultaneously, which makes it practical for tasks like comparing product specs or scanning several documents at once without manual prompt repetition.
Google is also shipping a pre-built Skills library for common workflows like ingredient breakdowns, gift selection, and macro calculations, with options to customize any library Skill by editing the underlying prompt.
On the privacy and security side, Skills inherits Chrome’s existing Gemini safeguards, including automated red-teaming and confirmation prompts before sensitive actions like sending email or adding calendar events.
Saved Skills sync across signed-in Chrome desktop devices, making this more of a persistent personal workflow tool than a one-off browser feature, though it is limited to desktop, and there is no mention of separate pricing beyond existing Gemini in Chrome access.

1:14:42 Ryan – “I’m trying to figure out whether I like this or not, right? Because I can think of some things that are kind of cool. And I’m trying to get around the, you know, the silliness of just executing things without really knowing what’s going on. That’s usually how security problems get introduced.”

Azure

1:17:36 Microsoft’s Agent Stack Confuses Developers While Rivals Simplify

Microsoft released Agent Framework 1.0 on April 3, merging Semantic Kernel and AutoGen into a single SDK after maintaining them as incompatible parallel frameworks.
AutoGen will now receive only bug fixes and security patches, meaning developers on either framework face meaningful migration work to adopt the new unified tool.
The Azure agent stack still spans multiple distinct surfaces, including Agent Framework for pro-code development, Copilot Studio for low-code, Foundry Agent Service as the managed runtime, and the Microsoft 365 Agents SDK for Teams distribution. Each surface has its own documentation and deployment model, requiring enterprise teams to make platform decisions before writing any agent logic.
Agent 365, a governance and compliance control plane for monitoring agents at enterprise scale, reaches general availability on May 1 at $15 per user per month. This adds another procurement decision on top of the existing build and runtime layers rather than consolidating them.
By comparison, Google Cloud’s Agent Development Kit feeds directly into Agent Engine on Vertex AI with a single CLI command for deployment, and AWS positions Strands Agents SDK as a thin framework that pairs cleanly with AgentCore as its managed runtime. Both competitors offer a more direct path from local development to production without requiring lateral platform decisions.
Enterprise teams evaluating Azure for agentic workloads should map which surfaces their development, operations, and security teams will standardize on at each layer and account for the organizational cost of those decisions, including migration effort from Semantic Kernel or AutoGen.

1:19:11 Matt – “Microsoft making things harder and more confusing? Never. ”\

After Show

54:04 How NASA Built Artemis II’s Fault-Tolerant Computer – Communications of

the ACM

Artemis II’s Orion capsule runs eight CPUs in parallel across four Flight Control Modules, using a fail-silent design where faulty processors drop out rather than transmit bad data, and the system can lose three of four modules within 22 seconds and still operate safely on the remaining one.
The architecture enforces strict determinism through time-triggered Ethernet and an ARINC653 scheduler, ensuring all processors see identical inputs and produce identical outputs, which is a notable contrast to modern Agile and DevOps practices, where this level of architectural discipline is increasingly uncommon.
NASA uses dissimilar redundancy for the backup system, meaning different hardware, a different operating system, and independently written, simplified software, specifically to prevent a common software bug from taking down both primary and backup systems simultaneously.
The verification process relies on supercomputer-scale fault injection and Monte Carlo stress testing to simulate full mission timelines with catastrophic hardware failures introduced, which offers a practical model for how cloud and infrastructure teams might approach resilience testing at scale.
The broader industry implication is that as software takes over functions previously handled by mechanical or manual controls, whether in spacecraft, autonomous vehicles, or industrial systems, the engineering patterns developed here around fail-silent design and layered redundancy become increasingly relevant outside of aerospace.

350: It looks like you're trying to send an email from 250,000 miles away! Would you like help with that?

Thu, 16 Apr 2026 16:30:13 +0000

Welcome to episode 350 of The Cloud Pod, where the weather is always cloudy! Justin, Jonathan, and Matt are this week’s hosts, and they’ve scoured the clouds for all the latest news and announcements, including that Mythos drop. Is it the AI apocalypse that everyone is claiming? We’ve also got news from DigitalOcean, an email from Space, Claude and even some Guardrails. There’s a lot to cover, so let’s get started!

Titles we almost went with this week

Two AIs Walk Into a Studio and Actually Sound Good
No More Idle GPUs Twiddling Their Tensor Cores
When AWS Availability Zones Become Unavailability Zones
Token by Token Codex Pricing Finally Makes Cents
Just Ask AWS Where All Your Money Went
You’ve Got mTLS: Amazon SES Locks Down Email Security
Cost Explorer Finally Speaks Plain English
Missiles Make AWS Multi-Region Strategy Mandatory
Shell Yeah Your Agent State Now Persists
S3 Files Finally Lets You ls Your Bucket
Claude Found Your Zero-Day Before Lunch
One Guardrail to Rule All Your AWS Accounts
Premium SSD Wins Azure VDI but Your Wallet Cries
No More Amnesia: Your Bedrock Agent Keeps Its Memories
Pay Per Claw Anthropic Sharpens Its Pricing Policy
Even Astronauts Need IT Support for Microsoft Outlook
AWS still can’t answer the question of what EC2 Other is
AWS announces several new Unavailability Zones

A big thanks to this week’s sponsor:

Check out thecloudpod.net/archera to schedule a demo today.

Follow Up

00:45 Ground control to Microsoft: Artemis 2 astronauts deal with Outlook hiccup in deep space

Artemis 2 astronauts aboard NASA’s Orion spacecraft encountered a common Outlook configuration issue on their first day in space, requiring remote IT support from Mission Control to resolve it by reloading the commander’s files.
NASA uses commercial off-the-shelf software like Microsoft Outlook for crew scheduling and personal communications, while keeping primary flight systems on separate radiation-hardened hardware, illustrating a practical separation of concerns in mission-critical environments.
The Outlook issue stemmed from the app having configuration problems when no direct network connection is available, which the flight director noted is not uncommon, raising questions about offline-readiness for software deployed in connectivity-constrained environments.
This incident is a useful reminder for cloud and enterprise software users that applications heavily dependent on network connectivity can behave unpredictably in low or no-connectivity scenarios, and offline mode reliability remains an important consideration for software selection.
Microsoft has not issued a public comment, but the episode highlights how widely deployed enterprise software is, reaching use cases well beyond what vendors typically design or test for.

03:31 Iran declared AWS, Google, and Microsoft data centers military targets. The Legal and strategic fallout is just beginning

Iran’s April 2025 declaration named the Joint Warfighting Cloud Capability (JWCC) contract specifically, arguing that AWS, Google, Microsoft, and Oracle data centers hosting Pentagon AI and intelligence workloads have lost civilian status under the Geneva Conventions principle of distinction.
The legal argument centers on the fact that classified military workloads share physical infrastructure with banking, healthcare, and consumer services.
The JWCC contract, worth up to $9 billion, was deliberately designed to distribute military workloads across multiple commercial providers to avoid vendor lock-in, but this decision inadvertently spread the targeting problem across every major hyperscaler simultaneously rather than containing it to a single provider.
Northern Virginia, the densest data center concentration on Earth, sits near the Pentagon’s most sensitive cloud workloads, meaning a single facility in Ashburn could simultaneously process classified Pentagon data, hospital records, and financial transactions with no practical way to separate them once a conflict begins.
Insurance and operational costs are already responding to this risk, with businesses in geopolitically sensitive regions facing substantially higher premiums for multi-region redundancy and war-risk coverage, costs that will eventually pass through to end customers regardless of whether any strike occurs.
The article identifies three structural fixes: DoD physically isolating JWCC workloads from civilian infrastructure, Congress updating defense cloud procurement rules to account for civilian collateral risk, and hyperscalers disclosing to commercial customers whether their specific facilities host military workloads.
None of these changes are currently underway at a meaningful scale.

04:05 Justin – “In the case of FedRAMP and JWCC, those are typically in the FedRAMP data centers in the US, so it’s a little bit of an interesting distinction, but there’s no guarantee that they’re not putting FedRAMP-type workloads into regions closer to the war zone. There’s no conversation about that, so I can see Iran’s point in this. And this will definitely make insurance and operating in clouds more expensive for companies who are very politically sensitive.”

AI Is Going Great – Or How ML Makes Money

08:07 Codex now offers pay-as-you-go pricing for teams

OpenAI is introducing pay-as-you-go pricing for Codex-only seats within ChatGPT Business and Enterprise workspaces, billing on token consumption with no rate limits instead of a fixed per-seat fee, giving teams more cost visibility across workflows.
ChatGPT Business annual pricing drops from $25 to $20 per seat for teams that want standard ChatGPT access with Codex usage limits included, while the new Codex-only seat option serves teams that want dedicated coding agent access without the broader ChatGPT bundle.
OpenAI is offering eligible Business workspaces $100 in credits per new Codex-only team member added, capped at $500 per team for a limited time, which lowers the barrier for initial pilots.
Codex now supports Plugins and Automations through its macOS and Windows apps, allowing teams to connect the coding agent to existing internal systems and tooling rather than treating it as a standalone tool.
OpenAI reports over 2 million weekly active Codex builders and a 6x growth in Codex users within Business and Enterprise accounts since January, with named customers including Notion, Ramp, and Braintrust using it to standardize engineering workflows.

09:31 Jonathan – “I think if you want the best performance, you’re going to have to pay for what you use. I think anyone that’s paying for a bundle is always going to be second class.”

13:57 Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage

Starting April 4, Anthropic is requiring Claude Code subscribers to pay separately on a pay-as-you-go basis for usage through third-party tools like OpenClaw, rather than drawing from their existing subscription limits.
This affects all third-party harnesses, with more platforms to follow.
Anthropic’s head of Claude Code cited infrastructure constraints and unsustainable usage patterns from third-party tools as the reason for the change, and noted the company is offering full refunds to subscribers who were unaware of the policy shift.
The timing is notable given that OpenClaw’s creator, Peter Steinberger, recently joined OpenAI, and OpenClaw continues as an open source project with OpenAI backing. Steinberger publicly stated he attempted to negotiate with Anthropic and only managed to delay the pricing change by one week.
For developers building on or using AI coding assistants through third-party integrations, this signals a broader industry pattern where AI providers may separate subscription pricing from API-level or harness-level consumption, adding cost complexity for teams relying on open source tooling around proprietary models.
OpenAI recently shut down its Sora app to reallocate compute resources, reflecting that both major AI providers are actively managing infrastructure capacity as demand from software engineering use cases like Claude Code continues to grow.

14:59 Jonathan – “I understand *why* they’re doing it, because there’s a big difference between somebody having a conversation or somebody doing coding, where you are mostly using cache hits for the majority of the work, versus OpenClaw where the context changes constantly, and making calls every 60 seconds. It is a completely different type of workload. At the same time, I’m paying $200 a month…”

16:38 Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

Anthropic has signed a multi-gigawatt TPU capacity agreement with Google and Broadcom, with infrastructure expected to come online starting in 2027. This builds on an existing October 2025 TPU expansion and deepens Anthropic’s reliance on Google Cloud alongside AWS and NVIDIA hardware.
Anthropic’s run-rate revenue has grown from roughly $9 billion at the end of 2025 to over $30 billion, with enterprise customers spending over $1 million annually, doubling from 500 to over 1,000 in under two months. The compute expansion is a direct response to this accelerating demand.
Anthropic continues a multi-cloud hardware strategy, running Claude on AWS Trainium, Google TPUs, and NVIDIA GPUs to match workloads to appropriate chips. Amazon remains the primary cloud and training partner, with ongoing work on Project Rainier.
Claude is currently the only frontier AI model available across all three major cloud platforms: AWS Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry.
This broad availability has practical implications for enterprises already committed to any of the three major cloud providers.
The majority of new compute will be US-based, extending Anthropic’s November 2025 pledge to invest $50 billion in American AI infrastructure.
For cloud practitioners, this signals continued long-term capacity constraints driving large-scale, multi-year infrastructure commitments across the industry.

08:24 Project Glasswing: Securing critical software for the AI era

Anthropic announced Project Glasswing, a coalition including AWS, Google, Microsoft, Apple, Cisco, NVIDIA, and others, built around a new unreleased model called Claude Mythos Preview that is focused specifically on finding and fixing software vulnerabilities in critical infrastructure.
Mythos Preview has already identified thousands of high-severity vulnerabilities autonomously, including a 27-year-old flaw in OpenBSD, a 16-year-old bug in FFmpeg that survived 5 million automated test runs, and a Linux kernel privilege escalation chain, all of which have since been patched.
The model will not be generally available, but partners can access it via Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry at $25 per million input tokens and $125 per million output tokens after an initial period covered by $100M in Anthropic usage credits.
Anthropic is donating $4M to open-source security organizations, including Alpha-Omega, OpenSSF through the Linux Foundation, and the Apache Software Foundation, to help maintainers respond to vulnerabilities the model surfaces.
The initiative signals a shift in how AI safety and capability tradeoffs are being handled in practice, with Anthropic planning to test new cybersecurity safeguards on an upcoming Claude Opus model before considering any broader deployment of Mythos-class capabilities.

20:44 Justin – “…is it probably really great at finding stuff? Is it really good at chaining things together to find these attacks? Yes. Is it as scary as they may get out to be? Maybe, maybe not. I don’t know; time will tell. I’m not going to be spending money on Mythos tokens to find out, but I am curious to see what people are coming out with now that it’s out in the wild.”

AWS

21:56 Announcing managed daemon support for Amazon ECS Managed Instances

ECS Managed Daemons lets platform engineers deploy and update monitoring, logging, and tracing agents independently from application teams, eliminating the need to coordinate task definition changes or service redeployments across hundreds of services.
Daemons are guaranteed to start before application tasks and drain last, ensuring operational tooling like the CloudWatch Agent is always available throughout the application lifecycle, including during rolling updates.
A new daemon_bridge network mode keeps daemon containers isolated from application networking while still allowing communication, and daemons support privileged container access and host filesystem mounts for deep system-level visibility.
Each instance runs exactly one daemon copy shared across all application tasks on that instance, which optimizes resource utilization and allows CPU and memory parameters to be managed centrally without rebuilding AMIs or modifying application task definitions.
The feature is available now in all AWS regions at no additional charge beyond standard compute costs for the daemon tasks themselves, and can be configured through the ECS console or the new managed daemons API documented here.

22:55 Jonathan – “What a useful feature!”

23:36 Amazon SES Mail Manager adds new features for enhanced security and email processing

Amazon SES Mail Manager now supports optional STARTTLS configuration, allowing legacy systems that lack full STARTTLS support to still connect to Mail Manager without requiring a full infrastructure overhaul.
Mutual TLS (mTLS) adds certificate-based authentication at the Ingress Endpoint level, giving organizations a stronger identity verification layer for inbound email connections beyond standard encryption.
Two new rule actions expand email processing flexibility: Invoke Lambda lets you trigger custom code directly from rule sets for advanced routing or transformation logic, while the Bounce action sends RFC-compliant SMTP rejection responses back to sending servers.
These features are available now across most SES Mail Manager regions, with the notable exception of Middle East UAE and Middle East Bahrain, so customers in those regions will need to wait for expansion.
Pricing for SES Mail Manager follows existing SES usage-based pricing, so the cost impact of these new features will depend on Lambda invocation volume and overall email processing scale rather than any new flat fees.

24:38 Justin – “They’ve actually added a lot of features to Mail Manager recently; the fact that it now can handle bounce protection and handles all of that stuff that you used to have to build your own toil for, it’s nice that that stuff is now there.”

26:07 Amazon Bedrock Guardrails supports cross-account safeguards with centralized control and management

Amazon Bedrock Guardrails now supports cross-account safeguards in general availability, letting organizations enforce a single guardrail policy across all AWS accounts and organizational units from a central management account, covering every Bedrock model invocation automatically.
There are two enforcement levels: organization-level enforcement applies guardrails via AWS Organizations policies to all member accounts and OUs, while account-level enforcement applies guardrails to all Bedrock inference calls within a single account, giving teams flexibility to layer controls.
A notable configuration option lets admins choose between Comprehensive mode, which enforces guardrails on all content regardless of caller tagging, and Selective mode, which only applies guardrails to content that callers explicitly tag, useful for mixed workloads with pre-validated and user-generated content.
One practical gotcha worth flagging: specifying an incorrect guardrail ARN in the policy does not just fail silently; it blocks all Bedrock model inference for affected accounts, so ARN accuracy is critical before attaching policies to production OUs.
The feature is available now across all AWS commercial and GovCloud regions where Bedrock Guardrails is supported, with pricing tied to each enforced guardrail based on its configured safeguards per the Amazon Bedrock pricing page.
Automated Reasoning checks are not supported with this capability.

27:02 Justin – “If you are not careful, you can lock yourself out of your domain.”

28:09 AWS Cost Explorer launches Natural Language Query capabilities powered by Amazon Q

AWS Cost Explorer now supports natural language queries powered by Amazon Q Developer, letting users ask plain-English questions like “Show me my top spending services this month” and receive both written insights and automatically updated charts, filters, and groupings simultaneously.
The feature supports conversational follow-up questions with maintained context, meaning users can move from a quick cost check to a detailed investigation without switching tools or manually reconfiguring visualizations.
When Amazon Q pulls from additional datasets beyond raw cost and usage data, such as pricing catalogs or anomaly detection, those results appear in a separate artifacts panel rather than the main Cost Explorer view, which is a useful distinction to understand when interpreting outputs.
This is available at no additional charge across all commercial AWS Regions today, making it accessible without budget justification for teams already using Cost Explorer.
The practical impact is that non-technical stakeholders, like finance or product teams, can now query AWS spend directly without needing to understand Cost Explorer’s filter and grouping mechanics, potentially reducing the bottleneck on cloud or DevOps teams for routine cost reporting.

28:44 Justin – “I did play with, this because I was curious and I’ve done a lot of really cool things with AI for cost management recently. It’s not very good. Like most Amazon Q things, it’s not great.”

31:38 Building real-time conversational podcasts with Amazon Nova 2 Sonic

Amazon Nova 2 Sonic is a speech-to-speech model available through Amazon Bedrock that handles real-time conversational AI with support for seven languages and a 1 million token context window, making it practical for voice-first applications like customer support and interactive learning.
The AWS blog post demonstrates a proof-of-concept podcast generator that uses two Nova Sonic instances to simulate a host-and-expert dialogue, streaming audio in real time using a Flask and AsyncIO architecture with RxPy for reactive event processing.
A notable technical detail is the stage-aware content filtering system, which distinguishes between SPECULATIVE and FINAL generation stages to eliminate duplicate audio chunks and prevent artifacts, using a combination of interruption markers, text deduplication, and audio hash fingerprinting.
The architecture captures audio at 16kHz PCM input and returns synthesized speech at 24kHz PCM output through a bidirectional event stream brokered by Amazon Bedrock, with the blog noting that PyAudio is suitable for server-side demos, but production deployments should use Web Audio API or WebRTC for browser clients.
Practical use cases beyond podcasting include multilingual content localization, ecommerce product commentary, and enterprise training content, with pricing tied to Amazon Bedrock consumption-based rates rather than a fixed subscription, so costs scale with actual usage volume.

32:32 Matt – “Still not out of a podcasting job yet. Got it.”

33:40 Launching S3 Files, making S3 buckets accessible as file systems

Amazon S3 Files lets you mount any general-purpose S3 bucket as a native NFS v4.1 file system on EC2, ECS, EKS, and Lambda, meaning you can use standard file commands like ls, cp, and echo while changes sync back to S3 within minutes.
This eliminates the longstanding tradeoff between S3’s durability and cost versus a file system’s interactive capabilities.
Under the hood, S3 Files is built on EFS and delivers approximately 1ms latency for active data, with intelligent pre-fetching and byte-range reads to minimize unnecessary data transfer and costs.
Files not on high-performance storage are served directly from S3 to maximize throughput for large sequential reads.
The feature is positioned specifically for workloads where multiple compute resources need shared, concurrent access to the same data without duplication, including agentic AI systems using file-based Python tools and ML training pipelines. It supports NFS close-to-open consistency for collaborative workloads.
Pricing is based on data stored in the file system, small file reads, write operations, and S3 requests during synchronization, so costs will vary significantly by access pattern and workload type.
Full pricing details are on the S3 pricing page, and the service is available now in all commercial AWS regions.
AWS is careful to position S3 Files alongside rather than replacing EFS and FSx, noting FSx remains the better choice for on-premises NAS migrations, HPC workloads with Lustre, and workloads requiring NetApp ONTAP or Windows File Server compatibility.
Last Week in AWS blog: S3 Is Not a Filesystem (But Now There’s One In Front of It)

35:32 Jonathan – “In other possibly related news, the NetApp stock price is down from $104 to $96 in the past 24 hours, because that’s basically what NetApp tiered storage does.”

GCP

39:37 Unifying real-time and async inference with GKE Inference Gateway

GKE Inference Gateway now supports both real-time and async inference workloads on the same shared GPU/TPU accelerator pool, eliminating the need to maintain separate clusters for each traffic type.
This addresses a common infrastructure inefficiency where real-time clusters sit idle during off-peak hours while async jobs run on underutilized secondary hardware.
The async component works by integrating a Batch Processing Agent with Cloud Pub/Sub, where latency-tolerant requests are pulled from a queue and routed to the Inference Gateway as lower-priority “sheddable” traffic that fills unused compute cycles between real-time spikes.
Testing showed that without the Async Processor Agent, unmanaged multiplexing of low-priority requests caused 99% message drop, while using the agent resulted in 100% of latency-tolerant requests being served during available capacity windows.
This demonstrates that the priority enforcement mechanism is doing meaningful work, not just theoretical traffic shaping.
The project is open source and available on GitHub at github.com/llm-d-incubation/llm-d-async, meaning teams can use it across multiple cloud environments rather than being locked into GKE specifically. Pricing would follow standard GKE and Pub/Sub usage costs with no separate charge for the gateway component itself.
The next development phase will add deadline-aware scheduling, letting users set soft completion windows for batch jobs so the system can make more informed decisions about when to process filler traffic relative to real-time demand.

40:48 Jonathan – “…that’s very cool; especially works in the interest of the cloud vendors now who can maximize their utilization of the GPUs. There’s still a lot CPUs sitting there idle though, like 60 to 70% CPU idle while those GPUs are full on.”

41:04 Improve coding agents’ performance with Gemini API Docs MCP and Agent Skills.

Google released two tools to address a core limitation of coding agents: outdated API knowledge due to training data cutoffs.
The Gemini API Docs MCP connects agents to live Gemini API documentation via the Model Context Protocol, while Gemini API Developer Skills adds best-practice patterns and SDK guidance.
Using both tools together shows measurable improvements in evals, achieving a 96.3% pass rate with 63% fewer tokens per correct answer compared to standard prompting.
The token reduction is worth noting for developers concerned about cost and latency in agentic workflows.
The MCP server is accessible at gemini-api-docs-mcp.dev and works with any MCP-compatible coding agent, making it broadly applicable beyond just Google-native tooling. Setup documentation is available at ai.google.dev/gemini-api/docs/coding-agents.
This approach of pairing a live documentation server with a skills layer is a practical pattern that other API providers could adopt, and it highlights a growing need for real-time context injection as AI coding tools become more common in developer workflows.

Azure

44:09 Public Preview: Rule impact analysis on Azure Network Watcher

Azure Network Watcher now offers a public preview feature called Rule Impact Analysis, which lets network admins simulate the effect of security admin rules before actually applying them to their environment, reducing the risk of unintended connectivity disruptions.
The feature is particularly useful for teams managing Azure Virtual Network Manager security configurations, as it helps identify rule conflicts and validate that connectivity requirements are met before deployment.
This addresses a common operational pain point where applying network security rules in production environments can cause outages or unexpected behavior that is difficult to roll back quickly.
Target users are network and security engineers in organizations with complex Azure networking topologies who need a safer change management process for security policy updates.
The feature is currently in public preview, which typically means no additional cost beyond standard Network Watcher pricing, though customers should verify final pricing at general availability via the Azure pricing calculator at azure.microsoft.com/pricing.

44:35 Justin – “2026 and we still dealing with rule conflicts and firewalls.”

48:01 Azure VDI Storage Benchmark: Premium SSD vs Standard SSD Performance and Cost Breakdown

GO-EUC’s benchmark research comparing Premium SSD and Standard SSD for Azure VDI workloads found that Premium SSD delivers up to 8 times higher IOPS and 80-90% lower latency than Standard SSD, with the performance gap widening as disk size increases.
Standard SSD shows a fixed performance ceiling of roughly 850-980 IOPS regardless of disk size, while Premium SSD scales from about 1800 IOPS at 128GB up to 8100 IOPS at 2048GB, making disk sizing a meaningful architectural lever only for Premium SSD.
The cost comparison is less straightforward than it appears because Standard SSD carries transaction fees that can push its total cost close to Premium SSD pricing under heavy VDI workloads, making Premium SSD a more predictable cost option despite its higher base price.
The 2048GB Premium SSD at $284.94 per month emerges as the recommended sweet spot, since moving to 4096GB costs $545.10 with only marginal performance gains, and at 2500-seat scale that sizing decision translates to over $7.8 million in annual cost difference.
The research used synthetic DiskSpd testing rather than real user load simulation, so results reflect maximum disk capabilities under controlled conditions and may differ from production environments, with GO-EUC noting a load simulation follow-up is planned.

49:25 Justin – “It’s not Microsoft, and it’s not something Microsoft paid for and was something done independently, so I approve.”

Emerging Clouds

51:26 Now Available: DigitalOcean Cloud Security Posture Management (CSPM)

DigitalOcean has launched a native Cloud Security Posture Management tool that continuously evaluates resources like Droplets and Databases for misconfigurations without requiring agents or third-party tools, making it accessible to smaller teams without dedicated security staff.
The tool is built directly into the DigitalOcean dashboard and API, addressing a common pain point where security visibility requires separate tooling and context switching across platforms.
Unlimited free scans are available to all DigitalOcean customers, with advanced rules, automated guidance, and API integrations on upgraded plans, lowering the barrier to entry for basic security posture monitoring.
A feature called Security Advisor adds an AI layer that summarizes findings and surfaces high-priority risks, helping teams focus on the most impactful issues first and reducing alert fatigue.
This offering is positioned toward startups and SMBs running production workloads, including AI inference, who may lack the resources to implement enterprise-grade security tooling but still need consistent visibility into infrastructure risk.

52:23 Matt – “It’s definitely a nice feature to give the general developer or security person that might not know the intricacies of DigitalOcean a ‘here’s a read flag go look at this’.”

After Show

54:04 How Microsoft Vaporized a Trillion Dollars

The author, a senior Microsoft engineer who rejoined Azure Core in May 2023, discovered on his first day that a 122-person org was seriously planning to port large portions of Windows to a tiny, low-power ARM chip on the Azure Boost accelerator card — a plan he immediately recognized as physically impossible given the hardware constraints.
Nobody at Microsoft could explain why up to 173 agents were needed to manage each Azure node, what they all did, or how they interacted — a sprawl that created enormous fragility in the system orchestrating VMs for OpenAI, government clouds, and other mission-critical workloads.
After the elimination of dedicated testers in 2014 and a talent exodus of original Azure architects, much of the org was staffed by junior engineers with 1–2 years of experience, led by managers without deep systems backgrounds, creating a persistent gap in senior technical leadership.
The node management stack suffered millions of unattributed crashes per month, memory leaks, resource leaks, and “zombie VMs,” with each monthly release introducing more bugs than it fixed and most rollouts ending in panicked rollbacks.
A publicly exposed web server (WireServer) running on the secure host OS held unencrypted tenant data from multiple customers in shared memory caches — a serious security liability in a hostile multi-tenancy environment — while crashing 300,000–500,000 times per month fleet-wide.
Despite public claims at Ignite conferences from 2023–2025 that key components had been offloaded to Azure Boost and rewritten in Rust, the author states that as of late 2024, zero of 64 identified work items had been completed, and work hadn’t started on roughly 60 of them.
“Digital escort sessions” — where $18/hour employees executed commands on production nodes under direction from overseas support staff, including from China — became routine, with nearly 200 JIT access requests per day observed over a two-month period, directly contradicting the original “no human touch” design vision.
The author proposed an incremental componentization strategy to modernize the node stack from first principles — including a cross-platform component model, a new message bus, and security-hardened caches — but lower-level management responded with defensiveness and the org eventually terminated his employment.
The consequences materialized: OpenAI signed an $11.9B deal with CoreWeave in March 2025 and later a $300B deal with Oracle, the Secretary of Defense publicly cited “a breach of trust” with Microsoft, and Microsoft’s stock dropped over 30% from its late-October 2025 peak, erasing more than a trillion dollars in market cap.
The author escalated his concerns in formal letters to the Cloud + AI EVP (November 2024), the CEO (January 2025), and the Board of Directors — all sent before the public unraveling — and received no acknowledgment, reply, or request for clarification from any of them.

Closing

349: Gmail Finally Lets You Ditch xXDragonSlayer2004Xx

Wed, 08 Apr 2026 20:06:53 +0000

Welcome to episode 349 of The Cloud Pod, where the weather is always cloudy! Justin and Jonathan managed to make it into the studio this week, and they brought a guest! Dave Garaway jas joined us, and brought some on-the-ground knowledge from GTC, plus a slew of supply chain attacks, Gmail username changes and Claude’s code debacle. We’ve got all this and more – so let’s get started!

Titles we almost went with this week

AWS Console Gets a Makeover Nobody Asked For
From Eight Hours to 22 Seconds, Hackers Got Fast
AWS Spring Cleaning Hits Nine Services Hard
Trivy Pursuit Turns Into a 500K Credential Heist
Skip the Consultant, AWS Security Now Hacks Itself
AWS Pen Testing Agent Pokes Your Cloud Around the Clock
Your Cringey Gmail Address Gets a Second Chance
Stop Babysitting Servers, Let Google Handle MCP
AI Agent Untangles Your Kubernetes Networking Spaghetti
One Bad Actor Poisons a Hundred Million Downloads
Lambda Finally Hits the Gym with 32 GB
From GPU Hype to Production Inference Without the Hyperscaler Headache

Follow Up

01:28 Hegseth, Trump had no authority to order Anthropic to be blacklisted, judge says

A US District Judge granted Anthropic a preliminary injunction blocking the Department of War’s blacklisting, ruling the designation was First Amendment retaliation rather than a legitimate national security action.
The court found officials lacked authority to blacklist Anthropic without considering less restrictive alternatives or providing evidence of an urgent security risk, noting the designation was triggered by Anthropic’s “hostile manner through the press.”
The practical business impact was already substantial before the ruling, with three trade deals cancelled and other potential partners delaying negotiations, representing potentially billions in lost contracts over five years.
Anthropic continues to balance the legal fight with maintaining its government relationships, publicly emphasizing alignment with the Department of War’s mission around safe AI deployment even while litigating against it.
For cloud and AI vendors, this case establishes a notable precedent around government procurement decisions and First Amendment protections, with implications for how companies publicly challenge federal contracting positions.

02:35 Jonathan – “I’m guessing Anthropic is super busy with all the people coming to them for deals right now, because it seems to me that Anthropic is getting all the business customers and OpenAI are getting the personal customers.”

04:08 Delve Announces Changes and New Customer Support Measures

Delve has responded to allegations from an anonymous Substack post by denying claims of faked evidence, clarifying that independent AICPA-accredited auditors, not Delve, issue SOC 2 reports and ISO 27001 certifications.
The company published a formal rebuttal and is now rolling out operational changes to address customer concerns.
To support customers facing questions from their own clients and procurement teams, Delve is offering complimentary re-audits through independent auditors, complimentary grey-box penetration tests, and formal engagement letters from auditors, all at no cost.
On the transparency side, Delve is moving auditor communications directly into customer Slack channels or shared email threads, so customers have full visibility into the audit process rather than relying on Delve as an intermediary.
The platform is also adding clearer disclosures to templates and forms to explicitly identify them as guidance tools aligned to industry standards, addressing a core point of confusion raised in the controversy.
For cloud practitioners, this situation highlights the importance of understanding the distinction between compliance automation platforms and the independent auditors who issue attestations, a boundary that procurement teams are increasingly scrutinizing when evaluating vendor security posture.

06:12 Justin – “I think the reality is that, and we talked about this last week, is that SOC 2 audits are very heavily templatized. That’s how these companies make them, and they work them. They do need to be edited, reviewed, and approved, and the right things need to be done, but they can’t always start as a template. A template’s not the problem. It’s what appears to be the automation and then the rubber-stamping by these auditors.”

06:39 Delve – Fake Compliance as a Service – Part II – Day 1 of 5

This article covers allegations against Delve, a compliance automation startup, and represents a follow-up to earlier reporting. It does not directly relate to cloud platform news typically covered on The Cloud Pod, but here are the relevant talking points for context.
A whistleblower from Delve provided internal screenshots and recordings after the initial article, including conversations suggesting the company’s auditing partner, Accorp, may not conduct thorough evidence reviews before issuing SOC 2 reports.
Internal communications indicate Delve built an automated report generation tool, which contradicts the company’s public claim that it does not generate compliance reports on behalf of clients.
Leaked internal notes from Karun Kaushik, dated November 2024, acknowledge that Delve’s platform had not released any new compliance frameworks since January 2025, a period that overlaps with the company’s Series A fundraise, raising questions about the accuracy of investor materials.
Delve has transitioned clients to a new auditing firm called Ezzy and Associates, telling clients they will not need to restart SOC 2 Type 2 observation periods despite the auditor change, which compliance professionals would generally consider irregular, given the reported evidence quality concerns.
For cloud practitioners, this situation is a reminder that compliance automation tools require scrutiny of both the underlying audit processes and the third-party auditors involved, as the validity of certifications like SOC 2 depends on the rigor of evidence collection and review.

06:57 Justin – “It’s just getting worse. I don’t know that Delve actually survives this.”

General News

08:17 NVIDIA GTC 2026 Recap: Tokens & Inference

Jensen Huang reframed how AI infrastructure ROI should be measured, shifting from raw compute specs to tokens per watt and token speed at a fixed power budget.
Vera Rubin is projected to deliver approximately 5x more revenue potential per gigawatt compared to Blackwell, which has direct implications for how cloud operators and enterprises evaluate hardware investments.
The Vera Rubin platform integrates the acquired Groq 3 LPX chip alongside the Rubin GPU, with NVIDIA’s Dynamo software splitting inference workloads between the two chips. This heterogeneous approach delivers 35x more throughput per megawatt for latency-sensitive workloads compared to running Vera Rubin GPUs alone.
NVIDIA introduced OpenClaw, an open-source agentic AI framework, alongside an enterprise-hardened version called NeMo Claw that adds policy enforcement, network guardrails, and a privacy router to prevent data exfiltration. The security layer addresses a real concern for organizations deploying agents with access to internal infrastructure.
NVIDIA released six domain-specific open model families, including Nemotron for language tasks, BioNeMo for drug discovery, Cosmos for robotics simulation, and Earth2 for climate forecasting, positioning these as the foundation for sovereign AI deployments where organizations want to avoid dependence on a small number of external model providers.
The DSX digital twin platform uses Omniverse to simulate thermal, electrical, and network conditions before a data center is physically built, with NVIDIA estimating roughly a factor of two in recoverable efficiency across a typical AI factory deployment through better design and live operational optimization.

09:51 Dave – “Being in technology, that is a great place to go to put your finger on the pulse of where things are.”

27:28 GTC 2026 Confirmed It: The Inference Era Is Here

DigitalOcean is positioning itself specifically around production inference workloads, announcing a new Richmond data center built with NVIDIA HGX B300 systems and a 400 Gbps non-blocking RDMA fabric designed for reasoning and agentic use cases.
The company is bringing NVIDIA Dynamo 1.0 to its Kubernetes offering and expanding model access for reasoning, long-context, multimodal, and agentic workloads, which addresses the operational complexity developers face when moving AI from experimentation into production.
DigitalOcean reported over 43,000 OpenClaw deployments since launch, suggesting meaningful developer adoption for always-on assistant and agentic application use cases on their platform.
The broader industry signal from NVIDIA GTC 2026 is that cost per token, time to first token, and uptime are becoming as important as model quality, shifting infrastructure conversations from raw compute to full-system optimization, including CPUs alongside accelerators.
For smaller AI builders and startups, DigitalOcean’s focus on reducing setup friction through tools like 1-Click Droplets for NemoClaw and direct deployment from build.nvidia.com to Serverless Inference represents a practical alternative to hyperscaler complexity for running agents at scale.

27:42 Dave – “They are talking about a bubble – the people I’ve been talking to – but one of the neoclouds I was talking about said, ‘when we get to the point when we don’t have the need, we’re going to start powering the neighborhoods for free, so we’re just going to start giving out power for free’ so hopefully the good neighbor will extend out there.”

28:12 You can finally change the goofy Gmail address you chose years ago

Gmail turns 22 years old on April 1, and Google is marking the occasion by finally allowing US-based users to change their Gmail username without creating an entirely new account, addressing a long-standing limitation of the platform.
The change is limited to once every 12 months per account, which Google has not formally explained but likely serves as a spam mitigation measure to prevent abuse of the feature.
For cloud and IT professionals managing Google Workspace environments, this raises practical questions around identity management, email routing, and how username changes interact with existing integrations and third-party services tied to a Gmail address.
The feature is rolling out gradually in the US, so not all accounts will see the option immediately, and it remains to be seen when international users outside the initial test group will get access. You can check here to see if the feature is available to you.
This highlights a broader tension in long-lived identity platforms where usernames chosen decades ago become liabilities, and how platforms balance user flexibility with the operational complexity of allowing address changes at scale.

30:00 TeamPCP Attack

On March 19, threat actor group TeamPCP compromised Trivy, a widely used open-source vulnerability scanner from Aqua Security, by injecting credential-stealing malware into 75 GitHub Action tags, Docker images, and CI/CD pipelines, turning the security tool itself into the attack vector.
The malware collected SSH keys, cloud credentials, Kubernetes secrets, and environment files from affected systems, with attackers then using those stolen credentials to pivot into LiteLLM, a Python framework for AI model API management, pushing two malicious versions to PyPI that executed automatically on Python process startup.
The LiteLLM compromise reportedly yielded approximately 500,000 stolen credentials, and the attackers deployed privileged pods across Kubernetes clusters and installed persistent backdoors on nodes, demonstrating how a single supply chain entry point can cascade across entire production environments.
This attack illustrates a notable pattern in modern supply chain compromises where each set of stolen credentials unlocks the next target, moving from CI/CD pipelines to public package repositories to production infrastructure in a deliberate escalation chain.
Organizations relying on open-source security tooling in automated pipelines should audit recent Trivy and LiteLLM usage, check for the specific compromised versions noted, and review whether any credentials or secrets were exposed in affected environments.

Con’t Update: Ongoing Investigation and Continued Remediation

The Trivy supply chain attack began in late February 2026 when attackers exploited a GitHub Actions misconfiguration to extract a privileged access token, then used residual credentials after an incomplete rotation to publish malicious artifacts on March 19, affecting version 0.69.4 and 76 of 77 trivy-action version tags.
The attack’s most notable technique was force-pushing existing version tags to point at malicious commits, meaning CI/CD pipelines referencing those tags continued running without any visible indication of change, while the payload silently exfiltrated cloud credentials, SSH keys, Kubernetes tokens, and other secrets before legitimate scanning logic executed.
Any organization that ran affected versions during the compromise window should treat all secrets accessible to those pipeline environments as exposed and rotate them immediately, including cloud provider credentials, container registry tokens, Git credentials, and NPM publish tokens, which researchers confirmed are being actively weaponized across the NPM ecosystem.
The core hardening lesson from this incident is to pin GitHub Actions to full immutable commit SHA hashes rather than mutable version tags, since version tags can be silently redirected to malicious code without any workflow changes on the consumer side.
Aqua’s commercial platform was isolated from the compromise because it uses a separate build system with no shared GitHub infrastructure, CI/CD pipelines, or signing systems, and its controlled integration process meant the malicious release was never incorporated into commercial products.

30:51 Hacker hijacks Axios open-source project, used by millions, to push malware

A hacker compromised a maintainer account for the Axios JavaScript library on npm, pushing malicious versions that included a remote access trojan targeting Windows, macOS, and Linux users.
Axios receives over 100 million weekly downloads, making the potential exposure substantial.
The attack window was approximately three hours before being detected and stopped, but security firm Aikido advises anyone who downloaded Axios during that period to treat their system as compromised. The self-deleting malware complicates forensic investigation and detection.
Account takeover was the entry point here, with the attacker replacing the legitimate maintainer’s email to delay recovery. This highlights how a single compromised developer credential can weaponize a widely trusted package against an entire downstream ecosystem.
This is another example of a software supply chain attack, a pattern that has affected SolarWinds, Log4j, and Polyfill.io in recent years. Developers and security teams should be reviewing dependency monitoring practices and considering tools that detect unexpected package version changes automatically.
For cloud-focused teams, any CI/CD pipeline or serverless function that auto-installs npm dependencies without version pinning or integrity checks is a potential exposure point. Locking dependency versions and using tools like StepSecurity or Aikido for supply chain monitoring are practical mitigations worth discussing.

31:49 Jonathan – “I just can’t believe how much trust, blind trust, dumb trust, if you want to call it that, is involved in an awful lot of open source projects. I mean, the entirety of PyPy – I’ve got a module on PyPy – I could commit some bad code to my repo in 15 minutes; if somebody installs my package, it’s going to run. I’m not aware of a great deal of security checks that happen automatically on the backend there, but that entire ecosystem is built on trust. It’s not good at all.”

AI Is Going Great – Or How ML Makes Money

34:06 Entire Claude Code CLI source code leaks thanks to exposed map file

Anthropic accidentally shipped Claude Code npm version 2.1.88 with an exposed source map file, revealing nearly 2,000 TypeScript files and over 512,000 lines of code for the CLI tool.
Anthropic confirmed it was a packaging error, not a security breach, and stated that no customer data or credentials were exposed.
The leaked code has already been archived, posted to a public GitHub repository, and forked tens of thousands of times, meaning the codebase is effectively public regardless of any takedown efforts.
This gives competitors and developers a detailed look at how Anthropic built its agentic coding tool.
Developers analyzing the code have surfaced technical details about Claude Code’s memory architecture, including background memory rewriting and memory validity verification steps.
These implementation details were previously undocumented and give insight into how the tool manages context across long coding sessions.
For cloud developers and teams evaluating AI coding tools, the leak provides an unusually transparent view into the engineering decisions behind a production agentic CLI, which could inform how teams build or evaluate similar tooling. It also raises a practical reminder about source map hygiene in npm package publishing pipelines.

35:51 Jonathan – “The question is, did you really need the unobfuscated source code anyway? You’ve got AI tools. You can literally point Claude at it and say, hey, how does this work? I know because I did it a year ago.”

AWS

37:39 Customize your AWS Management Console experience with visual settings including account color, region and service visibility

AWS introduced User Experience Customization (UXC) in August 2025 and is now expanding it with the ability to hide unused Regions and services from the console, reducing visual clutter for teams working in scoped environments.
Account color coding is a practical multi-account management tool, letting administrators assign colors like red for production and orange for development to reduce the risk of accidental changes in the wrong environment.
The visibility settings are cosmetic only and do not restrict access via AWS CLI, SDKs, or APIs, so teams should not confuse this with a security or governance control like Service Control Policies.
Administrators can manage these settings programmatically using a new AWS CloudFormation resource type AWS::UXC::AccountCustomization with visibleServices and visibleRegions parameters, making it deployable at scale across accounts.
There is no additional cost mentioned for UXC customization features, and they are available today in the AWS Management Console with configuration options accessible through the unified settings gear icon.

39:26 AWS Lambda supports up to 32 GB of memory and 16 vCPUs for Lambda Managed Instances

Lambda Managed Instances now supports up to 32 GB of memory and 16 vCPUs, tripling the previous limits of 10 GB and roughly 6 vCPUs, which opens the door for workloads like media transcoding, large-scale data processing, and scientific simulations to run serverlessly.
A notable addition here is the configurable memory-to-vCPU ratio at 2:1, 4:1, or 8:1, giving developers actual control over resource balance rather than the fixed proportional scaling that standard Lambda has always used.
Lambda Managed Instances run functions on managed EC2 instances with built-in routing, load balancing, and auto-scaling, so customers get specialized compute configurations, including the latest-generation processors and high-bandwidth networking without taking on operational overhead.
Pricing will be worth watching closely since Lambda Managed Instances sit in a different cost tier than standard Lambda, and teams should evaluate whether the compute gains justify the cost difference compared to running equivalent workloads on ECS or EKS.
Configuration is available through the AWS Console, CLI, CloudFormation, CDK, and SAM in all regions where Lambda Managed Instances are generally available, so adoption fits into existing infrastructure-as-code workflows without requiring new tooling.

40:18 Jonathan – “Lambda’s already pretty cheap to begin with, though. I wonder quite how much they could charge for managing the control plane, and are you still paying for the compute? Not a lot, I would think. Maybe they charge per host, or a small fixed fee per invocation, or something. It’s going to be interesting.”

41:56 AWS launches frontier agents for security testing and cloud operations | Artificial Intelligence

AWS has launched two generally available frontier agents: AWS Security Agent for autonomous penetration testing and AWS DevOps Agent for incident resolution and SRE tasks.
These differ from typical AI assistants in that they operate independently for hours or days without constant human direction to complete complex, multi-step workflows.
AWS Security Agent ingests source code, architecture diagrams, and documentation to identify attack chains that traditional scanners miss, compressing penetration testing timelines by over 90% according to early customers. This shifts pen testing from a periodic, cost-constrained activity to an on-demand capability available 24/7 across an entire application portfolio.
AWS DevOps Agent integrates with a broad set of existing tools, including CloudWatch, Datadog, Dynatrace, Splunk, GitHub, and Azure DevOps, making it usable across multicloud and on-premises environments. Preview customers report up to 75% lower MTTR and 94% root cause accuracy, with WGU cutting one incident resolution from two hours to 28 minutes.
The DevOps Agent can work alongside tools like Kiro and Claude Code to not only identify root causes but generate validated fixes that feed back into CI/CD pipelines, moving the capability beyond investigation into actual remediation.
Pricing details are not specified in the announcement, so teams evaluating these services should check the AWS Security Agent and AWS DevOps Agent product pages directly for current cost information before planning adoption.

43:07 Jonathan – “Let me just scratch DevOps off my list of potential jobs.”

46:22 Amazon Bedrock AgentCore Evaluations is now generally available

Amazon Bedrock AgentCore Evaluations is now generally available, offering automated quality assessment for AI agents through two modes: online evaluation that continuously samples and scores live production traffic, and on-demand evaluation that plugs into CI/CD pipelines for regression testing.
The service ships with 13 built-in evaluators covering response quality, safety, task completion, and tool usage, reducing the need for teams to build custom scoring logic from scratch before they can start measuring agent behavior.
For teams with domain-specific needs, custom evaluators can be configured using your own prompts and model choice for LLM-based scoring, or implemented as Python or JavaScript functions hosted in Lambda for code-based evaluation logic.
Ground Truth support lets developers measure agents against reference answers, behavioral assertions at the session level, and expected tool execution sequences, giving teams a structured way to define and validate what correct agent behavior actually looks like.
AgentCore Evaluations integrates with AgentCore Observability for unified monitoring and real-time alerts, and is available across nine AWS regions, including US East, US West, multiple Asia Pacific regions, and two European regions. Pricing details are not specified in the announcement, so check the AWS pricing page for current costs.

47:17 Justin – “I like the idea of this, but then if you’re continuously monitoring it and it degrades, what do you do? What’s step two? Like, we detected it, cool, now what?”

57:10 Build a FinOps agent using Amazon Bedrock AgentCore

AWS published a reference architecture for building a FinOps agent using Amazon Bedrock AgentCore that consolidates data from Cost Explorer, AWS Budgets, and Compute Optimizer into a single conversational interface, giving finance teams natural language access to cost analysis without navigating multiple consoles.
The solution uses five CDK stacks to wire together AgentCore Runtime, Gateway, Memory, and Identity components alongside the Strands Agent SDK and Model Context Protocol servers, showing how these newer AgentCore building blocks fit together in a production-style deployment that takes roughly 15-20 minutes to stand up.
AgentCore Memory retains 30 days of conversation context, which means users can ask follow-up questions like “what about the second one?” without re-explaining prior context, a practical improvement for teams doing iterative cost investigations.
The architecture transforms open-source AWS Labs MCP servers from stdio transport to streamable HTTP, builds them as ARM64 Graviton container images, and hosts them on AgentCore Runtime with JWT authorization, which is a useful pattern for teams looking to adapt existing MCP tooling for hosted agent environments.
Pricing for this solution involves multiple services, including Bedrock model inference with Claude Sonnet 4.5, AgentCore Runtime and Memory, Cognito, CodeBuild, and ECR, so costs will vary based on query volume and conversation history retention rather than a flat rate.

58:31 Dave – “I can’t wait to kick the tires on that one!”

59:03 Building an AI-powered system for compliance evidence collection

AWS published a reference architecture for automating compliance evidence collection using Amazon Bedrock with the Amazon Nova 2 Lite model and a browser extension for Chrome and Firefox.
The solution replaces manual screenshot workflows by executing pre-defined JSON workflows that navigate web applications, capture timestamped screenshots, and store organized evidence in S3.
The AI layer operates in three modes: chat for ad-hoc compliance questions, designer mode for generating workflow JSON from uploaded compliance documents, and report generation mode that produces an HTML report delivered via Amazon SES after workflow completion.
Authentication uses Amazon Cognito with AWS STS to provide scoped, least-privilege credentials to the browser extension, meaning the extension only gets access to Bedrock, S3, and SES rather than broad account permissions.
The entire infrastructure deploys via a single CloudFormation template that creates the Cognito user pool, identity pool, S3 bucket with encryption and versioning, IAM roles, and Lambda functions in minutes. The sample code is available at the aws-samples GitHub repository.
Costs will vary based on Amazon Bedrock Nova 2 Lite inference usage, S3 storage for screenshots and reports, and SES sending volume, so organizations with frequent audit cycles should model their expected workflow execution frequency before deploying at scale.

1:00:00 Jonathan – “Screenshots? Why are we using screenshots in 2026?”

GCP

1:00:46 TurboQuant: Redefining AI efficiency with extreme compression

Google Research has published TurboQuant, a vector quantization algorithm that compresses LLM key-value cache data to as low as 3 bits without requiring model retraining or fine-tuning, while maintaining accuracy on standard benchmarks like LongBench and Needle In A Haystack using Gemma and Mistral models.
The core technical approach combines two sub-algorithms: PolarQuant, which converts vectors to polar coordinates to eliminate normalization overhead, and QJL (Quantized Johnson-Lindenstrauss), which uses a single sign bit per value to achieve zero memory overhead error correction.
Performance results show 4-bit TurboQuant achieves up to 8x speedup in computing attention logits compared to 32-bit unquantized keys on H100 GPUs, and reduces key-value memory footprint by at least 6x, which is relevant for teams running inference at scale.
For vector search use cases, TurboQuant outperforms existing methods like PQ and RabbiQ on recall ratios without requiring dataset-specific tuning or large codebooks, making it a practical option for semantic search systems operating over billions of vectors.
Google notes this research applies directly to Gemini’s key-value cache bottlenecks and large-scale search infrastructure, though no specific GCP product integration or pricing details have been announced alongside the research publication.

1:02:28 Jonathan – “What’s funny about this whole technology is that the video game industry has been using exactly the same algorithms for 25 years. And this is just a new application of the same technology. It’s kind of funny. Hey guys, we’ve got a new paper out!”

1:03:41 AI Tools for Sustainable Infrastructure and Reporting

Google published an open-source AI playbook for sustainability reporting, documenting how they used Gemini to cross-reference environmental claims against internal policies and NotebookLM to turn their static Environmental Report into a queryable knowledge base.
The playbook includes specific prompts and lessons learned, making it a practical resource for teams building similar workflows.
Equinix built a sustainability data lake in BigQuery that automatically ingests data from 240+ global sites, reducing their reporting cycle from weeks of manual spreadsheet work to on-demand insights. This was driven by a 46% year-over-year increase in customer sustainability data requests, which made manual processes unsustainable at scale.
The Equinix case illustrates a cost and efficiency argument for serverless architecture, where moving to BigQuery eliminated idle compute resources, reduced energy consumption, and improved performance per watt. Google frames this as a triple win of price, performance, and environmental footprint.
Google is connecting this work to their Well-Architected Framework sustainability pillar, using a 4Ms model covering Machine, Model, Mechanization, and Map as a structured approach for customers designing efficient AI and data infrastructure.
The WAF sustainability pillar documentation is available here.
The practical takeaway for GCP customers is that sustainability reporting can shift from a manual compliance exercise to a data product with strategic value, particularly for organizations managing large real estate or infrastructure footprints where energy and resource data is already being collected across many sites.

Azure

1:04:36 Public Preview: AI Agent for container networking troubleshooting

Azure has launched a public preview of an AI agent designed to help engineers troubleshoot Kubernetes networking issues through a lightweight web-based interface, addressing the common problem of logs and metrics being scattered across multiple tools.
The core value here is reducing manual correlation work during incidents, where engineers typically have to jump between kubectl, Azure Monitor, and other diagnostics tools to piece together what went wrong in a cluster network.
This fits into Microsoft’s broader push to embed AI assistance directly into operational workflows rather than requiring engineers to leave their environment and consult separate documentation or support channels.
Target users are platform and DevOps engineers running containerized workloads on Azure Kubernetes Service who deal with networking incidents and want faster root cause identification without deep networking expertise.
The feature is currently in public preview, so pricing details are not yet confirmed, and teams should evaluate it with that in mind before building it into critical incident response workflows. More details are available at the Azure Updates page at azure.microsoft.com/en-us/updates with ID 557887.

1:05:33 Dave – “Well, my first thought on this is that if most teams, at least that I’ve built, are already pulling all that data in there and finding a way to correlate the data and we resolve those issues quicker. So good for them for just automating that.”

Closing

348: Compliance Theater Now Available as a Subscriptions

Thu, 02 Apr 2026 05:54:07 +0000

Welcome to episode 348 of The Cloud Pod, where the weather is always cloudy! Justin, Ryan, and Matt are in the studio this week to bring you all the latest news in AI and Cloud, inclduing Strykers troubles, AWS’ birthday, Bedrock Agents, and Claude Code – plus so much more. Let’s get started!

Titles we almost went with this week

SOC 2 It to Me Delve Fires Back
Shell Yeah Bedrock Agents Just Got Command Line Powers
When Your SOC 2 Report Is Just Fan Fiction
uv, Ruff, and ty Walk Into an OpenAI Acquisition
Hash Field Expiration Is Here, and It’s No Redis Herring
Stop Paying Full Price for Tokens You Already Bought
Fake It Till You Audit It
Cache Me If You Can CNCF Sandbox Edition
Microsoft Learns Consent Matters in Copilot Rollout
Microsoft’s Stinky Cloud Gets Federal Seal of Approval
When Your Audit Trail Leads to a Blog Fight
Ping Your AI Agent on Discord Like a Millennial
Twenty Years of AWS and the Bill Never Stops
The LLM hack that feels a lot like Node Shift Left Package issues
Claude Code Auto Mode Lets AI Work Unsupervised
Stop Babysitting Your AI Claude Code Goes Solo
Auto Mode Gives Claude Code the Keys to the Car
Java comes to the coffee shop with AI

General News

01:21 Customer Updates: Stryker Network Disruption

Stryker confirmed a cyberattack on March 11, 2026, that disrupted their internal Microsoft corporate environment, affecting order processing, manufacturing, and shipping, but notably not their connected medical devices or cloud-hosted products.
The attack vector was specific to Stryker’s Microsoft environment, which meant products running on AWS (Vocera Edge, Vocera Ease) and Google Cloud Platform (care.ai) were architecturally isolated and unaffected, demonstrating a practical benefit of multi-cloud separation.
Stryker explicitly stated this was not ransomware or malware, and government agencies, including CISA, FBI, and the White House National Cyber Director, were engaged, with domain seizures linked to threat actors already executed.
The incident highlights how healthcare organizations can architect medical device and cloud product infrastructure to be independent of corporate IT environments, as every product from Mako to SurgiCount to LIFEPAK operated normally due to network segmentation.
Real-world patient impact was limited but present, with some personalized implant cases rescheduled due to shipping delays, underscoring that even contained corporate IT incidents can have downstream effects on physical supply chains.

02:30 Justin – “HugOps to the entire Stryker team; I couldn’t imagine having to rebuild my entire Windows estate at a company the size of Stryker in the middle of trying to do business and everything else.”

05:00 Federal cyber experts called Microsoft’s cloud a “pile of shit,” and approved it anyway

FedRAMP authorized Microsoft’s Government Community Cloud High despite internal reviewers finding insufficient security documentation, issuing an unusual “buyer beware” notice to agencies considering the product.
This raises questions about the integrity of the federal cloud authorization process when commercial pressures intersect with security evaluations.
The GCC High offering is specifically designed to handle some of the US government’s most sensitive data, making the documentation gaps particularly consequential, given that Microsoft had already been linked to two significant federal breaches involving Russian and Chinese state actors.
The core technical concern was Microsoft’s inability to adequately document how data is protected as it moves between servers within their cloud infrastructure, leaving reviewers unable to assess the system’s overall security posture with confidence.
For cloud practitioners and federal agencies, this situation highlights the risk of relying on vendor-provided security documentation without independent verification, especially for high-sensitivity workloads where compliance approval does not necessarily equal verified security.
The outcome has broader implications for FedRAMP’s credibility as a security benchmark, since agencies selecting cloud providers often treat authorization as a meaningful security signal rather than a conditional or incomplete endorsement.

06:00 Ryan – “If you can’t adequately explain how basic things like encryption and security controls are handled in your environment, that’s not good, right? Because while it’s not completely indicative of a security problem, it’s highly suspect.”

06:51 Delve – Fake Compliance as a Service – Part I

A detailed investigation alleges that Delve, a compliance automation platform, fabricates audit evidence, including board meeting records and test results, then uses Indian certification mills operating through US shell entities to rubber-stamp reports rather than conduct independent verification.
The core technical concern is that Delve reportedly generates identical audit reports across all clients, meaning the auditor independence required by AICPA and ISO standards is structurally violated since Delve itself is effectively acting as both platform and auditor.
Companies using Delve for HIPAA or GDPR compliance may face significant regulatory exposure, as the article claims the platform skips major framework requirements while telling clients they have achieved 100% compliance, potentially creating criminal liability under HIPAA and fines up to 4% of global revenue under GDPR.
The investigation highlights a broader issue in the compliance automation space where AI and automation claims may not reflect actual product capabilities, with the article describing Delve as essentially a template pack with a SaaS wrapper rather than a genuinely automated compliance tool.
For cloud-focused companies evaluating compliance platforms, this case underscores the importance of verifying auditor independence credentials, requesting evidence of actual testing procedures, and understanding whether a platform produces genuinely customized documentation or pre-populated templates adopted with minimal review.
Interested in reading the leaked spreadsheet? Find those here and the leaked documents here.

08:47 Ryan – “I’m not a big fan of checkbox security and having that around just for compliance purposes. But it’s also like, this is really a misrepresentation. You look at things and, and it’s certified by Delve; it’s not certified by these other companies. And if all that evidence, the specifics they listed in the report are crazy, just how, like, this is not cool. It’s just generated. It’s not even real in the slightest.”

11:37 Response to Misleading Claims

Delve is a SOC 2 compliance automation platform serving over 1,700 customers, and this response addresses a Substack post making claims about the legitimacy of its audit processes.
The core distinction Delve makes is that it automates evidence collection and provides templates, while independent licensed auditors retain sole authority to issue final reports.
The debate touches on a broader industry practice where compliance platforms provide standardized control sets based on AICPA and ISO frameworks, meaning structural overlap across reports is expected rather than evidence of fraud.
This is worth discussing because buyers of compliance software often do not fully understand where the platform ends and the auditor begins.
Delve claims 120+ automated integrations, which is a notable gap from the 14 cited in the original criticism, and speaks to how quickly compliance tooling has evolved in the cloud ecosystem.
For cloud-native companies pursuing SOC 2, the depth of integrations directly affects how much manual evidence collection is required.
The use of pre-filled templates for board minutes and policies is standard practice across compliance platforms, but it raises a legitimate question about whether customers treat these as starting points or simply submit them unchanged.
This is a real risk area for organizations where compliance becomes a checkbox exercise rather than a genuine security posture.
The competitive compliance automation market, which includes players like Vanta and Drata, means disputes like this are likely to continue as vendors differentiate on auditor quality, automation depth, and pricing.
Listeners evaluating compliance tools should independently verify auditor accreditation regardless of which platform they use.

13:08 Ryan – “I would argue the use of pre-filled templates is common…prefilled and direct copied templates from between companies.”

19:04 Supply Chain Attack in litellm 1.82.8 on PyPI

Litellm versions 1.82.7 and 1.82.8 on PyPI were found to contain a malicious .pth file that executes automatically on every Python process startup, with no corresponding release on the official GitHub repository, indicating the PyPI account was likely compromised.
The malware follows a three-stage attack pattern: collecting SSH keys, cloud credentials, .env files, and Kubernetes configs; encrypting and exfiltrating them to a domain unrelated to legitimate litellm infrastructure; then attempting persistent backdoor installation via systemd and privileged Kubernetes pod creation.
The attack was discovered because a bug in the malware caused an exponential fork bomb through a recursive .pth file, triggering, which crashed the host machine and made the compromise visible rather than silent.
Any developer or CI/CD pipeline that pulled litellm as a transitive dependency after March 24, 2026, should treat all credentials on that machine as compromised and rotate SSH keys, cloud provider tokens, API keys, and database passwords immediately.
This incident highlights the risk of supply chain attacks through transitive dependencies, where a package you never directly installed can introduce malicious code into your environment, making dependency auditing and package integrity verification important practices for cloud-connected development workflows.

21:21 Justin – “Yeah… that’s bad too.”

KUBECON EU

23:24 GKE and OSS innovation at KubeCon EU 2026

GKE Autopilot is no longer a cluster-level decision made at creation time. Standard clusters can now enable Autopilot compute classes on a per-workload basis, removing the need to create entirely new clusters when workload requirements change.
Google is open-sourcing the GKE Cluster Autoscaler, one of the core infrastructure provisioning components, with the goal of making it available to the broader Kubernetes community as a vendor-neutral tool.
llm-d, a Kubernetes-native distributed inference framework built with Red Hat and NVIDIA, has been accepted as a CNCF Sandbox project. It addresses inference-aware traffic management, multi-node replica orchestration, and KV cache offloading in a hardware-agnostic way.
Google released an open-source DRA driver for TPUs, coordinated alongside NVIDIA, donating their own DRA driver, establishing Dynamic Resource Allocation as a shared standard for describing specialized hardware across Kubernetes workloads.
TPU support is coming to Ray v2.55 with backing from both Google and Anyscale, and a new Ray History Server in alpha allows users to debug completed or terminated RayJobs using persisted logs, state, and metrics through the Ray Dashboard on GKE.

24:29 Ryan – “It’s super nice of them to open source that, because it does seem like a very powerful thing to use. I love the idea of having individual workloads on a cluster, and be able to delegate to managed and unmanaged… it’s kind of neat.”

24:49 llm-d officially a CNCF Sandbox project

llm-d has been accepted as a CNCF Sandbox project, with Google Cloud as a founding contributor alongside Red Hat, IBM Research, CoreWeave, and NVIDIA.
The project aims to extend Kubernetes for LLM inference workloads under an open-source model with no vendor lock-in, available at llm-d.ai.
The core technical contribution is model-aware request routing through the llm-d Endpoint Picker, which considers KV-cache hit rates, in-flight requests, and queue depth to direct traffic to optimal backends.
In production testing on Vertex AI, this approach reduced Time-to-First-Token latency by over 35% for coding workloads and improved P95 tail latency by 52% for bursty chat workloads.
A notable outcome of the routing intelligence was doubling Vertex AI’s prefix cache hit rate from 35% to 70%, which directly reduces re-computation overhead and lowers cost-per-token for high-volume inference deployments.
Google leads development of the Kubernetes LeaderWorkerSet API, which llm-d uses to orchestrate prefill and decode disaggregation across independently scalable pods, supporting both TPU and GPU fleets at scale.
Google has also extended vLLM natively for Cloud TPUs with a unified PyTorch and JAX backend, delivering up to 5x throughput gains over the initial release. Pricing for running llm-d workloads depends on underlying GKE and accelerator costs, which vary by instance type and region.

26:21 What’s new with Microsoft in open-source and Kubernetes at KubeCon + CloudNativeCon Europe 2026

Dynamic Resource Allocation has reached general availability in Kubernetes, and Microsoft’s DRANet now includes upstream support for Azure RDMA NICs, meaning GPU-to-NIC topology alignment is handled at the scheduler level rather than through manual configuration.
- This matters for teams running distributed training workloads where network topology directly affects performance.
AI Runway is a new open-source project under the KAITO umbrella that provides a common Kubernetes API for inference workloads, with a web interface, HuggingFace model discovery, GPU memory fit indicators, and real-time cost estimates.
- It supports multiple runtimes, including NVIDIA Dynamo and KubeRay, giving platform teams a single control plane for model deployments without requiring end users to know Kubernetes.
AKS networking gets several notable updates, including Azure Kubernetes Application Network for identity-aware mTLS and traffic telemetry without a full service mesh, WireGuard encryption at the node level via Cilium, and Pod CIDR expansion that lets clusters grow IP ranges in place rather than requiring a full rebuild.
- Pricing for Advanced Container Networking Services features like Cilium mTLS is not specified in the announcement.
On the observability side, AKS now surfaces GPU utilization directly into managed Prometheus and Grafana, closing a monitoring gap that previously required manual exporter configuration.
- A new agentic container networking interface also lets operators run natural-language diagnostic queries against live telemetry, reducing time to identify network issues.
Blue-green agent pool upgrades and agent pool rollback are now available in AKS, letting teams provision a parallel node pool with the new configuration, validate it, and revert to the previous Kubernetes version and node image if problems appear.
AKS Desktop also reached general availability, giving developers a local environment that mirrors production AKS configuration.

27:42 Ryan – “And if you’ve ever debugged an issue on Kubernetes, then you know that there’s logs everywhere that you have to go and review and correlate across each other, so having an agent that can go and look across all those places and diagnose issues is fantastic.”

AI Is Going Great – Or How ML Makes Money

28:22 Project SnowWork: The easiest way for business users to get work done

Snowflake announced Project SnowWork in Research Preview, an agentic AI platform targeting business users in finance, sales, marketing, and operations who need to complete multi-step data workflows without writing code or relying on technical teams.
The platform differentiates itself from general AI assistants by grounding outputs in an organization’s existing Snowflake data and automatically enforcing existing RBAC and governance policies, meaning users only see data they are already authorized to access.
Project SnowWork ships with pre-built persona profiles for specific business functions, so a finance user gets workflows tuned to FP&A KPIs and close narratives while a sales user gets pipeline risk summaries, rather than a one-size-fits-all interface.
Practical use cases highlighted include compressing financial close storytelling from days to a single workflow and replacing manual pipeline rollups with automated executive briefs, which gives listeners a concrete sense of the time savings being targeted.
Access is currently limited to a select group of customers in a collaborative research preview, so this is not a general availability release, and organizations interested in early access would need to engage directly with Snowflake.

27:42 Ryan – “I do like the idea of bringing AI to the data rather than the data to the AI, which is a common problem, especially in enterprise platforms. I worry a little bit; The RBAC and authorization in Snowflake is very complex, and I wonder if people are actually going through and actually defining those in a way that would be proper segmentation? But I guess, you know, they have access to it today, they just have to know how to query it.”

30:10 OpenAI to acquire Astral

OpenAI is acquiring Astral, the company behind three widely adopted Python developer tools: uv for dependency and environment management, Ruff for linting and formatting, and ty for type safety enforcement.
The Astral team will join the Codex team after the deal closes, pending regulatory approval.
Codex has reached over 2 million weekly active users, with 3x user growth and 5x usage increase since the start of 2025. This acquisition appears aimed at deepening Codex’s ability to operate across the full Python development lifecycle rather than just generating code snippets.
The stated goal is to move Codex toward participating in complete development workflows, including planning changes, modifying codebases, running tools, verifying results, and maintaining software over time. Integrating Astral’s tooling directly into that workflow gives Codex agents access to infrastructure developers already use daily.
OpenAI has committed to continuing support for Astral’s open source projects after closing, which matters to the Python community given how widely these tools are already embedded in developer workflows. Developers using uv or Ruff should not expect immediate disruption to those projects.
For cloud and platform teams, this signals a trend toward AI coding agents that are tightly coupled with language-specific toolchains rather than operating as generic code generators, which could influence how development environments and CI/CD pipelines are structured going forward.

30:47 Justin – “I don’t know why they needed to buy the company to do all this, it is open source already.”

32:50 Anthropic just shipped an OpenClaw killer called Claude Code Channels, letting you message it over Telegram and Discord

Anthropic released Claude Code Channels in version 2.1.80, enabling developers to connect their Claude Code sessions to Telegram and Discord bots, shifting from a synchronous chat model to an asynchronous, persistent agent that can work autonomously and notify users when tasks are completed.
The feature is built on Anthropic’s open-source Model Context Protocol, which acts as a standardized bridge between Claude Code and external messaging platforms.
The setup uses the Bun JavaScript runtime to run a polling service that injects incoming messages as session events, allowing Claude to execute code, run tests, and reply back through the messaging app.
Practically, this eliminates the need for developers to maintain dedicated hardware like a Mac Mini running open-source agent frameworks 24/7, since Claude Code itself now handles session persistence when run in a background terminal or on a VPS.
The plugin architecture is open, with official Telegram and Discord connectors hosted on GitHub under Anthropic repositories, meaning the community can build additional connectors for platforms like Slack or WhatsApp without waiting for Anthropic to ship them natively.
The feature remains tied to Anthropic’s commercial subscriptions (Pro, Max, and Enterprise), so while the MCP layer is open, the underlying Claude model and Claude Code harness are proprietary, which is an important cost and vendor-lock consideration for teams evaluating this against self-hosted alternatives.

33:50 Justin – “I tried to use this, and it don’t work for me, but I didn’t have enough time to test it, I had too many Claude sessions going, and I needed to kill all of them and update properly to the 2.1.80 version. But I am curious to play with it a little more.”

35:34 Put Claude to work on your computer

Anthropic has launched computer use capabilities in Claude Cowork and Claude Code, now in research preview for Pro and Max subscribers on macOS. Claude can directly control a browser, mouse, keyboard, and screen to complete tasks when no direct connector exists, with no setup required.
The feature follows a tool priority hierarchy, reaching for service connectors like Slack or Google Calendar first, then falling back to direct computer control. Claude requests explicit permission before accessing new applications and can be stopped at any point.
Anthropic has built in prompt injection safeguards by scanning model activations during computer use sessions. They acknowledge that the capability is still early and recommend users avoid sensitive data and start with trusted applications only.
Dispatch, released alongside this update, enables a continuous conversation thread between mobile and desktop, letting users assign tasks from their phone and pick up completed work on their computer.
- Use cases include automated morning email checks, scheduled metric pulls, and triggering Claude Code sessions for pull requests.
The combination of Dispatch and computer use means Claude can execute multi-step workflows on a desktop while the user is away, such as making IDE changes, running tests, and submitting a PR.
Current limitations include macOS-only support, slower execution compared to direct integrations, and occasional need for retries on complex tasks.

36:28 Ryan – “I didn’t know this was macOS only, because I was going to put it on my Linux server so I could get compute that wasn’t my laptop.”

38:32 Auto mode for Claude Code

Anthropic launched auto mode for Claude Code in research preview for Team plan users, with Enterprise and API access coming soon. It works with both Claude Sonnet 4.6 and Opus 4.6, offering a middle ground between the default conservative permission prompts and the risky dangerously-skip-permissions flag.
The core mechanism is a classifier that reviews each tool call before execution, automatically blocking potentially destructive actions like mass file deletion, sensitive data exfiltration, or malicious code execution, while letting safe actions proceed without interruption.
This directly addresses a practical developer workflow problem: Claude Code’s default mode requires frequent human approvals that prevent truly unattended long-running tasks, and auto mode allows developers to kick off extended jobs without babysitting the process.
Anthropic is transparent about the limitations, noting the classifier may still allow some risky actions when user intent is ambiguous, and may occasionally block benign ones. They continue to recommend using it in isolated environments rather than treating it as a fully safe alternative.
There is a small performance tradeoff to be aware of, as auto mode adds some overhead to token consumption, cost, and latency per tool call due to the classifier running before each action.

AWS

41:21 Amazon Bedrock AgentCore Runtime now supports shell command execution

Amazon Bedrock AgentCore Runtime now includes InvokeAgentRuntimeCommand, an API that lets developers execute shell commands directly inside a running agent session, streaming output in real time over HTTP/2 and returning exit codes without custom container logic.
The practical benefit here is that AI agents frequently need to run deterministic operations like tests, dependency installs, or git commands alongside LLM reasoning, and previously, developers had to build all that process management themselves inside their containers.
Commands run in the same container, filesystem, and environment as the agent session and can execute concurrently with agent invocations without blocking, which simplifies architectures for coding agents, CI/CD automation, and similar workflows.
The feature is available across 14 AWS regions, including major US, European, and Asia Pacific locations, giving teams broad geographic coverage for latency-sensitive or data-residency-constrained workloads.
Pricing details are not specified in the announcement, so teams evaluating this should check the AgentCore Runtime pricing page directly before building cost models around heavy command execution workloads.

42:11 Ryan – I do get the advantages of this. Most of my use cases in GitHub Autopilot or Cloud Code it’s running Shell to do lots of things, especially executing tests, and so for CI-CD type workflows, you couldn’t do anything without it. I’m really curious how teams were working around this; people that were previously using Agent Core, because I bet that is ugly. But yeah, it’s going to be dangerous.”

42:56 Amazon Inspector expands agentless EC2 scanning and introduces Windows KB-based findings

Amazon Inspector now supports agentless EC2 scanning for a broader range of software, including WordPress, Apache HTTP Server, Python packages, and Ruby gems, plus Windows OS vulnerabilities, with no configuration changes required for existing customers.
The new Windows KB-based findings consolidate multiple CVEs addressed by a single Microsoft patch into one finding, surfacing the highest CVSS score, EPSS score, and exploit availability, which reduces noise and makes remediation more straightforward.
All existing CVE-based Windows OS findings will automatically transition to KB-based findings, meaning security teams will see fewer duplicate alerts and can map findings directly to specific Microsoft patches via included KB article links.
The agentless approach lowers the operational overhead for security teams managing large EC2 fleets, particularly in environments where installing and maintaining agents is restricted or impractical.
Both capabilities are available across all AWS Regions where Amazon Inspector is currently offered, and pricing follows the existing Inspector model based on instance scanning volume, so customers should review the Inspector pricing page for current rates.

43:33 Justin – “I’m actually shocked this wasn’t already there, because CVE is really just the generic way that you would find these, but typically they’re always linked to a knowledge-based article which then typically links you to the patch, so I don’t know how people got from the CVE to the patch without this before, other than maybe the CVE mentions the KB articles.”

22:53 Amazon ECR now supports pull-through cache for Chainguard

Amazon ECR pull-through cache now supports Chainguard as an upstream registry source, allowing customers to automatically sync Chainguard container images into ECR without building custom synchronization workflows.
Chainguard images are known for their minimal attack surface and security-focused builds, so pairing them with ECR’s native image scanning and lifecycle policies gives teams a more integrated security posture for their container supply chain.
The practical benefit here is operational simplicity: teams using Chainguard images at scale no longer need separate tooling to keep images current, as ECR handles the sync automatically and frequently.
Cached Chainguard images inherit standard ECR capabilities, including lifecycle policies for cost management and image scanning, which means customers get consistent governance across both their own images and upstream Chainguard images.
The feature is available in all AWS regions where ECR pull-through cache is supported, and pricing follows standard ECR storage and data transfer rates with no additional charge specific to the Chainguard integration. Full details are in the ECR pull-through cache documentation here.

46:22 Matt – “It’s massive, but checks a box for your security team, right, that doesn’t want to understand how containers work. Just use this one, and you’ll have to worry about it. It’s like, but I can install anything I want on it. So is it actually going to help?”

47:57 AWS at 20*: Inside the rise of Amazon’s cloud empire, and what’s at stake in the AI era

AWS turns 20 this month, growing from 10 cents per compute hour in 2006 to nearly $129 billion in annual revenue, which would place it in the Fortune 500 top 40 as a standalone company.
The article traces how S3 and EC2 established the pay-per-use primitive model that directly undercut Oracle-style licensing and reshaped enterprise IT economics.
Bedrock has become the fastest-growing service in AWS history, surpassing 100,000 customers and generating multi-billion dollar revenue with 60% quarter-over-quarter spending growth. AWS built it as a multi-model platform rather than pushing a single in-house option, following the same pattern it used with CPUs and GPUs by offering AMD, Intel, Graviton, Nvidia, and Trainium alongside each other.
Project Rainier, an AI compute cluster powered by over 500,000 Trainium2 chips in Indiana, represents AWS attempting to reduce dependence on Nvidia by building its own silicon stack from chip to data center.
The OpenAI partnership, worth up to $100 billion in cloud commitments over eight years, brings OpenAI workloads onto Trainium chips, making it the second major AI lab after Anthropic to commit to Amazon’s custom silicon.
AWS still leads cloud revenue at over $116 billion annually, but Azure at $75 billion and Google Cloud at $50 billion annual run rates show the gap narrowing, particularly in AI workloads.
Corey Quinn’s Cisco analogy is worth discussing: AWS could remain profitable and essential while becoming less central to where AI innovation actually happens.
Jassy has publicly projected AWS could reach $600 billion in annual revenue by 2036 with AI as the driver, backing that with $200 billion in capital expenditure planned for this year alone, which would consume nearly all of Amazon’s operating cash flow.
Happy Birthday

49:37 AWS MCP Server (Preview) now with enhanced monitoring and semantic search capability

AWS MCP Server in preview now automatically publishes metrics to CloudWatch under the AWS-MCP namespace at no additional cost, covering invocation counts, success rates, client errors, server errors, and throttling for individual tools like the AWS API caller and Agent SOP retriever.
Agent SOPs are pre-built, tested workflows that guide AI assistants through complex multi-step AWS tasks, and the documentation search tool now uses semantic similarity so agents can discover the right SOP through natural language queries rather than exact keyword matching.
The CloudWatch integration addresses a previous gap where customers had no visibility into agent-driven changes, enabling teams to track usage patterns, identify permission issues, and configure alarms when error rates exceed defined thresholds.
The service is currently available only in US East (N. Virginia) in preview, which is worth noting for teams with data residency requirements or those operating primarily in other regions.
For listeners building AI-assisted infrastructure automation, this update provides a practical observability layer for MCP-based agents, which is increasingly relevant as teams adopt AI assistants for AWS operations tasks.

50:26 Ryan – “Why did everything go offline? Now you can find out!”

GCP

50:59 CloudSQL read pools support autoscaling

Cloud SQL read pools, now generally available for Enterprise Plus edition, let you provision up to 20 read replicas behind a single load-balanced endpoint for MySQL and PostgreSQL, removing the need to manually manage multiple replicas or reconfigure applications when nodes are added or removed.
The new autoscaling feature dynamically adjusts node count based on CPU utilization or database connection thresholds, with users defining minimum and maximum node counts so the pool scales within those bounds automatically during traffic fluctuations.
Pools with two or more nodes are backed by a 99.99% availability SLA that covers maintenance downtime, and configuration changes like VM type or database flag updates are applied across all nodes with near-zero downtime.
From a cost perspective, autoscaling helps avoid over-provisioning by scaling in during low-traffic periods, meaning you pay only for nodes actively in use rather than maintaining a fixed fleet sized for peak load.
Retail and other industries with variable workloads are a natural fit, and teams can get started via gcloud CLI, Terraform, or the REST API, with a 30-day free trial available at cloud.google.com/sql for hands-on access to Enterprise Plus features.
Want to sign up for a free trial of Cloud SQL? You can do that here.

52:29 Matt – “The feature here I actually like is that it autoscales reads… nothing I’ve seen will do auto scaling on the reads for SQL and scale it out horizontally in that way. Like, even Aurora, if you’re on the normal one, you build a read replica, you have to build each read replica, and then either route or round robin to those ones. So if it’s actually going to do automatic adding and removing based on capacity needs, that’s a pretty nice feature because it can save you a lot of money.”

53:35 Design UI using AI with Stitch from Google Labs

Google Labs has evolved Stitch (stitch.withgoogle.com) into an AI-native design canvas that converts natural language descriptions into high-fidelity UI designs, targeting both professional designers and non-designers who want to move from concept to prototype quickly.
The updated tool introduces an infinite canvas, a design agent that reasons across a project’s full history, and an Agent Manager for running multiple design directions in parallel, which addresses a common pain point of managing divergent design explorations.
DESIGN.md is a notable addition that lets users extract and export design systems as an agent-friendly markdown file, making it easier to apply consistent design rules across projects or share them with other tools without starting from scratch each time.
Stitch connects to developer workflows through an MCP server and SDK, with export options to AI Studio and Antigravity, positioning it as a handoff layer between design and development rather than a standalone tool.
Pricing details are not specified in the announcement, so listeners interested in using Stitch for production workflows should check the documentation at stitch.withgoogle.com for current access and cost information.

55:20 Ryan – “I was developing something for my family, and it looks like you would expect, and so I can’t wait to try this out. And it was really impressive how fast, and how little feedback you gave it.”

Azure

56:12 Microsoft at NVIDIA GTC: New solutions for Microsoft Foundry, Azure AI infrastructure and Physical AI

Microsoft Foundry Agent Service and Observability in Foundry Control Plane are now generally available, giving enterprise teams a unified platform to build, deploy, and monitor AI agents with end-to-end visibility into agent behavior across tools, data, and workflows.
Azure is the first hyperscale cloud to power on NVIDIA Vera Rubin NVL72 systems in its labs, with rollout planned to liquid-cooled datacenters over the coming months, following deployment of hundreds of thousands of Grace Blackwell GPUs in under a year.
This positions Azure as a target platform for inference-heavy and reasoning-based workloads at scale.
NVIDIA Nemotron models are now available through Microsoft Foundry, and the Fireworks AI integration allows customers to fine-tune open-weight models into low-latency deployments that can be distributed to the edge.
Pricing for these models is not specified in the announcement and would vary based on usage.
Microsoft is extending NVIDIA Vera Rubin platform support to Azure Local, allowing organizations in sovereign and regulated environments to run next-generation AI workloads while maintaining Azure-consistent governance through Azure Arc and Foundry Local.
A new Physical AI Toolchain, available via a public GitHub repository, integrates NVIDIA Physical AI Data Factory with Azure services, enabling developers to build robotics and physical AI workflows that connect physical assets, simulation environments, and cloud training into repeatable enterprise pipelines.

57:38 Justin – “Skynet is VERY excited.”

59:06 Microsoft 365 pauses Copilot creep after admins cry foul

Microsoft has paused the automatic deployment of the Microsoft 365 Copilot app to desktop users, halting a rollout that had already slipped twice from its original October 2025 target date.
The pause has no specified end date, and existing installations remain unaffected.
The core admin complaint was that the opt-out default model increased IT workload by forcing organizations to set policies on Microsoft’s timeline rather than their own. Admins who want to proceed with deployment can still do so manually through other available methods.
European Economic Area customers were already excluded from this rollout, likely reflecting ongoing regulatory considerations around default software installations in that region.
This pause aligns with broader reported changes to Microsoft’s approach of embedding Copilot across Windows 11 surfaces, suggesting some recalibration of how aggressively the assistant is pushed to end users.
For IT decision-makers, the key takeaway is that centralized control over AI tool deployment remains a practical concern, and Microsoft’s willingness to halt the rollout signals that enterprise admin feedback carries weight in deployment decisions.

59:45 Justin – “Don’t force your IT people to do things. That’s not good. They’re already overworked and stressed.”

1:00:46 Advancing agentic AI with Microsoft databases across a unified data estate

Microsoft announced a savings plan for databases at SQLCon 2026, offering up to 35% savings versus pay-as-you-go pricing on a one-year hourly spend commitment, automatically applied across eligible Azure database services, including Azure SQL.
GitHub Copilot is now generally available in SQL Server Management Studio 22, bringing chat and T-SQL code assistance directly into SSMS for developers and DBAs who already use Copilot in Visual Studio and VS Code.
Azure SQL Database Hyperscale gained new public preview features, including a SQL MCP Server for connecting SQL data to AI agents, larger 160 and 192 vCore options, and enhanced vector indexes with full insert, update, and delete support requiring no code changes.
SQL database in Fabric reached general availability for several enterprise security features, including SQL Auditing, Customer-Managed Keys, and Dynamic Data Masking, with workspace-level Private Link in preview, targeting customers with strict governance and compliance requirements.
Microsoft introduced the Database Hub in Fabric, now in early access, providing a single management plane across Azure SQL, Cosmos DB, PostgreSQL, MySQL, and Arc-enabled SQL Server, with agent-assisted monitoring that surfaces estate-wide signals and recommended actions.
Interested in signing up for Database Hub? You can do that here.

1:01:37 Matt – “There’s a lot of ‘things’ in this blog post; the biggest one for me is the savings plan for databases… It’s just built in there now. It really means you can get those savings; you don’t have to commit or be a hyperscaler.”

1:03:28 Generally Available: Versionless key support for transparent data encryption in Azure SQL Database

Azure SQL Database now supports versionless keys for transparent data encryption, meaning customers can point to a key in Azure Key Vault without pinning to a specific version, and the database will automatically use the latest key version as it rotates.
This reduces operational overhead for teams managing customer-managed keys, eliminating the manual step of updating TDE configurations each time a key is rotated in Azure Key Vault or Managed HSM.
The practical benefit is improved reliability around key rotation workflows, since missed version updates previously could cause access disruptions to encrypted databases, a real risk in regulated industries with frequent rotation policies.
This feature is generally available and integrates with existing Azure Key Vault and Managed HSM setups, so customers already using bring-your-own-key TDE can adopt versionless references without rebuilding their encryption architecture.
No additional cost is associated with this feature beyond standard Azure Key Vault or Managed HSM pricing, making it a straightforward operational improvement for any Azure SQL Database customer using customer-managed keys.

1:04:10 Justin – “There’s no additional cost for this, and thank god, because this is the dumbest feature I’ve ever heard of in my entire life. Why does it not just do it automatically?”

1:06:24 Microsoft Launches Azure Skills Plugin to Give AI Coding Agents Real Azure Expertise

Microsoft released the Azure Skills Plugin, available at aka.ms/azure-plugin, which bundles over 19 curated Azure workflow skills, the Azure MCP Server with 200+ tools across 40+ Azure services, and the Foundry MCP Server into a single install for AI coding agents.
The goal is to move agents beyond generic code suggestions toward actual Azure deployment actions like provisioning, cost optimization, and live diagnostics.
The skills layer is the core differentiator here, encoding decision trees and sequencing logic for real Azure workflows rather than simple prompt snippets. Key skills include azure-prepare for generating infrastructure code, azure-validate for pre-flight checks, azure-deploy for orchestrating through the Azure Developer CLI, and azure-diagnostics for troubleshooting using logs and KQL queries.
The plugin is designed to be portable across agent hosts, including GitHub Copilot in VS Code, Copilot CLI, and Claude Code, with configuration handled automatically through a .mcp.json file and a .github/plugins/azure-skills folder. Teams using multiple agent tools do not need to maintain separate configurations for each.
Microsoft is explicit that this setup requires real credentials and real Azure resources, recommending least-privilege access, explicit tool approvals, and skills sourced only from trusted repositories. This positions the agent as a supervised collaborator rather than an autonomous actor, which is a practical consideration for teams evaluating security posture.
Prerequisites include Node.js 18 or later, Azure CLI authenticated via az login, and optionally the Azure Developer CLI for deployment workflows. No specific pricing is listed for the plugin itself, though costs will vary based on the underlying Azure services and resources the agent provisions during use.

1:07:48 Matt – “I’m actually most excited for the KQL feature because writing KQL is like writing SQL, but harder, but also I’m terrible at both, so don’t judge that one statement. But if I can live, just tell it to search the logs in a certain way, because right now I just have this terrible workflow of Claude – this is what I’m looking for in KQL. Copy-paste, take the screenshot, put it back over here, copy-paste, and iterate through this very slow cycle. So if I can have it understand KQL, so much better.”

1:08:56 Azure DevOps Remote MCP Server Lands in Microsoft Foundry, Giving AI Agents Direct Access to Your DevOps Data

Microsoft launched the Azure DevOps Remote MCP Server in public preview on March 17, followed by its integration into Microsoft Foundry two days later.
The server gives AI agents a hosted, authenticated connection to Azure DevOps data, including work items, pull requests, pipelines, repos, and wikis via a single URL endpoint at mcp.dev.azure.com.
Authentication runs entirely through Microsoft Entra, meaning organizations apply their existing identity policies, conditional access rules, and permission boundaries to agent access without building separate integrations. Notably, only Entra-backed Azure DevOps organizations are supported, leaving MSA-backed and on-premises deployments without this option for now.
Two access control headers stand out for enterprise use: X-MCP-Readonly restricts agents to read-only operations, and X-MCP-Toolsets lets teams scope which tool categories an agent can access. This shifts the governance conversation from whether agents should touch DevOps data to defining the specific conditions under which they can.
The Foundry integration connects Azure DevOps data to Foundry’s full agent development lifecycle, including model access, orchestration, evaluation, and deployment. Teams can add the server through the Foundry tool catalog and control which specific operations each agent is permitted to perform.
Current limitations worth noting include client support restricted to Visual Studio and VS Code without extra setup, while Claude Desktop, GitHub Copilot CLI, and ChatGPT require additional OAuth configuration in Entra before connecting. Microsoft has also indicated plans to eventually archive the local MCP Server in favor of this remote version, so teams on the local server should begin evaluating migration. No separate pricing has been announced beyond standard Azure DevOps and Foundry costs.

Oracle

1:11:13 Oracle Releases Java 26

Java 26 ships 10 JDK Enhancement Proposals covering AI integration, cryptography, and language simplification, including HTTP/3 support in the HTTP Client API and a fourth preview of primitive types in pattern matching. None of these are final features yet, with several JEPs still in preview or incubator status after multiple rounds.
The Ahead-of-Time Object Caching feature from Project Leyden is worth noting as it extends startup time improvements to work with any garbage collector, including ZGC, which addresses a practical pain point for cloud-native Java deployments where cold start latency matters.
Oracle is launching the Java Verified Portfolio, a bundled support offering covering JavaFX, Helidon, and the VS Code Java extension, included free for Java SE subscribers and OCI customers running Java workloads. For everyone else, pricing is not explicitly stated beyond noting that many components remain free for a wide range of use cases.
The Applet API removal in JEP 504 is notable mainly as a cleanup item, having been deprecated since JDK 17, and signals Oracle is willing to break legacy compatibility when features have been sufficiently warned about over multiple release cycles.
Helidon is being proposed as an OpenJDK project and aligned to the Java release cadence, which tightens Oracle’s control over the microservices framework ecosystem while keeping it open source, a pattern Oracle has used with other technologies in its portfolio.

1:11:21 Justin – “They brought AI to Java, and all is going to be lost.”

1:12:21 Oracle Unveils AI Database Agentic Innovations for Business Data

Oracle announced a bundle of agentic AI capabilities for Oracle AI Database at its AI World Tour in London, centered on keeping AI workloads closer to the data rather than moving data to external AI systems.
The headline additions include the Autonomous AI Vector Database in limited availability on free and low-cost developer tiers, a Private Agent Factory for no-code agent building, and a Unified Memory Core for storing agent context across multiple data types in a single engine.
The security angle is notable here. Oracle Deep Data Security and the Private AI Services Container are positioned to address prompt injection and data leakage risks by enforcing least-privilege access at the database layer rather than in application code, which is a practical concern for enterprises deploying agents against sensitive business data.
Oracle Trusted Answer Search takes a conservative approach to reducing hallucinations by matching user questions to pre-built reports via vector search rather than letting an LLM answer directly, which trades flexibility for determinism and may suit regulated industries but limits open-ended query use cases.
The open standards additions, specifically Vectors on Ice for Apache Iceberg support and an Autonomous AI Database MCP Server, are worth noting because they reduce some of the lock-in concerns that typically follow Oracle announcements, though customers still need to be running Oracle AI Database to benefit.
Pricing details are sparse in the announcement. The Autonomous AI Vector Database is available through the Oracle Cloud free tier or a low-cost developer tier, with a one-click upgrade path to full Autonomous AI Database, but Oracle has not published specific per-unit costs for the new agentic capabilities.

Closing

347: The CloudPod is Only Recording this Week “Because of AI”

Thu, 26 Mar 2026 21:24:20 +0000

Welcome to episode 347 of The Cloud Pod, where the forecast is always cloudy! Justin, Jonathan, and Ryan are in the studio recording today, and thankfully, Jonathan hasn’t replaced us all with Skynet – yet. This week, we’re discussing how old our tools (and us) are (hint: it’s really old), whether or not the SaasApocalypse is upon us, and whether or not the business or AI is responsible for the latest round of layoffs.

Titles we almost went with this week

S3 Bucket Names Finally Stop Being a Global Hunger Games
One Million Tokens Walk Into a Context Window
SLO Down and Smell the Reliability Metrics
CloudWatch Finally Watches Your Whole Cloud Organization
S3 Turns 20 and Still Buckets the Competition
Azure SRE Agent Goes GA So You Don’t Have To
Twenty Years of S3 and No Signs of Object Permanence
One Rule to Monitor Them All Across AWS
One Flag to Secure Them All on Cloud Run
SaaSpocalypse Now Atlassian Layoffs Hit the Jira
No More Bucket Name Bingo with S3 Regional Namespaces
A Picture Is Worth a Thousand Claude Tokens
One Command to Rule Your Autonomous AI Agents
AI Fixes Your Incidents Before Your Boss Notices
The CloudPod is only recording this week “Because of AI”
Amazon begs users to leave Simple DB with another migration tool

Follow Up

00:54 Microsoft’s brief in Anthropic case shows new alliance and willingness to challenge Trump administration

Microsoft filed an amicus brief in Anthropic’s lawsuit against the U.S. Department of War, urging a federal judge to temporarily block the Pentagon’s designation of Anthropic as a supply chain risk, citing substantial costs to government contractors that rely on Anthropic models.
The brief arrived one day after Microsoft launched Copilot Cowork, built on Anthropic’s Claude, and four months after Microsoft committed up to $5 billion in Anthropic as part of a deal requiring Anthropic to spend at least $30 billion on Azure, making the legal filing directly tied to concrete commercial dependencies.
Microsoft highlighted a procedural inconsistency in the government’s approach: the Pentagon gave itself six months to transition off Anthropic’s models while making the supply chain designation effective immediately for contractors, creating an unequal compliance burden.
Amazon, which has invested $8 billion in Anthropic, has not publicly responded to the lawsuit or the designation, creating a notable contrast in how two major cloud providers with similar financial exposure are handling the situation.
OpenAI announced its own Pentagon deal on the same day the Anthropic designation was issued, and 37 researchers from OpenAI and Google separately filed an amicus brief supporting Anthropic, indicating the case is drawing broad attention across the AI and cloud industry with potential implications for how AI guardrails are treated in government contracts.

01:37 Justin – “Oh, yeah, there’s a vested interest in the lawsuit which we did not mention last week, so I wanted to follow up on that, because that explains very clearly why Microsoft is throwing in with Anthropic on this.”

General News

02:37 Atlassian to shed ten percent of staff, because of AI

Atlassian is cutting roughly 1,600 employees, about 10 percent of its workforce, citing AI-driven changes to required skill sets and a need to self-fund further AI and enterprise sales investment.
The company’s market cap has dropped from a peak of around 112 billion dollars in 2021 to approximately 20 billion dollars today, providing financial context for why cost restructuring is happening alongside the AI narrative.
The SaaSpocalypse concept is worth discussing here, as Atlassian is among the SaaS vendors analysts flag as potentially vulnerable to organizations replacing traditional tools with AI-generated or vibe-coded alternatives.
Atlassian points to 25 percent cloud revenue growth, 600 customers spending over 1 million dollars annually, and 5 million users on its Rovo AI suite as indicators that the business is still growing, which creates an interesting tension with the layoff announcement.
For cloud practitioners, this is a concrete example of how AI adoption is beginning to visibly reshape headcount decisions at established SaaS vendors, not just startups, which has implications for how enterprises evaluate vendor stability and long-term support commitments.

03:18 Justin – “I’ve seen Rovo, which is Atlassian’s AI suite, and if that’s the best they can do… I have fears for the long-term health and viability of Jira in general. I’m kind of over the whole let’s blame AI for our bad business decisions. That’s going to get old real quick.”

AI Is Going Great – Or How ML Makes Money

06:18 Claude builds interactive visuals right in your conversation

Anthropic has launched in beta a new inline visualization feature for Claude that generates interactive charts, diagrams, and other visuals directly within chat conversations, available across all plan tiers at no additional cost.
These visuals are distinct from Claude’s existing artifacts system in a notable way: they are temporary and contextual, appearing inline rather than in a side panel, and they update or disappear as the conversation evolves rather than serving as persistent shareable documents.
Claude determines autonomously when a visual would aid comprehension, but users can also prompt it directly with natural language requests like “draw this as a diagram” or “visualize how this might change over time,” and can request adjustments iteratively within the same conversation.
The feature is part of a broader set of response format improvements Anthropic has been rolling out, including purpose-built layouts for recipes and weather queries, as well as direct in-conversation integrations with third-party tools like Figma, Canva, and Slack.
For developers and enterprise users, the practical implication is that Claude can now serve as a lightweight data visualization layer within workflows without requiring users to export data to separate charting tools, which could reduce friction in analytical and educational use cases.

07:27 Ryan – “Kind of excited when Claude decides that the monkey making the queries needs bigger pictures because the text isn’t working out, so it’s like, I get you, Claude. I see what you’re doing.”

07:38 Jonathan – “Anthropic’s Claude: Now with crayons.”

08:50 Introducing Genie Code

Databricks has launched Genie Code as a generally available product, positioning it as an agentic AI system built specifically for data teams rather than general software development.
It handles end-to-end tasks, including pipeline building, dashboard creation, ML model training, and production monitoring, directly within Databricks notebooks, SQL editor, and Lakeflow Pipelines.
The system claims to outperform a leading coding agent by more than 2x on real-world data science tasks, with the key differentiator being deep Unity Catalog integration that gives it access to data lineage, usage patterns, governance policies, and business semantics rather than just reading raw code.
Genie Code routes tasks across multiple models automatically, selecting from frontier LLMs, open source models, or custom Databricks-hosted models depending on the job, removing the need for users to manually choose models for different tasks.
A notable upcoming capability is background agents, which will proactively monitor Lakeflow pipelines and AI models, triage failures, handle routine Databricks Runtime upgrades, and auto-fix issues like schema mismatches in a sandboxed environment before alerting the team.
The governance angle is worth discussing for enterprise cloud users: Genie Code enforces Unity Catalog access controls during all operations, meaning it only surfaces data assets a user is authorized to see and respects existing lineage rules when building pipelines, which addresses a common concern with agentic systems operating on sensitive production data.

10:05 Ryan – “I don’t think it will kill Glue or any of the ETL things, but hopefully it will just do it for you, and then I don’t think I care anymore.”

11:19 1M context is now generally available for Opus 4.6 and Sonnet 4.6

Anthropic has moved 1M context windows to general availability for Claude Opus 4.6 and Sonnet 4.6, with standard pricing applying across the full window and no long-context premium.
Opus 4.6 is priced at $5/$25 per million input/output tokens, and Sonnet 4.6 at $3/$15, meaning a 900K-token request costs the same per-token rate as a 9K one.
On the performance side, Opus 4.6 scores 78.3% on MRCR v2, a benchmark measuring recall and reasoning across long contexts, which Anthropic claims is the highest among frontier models at that context length.
Practical use cases include loading entire codebases, thousands of pages of contracts, or full agent traces with tool calls and intermediate reasoning, eliminating the need for lossy summarization or manual context management that long-context workflows previously required.
Claude Code users on Max, Team, and Enterprise plans now get 1M context automatically with Opus 4.6, meaning fewer session compactions and more conversation history retained without consuming extra usage credits.
The 1M context window is available natively on the Claude Platform and through Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry, making it accessible across the major cloud provider ecosystems that developers are already using.

19:46 Introducing GPT-5.4 mini and nano

OpenAI released GPT-5.4 mini and nano, two small models positioned for high-volume, latency-sensitive workloads.
GPT-5.4 mini runs more than 2x faster than GPT-5 mini while approaching GPT-5.4 performance on benchmarks like SWE-Bench Pro and OSWorld-Verified.
Pricing is notably lower than larger models: GPT-5.4 mini costs $0.75 per 1M input tokens and $4.50 per 1M output tokens, while GPT-5.4 nano comes in at $0.20 input and $1.25 output per 1M tokens, with a 400k context window on mini.
The models are designed for multi-model orchestration patterns where a larger model like GPT-5.4 handles planning and coordination while GPT-5.4 mini subagents execute narrower parallel tasks, a pattern OpenAI has built directly into their Codex product.
In Codex specifically, GPT-5.4 mini uses only 30% of the GPT-5.4 quota, giving developers a cost-effective path for simpler coding tasks like codebase navigation, targeted edits, and debugging loops without sacrificing too much capability.
GPT-5.4 nano is API-only and recommended for classification, data extraction, ranking, and simpler subagent tasks, making it a practical option for cloud workloads where cost and throughput matter more than deep reasoning.

21:00 Ryan – “I’m a fan of these little models for certain things; as part of that tuning, my agent definitions have gotten a lot more complex. A lot of times, I’m breaking out agent definitions so that I can specifically use one of the smaller models for certain types of tasks. Data extraction being a big one.”

AWS

22:53 Twenty years of Amazon S3 and building what’s next

S3 turns 20 years old this month, growing from 1 petabyte of capacity and 15 cents per gigabyte in 2006 to hundreds of exabytes storing over 500 trillion objects at just over 2 cents per gigabyte today, representing roughly an 85% price reduction over two decades.
A notable engineering detail is that code written for S3 in 2006 still works today unchanged, with AWS maintaining complete API backward compatibility through multiple infrastructure generations, which is why the S3 API has become a de facto standard across the storage industry.
On the technical side, AWS has spent 8 years progressively rewriting performance-critical S3 components in Rust for memory safety and performance, and uses formal methods with automated proofs to mathematically verify consistency in the index subsystem and cross-region replication.
AWS is positioning S3 as a universal data foundation with three newer capabilities worth noting: S3 Tables for managed Apache Iceberg analytics, S3 Vectors for native vector storage supporting up to 2 billion vectors per index at sub-100ms latency, and S3 Metadata for centralized object cataloging, all priced at standard S3 cost structures rather than specialized database pricing.
The maximum object size has grown from 5 GB to 50 TB, and AWS reports customers have collectively saved over $6 billion in storage costs through S3 Intelligent-Tiering compared to S3 Standard storage class pricing.

24:08 Justin – “I am a big fan of the S3 vectors because we use it for Bolt.”

25:39 Introducing account regional namespaces for Amazon S3 general-purpose buckets

AWS S3 now supports account regional namespaces for general-purpose buckets, where bucket names automatically include your account ID and region as a suffix, such as mybucket-123456789012-us-east-1-an.
This solves the long-standing problem of bucket name collisions in the global namespace, particularly useful for large organizations managing buckets at scale across multiple regions.
The feature integrates with IAM and AWS Organizations service control policies via the new s3:x-amz-bucket-namespace condition key, allowing security teams to enforce that employees only create buckets within their account’s namespace.
- This gives enterprises a straightforward governance mechanism to prevent naming conflicts and unauthorized bucket creation.
Existing global namespace buckets cannot be renamed to use the account regional namespace, so this is a forward-looking change for new bucket creation only. S3 table buckets, vector buckets, and directory buckets already operate in account-level or zonal namespaces, so this update brings general-purpose buckets in line with those patterns.
CloudFormation support is included via the BucketNamespace property and pseudo parameters AWS::AccountId and AWS::Region, making it straightforward to update existing IaC templates. CLI and Boto3 support is also available using the x-amz-bucket-namespace header or BucketNamespace parameter.
The feature is available across 37 AWS regions, including AWS China and GovCloud, at no additional cost, making it a low-friction adoption for teams looking to simplify bucket naming conventions without budget impact.

27:17 Jonathan – “What’s really annoying is your account number is part of the public S3 bucket name! I wish a security person had been in the room there.”

28:17 Amazon CloudWatch Application Signals adds new SLO capabilities

Amazon CloudWatch Application Signals now includes three new SLO capabilities: SLO Recommendations, Service-Level SLOs, and SLO Performance Report, addressing longstanding gaps in data-driven reliability management for AWS customers.
SLO Recommendations analyzes 30 days of historical P99 latency and error rate data to suggest appropriate reliability targets, reducing the manual guesswork that previously led to misconfigured thresholds and alert fatigue.
Service-Level SLOs give teams a consolidated view of reliability across all operations within a service, making it easier to align technical monitoring with business objectives without stitching together multiple dashboards.
The SLO Performance Report adds calendar-aligned historical reporting at daily, weekly, and monthly intervals, which is useful for teams that need to present reliability data to stakeholders in business-friendly formats.
Pricing is usage-based, tied to inbound and outbound application requests plus SLO charges, with each SLO generating 2 application signals per service level indicator metric period. The features are available in all regions where CloudWatch Application Signals is supported.

29:11 Jonathan – “So instead of fixing your product, you just use a tool that tells you that you should turn down your commitments to your customers. Ok…”

29:57 Amazon SimpleDB now supports exporting domain data to Amazon S3

Amazon SimpleDB, one of AWS’s oldest database services dating back to 2007, now supports exporting domain data directly to S3 in JSON format, giving long-time users a practical path to migrate away from the service or archive data for compliance purposes.
The export tool introduces three new APIs (StartDomainExport, GetExport, and ListExports) with background processing that avoids any performance impact on the running database, which matters for users who cannot afford downtime during data extraction.
Cross-region and cross-account support, along with multiple encryption options, make this useful for organizations with strict data governance requirements who need to move SimpleDB data into modern storage or database systems.
Rate limiting is set at 5 exports per domain and 25 per account within a 24-hour window, so teams with large numbers of domains should plan their migration timelines accordingly rather than assuming bulk exports can happen all at once.
The tool itself is free to use, but standard S3 data transfer charges apply, so cost planning should account for data volume when scoping a migration or archival project.

30:53 Justin – “SimpleDB gets a new feature!”

32:19 Amazon CloudWatch introduces organization-wide EC2 detailed monitoring enablement

CloudWatch now supports organization-wide rules to automatically enable EC2 detailed monitoring, shifting metrics collection from a per-instance manual task to a centralized policy-driven configuration across the entire AWS Organizations.
Rules can be scoped to the full organization, specific accounts, or individual resources using tags, so teams can target environments like production workloads without enabling the feature universally and incurring unnecessary costs.
The 1-minute interval metrics that detailed monitoring provides are particularly relevant for Auto Scaling groups, where faster data collection means scaling policies can respond more quickly to utilization changes rather than waiting for the default 5-minute interval.
The feature covers both existing and newly launched instances within the rule scope, which closes a common gap where new instances spun up after policy creation would otherwise miss monitoring configuration.
Detailed monitoring costs apply per instance per metric per month per CloudWatch pricing, so organizations should evaluate tag-based scoping carefully to avoid unexpected billing increases when rolling this out broadly.

33:17 Ryan – “I mean, what’s wrong with the previous method of waiting until you had an outage, not having the data, and THEN turning it on for your project?”

GCP

33:47 Why context is the missing link in AI data security

Google Cloud’s Sensitive Data Protection is now generally available with new context classifiers for medical and finance data, plus image object detectors for faces and passports, moving beyond simple keyword matching to understand the semantic meaning of data.
For AI training workflows on Vertex AI, SDP can scan unstructured image data using OCR and object detection to find sensitive content like credit card numbers or photo IDs, then generate redacted versions rather than discarding the data entirely, preserving training dataset quality.
The context-aware approach addresses a practical problem with traditional regex-based detection: the same number sequence can be treated differently depending on surrounding words, so “order number” passes through while “wallet number” triggers financial context classification and redaction.
SDP serves as the underlying engine for several other Google Cloud products, including Model Armor, Security Command Center, and Contact Center as a Service, meaning improvements here propagate across those services automatically.
Organizations in regulated industries like healthcare and finance are the most direct beneficiaries, as the tool helps ensure AI agents only access data appropriate to their function during both training and live user interactions. Pricing details are not specified in the announcement, so teams should check cloud.google.com/security/products/sensitive-data-protection for current rates.

35:16 Ryan – “I don’t really think that’s usually where the sensitive data is. It can be, in some workloads, but probably not the majority, so there’s so many false positives, so I really like the idea that they’re having context be a part of that decision.”

37:16 Welcoming Wiz to Google Cloud: Redefining security for the AI era

Google has completed its acquisition of Wiz, a cloud and AI security platform, which will retain its brand and continue supporting multicloud environments, including AWS, Azure, and Oracle Cloud Platform, alongside Google Cloud.
Wiz connects code, cloud, and runtime into a single context, allowing security teams to map application architecture, permissions, data flows, and runtime behavior in real time to identify and prioritize exploitable attack paths before they reach production.
The combined offering integrates Wiz’s cloud security platform with Google Security Operations, Mandiant Consulting, and Google Threat Intelligence under the Google Unified Security umbrella, with Gemini AI assisting in threat hunting, remediation workflows, and audit documentation.
A notable focus of the acquisition is AI-specific security, addressing threats that target AI models and those generated by AI systems, which is increasingly relevant as organizations deploy AI agents fed with business-critical data.
Pricing details for the combined platform have not been announced, but Wiz products will remain available through existing partner channels, system integrators, and managed security service providers, suggesting continuity for current Wiz customers during the transition.

38:16 Justin – “Typically on these acquisitions, it takes about a year for Google to figure out how to package them properly, and most likely they’ll want a separate contract for it anyways because that’s how all the integration acquisitions they’ve done are.”

39:22 IAP integration with Cloud Run

Google Cloud Run now supports direct Identity-Aware Proxy integration in general availability, allowing developers to enable IAP authentication with a single UI click or the –iap flag in gcloud, eliminating the previous requirement to configure load balancers manually. IAP carries no additional cost beyond standard Cloud Run charges, with limited exceptions noted in the pricing docs.
IAP on Cloud Run supports enterprise authentication features, including user and group identity policies, context-aware access controls based on IP, geolocation, and device status, and Workforce Identity Federation for external identity providers. This makes it practical for organizations that need to secure internal web applications without building custom authentication layers.
A separate change allows Cloud Run services to disable the default IAM invoker check by selecting “Allow Public access,” which resolves a long-standing friction point for teams trying to host public-facing applications while also enforcing Domain Restricted Sharing org policies.
The two features address different scenarios: IAP is the recommended path for internal business applications requiring user authentication, while the public access option suits public websites, store locators, or private microservices where network-level controls like Cloud Armor handle security instead.
Real-world adoption examples include L’Oreal using IAP across their Google Cloud application portfolio and Bilt Rewards disabling IAM invoker checks on multi-regional Cloud Run services to simplify edge routing while relying on Cloud Armor for security enforcement.

39:57 Ryan – This is a neat little feature. I don’t know how widely known it is, but it’s something that I’ve been using for a while.”

42:09 Multi-cluster GKE Inference Gateway helps scale AI workloads

Google Cloud has launched a preview of multi-cluster GKE Inference Gateway, which extends the existing GKE Gateway API to enable model-aware load balancing for AI inference workloads across multiple GKE clusters and regions.
This addresses practical limitations of single-cluster deployments like GPU/TPU capacity caps and regional availability risks.
The system introduces two core Kubernetes custom resources, InferencePool and InferenceObjective, which group model-server backends and define routing priorities, respectively.
This allows the gateway to intelligently multiplex latency-sensitive and lower-priority inference requests across a distributed fleet.
A notable technical capability is the GCPBackendPolicy resource, which enables load balancing decisions based on real-time custom metrics such as KV cache utilization on model servers.
This is more inference-specific than traditional request-count or latency-based routing approaches.
The architecture uses a dedicated config cluster to manage a single Gateway configuration that routes traffic to multiple target clusters, simplifying operations for teams running globally distributed AI services. Supported use cases include disaster recovery, capacity bursting, and heterogeneous hardware utilization.
Pricing for this feature is not separately detailed in the announcement, so costs would likely follow existing GKE and Cloud Load Balancing pricing structures. Teams evaluating this should factor in multi-cluster networking and potential cross-region data transfer costs alongside their GPU/TPU resource expenses.

43:06 Ryan – “Simplify. Sure…”

44:35 More transparency and control over Gemini API costs

Google AI Studio now supports Project Spend Caps, letting developers set monthly dollar limits per project directly from the Spend tab.
There is a roughly 10-minute enforcement delay, so users remain responsible for any overages incurred during that window.
Usage Tiers have been redesigned with lower spend qualifications, automatic tier upgrades based on payment history, and system-defined billing account caps that increase as you move to higher tiers. This reduces manual intervention for developers scaling their API usage over time.
Three new dashboards have been added to Google AI Studio covering rate limits, costs, and usage. The rate limit dashboard tracks RPM, TPM, and RPD per project, while the cost dashboard offers a daily breakdown filterable by model and time range going back up to a full month.
Billing setup can now be completed entirely within Google AI Studio, including linking billing profiles to projects, removing the previous need to navigate across multiple Google Cloud console windows.
This consolidation is particularly useful for teams managing several projects under one billing account.
Developers building with Imagen and Veo now have dedicated usage graphs alongside standard request metrics, giving multimodal workloads the same observability previously available only for text-based Gemini API calls.

45:13 Justin – “If you’ve ever tried to figure out who is using what models and what they’re doing with them and how much it costs, you know that this is all terrible – and this doesn’t actually improve it all that much.”

Azure

47:35 Generally Available: Azure SRE Agent with new capabilities

Azure SRE Agent is now generally available as an AI-powered operations tool designed to help teams diagnose incidents faster and automate response workflows, to reduce downtime and manual operational work.
The GA release introduces deep context gathering capabilities, meaning the agent can pull together relevant signals and telemetry during an incident rather than requiring engineers to manually correlate data across multiple tools.
This fits naturally into teams already using Azure Monitor, Application Insights, and related observability tooling, as the agent is positioned to work within existing Azure operations workflows rather than requiring a separate platform.
The primary target audience is operations and SRE teams managing production workloads on Azure who are looking to reduce the time between incident detection and resolution without adding headcount.
Pricing details were not included in the announcement, so teams evaluating this should check the Azure pricing page directly here before planning adoption, as AI-powered agent services on Azure typically carry consumption-based costs.

48:25 Jonathan – “All right, so they run the services, which are going to have problems. And now they want me to pay for another service so that I can use that tool to troubleshoot the problems with the other tools that I’m already paying for. OK…”

55:59 Many agents, one team: Scaling modernization on Azure

Azure announced two new public preview offerings: the Azure Copilot migration agent and the GitHub Copilot modernization agent, designed to automate discovery, assessment, planning, and deployment for organizations moving workloads to Azure.
The migration agent targets servers, virtual machines, applications, and databases, while the modernization agent orchestrates code upgrades at scale across multiple applications simultaneously.
The two agents are designed to work together, with GitHub Copilot scanning application code to produce assessment reports that Azure Copilot’s migration agent then ingests to inform cloud infrastructure planning. This integration aims to close the historical gap between developer-level code work and infrastructure decisions around landing zones, networking, and governance.
Early customer results show a 70% reduction in total modernization effort using GitHub Copilot modernization capabilities, and Ahold Delhaize is cited as a customer that reduced complexity and accelerated delivery using these agentic workflows across discovery, assessment, and execution.
Microsoft is pairing these agentic tools with a structured delivery program called Cloud Accelerate Factory, a no-cost benefit under Azure Accelerate where Microsoft experts work alongside customers from discovery through production. Pricing for the agents themselves is not specified in the announcement, so listeners should check Azure pricing pages directly for cost details.
According to a Forrester Q1 2026 survey of 223 global IT leaders, 91% view application modernization as necessary for enabling AI in their business, which provides context for why Microsoft is investing in automating what has traditionally been a slow, manual planning process.

52:32 Ryan – “I keep waiting for someone to tout the success of how they did it, they’ve migrated all their terrible legacy code into this new thing, and it all works – but I haven’t seen it…”

53:28 Announcing Fireworks AI on Microsoft Foundry

Microsoft Foundry now integrates with Fireworks AI’s inference cloud, giving customers access to models like DeepSeek v3.2, Kimi K2.5, and OpenAI’s gpt-oss-120b through both pay-per-token and provisioned throughput deployment options.
This is currently in public preview and requires an opt-in through the Azure portal’s Preview features panel.
Pricing follows a per-million-token model for serverless deployments covering input, cached input, and output tokens, with US Data Zone availability across six regions, including East US and West US.
Default quota limits start at either 250K or 25K tokens per minute, depending on subscription type, with additional quota available via a request form.
A notable addition is custom model support, allowing teams who have fine-tuned models from families like Qwen3-14B, DeepSeek v3, or Kimi K2 to import and deploy those weights directly into Foundry projects.
The Azure Developer CLI has been updated with an azd ai models create command to facilitate the weight transfer process.
Fireworks-hosted models are distinct from Azure Direct models in that they skip Microsoft’s Responsible AI safety assessments, so teams needing safety evaluations will need to use Foundry’s built-in risk and safety evaluator tools separately.
Model retirement for serverless deployments comes with at least 30 days’ notice, and customers can extend usage past retirement dates by switching to provisioned throughput deployments, which use existing Global PTU quota and reservation commitments.

54:30 Justin – “Sounds like it’s a cross-connect that they’ve done to Firework’s cloud basically, to provide this to you, so it’s sort of interesting.”

56:02 Announcing Copilot leadership update

Microsoft is reorganizing its Copilot efforts by merging consumer and commercial teams into a single unified org, structured around four pillars: Copilot experience, Copilot platform, Microsoft 365 apps, and AI models.
Jacob Andreou will lead the combined Copilot experience as EVP, reporting directly to Satya Nadella.
Mustafa Suleyman is shifting focus exclusively to what Microsoft calls its “superintelligence” effort, concentrating on frontier model development, enterprise-tuned model lineages, and reducing inference costs at scale over the next five years.
The restructuring reflects a product direction where Copilot moves from individual features toward an integrated system connecting agents, apps, and workflows, with recent announcements like Copilot Tasks, Copilot Cowork, and Agent 365 representing early examples of this approach.
For enterprise customers, the key practical implication is that commercial and consumer Copilot capabilities will converge, meaning IT and governance controls will need to account for a more unified product surface rather than separate consumer and business tracks.
The Copilot Leadership Team now includes Suleyman, Andreou, Charles Lamanna, Perry Clarke, and Ryan Roslansky, signaling that Microsoft 365 app development and platform infrastructure will be tightly coordinated with model development rather than operating independently.

57:23 Ryan – “Noticeably missing is Github’s Copilot…”

After Show

55:59 Washington state hotline callers hear AI voice with Spanish accent

Washington state’s Department of Licensing accidentally routed Spanish-language callers to an AI voice speaking English with a Spanish accent for several months, a direct result of a misconfiguration by DOL staff using Amazon Web Services Polly.
AP journalists were able to replicate the issue by selecting the AWS Polly voice named “Lucia,” which is designed to mimic Castilian Spanish, highlighting how easy it is to misconfigure AI voice services when teams lack familiarity with the underlying platform options.
The incident is a practical reminder that deploying AI-driven customer service tools across multiple languages requires thorough testing and quality assurance, particularly for government agencies serving diverse populations with real accessibility needs.
Amazon provided the platform but declined interview requests, raising a recurring question in cloud deployments about where vendor responsibility ends and customer configuration responsibility begins when things go wrong in production.
The story went viral with around 2 million TikTok views, which illustrates how public-facing AI failures in government services can quickly become reputational issues, adding pressure on agencies to treat AI deployment with the same rigor as other critical infrastructure.

Closing

346: Zuckerberg Finally Finds His People, They Are All AI Agents

Thu, 19 Mar 2026 21:16:37 +0000

Welcome to episode 346 of The Cloud Pod, where the forecast is always cloudy! Hold on to your butts, because Justin, Ryan, and Matt are in the studio today, and they’re ready to bring you all the latest in Cloud and AI news, including the usual: Meta buying social networks, Amazon responding to outages, and OpenAI giving up another version of GPT. Let’s get into it!

Titles we almost went with this week

✍️ Cloudflare Spent $1100 to Rewrite Next.js in a Week
One Pipe to Rule All Your OpenTelemetry Data
☑️ Check Yourself Before Google Wrecks Your Cloud Config
Copilot Takes Jira Tickets So You Don't Have To
‍✈️ GitHub Copilot Agent Joins Your Jira Workflow Uninvited
When AI Agents Network, Meta Swipes Right on Moltbook
️ Sixty Controls Walk Into a Terraform Repository
One Security Console to Rule All Your Clouds
AI Ate My Lock-In, and I Feel Fine
⛅ Oracle Sees $90 Billion Future Cloudy With a Chance of GPUs
Your API Has Trust Issues, and We Can Prove It
Stop Running Three Pipelines Like a Telemetry Hoarder
From Database Dinosaur to AI Cash Cow
☠️ Meta: Target acquired; must kill Moltbook
Meta saw Moltbook and said, “WE MUST OWN IT AND KILL.”

Follow Up

00:51 Where things stand with the Department of War

Anthropic has been designated a supply chain risk to US national security by the Department of War, a designation the company is challenging in court as legally unsound under 10 USC 3252.
The practical scope of the designation is narrow, applying only to the use of Claude in direct Department of War contracts, not to all customers that hold such contracts or to unrelated business with Anthropic.
Anthropic has stated that it will continue to provide its models to the Department of War and the national security community at nominal cost, with ongoing engineering support, during any transition period and for as long as permitted.
The company's two stated exceptions to military use involve fully autonomous weapons and mass domestic surveillance, and Anthropic has clarified these do not extend to operational decision-making, which it considers the military's domain.
For cloud and enterprise customers, the key takeaway is that existing Claude deployments unrelated to Department of War contracts remain unaffected, though the legal dispute introduces uncertainty into federal procurement pipelines involving AI services.
We will keep you updated on this in 12-18 months…

AI Is Going Great - Or How ML Makes Money

01:21 Introducing GPT-5.4

OpenAI released GPT-5.4 across ChatGPT, the API, and Codex, positioning it as their most capable reasoning model to date. It merges the coding strengths of GPT-5.3-Codex with general reasoning, professional knowledge work, and native computer-use capabilities in a single model.
The computer-use capabilities are a notable technical step, with GPT-5.4 achieving a 75% success rate on OSWorld-Verified desktop navigation, surpassing the reported human benchmark of 72.4% and up from GPT-5.2's 47.3%.
This makes it the first general-purpose OpenAI model with native computer use built in, making it relevant for developers building agents that operate across web browsers and desktop software.
Tool search is a practical efficiency improvement for agentic API workflows, dynamically loading tool definitions only when needed rather than stuffing all definitions into the prompt upfront. In testing against Scale's MCP Atlas benchmark on 36 MCP servers, this reduced total token usage by 47% with no loss in accuracy, directly translating to lower API costs for tool-heavy applications.
On the professional work side, GPT-5.4 scores 87.3% on an internal investment banking spreadsheet benchmark, up from 68.4% for GPT-5.2, and achieves 91% on BigLaw Bench for legal document work. The ChatGPT for Excel add-in, launched alongside it, gives Enterprise customers a direct integration path.
Pricing is higher per token than GPT-5.2 in the API, though OpenAI notes the model's token efficiency should offset costs for many workloads.
Batch and Flex pricing remain available at half the standard rate, and Priority processing is available at 2x the standard rate for latency-sensitive use cases.

02:19 Justin - “There’s also been a slew of every cloud provider in the world announcing Chat-GPT 5.4 is now available, and we will not be telling you about all of them, but assume that if you use a different model or different cloud, they probably have it.”

04:33 Introducing ChatGPT for Excel and new financial data integrations

OpenAI launched ChatGPT for Excel in beta, an add-in powered by GPT-5.4 that lets users build, update, and analyze spreadsheet models using plain language descriptions.
It preserves existing formulas and structure, asks permission before making changes, and links answers to specific cells for auditability.
Available now for Business, Enterprise, Edu, Pro, and Plus users in the US, Canada, and Australia.
GPT-5.4 (also available as GPT-5.4 Thinking) is now live in ChatGPT, Codex, and the API, with OpenAI noting it was specifically tuned on real-world finance workflows, including financial modeling, scenario analysis, data extraction, and long-form research.
New financial data integrations bring Moody's, Dow Jones Factiva, MSCI, Third Bridge, MT Newswire, and others directly into ChatGPT workflows, with FactSet coming soon.
Organizations can also connect proprietary data sources using Model Context Protocol (MCP), centralizing market, company, and internal data in a single interface.
For enterprise deployments, the Excel add-in supports RBAC, SAML SSO, SCIM, audit logs, AES-256 encryption at rest, TLS 1.2+ in transit, and data residency controls. In Enterprise and Edu workspaces, the feature is off by default and requires admin enablement with custom roles and group permissions.
ChatGPT for Google Sheets is listed as coming soon, signaling OpenAI is extending this spreadsheet integration beyond the Microsoft ecosystem.

04:49 Justin - “If I were a betting man, I’d also say they’re going to have a PowerPoint version any day.”

06:13 Meet KARL: A Faster Agent for Enterprise Knowledge, powered by custom RL

Databricks introduced KARL (Knowledge Agent with Reinforcement Learning), a custom model built using RL techniques to handle grounded reasoning tasks like document search, fact-finding, and multi-step reasoning across enterprise data sources.
KARL was trained with a few thousand GPU hours using entirely synthetic data. In internal testing, it matched or outperformed Frontier's proprietary models on inference cost, latency, and response quality simultaneously.
The core technical challenge KARL addresses is hard-to-verify tasks, where there is no single correct answer, making RL reward signal design particularly difficult compared to domains like math or code, where correctness is easier to measure.
Databricks is now offering a Custom RL private preview backed by Serverless GPU Compute, allowing enterprise customers to use the same RL pipeline that produced KARL to build domain-specific, cost-optimized versions of their own high-volume agents.
For enterprises running AI agents at scale, this approach suggests that custom RL fine-tuning on smaller models can substantially reduce inference costs compared with relying on general-purpose frontier models, a practical consideration as agentic workload costs grow.
Interested in checking out the preview? You can find more information on that here.

07:09 Ryan - “It's kind of a neat idea to provide sort of the pipeline there. I mean, I guess the big cloud providers are producing agent-building platforms and stuff; I wonder how much of this you can follow the path that they use for creating KARL and building your own domain-specific agent in the same way. I like the idea. Smaller model, less GPU.”

08:55 Codex Security: now in research preview

OpenAI launched Codex Security in research preview, formerly known as Aardvark, and is now available to ChatGPT Pro, Enterprise, Business, and Edu customers via the Codex web with free usage for the first month.
The tool functions as an agentic application security scanner that builds a project-specific threat model to identify and prioritize vulnerabilities with context-aware fixes.
The performance metrics from the beta are notable: false positive rates dropped by over 50%, overreported severity findings fell by more than 90%, and noise was reduced by 84% in some repositories.
Over the last 30 days, it scanned more than 1.2 million commits, surfacing 792 critical and 10,561 high-severity findings, with critical issues appearing in fewer than 0.1% of commits.
The tool uses sandboxed validation environments to pressure-test findings before surfacing them and can generate working proofs of concept when configured with a project-specific runtime environment. It also learns from user feedback on finding severity to refine its threat model over time.
Codex Security has already produced real-world results in open source, with 14 CVEs assigned across projects including OpenSSH, GnuTLS, GOGS, PHP, and Chromium.
OpenAI is also launching Codex for OSS, offering free ChatGPT Pro and Plus accounts, as well as Codex Security access for open-source maintainers.

10:07 Ryan - “I wish AI wouldn’t generate all those vulnerabilities in code… but I do like that these tools are available.”

12:40 OpenAI to acquire Promptfoo

OpenAI is acquiring Promptfoo, an AI security platform used by over 25 percent of Fortune 500 companies, with plans to integrate its technology directly into OpenAI Frontier, the company's enterprise platform for building AI agents.
Promptfoo's core capabilities include automated red-teaming and security testing for LLM applications, targeting risks such as prompt injection, jailbreaks, data leaks, tool misuse, and out-of-policy agent behavior.
These will become native features within Frontier rather than separate tools.
The acquisition addresses a practical gap for enterprise AI deployments: systematic ways to test agent behavior before production, maintain audit trails, and meet governance and compliance requirements as AI agents connect to real data and business systems.
Promptfoo also maintains a widely used open-source CLI and library on GitHub, and OpenAI has stated it will continue developing the open-source project alongside the integrated enterprise capabilities, which is notable for developers already using those tools.
For enterprises building on Frontier, this signals that security testing and evaluation are moving from optional add-ons to built-in requirements of the development workflow, with direct implications for how teams structure AI deployment pipelines and compliance documentation.

13:36 Justin - “It's good that this company got bought, integrated into the models is a great stepping stone, and I look forward to seeing more red teaming agents, because I think that's an area companies really have underinvested, and with our new cyber warfare world, it's going to become more more important that you're doing more active red teaming.”

15:21 Introducing Kasal

Databricks released Kasal, an open-source visual platform for building multi-agent AI workflows without writing orchestration code.
Users can drag and drop agents onto a canvas or describe workflows conversationally, and Kasal automatically generates the underlying CrewAI-based Python code.
Kasal runs natively on Databricks Apps with built-in OBO authentication, SQLite or Lakebase persistence, and MLflow tracing integration, meaning teams can move from visual design to production deployment with minimal additional configuration.
The platform supports both sequential and hierarchical agent modes, in which hierarchical workflows include a manager agent coordinating specialized subagents, useful for tasks such as generating customer-specific sales presentations by combining product and customer data pipelines.
Observability is handled at two layers: business users see execution timelines and workflow status in the Kasal frontend. At the same time, AI engineers can use MLflow tracing to debug LLM calls and agent behavior at a technical level.
Workflows built in Kasal can be exported as Python code for further customization, and reusable plans can be registered in a shared catalog, giving teams a path from low-code prototyping to production-grade pipelines without being locked into the visual interface.

15:48 Justin - “They didn’t mention security review; I just want to call that out.”

17:04 Code Review for Claude Code

Anthropic launched Code Review for Claude Code in research preview for Team and Enterprise plans, using a multi-agent system that dispatches parallel agents to find bugs, filter false positives, and rank issues by severity, delivering results as a single summary comment plus inline annotations on each PR.
Internal metrics show the system increased substantive review comments from 16% to 54% of PRs at Anthropic, with large PRs over 1,000 lines receiving findings 84% of the time, averaging 7.5 issues, and less than 1% of findings marked incorrect by engineers.
Reviews scale dynamically with PR complexity, averaging around 20 minutes per review, and are billed at roughly $15 to $25 per review, making this notably more expensive than the existing open-source Claude Code GitHub Action, which remains available as a lighter-weight alternative.
A practical example from TrueNAS shows the system surfacing a pre-existing type mismatch bug in adjacent code that was silently wiping an encryption key cache on every sync, the kind of latent issue outside the direct changeset that human reviewers typically would not investigate.
The system intentionally does not approve PRs, keeping humans in the decision loop. At the same time, admins on Team and Enterprise plans retain controls over spend and usage, positioning this as a depth-focused supplement to human review rather than a replacement.

18:15 Justin - “The COST of the review is really the biggest thing…definitely something that is a factor in all of these things.”

22:24 Meta acquires Moltbook, the AI agent social network

Meta acquired Moltbook, an AI agent social network built as a Reddit-style platform where every participant is an AI agent run by a human, with no direct human membership.
The founders will join Meta Superintelligence Labs, though deal terms were not disclosed.
Meta specifically called out Moltbook's "always-on directory" approach for connecting agents as a novel development, suggesting the acquisition is focused on agent discovery and coordination infrastructure rather than the social network concept itself.
Moltbook was built on OpenClaw, an LLM coding agent wrapper that enables prompting via WhatsApp and Discord and supports deep local system access through community plugins.
OpenClaw's founder was separately hired by OpenAI in February, indicating both major AI labs are recruiting from the same open-source agent ecosystem.
For developers and businesses, the acquisition signals that agent-to-agent communication protocols and persistent agent directories are becoming areas of serious investment, which could influence how cloud-based agentic workflows are designed going forward.
A practical caveat worth noting: Moltbook lacked security controls to verify that all participants were actually AI agents, meaning some posts were likely written by humans posing as agents. This highlights that agent identity and authentication remain unsolved problems in agentic system design.

22:39 Justin - “We didn't really talk about Moltbook because we didn't want to talk about OpenClaw extensively, but basically, OpenClaw is a terrible way that you can run AI agents in a fully unsafe manner that accesses all of your personal data, and one of the things you could do is add a skill that would basically have it randomly post things onto MoltBook, which could include your bank accounts or security things if you're not careful in your security. And Meta buying this is just sort of the classic; it's a social network, and it could take us down, let's just take it off the market and kill it.”

Cloud Tools

23:58 GitHub Copilot coding agent for Jira is now in public preview

GitHub Copilot coding agent now integrates directly with Jira Cloud, allowing teams to assign Jira issues to Copilot and receive AI-generated draft pull requests in their connected GitHub repositories without leaving their existing workflow.
The agent works asynchronously and autonomously, analyzing issue descriptions and comments for context, implementing code changes, and posting status updates back in Jira, including asking clarifying questions when needed.
This integration targets common, repetitive tasks such as bug fixes and documentation updates and respects existing pull request review and approval rules, so teams do not need to change their governance processes.
Setup requires installing two marketplace apps, one from Atlassian and one from GitHub, and notably requires Jira Cloud with Rovo enabled alongside an active GitHub Copilot coding agent subscription, so there are meaningful prerequisite costs to consider.
The integration supports GitHub Data Residency customers across supported regions, which is a practical consideration for teams with data sovereignty requirements.

24:42 Ryan - “That’s interesting, because Rovo is Atlassian’s AI bot…I’m curious about why that’s required.”

26:09 The Pulse: Cloudflare rewrites Next.js as AI rewrites commercial open source

Cloudflare released vinext, a rewrite of Next.js that replaces Vercel's proprietary Turbopack build system with the standard Vite build tool, allowing Next.js applications to deploy to Cloudflare Workers with a single command and producing client bundles that are reportedly up to 57% smaller.
The project was completed by one engineer in one week, using approximately $1,100 in AI tokens via the OpenCode agent and Claude Opus 4.5, reducing what would traditionally have taken years of engineering to days. However, the result is explicitly experimental and not yet battle-tested at scale.
A key practical concern is that vinext covers 94% of the Next.js API surface, with roughly 67,000 lines of code, compared with Next.js's 194,000, meaning edge cases and security auditing remain outstanding before production use at any meaningful traffic level.
Cloudflare also released a migration agent skill that integrates with tools like Claude Code, Cursor, and Codex, allowing developers to run a single command to migrate an existing Next.js project to vinext, with compatibility checks, dependency installation, and config generation handled automatically.
The broader implication for cloud engineers is that comprehensive open-source test suites now serve as a blueprint for AI-assisted rewrites, which puts pressure on commercial open-source business models that rely on deployment lock-in rather than infrastructure, support, or community as their primary differentiators.

27:31 Ryan - “I feel like it's an awful precedent, right? Like, the whole point of open source is community collaboration, and this is directly in the face of that. Like, why would you release something open source if someone's just going to use an AI agent to create their own fork of it?”

31:58 Active defense: introducing a stateful vulnerability scanner for APIs

Cloudflare launched a beta Web and API Vulnerability Scanner focused initially on BOLA (Broken Object Level Authorization), which is the top threat in the OWASP API Top 10.
Unlike WAF rules that catch syntax-based attacks, BOLA involves valid authenticated requests that violate business logic, making them invisible to traditional defenses.
The scanner is stateful, meaning it builds an API call graph from your OpenAPI spec and chains requests together logically, creating resources as an owner and then attempting to access them as an attacker. This solves a core limitation of legacy DAST tools that evaluate each request in isolation and miss authorization flaws that span multiple API calls.
To handle ambiguous or inconsistent OpenAPI schemas, the scanner uses Cloudflare Workers AI, which runs OpenAI's gpt-oss-120b model with structured outputs to infer data dependencies between endpoints automatically. This removes the manual configuration burden that typically slows DAST tool deployment.
Credential security is handled by the HashiCorp Vault Transit Secret Engine, where credentials are encrypted immediately upon submission and decrypted only by the specific Rust worker executing the test. This is a notable design choice, given that vulnerability scanners, by definition, need access to valid API credentials.
The scanner is now available in open beta for API Shield customers via the API, allowing teams to trigger scans and pull results into CI/CD pipelines or security dashboards.
Cloudflare plans to extend coverage to OWASP Web Top 10 threats like SQLi and XSS in future releases.

33:22 Ryan - “This is super cool. This is the AI-enhanced security scanning I’ve been waiting for.”

AWS

34:43 Amazon plans 'deep dive' internal meeting to address outages

Amazon's retail site experienced four Sev 1 outages in a single week, including a six-hour checkout and account access failure on March 5, prompting an internal deep-dive meeting led by SVP Dave Treadwell to review the availability posture.
An internal document initially cited GenAI-assisted changes as a contributing factor to a trend of incidents since Q3.
Still, that reference was removed before the meeting, and Amazon later clarified that only one incident involved AI and none involved AI-written code.
Amazon is implementing new safeguards that require additional review of GenAI-assisted production changes, with Treadwell acknowledging that best practices for using generative AI in production environments have not yet been fully established.
A separate AWS outage in December was linked to the Kiro AI coding tool. However, Amazon attributed that incident to user error rather than the AI itself, highlighting an ongoing pattern of questions around AI tooling in production deployments.
With Amazon projecting $200 billion in capital expenditures this year while simultaneously reducing its workforce by tens of thousands, the reliability of AI-assisted development workflows becomes a practical concern for any organization adopting similar tooling at scale.

36:36 Ryan - “Hold on to your butts, but we’re going to see a lot more of this.”

39:00 Database Savings Plans now supports Amazon OpenSearch Service and Amazon Neptune Analytics

Database Savings Plans now cover Amazon OpenSearch Service and Amazon Neptune Analytics, offering up to 35% savings with a one-year commitment and no upfront payment required.
The plans apply automatically across serverless and provisioned instances regardless of engine, instance family, size, or region, so customers can switch instance types like moving from m7i.large.search to c8g.2xlarge.search without losing their discount.
This expansion is useful for organizations running search or graph analytics workloads at scale, since Neptune Analytics and OpenSearch can carry substantial hourly costs that benefit from committed-use pricing.
Customers can use the Savings Plans Purchase Analyzer in the AWS Billing and Cost Management Console to model custom scenarios before committing, which reduces the guesswork in sizing a commitment.
Available now in all AWS regions except China.
Pricing details are available here.

39:34 Justin - “Finally. Thank you.”

40:54 AWS Elastic Beanstalk now offers AI-powered environment analysis

AWS Elastic Beanstalk now integrates with Amazon Bedrock to provide AI-powered analysis of environment health issues, automatically collecting events, instance health data, and logs to generate step-by-step troubleshooting recommendations without manual log review.
The feature is triggered from the Elastic Beanstalk console via an AI Analysis button when environment health reaches Warning, Degraded, or Severe status, and is also accessible programmatically through the existing RequestEnvironmentInfo and RetrieveEnvironmentInfo CLI and API operations.
This is a practical addition for teams managing Beanstalk environments who want to reduce mean time to resolution, particularly useful for developers who may not have deep operational expertise in diagnosing platform-level issues.
Availability is limited to regions where both Elastic Beanstalk and Amazon Bedrock are supported, so teams in regions without Bedrock coverage will not have access, and AWS has not published specific pricing details for this feature beyond standard Beanstalk and Bedrock usage costs.
This continues a broader AWS pattern of embedding Bedrock-powered assistance into existing managed services, similar to features seen in other consoles, positioning AI-assisted operations as a standard capability rather than a standalone product.

41:55 Matt - “I will say troubleshooting Beanstalk is a pain in the butt. It just says ‘degraded’ and you’re like ‘why’? And at one point, I had an issue with Beanstalk where it needed a specific CloudWatch put metric in order to do it; it got to the point I opened a support case, and asked AWS why it wasn't working. And they're like, here's this - buried 17 pages into… so I can definitely see it being useful.”

43:13 Introducing Amazon Connect Health, Agentic AI Built for Healthcare

Amazon Connect Health is now generally available, offering five purpose-built AI agents targeting healthcare administrative workflows, including patient verification, appointment scheduling, ambient documentation, patient insights, and medical coding with ICD-10 and CPT code generation.
The service is HIPAA-eligible and integrates natively with Amazon Connect, allowing contact center and point-of-care workflows to be configured in minutes rather than months, which is a notable deployment speed advantage for healthcare IT teams.
The two GA agents (patient verification and ambient documentation) are ready for production use today, while appointment management, patient insights, and medical coding remain in preview, so organizations should plan adoption timelines accordingly.
Point-of-care capabilities like ambient listening and medical coding are accessible via a unified SDK, letting developers embed these features directly into existing EHR systems rather than requiring a full platform migration.
The service is currently limited to US East (N. Virginia) and US West (Oregon), and AWS has not published specific pricing details publicly, so healthcare organizations will need to engage AWS directly to understand cost structures before planning deployments.

43:45 Justin - “This is a great example of a really purpose-built AI that has a specific use case, and I’d almost rather talk to the AI at any time of the day that can book my appointment rather than waiting for the office to open during the day when I’m busy.”

27:58 Amazon Lightsail now offers OpenClaw, a private self-hosted AI assistant

Amazon Lightsail now supports deploying OpenClaw, a self-hosted AI assistant that runs on your own Lightsail instance, giving users a private alternative to cloud-based AI services where data stays within their own infrastructure.
The offering includes several built-in security features out of the box: sandboxed agent sessions, one-click HTTPS without manual TLS setup, device pairing authentication, and automatic configuration snapshots, reducing the typical operational overhead of self-hosting AI tools.
Amazon Bedrock serves as the default model provider, which ties this directly into the broader AWS AI ecosystem, though users can swap models or connect to messaging platforms like Slack, Telegram, WhatsApp, and Discord for different workflows.
Pricing follows standard Lightsail instance pricing rather than a separate AI-specific cost structure, which may make this appealing for small teams or developers who want predictable monthly costs; check the Lightsail pricing page at aws.amazon.com/lightsail/pricing for current instance rates.
The feature is available across 15 AWS Regions, including US East, US West, Frankfurt, London, Tokyo, and Jakarta, and can be accessed directly from the Lightsail console with quick start documentation available for getting up and running quickly.

44:46 Justin - “If you want to try it (OpenClaw) and you can’t get a Mac Mini because everyone is buying them for their OpenClaw implementations, Amazon Lightsail now supports (it).”

47:22 Amazon OpenSearch Ingestion now supports a unified ingestion endpoint for OpenTelemetry data

Amazon OpenSearch Ingestion now accepts logs, metrics, and traces through a single unified pipeline endpoint, eliminating the previous requirement to run three separate pipelines for each OpenTelemetry signal type.
The consolidation reduces operational overhead around access control, monitoring, and lifecycle management, which translates to lower infrastructure costs for teams running observability at scale.
A practical benefit is incremental OpenTelemetry adoption: teams can start with one signal type and add others later without reconfiguring the pipeline, lowering the barrier to getting started.
Signal correlation becomes more straightforward when all three data types flow through a centralized pipeline, giving teams a more complete view of application health in one place.
The unified endpoint is available now in all regions where Amazon OpenSearch Ingestion is supported, and customers can configure it through the AWS Management Console or CLI.
Pricing follows existing OpenSearch Ingestion rates based on Ingestion OCUs, so no new cost model is introduced.

47:54 Ryan - “I mean, at the ingestion layer? I don’t know. Because this is really at the logs- equivalent…”

48:27 Announcing the end-of-support for the AWS Copilot CLI

AWS Copilot CLI reaches end of support on June 12, 2026, meaning it will no longer receive new features or security updates, though it remains available as an open-source project on GitHub.
AWS recommends two primary migration paths: Amazon ECS Express Mode for teams wanting a fast, opinionated path to production with automatic ALB, TLS, and auto-scaling provisioning, and AWS CDK L3 constructs for teams needing fine-grained infrastructure control in familiar programming languages.
ECS Express Mode is the closest functional replacement for Copilot's most common patterns, supporting shared Application Load Balancers across up to 25 services and eliminating the need to learn a custom manifest format.
Teams migrating Worker Services, Backend Services, and Scheduled Jobs have specific CDK construct equivalents available, including QueueProcessingFargateService for SQS-based workloads and ScheduledFargateTask for cron-based jobs.
Since Copilot uses standard CloudFormation under the hood, teams can also simply adopt the existing generated stacks and manage them directly, which represents the lowest-effort migration option for teams not ready to switch tooling.

49:26 Justin - “ I mean, yeah, this is kind of the first step into a fully managed world of ECS, and I remember when it came out we talked about it and was like, well, this is nice, but we really want what became Amazon ECS Express, and so they kind of deprecated themselves in their own way with better solution.”

51:04 Amazon Route 53 Global Resolver is now generally available

Amazon Route 53 Global Resolver is now generally available across 30 AWS Regions, expanding from the 11-region preview shown at re:Invent 2025, with support for both IPv4 and IPv6 DNS query traffic from any location.
The service functions as an internet-reachable anycast DNS resolver, allowing authorized clients in an organization to resolve both public internet domains and private Route 53 hosted zones without being tied to a specific network location.
Security filtering is a core capability, blocking malicious domains, DNS tunneling, Domain Generation Algorithms, and now with GA, Dictionary DGA threats, alongside centralized query logging for visibility across the organization.
This positions Global Resolver as a managed alternative to running your own DNS resolver infrastructure for distributed or remote workforces, reducing operational overhead while centralizing DNS policy enforcement.
New customers get a 30-day free trial to evaluate the service, with pricing details available here.

51:57 Ryan - “I both love and hate this. Having operated a global Anycast resolver, I know how much of a pain it is, and so I wouldn't want to set another one up, and I would gladly pay Amazon to do that. However, I don't know that they're removing the annoying parts. And you add more abstraction, I wonder, troubleshooting failed queries; that's going to be really difficult. And you have a lot more control when you control the network for these things, and so I'm very dubious about this one. But if it just works, then it'll probably be worth it.”

53:29 Automated deployments with GitHub Actions for Amazon ECS Express Mode

AWS published a walkthrough for connecting GitHub Actions to Amazon ECS Express Mode, automating the full pipeline from code commit to container deployment, including image builds, ECR pushes, and service updates without manual coordination.
The integration uses OIDC for authentication instead of stored AWS credentials, meaning GitHub Actions receives temporary credentials that expire after each workflow run, which reduces the risk surface compared to long-lived access keys sitting in repository secrets.
ECS Express Mode handles the infrastructure heavy lifting automatically, provisioning an ALB, target groups, health checks, auto scaling based on CPU, and security groups, so teams get a production-ready stack from a minimal workflow configuration.
Image tagging uses the first 7 characters of the git commit SHA, giving teams precise version traceability and a straightforward path to rollback by referencing a specific immutable image in ECS deployment history.
Costs are usage-based, covering ECS Fargate tasks, ECR storage, and data transfer, with no GitHub Actions charges for public repositories. The estimated setup time is 20 to 30 minutes, making this a relatively low-friction starting point for teams not yet running automated container deployments.

GCP

55:59 Introducing the Google Cloud recommended security checklist

Google Cloud published a recommended security checklist at docs.cloud.google.com/docs/security/gcmvsp, featuring 60 curated controls across six domains, including authentication, data protection, and network security, organized into Basic, Intermediate, and Advanced tiers.
The checklist is directly motivated by data from the 2025 Google Cloud Threat Horizons Report, which found that weak credentials and misconfigurations account for nearly 76% of cloud compromise (that’s a BIG number), making these controls particularly relevant for organizations assessing their baseline posture.
A companion Terraform repository on GitHub provides deployable code for the controls, moving the checklist beyond documentation into something teams can act on immediately and consistently.
The checklist is free to use and aligns with the open Minimum Viable Secure Product framework, meaning organizations can cross-reference it against existing compliance or vendor-neutral security standards they may already be tracking.
Early access customers reported being able to identify and activate critical controls in a single session, which suggests this is a practical tool for teams that need to establish or audit a security baseline without extensive prior GCP expertise.

56:52 Ryan - “So, your mileage may vary. Some of the code that they have in the solution requires really, really high privileges to run in your GCP environment, so it's one of those things where you might not be able to get that far with it unless you're administering the cloud directly. But it's definitely, I think, a lot of really good, useful things that you can then take… anything that allows people to focus on what they care about is pretty great.”

58:06 New agents for the Autonomous Network Operations framework

Google Cloud expanded its Autonomous Network Operations framework with two new components: the Autonomous Data Steward and the Core Network VoLTE Agent, both built on Gemini and targeted at telecom operators managing complex network infrastructure.
The Autonomous Data Steward addresses a core scaling problem by using a zero-copy architecture with Dataplex Universal Catalog to store metadata pointers instead of duplicating datasets, reducing storage costs by up to 70% while enabling real-time data access across previously siloed domains like RAN, Core, and Probes.
The VoLTE Agent builds on the Data Steward foundation to monitor voice quality metrics like Call Setup Success Rates and Mean Opinion Scores, correlate SIP and Diameter signaling data for root cause analysis, and recommend corrective actions like call routing adjustments without requiring manual intervention.
New Zealand telecom provider One NZ is already deploying the VoLTE Agent in production, which gives this announcement a concrete, real-world validation point rather than remaining purely a proof-of-concept offering.
Google and Future Connections have open-sourced the core methodologies behind these agents, allowing telecom operators to build and customize their own agentic workflows; interested parties need to contact their Google Account Team for early access, and pricing is not publicly listed.

58:39 Justin - “This is all a lot of stuff for TelCo’s, but it’s cool, if you’re into geeky TelCo things, check it out.”

59:24 NotebookLM adds Cinematic Video Overviews

NotebookLM's Cinematic Video Overviews moves beyond static narrated slides to generate fluid animations and detailed visuals from user-provided sources, using a combination of Gemini 3 and Veo 3 models working together.
Gemini functions as a creative director in this pipeline, handling narrative structure, visual style selection, format decisions, and self-refinement passes to maintain consistency across the generated video.
This is a consumer-facing AI feature rather than a direct GCP infrastructure offering, but it demonstrates practical multi-model orchestration that GCP customers building their own AI pipelines may find instructive.
Availability is currently limited to English-language users on web and mobile who subscribe to Google AI Ultra, which is priced at $249.99 per month, and is restricted to users 18 and older.
The primary use cases center on education and knowledge synthesis, where users can transform documents, research, or other sources into video summaries, which could be relevant for training content, internal documentation, or learning platforms built on Google's ecosystem.

1:00:21 Justin - “A little bit pricey to replace all the YouTubers, but coming soon.”

1:01:14 Gemini Embedding 2: Our first natively multimodal embedding model

Gemini Embedding 2 is now in Public Preview via the Gemini API and Vertex AI, marking Google's first natively multimodal embedding model built on the Gemini architecture. It maps text, images, video up to 120 seconds, audio, and PDFs into a single unified embedding space across 100-plus languages.
A notable technical detail is that audio is embedded natively without requiring intermediate transcription, which removes a common pipeline step that previously added latency and potential accuracy loss in multimodal workflows.
The model uses Matryoshka Representation Learning to support flexible output dimensions scaling down from a default of 3072, with recommended sizes of 3072, 1536, and 768.
This lets developers trade off retrieval quality against storage and compute costs depending on their use case.
Interleaved multimodal input, such as combining an image and text in a single request, allows the model to capture relationships between media types rather than treating each modality independently.
This is particularly relevant for RAG pipelines, semantic search, and data clustering applications.
Integration is available through LangChain, LlamaIndex, Haystack, Weaviate, QDrant, ChromaDB, and Vertex AI Vector Search, meaning teams can adopt this model without significant changes to existing tooling.
Pricing details are not specified in the announcement, so listeners should check the Vertex AI pricing page directly before planning production workloads.
Interested in checking out that demo? Find it here.

1:02:29 Ryan - “I go back and forth on these multimodal, because I feel like there's so much bloat and we use the wrong model for so many use cases, and I feel like the multimodal is a really good way to do that. So it is interesting, I just haven't seen a use case where I would see a whole lot of benefit of being able to sort of use the multimodal model to get an answer out of an LLM that I wouldn't be able to get using other tools.”

1:03:28 Google shares Gemini updates to Docs, Sheets, Slides and Drive

Google is rolling out beta updates to Gemini across Docs, Sheets, Slides, and Drive that allow the assistant to pull context from a user's own files, emails, calendar, and the web when generating or editing content.
This cross-source grounding is the core technical shift here, moving Gemini from a generic assistant to one that works with personal data.
In Docs, new features include style matching across a document and format matching against a reference file, so Gemini can populate a travel itinerary template using flight and hotel details pulled directly from a user's Gmail inbox. This kind of structured extraction from unstructured personal data is worth noting for enterprise use cases.
Sheets gets a "Fill with Gemini" capability that lets users drag down a column and have Gemini populate cells with real-time web data or summarized content, similar to how a formula works but using natural language and live search results.
This could be useful for research-heavy workflows like competitive analysis or application tracking.
Drive gains an AI Overview feature in search results that summarizes relevant file contents with citations before a user even opens a document, plus a new Ask Gemini panel for querying across documents, emails, and calendar simultaneously.
Availability is limited to Google AI Ultra and Pro subscribers at google.com/intl/en/about/google-ai-plans, with English-only support globally for Docs, Sheets, and Slides, and U.S.-only for Drive. Workspace business customers have a separate path through the Google Workspace blog.

1:04:21 Justin - “So if you’re in the Google workspaces places, you’ve not got basically what Copilot gave you, but better.”

Azure

1:05:29 Azure Databricks Lakebase is Generally Available

Azure Databricks Lakebase is now generally available as a managed, serverless Postgres offering that stores operational data directly in lakehouse storage, eliminating the need for ETL pipelines between transactional systems and analytics workloads.
The service separates compute from storage and scales to zero when idle, with usage-based pricing meaning customers pay only for compute consumed. Specific pricing details are not published in the announcement, so listeners should check the Azure Databricks pricing page for current rates.
Lakebase integrates with Unity Catalog, giving teams a single governance layer covering operational, analytical, and AI workloads with consistent access control, lineage tracking, and auditing across the entire Databricks data estate.
Developers get instant zero-copy branching and point-in-time recovery, allowing teams to test schema changes or debug against production data without affecting live users or requiring duplicate infrastructure.
The service supports standard Postgres tooling, including pgAdmin, DBeaver, pgvector for AI search, and PostGIS for geospatial use cases, and integrates with Microsoft Entra ID and Azure networking, making it a practical option for teams already invested in the Azure ecosystem.
Cool. Glad to have another database available.

1:07:17 Copilot Cowork: A new way of getting work done

Copilot Cowork is a new Microsoft 365 feature that moves Copilot beyond answering questions into actually executing multi-step work tasks, such as rescheduling calendar conflicts, building meeting packets, and coordinating product launch assets across Outlook, Teams, and Excel.
The feature is powered by Work IQ, which pulls signals from across Microsoft 365 apps to give Copilot contextual understanding of your work before taking action, with user-controlled checkpoints to approve, pause, or modify tasks before changes are applied.
A notable technical detail is that Cowork integrates Claude from Anthropic alongside Microsoft's own models, reflecting a multi-model approach where Copilot selects the most appropriate model for a given task rather than relying on a single provider.
Enterprise governance is built in by default, with identity, permissions, and compliance policies applied automatically, and all actions running in a sandboxed cloud environment that keeps tasks progressing safely across devices.
Cowork is currently in Research Preview with limited customers and will expand to the Frontier program in late March 2026, with no public pricing details announced yet, so organizations interested in early access should check the Frontier here.

57:31 Introducing the First Frontier Suite built on Intelligence + Trust

Microsoft announced Microsoft 365 E7: The Frontier Suite, available May 1 at $99 per user, bundling Microsoft 365 E5, Microsoft 365 Copilot, and the new Agent 365 into a single SKU that includes Entra Suite, Defender, Intune, and Purview capabilities.
Agent 365, also generally available May 1 at $15 per user, functions as a control plane for AI agents, giving IT and security teams a single interface to observe, govern, and secure agents across the organization.
Microsoft reports visibility into over 500,000 internal agents as Customer Zero, generating 65,000 daily responses in the past 28 days.
Wave 3 of Microsoft 365 Copilot introduces model diversity by adding Anthropic Claude to mainline chat alongside OpenAI models, and includes a research preview of Copilot Cowork for long-running multi-step tasks built in collaboration with Anthropic.
The concept of Work IQ is central to this announcement, positioning Microsoft 365 Copilot as differentiated from generic model-plus-connector solutions by embedding organizational context about how people work, who they work with, and what content they use.
Adoption metrics cited include paid Copilot seats growing over 160% year over year, daily active usage up ten times, and the number of customers deploying more than 35,000 seats tripling year over year, with 90% of Fortune 500 companies now using Copilot in some capacity.

1:10:54 Ryan - “This is interesting; I know, in evaluations and talking to people from different companies, when they were rolling this out originally - I think it was something like 30 or 50 bucks a user, no one wanted to pay that price. And there was a minimum number of users. So it was a large amount of money.”

Oracle

1:12:29 Introducing OCI’s Cost Anomaly Detection

Oracle launched OCI Cost Anomaly Detection as a no-cost feature that uses machine learning to monitor daily cloud spend across all services and regions, alerting users when costs deviate from forecasted baselines.
This is a welcome addition, given that most cloud providers offer similar capabilities, with AWS and Azure having had comparable tools for some time.
The ML model accounts for daily, weekly, yearly, and holiday seasonality patterns, and users can provide feedback to improve accuracy and reduce false positives.
Custom cost monitors can be scoped by compartment or tags, which gives teams reasonable flexibility for environment or application-level tracking.
Alert thresholds can be configured as absolute dollar amounts or percentage variances, which helps reduce alert noise by focusing notifications on anomalies that actually exceed meaningful cost boundaries. This is a practical design choice that avoids the common problem of alert fatigue in cost monitoring tools.
Default monitors are automatically created at the tenancy, service, and region level, meaning customers get baseline coverage without any configuration, though teams with complex multi-compartment environments will likely need to invest time building custom monitors to get a genuinely useful signal.
The feature is free, which removes the awkward situation of paying for a tool designed to help you avoid overspending, though the real value depends on how accurately the forecasting model performs in practice, something Oracle has not provided specific benchmark data on in this announcement.

1:12:42 Justin - “This has been at every other cloud forever, so…”

1:13:24 Oracle Announces Fiscal Year 2026 Third Quarter Financial Results

Yeah, we know. They report at weird times.
Oracle reported Q3 fiscal 2026 total revenue of $17.2 billion, up 22% year-over-year, with cloud revenue specifically hitting $8.9 billion, a 44% increase, marking the first quarter in over 15 years where both organic revenue and non-GAAP EPS grew at 20% or more simultaneously.
The Remaining Performance Obligations figure of $553 billion, up 325% from last year, is the headline number worth scrutinizing, as Oracle notes most of this growth comes from large-scale AI contracts funded either through customer prepayments for GPU purchases or customer-supplied hardware, which is a notably different model than traditional cloud commitments.
Oracle raised $30 billion in debt and equity financing within days of announcing a $50 billion capital raise program, with the proceeds tied to funding infrastructure for AI training and inferencing capacity, and the company is projecting $50 billion in capital expenditures for fiscal year 2026.
Oracle is openly stating it has restructured product development teams into smaller groups due to AI code generation tools, framing this as a cost reduction and productivity improvement for SaaS development, though the workforce implications of building more software with fewer people deserve attention.
The company raised fiscal year 2027 total revenue guidance to $90 billion, up from prior estimates, while maintaining fiscal year 2026 guidance of $67 billion, suggesting Oracle is betting heavily that AI infrastructure demand will remain supply-constrained and that its cloud positioning will capture a meaningful share of that spending.

1:14:47 Justin - “That’s a pretty good bet, so I get it. I also think Oracle is kind of lucking into the multi-cloud…because people are having to adopt Oracle cloud to get the capacity they need.”

After Show

57:31 Xbox surprise: Microsoft reveals 'Project Helix' as the codename of its next console

Microsoft revealed the codename Project Helix for its next-generation Xbox console, confirmed by new Xbox CEO Asha Sharma, who recently replaced Phil Spencer after his 38-year tenure at Microsoft.
The announcement is notable given persistent industry speculation that Microsoft might exit the console hardware business entirely, suggesting the gaming division intends to continue through at least one more console generation.
Project Helix is described as leading in performance and supporting both Xbox and PC games, continuing the cross-platform compatibility direction Microsoft has pursued in recent years.
A current RAM shortage driven by AI data center demand is affecting the broader hardware supply chain, potentially pushing the console's release beyond the initially rumored late-2027 window, which is a direct example of how AI infrastructure buildout creates ripple effects across other tech sectors.
For cloud professionals, this is worth watching because Xbox hardware increasingly ties into Microsoft's cloud gaming and Game Pass ecosystem, meaning console generation transitions have implications for Azure-based gaming services and infrastructure planning.

Closing

345: Damn It… my excuse is now gone for Disaster Recovery

Thu, 12 Mar 2026 00:43:12 +0000

Welcome to episode 345 of The Cloud Pod, where the forecast is always cloudy! Justin, Ryan, and Matt are in the studio this week and are ready to bring you all the latest in cloud and AI news, including what’s going on between Anthropic, the DOD, and OpenAI, what the war means for Middle East data centers (Spoiler – I hope you have a good Disaster Recovery plan), and Transit Gateway pricing changes that are enough to make a grown man cry. And don’t bother waiting: Matt has completely forgotten almost two years of “bye everybody” and now claims full amnesia as to what his outtro is. Oh well. Let’s get into today’s show.

Titles we almost went with this week

Claude Learned to Use a Computer Better Than Your Dad **OpenAI
Amazon and OpenAI’s $138 Billion AI Bromance
When Two AZs Go Dark the Cloud Gets Crispy
Fifty Billion Reasons AWS Loves OpenAI Now **Anthropic
Azure Still Wins Even When AWS Thinks It Did
Fire, Water, and a Multi-AZ Assumption Goes Up in Smoke
Claude Refuses to Go Full Skynet for the Pentagon
GPT-5.3 Instant Finally Stops Lecturing You
No Killer Robots Without Human Approval Please
Terraform Finally Sees Your Forgotten Cloud Resources
Stage Before You Rage Deploy Azure Firewall
CrowdStrike to Zscaler AWS Wants Your Security Tab
One Hub to Rule Your API Sprawl
Transit Gateway Attachments Just Got Surprisingly Expensive
Azure Container Registry Finally Has Room for Your AI Hoarding
Bedrock Gets a Roommate OpenAI Moves In
Azure Firewall Gets a Safety on the Trigger
Stop Writing Scripts, Just Import the Dang Infrastructure
Audit Your APIs Before March 2026 Bites You
Damn it… my excuse not to DR is gone
I’m Epically Furious about DR

AI Is Going Great – Or How ML Makes Money

03:34 Anthropic acquires Vercept to advance Claude’s computer use capabilities

Anthropic acquired Vercept, a team specializing in AI perception and interaction, to strengthen Claude’s computer use capabilities.
The Vercept founders, including Ross Girshick, bring deep expertise in how AI systems visually interpret and interact with software interfaces.
Claude Sonnet 4.6 shows substantial improvement in computer use benchmarks, jumping from under 15% on the OSWorld evaluation in late 2024 to 72.5% today.
The model is now approaching human-level performance on tasks like navigating spreadsheets and completing multi-tab web forms.
Computer use enables Claude to operate inside live applications the way a human would, handling multi-step workflows across tools that cannot be automated through code alone.
This is relevant for enterprise use cases involving document processing, browser-based workflows, and cross-application task management.
This is Anthropic’s second acquisition in a short period, following the purchase of Bun, which was tied to the Claude Code milestone. The pattern suggests Anthropic is actively acquiring specialized engineering teams rather than just technology assets.
For developers and businesses building agentic workflows on Claude, the improved computer use performance means more reliable automation of complex, real-world software tasks without requiring custom integrations or APIs for every application involved.

05:18 Justin – “It seems like every day I have to update Claude Code because they released a new feature or a new capability.”

12:34 Improving skill-creator: Test, measure, and refine Agent Skills

Anthropic has updated its skill-creator tool for Claude Agent Skills, now available on Claude.ai, Cowork, and as a plugin for Claude Code.
The update brings software development practices like testing, benchmarking, and iterative refinement to skill authoring without requiring users to write code.
The core addition is an eval framework that lets skill authors define test prompts, describe expected outputs, and verify skill behavior across model updates.
A practical example given is the PDF skill fix, where evals isolated a positioning failure on non-fillable forms and guided a targeted fix.
A new benchmark mode tracks eval pass rate, elapsed time, and token usage, and can be integrated into CI systems or local dashboards. Multi-agent parallel eval execution is also included to reduce test time and prevent context bleed between runs.
Comparator agents enable A/B testing between two skill versions or between a skill and no skill, with blind judging to reduce bias in assessing whether a change improves output quality.
Anthropic notes that as base-model capabilities improve, some capability-uptake skills may become unnecessary, and the eval framework is positioned as a step toward skills being defined by natural-language descriptions of desired outcomes rather than detailed implementation instructions.

13:54 Justin – “For things that are actually in pipelines or agentic capabilities where you want things to be specific, this is great.”

14:35 Statement on the comments from Secretary of War Pete Hegseth

Anthropic has publicly refused to allow Claude to be used for mass domestic surveillance of Americans or fully autonomous weapons, citing concerns about current AI reliability and civil liberties.
These two exceptions led to a breakdown in negotiations with the Department of War after months of discussions.
The Department of War is moving to designate Anthropic as a supply chain risk under 10 USC 3252, a designation Anthropic states would be the first time applied to a US adversary. Anthropic has indicated it will challenge any such designation in court.
From a practical standpoint, the legal scope of a supply chain risk designation is narrow. It would only affect the use of Claude on Department of War contract work, leaving commercial API customers, Claude.ai users, and non-DoW contractor use cases completely unaffected.
This situation raises a broader question for cloud and AI vendors about the terms under which they can negotiate acceptable use policies with government customers.
The outcome could set a precedent for how American companies handle government contracts that conflict with their own usage restrictions.
Anthropic notes it has been deployed in US government classified networks since June 2024, making this dispute notable for the AI industry as more frontier model providers pursue federal contracts through programs like FedRAMP and classified cloud environments.

Statement from Dario Amodei on our discussions with the Department of War

Anthropic has publicly refused the Department of War’s requests to remove two specific safeguards from Claude: restrictions on mass domestic surveillance use cases and on fully autonomous weapons systems.
This is notable because Anthropic was already the first frontier AI company to deploy models in US classified networks, National Laboratories, and custom national security configurations.
The Department of War has threatened to label Anthropic a “supply chain risk,” a designation previously reserved for US adversaries, and to invoke the Defense Production Act to force removal of these safeguards. Anthropic notes that these two threats are contradictory since one frames Claude as a security risk while the other frames it as essential to national security.
The autonomous weapons position has a specific technical basis: Anthropic states current frontier AI systems lack sufficient reliability for fully autonomous target selection and engagement, and they offered to collaborate with the Department on R&D to improve reliability, an offer that was not accepted.
For cloud and enterprise listeners, this situation establishes a precedent in which an AI provider publicly declines government contract terms on safety grounds rather than on commercial grounds, with direct implications for how AI vendors structure acceptable use policies in high-stakes government and defense cloud deployments.
Anthropic has indicated it will support a smooth transition to another provider if offboarded, signaling that continuity planning for AI-dependent military operations is now a real operational consideration for defense cloud infrastructure teams.

Our agreement with the Department of War

OpenAI signed a classified AI deployment agreement with the Pentagon using a cloud-only architecture, meaning models run on OpenAI infrastructure rather than on edge devices or government-controlled hardware, which is central to how they enforce their safety constraints.
The agreement includes three stated red lines: no mass domestic surveillance, no directing autonomous weapons systems, and no automated high-stakes decisions without human approval.
OpenAI retains full control of the safety stack and has cleared personnel embedded with the deployment.
The cloud-only deployment model is the key technical differentiator here. By keeping models off edge devices, OpenAI argues it can run and update classifiers independently to verify red lines are not crossed, which would not be possible with on-premise or edge deployments.
The contract language locks in current surveillance and autonomous weapons laws as the standard, meaning even if those laws or DoD policies change in the future, usage must still comply with the standards in place at signing. This is a notable contractual mechanism for maintaining guardrails over time.
OpenAI requested that the same contract terms be made available to all AI labs, including Anthropic, framing this as an attempt to establish a consistent baseline for how the government engages with frontier AI providers on classified work.

21:04 Justin – “The precedent that could be set, potentially, that the government can declare any vendor they want to a supply chain risk feels like it’s gonna violate several amendments to the Constitution…”

New Model Section

21:38 Gemini 3.1 Flash Lite: Our most cost-effective AI model yet

Google launched Gemini 3.1 Flash-Lite in preview, available through the Gemini API in Google AI Studio and Vertex AI, priced at $0.25 per million input tokens and $1.50 per million output tokens, positioning it as a cost-focused option for high-volume workloads.
Compared to 2.5 Flash, the new model delivers 2.5x faster Time to First Answer Token and 45% higher output speed according to Artificial Analysis benchmarks, while scoring 86.9% on GPQA Diamond and 76.8% on MMMU Pro.
The model includes configurable thinking levels, letting developers dial reasoning depth up or down depending on task complexity, which is useful for balancing cost and quality across different workload types.
Practical use cases highlighted include high-volume content moderation, translation, UI generation, and real-time dashboard creation, with early adopters like Latitude, Cartwheel, and Whering already using it in production.
For GCP customers running inference at scale, the combination of low per-token pricing and higher throughput speed makes this a practical option to evaluate against existing model choices in Vertex AI pipelines.

22:09 Google reveals Nano Banana 2 AI image model, coming to Gemini today

Google has released Nano Banana 2, technically named Gemini 3.1 Flash Image, which replaces both the standard and Pro variants of the previous Nano Banana model across Gemini, AI Studio, Vertex AI, and Flow simultaneously.
The model draws on Gemini 3.1 LLM web knowledge to improve object fidelity and infographic accuracy, and Google claims it delivers text rendering quality comparable to the previous Pro tier at Flash-tier speeds.
For developers building multi-character or complex scene workflows, the model supports consistent rendering of up to five characters and up to 14 distinct objects per workflow, with expanded output options ranging from 512px square to 4K widescreen.
The full replacement of prior Nano Banana variants means GCP customers on Vertex AI have no migration choice here, so teams relying on the previous Pro model for production workloads should validate outputs against the new model promptly.
Pricing details were not disclosed in the announcement, so Vertex AI customers should check the Vertex AI pricing page directly for updated image generation costs tied to the Gemini 3.1 Flash Image model.

22:32 Justin – “I’m excited to plug this one into our show cover generator; I’ve been using Nano Banana 1, and if you’ve checked out our show covers lately, you’ve noticed they’ve become fun cartoons based on our show titles.”

22:54 GPT-5.3 Instant: Smoother, more useful everyday conversations

OpenAI released GPT-5.3 Instant as the new default model in ChatGPT, available to all users today and to developers via the API as gpt-5.3-chat-latest, with GPT-5.2 Instant remaining available for paid users until June 3, 2026.
The update targets conversational quality issues that benchmarks typically miss, specifically reducing unnecessary refusals, moralizing preambles, and overly cautious responses that users flagged as frustrating in GPT-5.2 Instant.
Hallucination rates show measurable improvement: 26.8% reduction in high-stakes domains like medicine, law, and finance when using web search, and 19.7% reduction using internal knowledge only, based on OpenAI’s internal evaluations.
Web search integration is notably improved, with the model now balancing retrieved results against its own reasoning rather than defaulting to link lists, producing more synthesized and immediately usable answers.
Developers should note this is a drop-in update to the existing model endpoint, meaning applications using gpt-5.3-chat-latest will automatically get the improved behavior, which could affect any downstream applications that relied on the previous refusal or response patterns.

25:07 Matt – “Testing the models before you roll them out into production. One of the things… how do you actually test these models and prove they’re working? And a lot of customers and questionnaires all require measurable statistics.”

AWS

27:58 Amazon DC Impacted in Operation Epic Fury

Two simultaneous outages hit AWS Middle East regions on March 1-2, with ME-CENTRAL-1 (UAE) suffering physical fire damage to a data center that knocked out two of three availability zones, and ME-SOUTH-1 (Bahrain) experiencing a localized single-AZ power failure.
The UAE incident demonstrated a critical edge case where S3, normally resilient to single-AZ loss, began failing for ingest and egress once a second AZ went down, highlighting that multi-AZ redundancy assumptions break down when two zones are simultaneously unavailable.
Recovery timelines extended beyond 24 hours in both regions due to the need for physical facility repairs, cooling system restoration, and coordination with local authorities, underscoring that some failure modes fall outside software-level remediation.
AWS recommended customers failover to EU regions for ME-CENTRAL-1 workloads, restore from EBS snapshots in unaffected regions, and use the allow-reassociation flag to migrate Elastic IPs to healthy AZs, which are standard DR playbook steps that many customers may not have pre-tested.
This incident is a practical reminder that multi-AZ deployments alone are insufficient for high-availability requirements in smaller regions with fewer AZs, and that cross-region DR plans with tested failover procedures are necessary for critical workloads.
Directly from Status Page: Due to the ongoing conflict in the Middle East, both affected regions have experienced physical impacts to infrastructure as a result of drone strikes. In the UAE, two of our facilities were directly struck, while in Bahrain, a drone strike in close proximity to one of our facilities caused physical impacts to our infrastructure. Finally, even as we work to restore these facilities, the ongoing conflict in the region means that the broader operating environment in the Middle East remains unpredictable. We recommend that customers with workloads running in the Middle East consider taking action now to back up data and potentially migrate your workloads to alternate AWS Regions

29:38 Justin – “This is a real big deal because as our show title said tonight… DR is going to become a real big deal now. If you’re in the business where you need to host data for other customers across the globe, your job just got a lot harder.”

37:26 Amazon invests $50B in OpenAI, deepens AWS partnership with expanded $100B cloud deal

Amazon is making a $50 billion investment in OpenAI as part of a $110 billion funding round that also includes SoftBank and NVIDIA, valuing OpenAI at $730 billion pre-money.
Separately, OpenAI and AWS are expanding their existing cloud agreement by $100 billion over eight years, which analysts estimate could add roughly $17 billion annually to AWS revenue.
A key technical component of the deal is OpenAI committing to consume 2 gigawatts of capacity on Amazon’s Trainium chips, giving AWS a high-profile validation of its in-house AI silicon at a scale that helps justify Amazon’s $200 billion capital expenditure plan for 2026.
AWS and OpenAI will co-create a Stateful Runtime Environment delivered through Amazon Bedrock, allowing enterprise customers to build AI agents that retain context and handle complex multi-step tasks, with AWS serving as the exclusive third-party cloud distribution provider for OpenAI Frontier.
Microsoft retains exclusivity over stateless OpenAI API calls, meaning simple one-and-done AI requests still route through Azure, while Amazon is positioning AWS as the infrastructure layer for stateful, context-aware, and agent-based workloads where the compute intensity and revenue potential are substantially higher.
Amazon also maintains its existing partnership with Anthropic, meaning AWS customers now have access to models from two of the leading AI labs, which broadens the options available through Bedrock without requiring customers to commit to a single model provider.

41:29 Justin – “I am more and more convinced every day that we are in an AI bubble. I do not see how they’re going to generate the revenues required to cover the capital investments that all of these cloud providers are making.”

43:18 AWS Security Hub Extended oﬀers full-stack enterprise security with curated partner solutions

AWS Security Hub Extended is a new plan that bundles curated third-party security tools from partners like CrowdStrike, Okta, Splunk, Zscaler, and Proofpoint directly into the Security Hub console, covering endpoint, identity, email, network, and cloud security in one place.
AWS acts as the seller of record for all partner solutions, meaning customers get a single consolidated bill, pre-negotiated pay-as-you-go pricing, and no long-term commitments, which removes the overhead of managing separate vendor contracts.
All security findings from both AWS native services and partner tools are normalized using the Open Cybersecurity Schema Framework (OCSF) and automatically aggregated in Security Hub, making cross-environment threat correlation more straightforward.
Enterprise Support customers get unified Level 1 support across all participating solutions, which reduces the friction of figuring out which vendor to contact when issues span multiple tools.
The Extended plan is generally available now across all commercial AWS regions where Security Hub is supported, with both consumption-based and flat-rate pricing options available at aws.amazon.com/security-hub/pricing.

44:11 Justin – “Thank you, Amazon. It’s only taken you 10 years to get to this point – because this is cool. Build partnerships with your security vendors, standardize the inputs, and make connections for those things so they all connect together, and if I can do all that through my cloud vendor, who I already have commitments with? I think that’s fantastic.”

Quick Hits

45:41 AWS announces pricing for VPC Encryption Controls

Just pricing BUT CRAZY
VPC Encryption Controls exits free preview on March 1, 2026, introducing a fixed hourly charge per non-empty VPC with the feature enabled in either monitor or enforce mode, with no charge for empty VPCs.
The feature offers two operational modes: monitor mode audits for unencrypted traffic flows, while enforce mode actively blocks resources that would allow unencrypted traffic within or across VPCs in a region.
A notable billing consideration is that enabling encryption support on a Transit Gateway triggers standard VPC Encryption Controls charges for all attached VPCs, regardless of their individual encryption mode setting, even if those VPCs are empty.
For compliance-focused organizations, this feature provides a centralized mechanism to audit and enforce encryption-in-transit across VPC traffic flows, which is a common requirement in regulated industries like finance and healthcare.
Customers should audit how many non-empty VPCs they plan to enable this on before March 1, 2026, and pay close attention to Transit Gateway attachment costs, as those charges can accumulate across a large number of attached VPCs. Detailed regional pricing is available on the VPC pricing page.

46:00 Matt – “Go cry a little bit.”

48:03 Policy in Amazon Bedrock AgentCore is now generally available

Policy in Amazon Bedrock AgentCore is now generally available, giving security and compliance teams a way to define and enforce tool access rules for AI agents without touching agent code, which is a meaningful separation of concerns for enterprise governance.
The natural language to Cedar conversion is a practical feature, letting non-developers author policies that automatically translate to the AWS open-source policy language, lowering the barrier for ops and compliance teams to participate in agent governance.
The AgentCore Gateway acts as an inline policy enforcement point, intercepting agent-tool traffic and evaluating each request before allowing or denying access, which mirrors familiar patterns from API gateway and service mesh architectures.
The feature is available across 13 AWS regions at launch, including major US, European, and Asia Pacific regions, giving organizations with data residency requirements reasonable coverage from day one.
Pricing details are not specified in the announcement, so teams evaluating this for production workloads should review the AgentCore pricing page and documentation at docs.aws.amazon.com/bedrock-agentcore/latest/devguide/policy.html before planning deployments.

49:27 Ryan – “I like the Cedar natural language processing, but I wonder how practical it is to write policies that allow agent-to-agent and tool communication.”

GCP

57:07 Combat API sprawl using Apigee API hub

Apigee API hub now integrates directly with API Gateway to automatically synchronize API definitions, OpenAPI specs, and gateway configurations in near real-time, giving platform teams a single control plane for APIs spread across multiple gateways and platforms.
The new specification boost add-on, currently in public preview, uses AI to scan API specs for gaps like missing usage examples or undefined error codes, then generates an enhanced parallel version labeled specboost-draft without overwriting the original, so teams can compare before adopting.
The core problem being addressed is that incomplete or undocumented APIs cause AI agents to fail at function calling or miss APIs entirely, so centralizing and enriching specs directly improves agent reliability in agentic workflows.
Both features are available now, with API Gateway users seeing onboarding prompts directly in the console.
Pricing details for the spec boost add-on are not specified in the announcement, so teams should check the Add-on management section of the API hub for current cost information.
Organizations running legacy specless proxies with no documentation stand to benefit most immediately, as the spec boost add-on can generate documentation for APIs that currently have none, making them visible to both developers and automated tools.

52:08 Matt – “Any undocumented API is always a problem, whether you’re using it or one team uses something they don’t know, or a client finds that should be a dark API that is public, and that always becomes a problem. So, a way to centralize that and kind of help address API sprawl in general is a great thing and will make people’s lives so much better.”

52:41 Improve chatbot memory using Google Cloud

Google Cloud’s polyglot storage approach for chatbot memory combines Memorystore for Redis, Cloud Bigtable, and BigQuery to handle short, mid, and long-term conversation history, respectively, addressing a common scaling challenge for conversational AI applications.
Memorystore for Redis handles the hot layer with sub-millisecond latency using Redis Lists and RPUSH commands, while Bigtable serves as the durable mid-term store using a user_id#session_id#reverse_timestamp key pattern to enable efficient range scans across millions of simultaneous sessions.
Bigtable’s garbage collection policies allow teams to retain only recent data, such as the last 60 days, in the high-performance tier, while older data flows asynchronously to BigQuery via Pub/Sub and Dataflow for archival and analytics without impacting live application performance.
Cloud Storage handles unstructured multimedia artifacts using a URI pointer strategy with signed URLs, keeping the primary databases lean while maintaining secure, time-limited access to files generated or uploaded during conversations.
This architecture is relevant to any team building production-scale agentic applications on Vertex AI Agent Builder, particularly in industries like customer service, healthcare, and financial services, where maintaining accurate long-term conversation context is a compliance or user experience requirement. Pricing varies across each component based on storage volume and query usage.
Ryan loves this almost as much as he loves The Eagles.

Quick Hits

55:42 Spanner columnar engine in preview

Spanner columnar engine is now in preview, adding columnar storage alongside traditional row-based storage to enable analytical query acceleration of up to 200x on live operational data without impacting transactional workloads.
This addresses the longstanding trade-off between OLTP and analytical performance in a single horizontally scalable system.
The engine uses vectorized execution to process data in batches rather than row-by-row, and Spanner automatically routes large-scan analytical queries to the columnar representation.
A new major compaction API also lets users manually trigger the conversion of existing data into columnar format.
A key use case is reverse ETL from Iceberg lakehouses, where processed analytical data from BigQuery, Databricks, Snowflake, or Oracle Autonomous AI Lakehouse gets loaded into Spanner for sub-second, high-concurrency serving. This targets scenarios like real-time dashboards, AI agent features, and user-facing applications that need low-latency access to precomputed insights.
The BigQuery integration is notably bidirectional, supporting federated queries via external datasets, reverse ETL pushes from BigLake Iceberg tables into Spanner, and live CDC streaming from Spanner back into BigQuery and BigLake Iceberg via Datastream. Oracle GoldenGate 26ai also now supports direct replication into Spanner.
The feature is available in preview and can be enabled on existing Spanner tables via a DDL change, with benchmark queries available on GitHub.
Pricing follows standard Spanner node pricing, with no separate cost structure announced for the columnar engine specifically.

55:52 Justin – “If you don’t know anything about columnar databases, you don’t know how cool that is.”

Azure

57:31 Announcing new public preview capabilities in Azure Monitor pipeline

Azure Monitor pipeline now supports TLS and mutual TLS for TCP-based ingestion endpoints in public preview, allowing teams to encrypt data in transit and enforce mutual authentication without relying on external proxies or custom gateways.
This is particularly relevant for regulated environments and edge deployments where plain TCP ingestion no longer meets security requirements.
The new execution placement configuration gives Kubernetes users direct control over how pipeline instances are scheduled across nodes, addressing practical problems like port exhaustion, multi-tenant isolation, and availability zone distribution.
Notably, if the cluster cannot satisfy placement rules, the pipeline simply will not deploy, making failures predictable rather than silent.
Data transformations allow teams to filter, aggregate, and normalize telemetry before it reaches Azure Monitor, including converting raw syslog or CEF messages into standardized schemas using KQL templates. This addresses the cost and complexity of ingesting high-volume noisy data and cleaning it up after the fact.
All three capabilities are in public preview today and target organizations running Azure Monitor pipeline on on-premises infrastructure, edge locations, and large Kubernetes clusters.
Pricing is not separately detailed for these features, so costs would follow existing Azure Monitor ingestion and data processing rates, which vary by volume.

58:38 Matt – “It’s their ETL pipeline service… that’s kind of why this is a big deal.”

59:43 Microsoft Sovereign Cloud adds governance, productivity, and support for large AI models securely running even when completely disconnected

Microsoft has expanded its Sovereign Cloud offering with three new capabilities targeting organizations that need to operate in fully disconnected environments: Azure Local disconnected operations, Microsoft 365 Local disconnected, and large model support in Foundry Local.
These are aimed at government, defense, and regulated industries where external connectivity may be intentionally restricted or prohibited.
Azure Local disconnected operations allow organizations to run infrastructure with Azure governance and policy controls without any cloud connectivity, meaning management and workload execution stay entirely within customer-operated environments. This is now generally available worldwide, though pricing is not publicly listed and would depend on hardware and licensing configurations.
Microsoft 365 Local disconnected brings Exchange Server, SharePoint Server, and Skype for Business Server into the sovereign private cloud boundary, with Microsoft committing support for these workloads through at least 2035. This extends productivity capabilities to teams operating in air-gapped or isolated environments without requiring a cloud connection.
Foundry Local now supports large multimodal AI models running on-premises using NVIDIA GPU infrastructure, enabling local inferencing entirely within customer-controlled data boundaries. This moves beyond the small model support Foundry Local previously offered and is currently available to qualified customers rather than broadly.
The overall architecture is designed to span connected, hybrid, and fully disconnected modes under a consistent governance model, which reduces the operational complexity of managing separate toolsets for different connectivity scenarios.
Organizations considering this stack should evaluate hardware requirements carefully, given the GPU dependencies for AI inferencing workloads.

57:25 Best Practice: Using Self-Signed Certificates with Java on Azure Functions

Winner of the dumbest feature of the week:

Java developers on Azure Functions Linux who connect to services secured by self-signed certificates frequently encounter SSL handshake errors because the JVM only trusts well-known Certificate Authorities by default. The recommended fix is creating a custom truststore in the persistent /home directory and pointing the JVM to it via JAVA_OPTS application settings.
The core reason to use /home for the truststore rather than system JVM directories is that the Linux Functions file system is ephemeral, meaning any changes outside /home are wiped on restart, scaling, or platform updates. Storing the keystore at a path like /home/site/wwwroot/my-truststore.jks ensures it survives those events.
One practical deployment gotcha worth noting is that ZipDeploy or Run From Package configurations can overwrite /home/site/wwwroot contents during code deployments, so storing the .jks file in a separate directory like /home/my-certs/ is a safer long-term choice.
Azure Functions Linux behaves differently from Azure App Service Linux in a notable way: App Service startup scripts often auto-import platform-managed certificates into the JVM keystore, but Functions does not, meaning OS-level tools like curl may succeed while Java code still throws handshake errors.
For teams that prefer not to manage server-side keystore files, two code-based alternatives exist: loading an Azure-managed certificate from /var/ssl/certs via custom SSLContext code, or bundling a locally built JKS file inside the application JAR. Both require application code changes, which adds maintenance overhead compared to the JAVA_OPTS approach.

1:03:46 Justin – “This is just a way for you to troubleshoot certificates even worse than you were troubleshooting it before.”

Quick Hits

1:05:19 Announcing general availability of Azure Intel® TDX confidential VMs

Azure has moved its Intel TDX confidential VMs to general availability, using 5th Gen Intel Xeon processors to provide hardware-enforced isolation that protects data while in use, which addresses a longstanding barrier for organizations running sensitive workloads in the cloud. Notably, existing applications can be deployed without any code changes.
The new VM series (DCesv6, DCedsv6, ECesv6, ECedsv6) introduces NVMe local SSD support as a first for Azure confidential VMs, delivering roughly 5x more throughput and about 16% lower latency compared to the previous SCSI generation, with IO latency reduced by approximately 27 microseconds.
These VMs are the first in Azure confidential compute to use the open-source OpenHCL paravisor, which increases transparency and allows customers to cryptographically verify workload integrity rather than simply trusting the cloud operator.
The open-source component is available at github.com/microsoft/openvmm.
Intel AMX acceleration is built in, making these VMs suited for confidential AI workloads such as protecting model weights and running cross-organization AI pipelines without exposing underlying data.
Azure Boost support adds up to 205k IOPS, 4 GB/s remote storage throughput, and 40 Gbps network bandwidth.
General availability is currently limited to the West US and West US 3 regions, with support for Windows Server 2025 and Ubuntu 22.04 and 24.04. Pricing is not specified in the announcement, and customers can request preview access in additional regions at aka.ms/acc/v6preview.

1:10:10 Generally Available: Draft & Deploy on Azure Firewall

Azure Firewall Policy now supports a two-phase Draft and Deploy workflow, meaning teams can stage policy changes before committing them, which reduces the risk of unintended disruptions during updates.
Previously, any policy change triggered a full firewall deployment, which could cause delays and service interruptions.
This feature separates the authoring phase from the deployment phase, giving teams more control over when changes go live.
The feature is particularly useful for organizations with strict change management processes, as it allows multiple edits to be batched and reviewed before a single deployment is executed, rather than deploying each change individually.
This is now generally available, so production workloads can rely on it. Azure Firewall Policy pricing remains consumption-based, and customers should check the Azure Firewall pricing page at azure.microsoft.com for current rates, as costs vary by policy tier and region.
Teams managing complex or high-traffic environments will benefit most, since reducing the frequency of full deployments directly translates to fewer maintenance windows and more predictable firewall behavior.

1:10:27 Azure Container Registry Premium SKU Now Supports 100 TiB Storage

Azure Container Registry Premium SKU now supports up to 100 TiB of storage, a 2.5x increase from the previous 40 TiB cap, with no configuration changes required for existing registries to benefit automatically.
The increase directly addresses a real operational pain point where enterprises were splitting workloads across multiple registries just to stay under limits, adding complexity to access control and networking that had nothing to do with actual business requirements.
AI and ML workloads are a clear driver here, as teams storing large model artifacts, training outputs, and inference containers were consuming registry capacity faster than anticipated, alongside normal container workload growth.
Microsoft also improved geo-replication data sync speeds for new replicas and added a storage consumption view in the Azure Portal Monitoring tab, two improvements that had been customer requests for some time.
The 100 TiB limit is exclusive to Premium SKU, so teams on Basic or Standard tiers will need to upgrade to access it, though Premium also includes geo-replication, private endpoints, and enhanced throughput.
Pricing details for Premium SKU storage are available at the Azure Container Registry pricing page.

1:10:47 Ryan – “So now instead of two windows container images you can store FOUR.”

1:13:37 New Azure API management service limits

Azure API Management is rolling out updated resource limits starting March 2026, aligning classic tier limits with v2 tier limits across entities like API operations, tags, products, and subscriptions. This affects all service tiers in a phased rollout over several months.
Existing classic tier customers whose usage exceeds the new limits will be grandfathered in, with their limits set 10% above observed usage at the time the new limits take effect.
New services and those under the new thresholds will be subject to the updated limits immediately.
Limit increase requests will only be considered for Standard, Standard v2, Premium, and Premium v2 tiers, with Premium customers receiving priority. Requests are evaluated case by case and are not guaranteed, so teams relying on high resource counts should audit their usage now.
Before requesting a limit increase, Microsoft recommends reviewing the Manage Resources Within Limits documentation at learn.microsoft.com, as some increases can introduce latency or affect service capacity.
This is a practical reminder that limits exist to protect shared infrastructure performance, not just to restrict usage.
Pricing for API Management tiers varies, with the Developer tier starting around $0 for testing and the Premium tier running substantially higher for production workloads. Customers on lower tiers, like Consumption or Developer, cannot request limit increases, so production workload planning should account for tier selection early.

Closing

344: Amazon’s Coding Bot Bites the Hand That Runs It

Wed, 04 Mar 2026 20:27:09 +0000

Welcome to episode 344 of The Cloud Pod, where the forecast is always cloudy! Justin is out of the office at a World of Warcraft Tournament (not really), and Ryan is pursuing his lifelong dream of becoming a roadie for The Eagles (maybe?), so it’s Jonathan and Matt holding down the fort this week, and they’ve got a ton of cloud news for you! From security to AI assistants, we’ve got all the news you need. Let’s get started!

Titles we almost went with this week

Zero Bus, All Gas, No Kafka Brakes
AI Coding Bot Bites the Hand That Runs It
When Your Robot Developer Goes Rogue on AWS
Kubernetes VPA Finally Stops Evicting Your Database Pods
Google Trains 100 Million People, Still No One Reads the Docs
MCP Walks Into a Bar Not Enterprise Ready Yet
No More Pod Evictions Kubernetes 1.35 Scales In Place
No Keys No Drama Just IAM and Cloud SQL
One Agent to Rule Them All in Kubernetes
IAM Tired of Writing Policies Manually
When Your AI Coding Tool Has Delete Permissions
One Dashboard to Rule All Your GPU Clusters
Serverless Reservations Prove Nothing Is Truly Free Range
Kiro Takes the Wheel on AWS IAM Policies
Stop Blaming Backups for Your Bad Architecture
AI Agent Goes Rogue, Takes AWS Down With It
Everything is Bigger in Texas Except the Water Usage
OpenAI launches the college basketball of Inference. Pro service – low cost

General News

1:05 Code Mode: give agents an entire API in 1,000 tokens

Cloudflare‘s Code Mode MCP server reduces token consumption by 99.9% compared to a traditional MCP implementation, exposing the entire Cloudflare API (over 2,500 endpoints) through just two tools, search() and execute(), using roughly 1,000 tokens versus 1.17 million for a conventional approach.
The architecture works by having the AI agent write JavaScript code against a typed OpenAPI spec representation, rather than loading tool definitions into context, with code executing inside a sandboxed V8 isolate (Dynamic Worker) that restricts file system access, environment variables, and external fetches by default.
This approach addresses a fundamental constraint in agentic AI systems: adding more tools to give agents broader capabilities directly competes with the available context space for the task at hand.

01:41 Jonathan- “It’s good. I’m not sure I could imagine 2 ½ thousand MCP tool definitions in a context window and still actually use it for anything.”

AI Is Going Great – Or How ML Makes Money

03:58 OpenClaw creator Peter Steinberger joins OpenAI

Peter Steinberger, creator of viral AI assistant OpenClaw (formerly Clawdbot/Moltbot), has joined OpenAI to lead development of next-generation personal agents.
OpenClaw gained attention for its ability to perform real-world tasks like calendar management, flight booking, and autonomous social network participation.
OpenAI will maintain OpenClaw as an open source project through a foundation structure, allowing the community to continue development while Steinberger focuses on building similar capabilities into OpenAI’s product suite.
This acquisition-to-open-source model differs from typical tech company acquisitions, where projects are absorbed or shut down.
The move signals OpenAI’s strategic focus on agentic AI systems that can execute multi-step tasks autonomously rather than just responding to prompts. Steinberger’s experience building practical automation workflows could accelerate OpenAI’s development of agent capabilities that compete with offerings from Anthropic, Google, and Microsoft.
For developers, this represents a shift in how personal AI assistants may be deployed, moving from standalone applications to integrated agent frameworks within larger platforms.
The open source continuation of OpenClaw provides a reference implementation for building task-oriented AI systems.

04:19 Matt – “This is kind of where I see Anthriopic Cowork slowly going to, being your personal assistant, and having this be your ability to manage your real-world tasks. It’s great, and if they can build that into OpenAI, then it becomes a lot more of a personal assistant than just a general tool that you’re using.”

09:11 Making frontier cybersecurity capabilities available to defenders

Anthropic launched Claude Code Security in a limited research preview for Enterprise and Team customers, with free expedited access for open-source maintainers.
Unlike traditional static analysis tools that match known vulnerability patterns, it reasons through code contextually, the way a human security researcher would, catching logic flaws and access control issues that rule-based tools miss.
The tool uses a multi-stage verification process where Claude re-examines its own findings to filter false positives, assigns severity ratings, and provides confidence scores.
Critically, no patches are applied without human approval, keeping developers in the decision loop.
For cloud and enterprise teams, this integrates directly into Claude Code on the web, meaning security review happens within existing developer workflows rather than requiring separate tooling. The dashboard surfaces validated findings alongside suggested patches for team review.
Want to request access? You can do that here.

09:35 Preview, review, and merge with Claude Code

Claude Code on desktop now closes the full development loop by adding live app preview, inline code review, and GitHub PR monitoring in a single interface, reducing the need to switch between tools during development.
The new auto-fix and auto-merge features allow Claude to monitor PRs in the background, automatically attempt to fix CI failures, and merge PRs once all checks pass, letting developers move on to new tasks without manually tracking PR status.
The inline code review feature via the Review Code button lets Claude examine local diffs and leave comments directly in the desktop diff view before any code leaves the machine, functioning as an automated pre-push review step.
Session portability is now built in, allowing developers to start a session in the CLI using /desktop to bring context into the desktop app, or push local sessions to the web or Claude mobile app using the Continue with Claude Code on the web button.
These updates are available now to all users and represent a shift toward agentic, background-running development workflows where the AI continues working on tasks like CI remediation while the developer focuses elsewhere.

11:20 Jonathan – “It’s a very human way of going back and self-reflecting on the work that you’ve just done.”

18:08 Announcing General Availability of Zerobus Ingest, part of Lakeflow Connect

Databricks has announced General Availability of Zerobus Ingest, part of Lakeflow Connect, a serverless streaming service that pushes data directly into Delta tables without intermediate message buses like Kafka.
It supports thousands of concurrent connections and achieves over 10GB per second of aggregate throughput with data landing in under 5 seconds.
The core architectural difference is a single-sink design versus Kafka’s multi-sink approach, reducing a traditional five-system streaming stack down to two components.
This eliminates dedicated compute and storage for the message bus itself, along with the engineering overhead to manage it, at a fraction of the cost per gigabyte compared to self-managed Kafka.
Developers can integrate via gRPC, REST APIs, or language-specific SDKs, and every write is automatically governed through Unity Catalog for lineage tracking and access control.
This means streaming data gets the same governance treatment as the rest of the lakehouse from the moment it arrives.
Real-world deployments include Toyota using it to detect factory overheating conditions in minutes rather than hours, and Joby Aviation reducing aircraft telemetry resolution latency from days to minutes.
Both cases highlight manufacturing and IoT as strong use cases where low-latency ingestion has a direct operational impact.
Zerobus Ingest is now GA on AWS and Azure, with Google Cloud support coming soon, priced under the Lakeflow Jobs Serverless SKU with a 6-month promotional pricing period currently active.

20:05 Jonathan – “I’m not a fan of Kafka in general, but I am a fan of doing things at massive scale, so it’s kind of cool.”

07:27 OpenAI prepares new ChatGPT Pro Lite tier at $100 monthly

OpenAI appears to be preparing a ChatGPT Pro Lite tier at $100 per month, slotting between the existing Plus plan at $20 and the full Pro plan at $200, based on findings from engineer Tibor Blaho, who has a consistent track record of uncovering unreleased features.
The new tier would address a notable pricing gap for users who regularly hit Plus rate limits but cannot justify the full Pro cost, with freelancers, researchers, and developers as the likely target audience.
The plan may be structured around compute-heavy use cases, including Codex and persistent agentic workloads, where background-running agents carry substantially higher infrastructure costs than standard chat interactions.
OpenAI recently hired Peter Steinberger, creator of the open-source agent framework OpenClaw, and has signaled a multi-agent direction for ChatGPT, suggesting the Pro Lite tier could serve as an entry point for always-on agentic capabilities rather than just increased chat limits.
No release date or confirmed feature set exists yet, but the addition of a mid-tier option would create competitive pressure on Google, which currently lacks an equivalent individual plan at this price point.

21:56 Matt – “I just think they needed a different naming convention.”

Cloud Tools

23:11 HCP Packer adds SBOM vulnerability scanning

HCP Packer now includes SBOM vulnerability scanning in public beta, allowing platform teams to scan software bills of materials against MITRE’s CVE database and classify findings by severity directly within the artifact registry.
The feature builds on last year’s SBOM storage capabilities, which are now generally available, meaning teams can generate, store, and now actively scan SBOMs for known vulnerabilities in a single workflow.
This addresses a supply chain security gap by surfacing vulnerability data at the image level, covering AMIs, Docker containers, and virtual machines before they reach production environments.
Teams can see which specific package versions are affected and when vulnerabilities were detected, giving them the information needed to prioritize remediation without leaving the HCP Packer interface.
The feature is available in public beta at no cost through the free HCP Packer tier, making it accessible for teams looking to add CVE scanning to their image management process without additional tooling.

24:15 Jonathan – “It’s only as current as the time you built it though…”

25:43 Why Kubernetes 1.35 is a game-changer for stateful workload scaling

Kubernetes 1.35 brings two notable autoscaling milestones: In-Place Pod Resize graduating to GA and Vertical Pod Autoscaler’s InPlaceOrRecreate update mode reaching beta, allowing VPA to adjust CPU and memory on running pods without evicting them.
The practical benefit for stateful workloads is substantial.
Previously, VPA had to evict and recreate pods to apply new resource requests, which caused disruption for databases, caches, and other restart-sensitive applications. In-place resizing preserves the pod UID, container ID, and restart count throughout the adjustment.
VPA operates in three stages worth understanding: a recommendation-only mode for passive observation, an InPlaceOrRecreate mode that attempts live resizing first and falls back to eviction only when node resources are insufficient, and configurable policies using minAllowed and maxAllowed to bound what VPA can actually set.
VPA controllers are not bundled with Kubernetes itself.
Engineers need to clone the kubernetes/autoscaler repository and run the vpa-up.sh script to deploy the Recommender, Updater, and Admission Controller components alongside the mutating

26:09 Jonathan – “I think the practical benefit for stable workloads are fairly substantial, if you’re one of those crazy people who like to run databases or SQL server on Kubernetes (like Cody) because previously those pods would be evicted and new resources requested, which would obviously cause disruption, stale caches, and other issues.”

AWS

31:20 Amazon service was taken down by AI coding bot

Listener note: paywall article
Amazon’s Kiro AI coding tool caused a 13-hour outage of an AWS cost exploration service in December after engineers granted it broad permissions, and it autonomously decided to delete and recreate the environment rather than patch it.
A second outage involved Amazon Q Developer, though Amazon says neither event impacted core customer-facing AWS services.
Amazon’s official position is that both incidents were user error stemming from improper access controls, not failures of the AI tools themselves.
Kiro is designed to request authorization before acting, but the engineer involved had been granted broader permissions than intended, bypassing that safeguard.
The incidents highlight a practical risk with agentic AI tools in production environments: when an AI agent is given the same permissions as a human operator without requiring peer review, it can take destructive autonomous actions that a second set of eyes might have caught. AWS has since added mandatory peer review and staff training as corrective measures.
AWS is pushing for 80 percent of its developers to use AI coding tools at least once weekly, which means these tools are being adopted at scale internally before the risk patterns are fully understood.
Listeners running their own AI agents in production should treat permission scoping and human-in-the-loop approval gates as non-optional controls, not optional defaults.
Kiro launched in July 2025 and is positioned as a specification-driven coding assistant meant to go beyond simple vibe coding.
The December incident was limited to mainland China, and the second incident had no customer-facing impact, but the pattern of two production disruptions in a few months is worth tracking as agentic tools become more common in enterprise workflows.

33:24 Matt – “…if you’re letting the AI tool start to do things inside of production environments, that’s where you need to watch it, and you need to probably have it be a little bit more specific, so the human needs to kind of be watching what’s going on and peer reviewing it.”

35:49 Amazon pushes back on Financial Times report blaming AI coding tools for AWS outages

Amazon issued a public rebuttal to a Financial Times report claiming its Kiro AI coding tool caused multiple AWS outages, acknowledging one limited incident in December but attributing it to a misconfigured access control role rather than a flaw in the AI tool itself.
The confirmed disruption affected only AWS Cost Explorer in a single China region for roughly 13 hours, with no customer inquiries received, and did not touch core services like compute, storage, or databases.
Amazon’s core defense is that the issue was user error, not AI error, noting that a misconfigured role could result from any developer tool or manual action, AI-powered or not.
In response to the incident, AWS has added safeguards, including mandatory peer review for production access, which is a practical governance consideration for any organization deploying agentic AI tools in production environments.
The broader takeaway for AWS customers is that agentic AI tools capable of autonomous actions, like deleting and recreating environments, require clear human oversight policies and access control guardrails before being used in production systems.

37:00 AWS IAM Policy Autopilot is now available as a Kiro Power

AWS IAM Policy Autopilot, an open source static code analysis tool launched at re:Invent 2025, is now available as a Kiro Power, allowing developers to generate baseline IAM policies directly within the Kiro IDE without manual policy writing.
The integration uses a one-click installation model that removes the need for manual MCP server configuration, streamlining how developers access policy generation tools during AI-assisted development workflows.
Key use cases include rapid prototyping of AWS applications, baseline policy creation for new projects, and keeping developers in their coding environment rather than switching to the IAM console or documentation.
This fits into the broader trend of embedding security and permissions tooling earlier in the development cycle, helping teams start with least-privilege policies that can be refined over time rather than retrofitting permissions after the fact.
The tool is open source and available on GitHub at github.com/awslabs/iam-policy-autopilot, with no additional cost mentioned beyond standard Kiro and AWS service usage, making it accessible for teams already using the Kiro IDE.

38:18 Jonathan – “I’m really on the fence about this. Because on one hand, I know the pain, especially with things like deployment policies…and just trying to figure out every permission that has to be added so that Terraform can just do a deployment – it becomes very complicated. At the same time, if you have a machine that looks at your code and says ‘this is the policy you need for it,’ I don’t think that’s any security at all unless there’s another check at the end.”

-Honorable Mentions-

41:52 Amazon Redshift Serverless introduces 3-year Serverless Reservations

Amazon Redshift Serverless now offers 3-year Serverless Reservations, providing up to 45% cost savings compared to standard on-demand RPU pricing while maintaining the serverless model’s flexibility.
The reservations are managed at the AWS payer account level and can be shared across multiple AWS accounts, making this useful for organizations running Redshift Serverless workloads across linked accounts.
-stop
Billing runs 24/7 on an hourly basis, metered per second, meaning you pay for reserved RPUs continuously, regardless of actual usage, so this option makes most sense for consistently active workloads rather than sporadic ones.
Any RPU consumption beyond the reserved amount falls back to standard on-demand rates, so customers need to size their reservations carefully to avoid negating the savings.
Reservations can be purchased through the Redshift console or via the create-reservation API and are available in all regions where Redshift Serverless is currently supported.
More information is available on the Amazon Redshift Management Guide, which you can find here.

42:03 Amazon Says It Will Spend $12 billion On Louisiana Data Centers

Amazon has announced a $12 billion investment in data center campuses in Louisiana, aimed at expanding infrastructure capacity for AI and cloud computing workloads.
A notable aspect of the deal is Amazon’s commitment to covering its own power costs directly, working with regional utility Southwestern Electric Power Company to avoid passing energy expenses onto local consumers.
Amazon is pairing the infrastructure investment with solar energy projects in Louisiana, which aligns with its broader sustainability commitments and addresses concerns about grid strain from large-scale data center operations.
This announcement reflects a broader industry trend where cloud providers are proactively addressing public and political concerns about data center energy consumption, following a similar commitment from Microsoft last month regarding higher electricity rate payments.
For AWS customers, this expansion signals continued investment in US-based infrastructure capacity, which could translate to improved regional availability and lower latency for workloads in the southern United States over time.

42:18 Announcing AWS Elemental Inference

AWS Elemental Inference is a fully managed AI service that automatically generates vertical video crops and highlight clips from live and on-demand broadcasts in parallel with encoding, targeting broadcasters who need to distribute content across TikTok, Instagram Reels, YouTube Shorts, and similar platforms without dedicated production staff.
The service uses an agentic AI approach with no prompts or human-in-the-loop intervention required, handling both vertical video cropping and metadata-based highlight detection automatically, which reduces the manual workflow overhead typically associated with multi-platform content distribution.
Beta testing with large media companies showed 34% or more cost savings on AI-powered live video workflows compared to using multiple point solutions, making this a notable consolidation option for media organizations already using AWS Elemental encoding services.
A practical sports broadcasting use case is highlighted where highlight clips can be identified and distributed to social platforms during live games rather than hours after the fact, addressing a real operational gap in live content workflows.
The service is available in four regions at launch: US East N. Virginia, US West Oregon, Asia Pacific Mumbai, and Europe Ireland.
Pricing details are not specified in the announcement, so listeners should check the AWS Elemental Inference documentation at docs.aws.amazon.com/elemental-inference for current pricing information.

GCP

57:25 Managed MCP servers for Google Cloud databases

Google Cloud expanded its managed MCP server support to cover AlloyDB, Spanner, Cloud SQL, Bigtable, and Firestore, allowing AI agents to interact with these databases through natural language without requiring infrastructure deployment or complex configuration.
The security model relies entirely on IAM for authentication rather than shared keys, and all agent actions are logged in Cloud Audit Logs, which addresses a practical concern for teams worried about giving AI agents access to production databases.
A new Developer Knowledge MCP server connects IDEs directly to Google’s official documentation, letting agents reference best practices in real time during tasks like database migrations or app development troubleshooting.
Because these servers follow the open MCP standard, they work with third-party clients like Anthropic’s Claude in addition to Gemini, which broadens the practical appeal beyond teams already committed to Google’s AI tooling.
Google has signaled plans to extend managed MCP support to Looker, Memorystore, Pub/Sub, Kafka, and migration services in the coming months, suggesting this is an ongoing buildout rather than a one-time release.
Pricing is not separately listed for MCP access and likely falls under existing database service costs.

44:12 Matt – “Anything that makes databases easier, I’m all for.”

45:12 Gemini 3.1 Pro: Announcing our latest Gemini AI model

Gemini 3.1 Pro is now available in preview for developers via Google AI Studio, Gemini CLI, Vertex AI, and Android Studio, with enterprise access through Vertex AI and Gemini Enterprise. Pricing details have not been publicly announced for the preview period.
The model scores 77.1% on the ARC-AGI-2 benchmark, which tests reasoning on novel logic patterns, representing more than double the score of the previous Gemini 3 Pro model.
This positions it as a stronger option for complex problem-solving tasks compared to its predecessor.
Practical use cases highlighted include generating animated SVGs from text prompts, building live data dashboards by connecting to public APIs, and prototyping interactive 3D interfaces with hand-tracking and generative audio. These examples suggest the model is particularly suited for developers working on data visualization and creative coding projects.
Consumer access is rolling out through the Gemini app and NotebookLM, but the 3.1 Pro tier is restricted to Google AI Pro and Ultra plan subscribers. This tiered access model means free-tier users will not have access during the preview phase.
Google notes the model is still in preview while they validate performance for agentic workflows before a general availability release. GCP customers evaluating it for production use should factor in that capabilities and pricing may shift before the full release.

46:23 Matt – “It’s just amazing to me how fast these models are improving. This one is saying it scored a 77%, where models a year ago where 40 and 50%. Seeing how fast everything is moving is insane.”

47:36 Understanding the Firefly clock synchronization protocol

Google’s Firefly is a software-based clock synchronization protocol that achieves sub-10-nanosecond NIC-to-NIC synchronization across data center hardware, without requiring specialized or expensive dedicated timing equipment.
The protocol uses a distributed consensus algorithm built on random graphs rather than a traditional hierarchical time server model, which improves convergence speed, scalability, and resilience to network path asymmetries.
Firefly decouples internal synchronization from external UTC synchronization, meaning external time server jitter does not degrade the precision of clock alignment within the data center fabric itself.
Financial services workloads are a primary beneficiary, as regulatory requirements mandate sub-100 microsecond external UTC synchronization and sub-10 nanosecond internal synchronization, both of which Firefly meets on standard cloud infrastructure.
Beyond finance, the protocol has practical implications for distributed database consistency, ML workload coordination, and fine-grained network telemetry, potentially enabling workloads that previously required on-premises dedicated hardware to run on cloud infrastructure instead. No specific pricing details were provided in the announcement.

48:52 Jonathan – “The fact that you need to guarantee sub-hundred microsynchronization for financial systems is crazy.”

-Honorable Mentions-

50:32 America-India Connect infrastructure connects four continents

Google is investing $15 billion in AI infrastructure in India and launching America-India Connect, a multi-continent subsea cable initiative that establishes new fiber-optic routes connecting the United States, India, Singapore, South Africa, and Australia.
The project creates Visakhapatnam as a new international subsea gateway on India’s east coast, adding network diversity beyond existing Mumbai and Chennai landing points.
The infrastructure combines multiple subsea cable systems, including Equiano, Nuvem, Bosun, Tabua, TalayLink, and Honomoana, to create redundant high-capacity routes between American coasts and India through both African and Pacific paths.
This approach provides network resilience for over 1 billion people in India while improving connectivity across the Southern Hemisphere.
Google Cloud is serving as the primary cloud infrastructure provider for India’s iGOT Karmayogi platform, which delivers training to over 20 million public servants across 800+ districts.
The platform will use AI to digitize legacy training content and enable access in 18+ Indian languages, supporting the government’s Mission Karmayogi initiative for civil service modernization.
The announcement positions these subsea cables as critical infrastructure to prevent an AI divide, with documented evidence that subsea cable connectivity improves internet affordability and reliability while driving productivity and economic growth.
The initiative builds on Google’s existing infrastructure investments in Africa, Australia, and the Pacific region.
Added this one just for you, Justin.

52:20 Wilbarger County data center

Google is building a new data center in Wilbarger County, Texas, expanding its existing infrastructure footprint in the state.
This is primarily an infrastructure capacity announcement rather than a new GCP service or feature.
The facility will use air-cooling technology instead of traditional water cooling, limiting water consumption to only essential campus operations like kitchens. This is a notable operational choice given ongoing concerns about data center water usage in drought-prone regions.
Google has contracted to add more than 7,800 MW of net-new energy generation and capacity to the Texas electricity grid, with the Wilbarger facility co-located alongside new clean power developed in partnership with AES.
Google announced a $30 million Energy Impact Fund in November to support energy affordability, school weatherization, and energy workforce development across Texas. Details on the fund are available here.
For GCP customers, additional Texas-based infrastructure generally signals potential improvements in latency and redundancy for workloads serving the south-central US region, though Google has not announced specific new GCP regions or zones tied to this facility.

52:55 Use Lyria 3 to create music tracks in the Gemini app

Google DeepMind’s Lyria 3 model is now available in beta within the Gemini app, letting users generate 30-second music tracks with lyrics, custom cover art, and style controls from text prompts or uploaded photos and videos.
This is available to users 18 and older in 8 languages, with higher usage limits for Google AI Plus, Pro, and Ultra subscribers.
Lyria 3 improves on previous versions by auto-generating lyrics from prompts, offering more control over style, vocals, and tempo, and producing more musically complex outputs without requiring users to provide their own creative assets.
All generated tracks are embedded with SynthID, Google DeepMind’s imperceptible watermark, and the Gemini app now extends its AI content verification to audio files, allowing users to upload audio and check whether it was generated by Google AI.
The feature is also rolling out to YouTube creators via Dream Track for Shorts soundtracks, connecting Lyria 3 to a broader content creation workflow beyond the Gemini app itself.
On the responsible AI side, Google states Lyria 3 was trained with copyright and partner agreements in mind, artist-specific prompts are treated as stylistic inspiration rather than direct mimicry, and output filters check against existing content, though Google acknowledges this approach is not guaranteed to catch all issues.

Azure

57:25 A milestone achievement in our journey to carbon negative

Microsoft has achieved its 2025 goal of matching 100 percent of global electricity consumption with renewable energy, contracting 40 gigawatts of new renewable capacity across 26 countries since 2020.
This represents enough energy to power approximately 10 million US homes, with 19 GW currently online and the remainder coming online over the next five years.
The renewable energy procurement has reduced Microsoft’s reported Scope 2 carbon emissions by an estimated 25 million tons and mobilized billions in private investment through over 400 contracts with 95 utilities and developers. This directly impacts Azure datacenter operations globally, supporting the infrastructure that runs customer workloads while advancing toward the company’s 2030 carbon negative commitment.
Microsoft is expanding beyond renewable energy to include nuclear power and other carbon-free technologies, including a 50 MW fusion project with Helion in Washington state and restarting the 835 MW Crane Clean Energy Center in Pennsylvania with Constellation Energy. The Climate Innovation Fund has allocated $806 million to 67 investees, with 38 percent directed toward energy systems innovation.
The company is deploying AI-driven tools to accelerate clean energy deployment, including collaborations with Idaho National Laboratory for nuclear licensing and the Midcontinent Independent System Operator for grid optimization.
These tools aim to streamline the design, permitting, and deployment of new power technologies to expand grid capacity more efficiently.
Azure customers benefit indirectly through more sustainable cloud infrastructure, though Microsoft notes the shift to an all-of-the-above decarbonization strategy recognizes that rising electricity demand from datacenters, AI workloads, and digital services requires diverse carbon-free energy sources beyond renewables alone.

55:58 Generally Available: Quota and deployment troubleshooting tools for Azure Functions Flex Consumption

Azure Functions Flex Consumption now has generally available quota and deployment troubleshooting tools built directly into the platform, giving developers clearer visibility into quota limits and constraints without needing to dig through documentation or support tickets.
The quota troubleshooting experience surfaces Flex Consumption-specific limits in context, which is useful for teams hitting scaling walls and trying to understand why deployments are behaving unexpectedly.
This is a quality-of-life improvement aimed at developers and platform engineers who use Flex Consumption for its per-execution billing model and fast scaling, helping reduce time spent diagnosing deployment failures.
Pricing for Flex Consumption remains consumption-based, so there is no additional cost for these troubleshooting tools themselves. More details are available at the Azure updates page here.
Teams already invested in Azure Functions should note this reduces reliance on external monitoring or support escalations for common quota-related issues, keeping troubleshooting within the Azure portal workflow.

56:32 Matt – “This is a great quality of life improvement because you can see why things are breaking when you’re using flexible consumption.”

-Honorable Mentions-

1:01:07 Public Preview Announcement: Empower Real-Time Security with Microsoft Sentinel’s CCF Push Feature | Microsoft Community Hub

Microsoft Sentinel’s CCF Push feature, now in public preview, allows security data providers to send logs directly to a Sentinel workspace without the traditional setup overhead of manually configuring Data Collection Endpoints, Data Collection Rules, Entra app registrations, and RBAC assignments. Pressing Deploy handles all resource provisioning automatically.
The feature is built on Sentinel’s Log Ingestion API, which supports high-throughput data ingestion, pre-ingestion data transformation, and direct targeting of system tables, making it more flexible than the older polling-based connector model.
For partners and ISVs building Sentinel integrations, CCF Push reduces time to market by consolidating connector deployment through the Content Hub as a single interface, rather than requiring customers to configure multiple Azure resources independently.
Early adopters include security vendors like Obsidian Security and Varonis, suggesting the feature is already being validated in real-world security workflows.
Developers can reference the MS Learn documentation here to get started.
No specific pricing details were provided in the announcement, but since CCF Push feeds data into Sentinel workspaces, standard Sentinel and Log Analytics ingestion costs would apply.
Organizations evaluating this feature should factor in their existing Sentinel pricing tier when estimating costs.

1:01:24 Microsoft Sovereign Cloud adds governance, productivity and support for large AI models securely running even when completely disconnected

Azure Local disconnected operations are now generally available, allowing organizations to run mission-critical infrastructure with full Azure governance and policy enforcement even when completely isolated from cloud connectivity. This targets government, defense, and regulated industries where external dependencies are either unacceptable or prohibited.
Microsoft 365 Local disconnected brings Exchange Server, SharePoint Server, and Skype for Business Server into fully air-gapped sovereign environments running on Azure Local, with Microsoft committing support for these workloads through at least 2035.
This keeps productivity tools available under the same governance boundary as infrastructure workloads.
Foundry Local now supports large multimodal AI models running on-premises hardware, including NVIDIA GPUs, within fully disconnected sovereign environments. This extends local AI inferencing capabilities beyond the smaller models Foundry Local previously supported, with Microsoft providing deployment, update, and operational health support.
The three components together form a full-stack sovereign private cloud covering infrastructure, productivity, and AI inferencing, all manageable through consistent Azure governance tooling regardless of connectivity state.
Pricing is not publicly listed and appears to vary based on deployment scale and customer qualification, so organizations should contact Microsoft directly for specifics.
Target customers include public sector agencies, classified environments, and regulated industries in regions where data residency and operational autonomy are legal or contractual requirements.
Azure Local is disconnected, and Microsoft 365 Local is available worldwide, while large model support on Foundry Local is currently limited to qualified customers.

Emerging Clouds

1:03:04 Introducing Command Center: The unified operations platform for AI workloads

Crusoe Command Center is a unified operations platform that consolidates GPU cluster monitoring, orchestration, and support into a single interface, addressing the common problem of engineers context-switching between fragmented dashboards during AI training runs.
The platform integrates with Crusoe Managed Kubernetes and supports Managed Slurm, allowing long-running multi-week training jobs to operate continuously across large GPU clusters without manual intervention.
AutoClusters is a key component that automatically detects GPU performance degradation, evicts compromised nodes, and replaces them with healthy instances from a reserve pool, reducing the need for around-the-clock manual oversight.
On the observability side, Command Center supports multiple access methods, including a UI, Grafana via PromQL API, and a Prometheus endpoint, while a Telemetry Relay feature streams infrastructure metrics directly to external tools to reduce data silos.
The Crusoe Watch Agent, paired with Telemetry Relay, extends visibility to custom application-level metrics, allowing teams to correlate workload performance with underlying GPU health data for more precise troubleshooting.

1:04:04 Matt – “The whole stack here is what I kind of find nice. The smaller clouds are trying to attack that whole vertical a lot more, where they’re giving you that depth all the way down, so if you are training your own model, you get the CPU, you get the GPU, you can see that whole stack of what’s going on, and really start to fine-tune.”

1:05:09 Expanding our Agentic Inference Cloud: Introducing GPU Droplets Powered by AMD Instinct MI350X GPUs

DigitalOcean is adding AMD Instinct MI350X GPUs to its GPU Droplets lineup, built on the CDNA 4 architecture and optimized for inference workloads, including prefill phase compute, low-latency token generation, and larger context windows.
The platform has demonstrated measurable results with existing customers, including a 2x increase in production request throughput and 50% reduction in inference costs for Character.AI, giving potential adopters concrete performance benchmarks to evaluate.
DigitalOcean is positioning these offerings toward AI-native companies and developers who need enterprise features like HIPAA eligibility and SOC 2 compliance without the complexity of larger cloud providers, with provisioning available in a few clicks.
The GPUs are currently available in the Atlanta datacenter, with AMD Instinct MI355X GPUs planned for next quarter, which will introduce liquid-cooled rack infrastructure to support larger models and datasets.
For smaller businesses and developers, the predictable usage-based pricing and simplified deployment model represent a meaningful alternative to the more complex pricing and configuration requirements typical of hyperscaler GPU offerings.

Closing

343: AWS CloudWatch Finally Hits Snooze

Wed, 25 Feb 2026 23:20:15 +0000

Welcome to episode 343 of The Cloud Pod, where the forecast is always cloudy! Justin, Ryan, and Matt are in the studio this week bringing you all the latest in Cloud and AI news, including some of the smaller clouds like Cloudflare and Crusoe Cloud, as well as announcements from the big guys like Google’s Gemini DeepThink, Anthropic’s big pay day, and Microsoft’s Notepad problem. We’ve got all this plus Matt screwing up his outro AGAIN, so let’s get started!

Titles we almost went with this week

Chrome’s WebMCP Protocol: Teaching AI Agents to Stop Doom-Scrolling the DOM and Actually Get Work Done
Claude Enterprise Self-Service: Because Sometimes You Just Want to Buy AI Without Small Talk
AWS EC2 Goes Inception Mode: Now You Can Virtualize Your Virtualization Without Going Broke
Amazon EC2 Nested Virtualization: Because Your Virtual Machine Was Lonely and Needed Its Own Virtual Machine
CloudWatch Alarm Mute Rules: Because Your Deployment Doesn’t Need a Standing Ovation at 3 AM
Anthropic’s $380 Billion Valuation Proves AI Funding Has Gone Claude Nine
AWS EC2 Nested Virtualization Finally Escapes the Expensive Hardware Jail
Cloudflare Teaches AI Agents the Magic Words: Accept text/markdown and Save 13,000 Tokens
Crusoe Cloud’s MCP Server: Teaching AI Assistants to Stop Asking for the Manager and Just Fix Your Infrastructure
Azure’s New Agentic Copilot: Because Manually Clicking Through Dashboards Was So 2023
Chrome’s WebMCP Gives AI Agents a GPS for Websites Because Apparently They’ve Been Lost in the HTML This Whole Time
Anthropic Cuts Out the Middleman: Claude Enterprise Now Available Without the Enterprise Sales Dance
AWS Gives CloudWatch the Silent Treatment: New Mute Rules Let Alarms Sleep Through Maintenance Windows
AWS CloudWatch Hits Snooze: Mute Rules End On-Call Nightmares
AWS Gives CloudWatch the Silent Treatment

General News

00:45 Bloat Risk? Microsoft’s Notepad Upgrade Also Introduced a Vulnerability | PCMag

Microsoft’s recent Notepad modernization introduced CVE-2026-20841, a vulnerability in the new Markdown support feature that allows malicious links in files to execute remote code.
The flaw has been patched in the February 2026 security updates, but it highlights the security trade-offs when adding features to historically simple applications.
The vulnerability exploits Notepad’s Markdown rendering capability, which Microsoft added in May to support lightweight markup language formatting. When Notepad opens a specially crafted Markdown file, embedded malicious links can trigger unverified protocols that load and execute remote files on the system.
This incident raises questions about feature bloat in core Windows utilities, particularly as Microsoft continues adding network-dependent capabilities like AI-powered text writing to Notepad. Security researchers are debating whether basic text editors should have network functionality at all, given the expanded attack surface.
The vulnerability demonstrates how modernization efforts can introduce security risks in previously low-risk applications.
Organizations using Windows need to ensure their systems receive the February 2026 security updates to address this specific flaw in Notepad’s Markdown implementation.

02:04 Matt – “I’m just confused why they didn’t use Copilot on their pull request in order to identify this as a potential bug. I feel like it should have found it. Just sayin’…”

03:13 WebMCP is available for early preview

Chrome is introducing WebMCP, a standardized protocol that lets websites expose structured tools and actions directly to AI agents, eliminating the need for agents to parse raw HTML and DOM elements.
This addresses a key reliability problem in agentic workflows where AI agents currently struggle with inconsistent web interactions.
The protocol offers two interaction modes: a declarative API for simple HTML form-based actions and an imperative API for complex JavaScript-driven workflows. This dual approach lets websites define exactly how agents should interact with features like booking systems, support ticket forms, and checkout processes.
Early use cases focus on high-value transactional workflows, including e-commerce product configuration, travel booking with complex filtering requirements, and automated customer support ticket creation with technical details. These scenarios benefit most from structured interactions versus unreliable DOM manipulation.
The early preview program requires sign-up for access to documentation and demos, indicating this is still in experimental stages.
Developers interested in making their sites agent-ready will need to implement these new APIs to participate in the agentic web ecosystem Chrome is building.
This represents Chrome’s attempt to standardize how AI agents interact with websites before the market fragments with competing approaches. Sites that adopt WebMCP early may gain advantages as browser-based AI agents become more prevalent.
Interested in signing up for the preview? You can do that here.

04:41 Ryan – “It makes a lot of sense why they want to standardize on a specific protocol, but I can’t help but feel like this is the beginning of the end of human interaction; where you’re going to have an AI agent-to-agent protocol.”

AI Is Going Great – Or How ML Makes Money

07:27 Anthropic raises $30 billion in Series G funding at $380 billion post-money valuation \ Anthropic

Anthropic closed a $30 billion Series G at a $380 billion post-money valuation, reaching $14 billion in run-rate revenue with 10x annual growth for three consecutive years.
The company now serves eight of the Fortune 10, with over 500 customers spending more than $1 million annually.
Claude Code, made generally available in May 2025, has grown to $2.5 billion in run-rate revenue and now accounts for 4% of all public GitHub commits worldwide. Business subscriptions quadrupled since early 2026, with enterprise customers representing over half of Claude Code’s revenue.
Opus 4.6 launched last week as the latest model release, leading the GDPval-AA benchmark for economically valuable knowledge work in finance and legal domains. The model powers agents capable of generating professional documents, spreadsheets, and presentations autonomously.
Anthropic expanded its product portfolio in January with over thirty launches, including Cowork, which extends Claude Code capabilities to broader knowledge work with eleven open-source plugins for specialized roles.
Claude for Enterprise is now HIPAA-compliant and available for healthcare and life sciences organizations.
Claude remains the only frontier AI model available across all three major cloud platforms through AWS Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry.
The company trains on diversified hardware, including AWS Trainium, Google TPUs, and NVIDIA GPUs, to optimize workload performance and resilience.

08:10 Matt – “Those numbers are insane. I just want to make sure we’re all clear about that.”

15:16 Introducing Sonnet 4.6 \ Anthropic

Claude Sonnet 4.6 is now generally available across all Claude plans, API, and major cloud platforms at the same pricing as Sonnet 4.5 ($3/$15 per million tokens), with a 1M token context window in beta.
The model now serves as the default for Free and Pro plan users, bringing Opus-class performance to a mid-tier price point.
Computer use capabilities have improved substantially, with Sonnet 4.6 scoring 94% on insurance benchmarks and showing human-level performance on tasks like navigating complex spreadsheets and multi-step web forms.
The model demonstrates better resistance to prompt injection attacks compared to Sonnet 4.5 and performs similarly to Opus 4.6 on safety evaluations.
Coding performance has advanced significantly, with early users preferring Sonnet 4.6 over Sonnet 4.5 roughly 70% of the time and even choosing it over Opus 4.5 59% of the time.
Users report better instruction following, less overengineering, fewer hallucinations, and more consistent follow-through on multi-step tasks, with one customer reporting an 80.2% score on SWE-bench Verified.
Several features have reached general availability on the API, including code execution, memory, programmatic tool calling, tool search, and tool use examples.
Web search and fetch tools now automatically write and execute code to filter search results, improving response quality and token efficiency.
The model supports both adaptive thinking and extended thinking modes, with context compaction in beta that automatically summarizes older context as conversations approach limits.
Claude in Excel now supports MCP connectors, allowing users to pull data from external sources like S&P Global, LSEG, and PitchBook directly within spreadsheets.

17:42 Ryan – “I haven’t played with Sonnet because it’s just released, but playing around with Opus, you can see that it’s another major improvement in these steps, and it is pretty fantastic to use.”

19:44 Token Anxiety – by Nikunj Kothari – Balancing Act

This article describes a cultural shift in San Francisco’s tech scene where developers are prioritizing AI agent management over social activities, with people leaving parties early to check on overnight code generation and spending weekends running 12-hour build sessions with AI assistants like Claude and Codex.
The piece highlights how AI coding tools have created a new productivity anxiety where developers feel compelled to keep agents running continuously, even during sleep, to maximize output and stay competitive as new model capabilities and context windows are released weekly.
Developers are adopting new vocabulary around AI models, discussing them like sommeliers evaluate wine and using animal training metaphors like keeping Claude on a tight leash for code review while giving it more slack for creative work.
The constant stream of benchmark improvements and new AI capabilities is creating pressure to continuously optimize workflows, as each advancement makes previous methods feel outdated and multiplies the sense that competitors are already leveraging these improvements.
This represents a broader shift in developer culture where traditional leisure activities are being replaced by AI-assisted building, with the primary social metric changing from what you accomplished to how many agents you have running in parallel.

24:25 Ryan – “I still don’t know how everyone has these overnight workloads; I guess I don’t trust AI at all; I’m not going to let it run unsupervised.”

31:48 Alibaba Launches New LLM as China’s AI Battle Heats Up

Qwen 3.5 is out. No industry freakouts (like with DeepSeek) so far

33:06 Seed News – ByteDance Seed Team

ByteDance officially launched Seedance 2.0, a next-generation video creation model with a unified multimodal audio-video architecture supporting text, image, audio, and video inputs.
The model can process up to 9 images, 3 video clips, 3 audio clips, and natural language instructions simultaneously for comprehensive content referencing and editing.
The model delivers substantial improvements in complex motion rendering and physical accuracy, particularly excelling at multi-subject interactions like competitive figure skating with synchronized movements, mid-air spins, and precise landings that follow real-world physics.
Industry evaluations show Seedance 2.0 achieves leading performance in motion stability, instruction following, and visual aesthetics compared to competing models.
Seedance 2.0 introduces dual-channel stereo audio generation with multi-track parallel output for background music, ambient effects, and voiceovers synchronized to visual rhythm.
The model supports 15-second high-quality multi-shot audio-video output suitable for commercial advertising, film VFX, game animations, and explainer videos.
New video editing capabilities allow targeted modifications to specific clips, characters, actions, and storylines, plus video extension functionality for generating continuous shots based on user prompts.
The model demonstrates improved instruction-following for complex scripts and open-ended prompts while maintaining subject consistency across extended sequences.
The unified multimodal architecture enables professional-grade content creation workflows where users can reference composition, motion, camera movement, visual effects, and audio elements from input assets, significantly lowering barriers to industrial-level video production without requiring specialized technical expertise.
https://www.instagram.com/reel/DUm4zSvEn76/ – John Wick cat video as mentioned.

34:53 Justin – “I’m surprised Hollywood stock didn’t crash today over this; very very impressive. Crazily so.”

AWS

36:47 Announcing new Amazon EC2 general purpose M8azn instances

AWS launches M8azn instances powered by fifth-generation AMD EPYC Turin processors running at 5GHz, the highest CPU frequency available in the cloud. These general-purpose instances deliver 2x compute performance over M5zn and 24% better performance than M8a instances, with 4.3x higher memory bandwidth and 10x larger L3 cache.
The instances target latency-sensitive workloads like high-frequency trading, real-time financial analytics, and simulation modeling for automotive and aerospace industries.
Built on sixth-generation Nitro Cards, they provide 2x networking throughput and 3x EBS throughput compared to M5zn instances.
M8azn instances come in nine sizes from 2 to 96 vCPUs with up to 384 GiB memory at a 4:1 memory-to-vCPU ratio, including two bare metal variants. Available in US East Virginia, US West Oregon, Tokyo, and Frankfurt regions through On-Demand, Spot, and Savings Plans pricing models.
The high-frequency positioning fills a specific niche for workloads requiring maximum single-threaded performance rather than just core count.
This complements AWS’s broader M8a lineup by offering customers a choice between standard frequency instances and these premium high-frequency variants for specialized use cases.

37:03 Announcing Amazon EC2 C8i, M8i, and R8i instances on second-generation AWS Outposts racks

AWS is bringing C8i, M8i, and R8i instances to second-generation Outposts racks, delivering 20% better performance and 2.5x more memory bandwidth compared to the previous C7i, M7i, and R7i generation. These instances also provide 20% more compute capacity within the same physical rack space and power consumption, improving density for on-premises deployments.
The new instances run on custom Intel Xeon 6 processors exclusive to AWS and target workloads that need enhanced on-premises performance, including large databases, memory-intensive applications, real-time analytics, high-performance video encoding, and CPU-based ML inference.
This addresses the gap for customers who need cloud-class compute but must keep workloads on-premises due to latency, data residency, or regulatory requirements.
Second-generation Outposts racks continue AWS’s hybrid cloud strategy by extending the latest EC2 instance types to customer data centers with the same APIs and tooling as the public cloud.
The availability varies by region, so customers should check the Outposts rack FAQs page for current country and territory support before planning deployments.
The performance improvements come primarily from the memory bandwidth increase and processor generation upgrade, which should benefit database operations, in-memory caching, and data-intensive applications that previously hit memory bottlenecks on Outposts.
The power and space efficiency gains matter for customers with constrained data center capacity or energy budgets.

37:08 Amazon EC2 Hpc8a Instances powered by 5th Gen AMD EPYC processors are now available

AWS launches Hpc8a instances powered by 5th Gen AMD EPYC processors, delivering 40% higher performance and 42% greater memory bandwidth than the previous Hpc7a generation, while offering up to 25% better price-performance for tightly coupled HPC workloads like computational fluid dynamics and weather modeling.
The instances come in a single 96xlarge size with 192 cores, 768 GiB memory, and 300 Gbps Elastic Fabric Adapter networking, featuring customizable core counts at launch and sixth-generation AWS Nitro cards for offloaded virtualization functions. Simultaneous Multithreading is disabled by default to optimize HPC performance.
Available now in US East Ohio and Europe Stockholm regions, with support for AWS ParallelCluster, AWS Parallel Computing Service, and Amazon FSx for Lustre integration to simplify cluster management and provide sub-millisecond storage latencies. Customers can purchase as On-Demand Instances or through Savings Plans, with specific pricing available on the EC2 pricing page.
The 1:4 core-to-memory ratio and high core density target compute-intensive simulation workloads requiring rapid time-to-results, including crash simulations and high-resolution weather modeling within tight operational windows. The customizable core count feature allows right-sizing based on specific HPC workload requirements without paying for unused capacity.

39:20 Ryan – “I’m sure they use a subcontractor for actual maintenance, things. But I’m sure that you have to give them access and manage them just like you would any other remote hands for your data center.”

39:37 MSK simplifies Kafka topic management with new APIs and console integration

Amazon MSK now provides native AWS APIs for Kafka topic management, eliminating the need to set up and maintain separate Kafka admin clients. The three new APIs (CreateTopic, UpdateTopic, and DeleteTopic) work alongside existing ListTopics and DescribeTopic APIs through AWS CLI, SDKs, and CloudFormation, letting teams manage topics using standard AWS tooling and IAM permissions.
The MSK console now consolidates all topic operations in one interface with guided defaults for creating and updating topics. Users can configure properties like replication factor, partition count, retention policies, and cleanup settings while viewing comprehensive partition-level metrics and configuration details directly in the console.
These capabilities are available at no additional cost for MSK provisioned clusters running Kafka version 3.6 and above across all regions where MSK is offered. Organizations need to configure appropriate IAM permissions to use the new APIs, with setup instructions available in the MSK Developer Guide.
The update addresses a common operational pain point where teams previously had to maintain separate Kafka admin tooling outside the AWS ecosystem. This integration brings Kafka topic management into standard AWS workflows, improving consistency with existing infrastructure-as-code practices and centralized access control through IAM.

40:47 Ryan – “I suspect this has more to do with Kafka than AWS because Kafka is notoriously hard to administer, so in a lot of cases there’s just not the ability…so I’m really happy to see this.”

42:40 Amazon Bedrock adds support for six fully-managed open weights models

Amazon Bedrock now supports six new open weights models, including DeepSeek V3.2, MiniMax M2.1, GLM 4.7, GLM 4.7 Flash, Kimi K2.5, and Qwen3 Coder Next, providing frontier-class performance at lower inference costs than proprietary alternatives.
These models cover different enterprise needs from advanced reasoning and agentic tasks to autonomous coding with large output windows and lightweight production deployments.
The models run on Project Mantle, a new distributed inference engine that accelerates model onboarding to Bedrock while providing serverless inference with quality of service controls and automated capacity management. Project Mantle includes native OpenAI API compatibility, allowing customers to switch from OpenAI endpoints without code changes.
The addition of these open weights models gives AWS customers more flexibility in model selection based on specific workload requirements and cost constraints.
DeepSeek V3.2 and Kimi K2.5 handle complex reasoning tasks, while GLM 4.7 and MiniMax 2.1 support coding workflows with extended context windows, and Qwen3 Coder Next and GLM 4.7 Flash offer cost-efficient options for high-volume production use.
Project Mantle’s unified capacity pools and higher default quotas address common scaling challenges customers face when deploying large language models.
The serverless architecture eliminates infrastructure management overhead, while the automated capacity management helps prevent quota limitations during peak usage periods.

44:05 Matt – “I like how they made it all compatible with OpenAI. It’s kind of like S3 compatibility; I feel like we’re slowly kind of coming to a standard, which means you can go play with it and see which model makes sense.”

46:02 Amazon EKS Auto Mode Announces Enhanced Logging for its Managed Kubernetes Capabilities

EKS Auto Mode now integrates with CloudWatch Vended Logs to automatically collect logs from its managed Kubernetes capabilities, including compute autoscaling, block storage, load balancing, and pod networking.
This gives customers centralized visibility into Auto Mode’s infrastructure management operations without manual configuration.
The integration uses CloudWatch Vended Logs, which provides lower pricing than standard CloudWatch Logs while maintaining built-in AWS authentication and authorization.
Customers can route logs to CloudWatch Logs, S3, or Kinesis Data Firehose, depending on their retention and analysis requirements, with standard destination charges applying.
Each Auto Mode capability can be configured independently as a log delivery source through CloudWatch APIs or the AWS Console.
This granular control allows teams to monitor specific components like the Karpenter-based autoscaler or VPC CNI networking without collecting unnecessary log data.
The feature addresses a common operational challenge where Auto Mode’s automated infrastructure management previously operated as a black box. DevOps teams can now troubleshoot issues like pod scheduling failures, storage provisioning problems, or load balancer configuration errors by examining the actual logs from Auto Mode’s control plane operations.
Available immediately in all regions where EKS Auto Mode operates, this logging capability helps bridge the observability gap between customer workloads and AWS-managed Kubernetes infrastructure components.

47:05 Justin – “All I have to say is, some lovely CloudWatch PM just made their bonus this year by turning this one, as this is a lot of logging context that you now need to parse and pay for.”

49:26 AWS CloudWatch Alarm Mute Rules eliminate alert fatigue

CloudWatch Alarm Mute Rules let you temporarily silence alarm notifications during planned maintenance windows, deployments, or off-hours without disabling the underlying monitoring.
The feature supports up to 100 alarms per rule with one-time or recurring schedules, and automatically triggers any suppressed actions once the mute period ends if the alarm state persists.
This addresses a common operational pain point where teams either ignore alerts during maintenance windows or use risky script-based workarounds that can be forgotten and leave monitoring disabled.
The native integration eliminates the need for custom automation to manage notification states during planned activities.
The feature is available today across all AWS regions that support CloudWatch alarms at no additional cost beyond standard CloudWatch pricing.
Configuration is done through the CloudWatch console or API, with support for all alarm states, including OK, ALARM, and INSUFFICIENT_DATA.
Primary use cases include silencing non-critical alerts during scheduled deployments, muting development environment alarms outside business hours, and suppressing known issues during maintenance windows.
This helps reduce alert fatigue while maintaining full visibility into system state and metrics collection.
The automatic re-triggering of muted actions ensures teams don’t miss persistent issues that started during a mute window, providing a safety mechanism that manual notification management typically lacks.

50:49 Ryan – “This is much nicer. Basically, set it for ignore for an hour and then have it kick back in. Glad to see this, but strange that it took this long.”

52:48 Amazon EC2 supports nested virtualization on virtual Amazon EC2 instances

AWS now supports nested virtualization on standard EC2 instances, not just bare metal, allowing customers to run KVM or Hyper-V hypervisors inside virtual machines. This expands flexibility for development and testing scenarios that previously required more expensive bare metal instances.
The feature launches on the latest generation C8i, M8i, and R8i instance families across all commercial AWS regions.
Customers can now run mobile app emulators, automotive hardware simulators, and Windows Subsystem for Linux on Windows workstations directly on virtual instances.
This capability addresses a long-standing limitation where nested virtualization required bare metal instances, which carry higher costs and longer provisioning times compared to standard virtual instances.
The change makes nested environments more accessible for development teams and testing workflows.
Common use cases include software vendors who need to test their products across multiple operating systems, automotive companies simulating vehicle hardware environments, and mobile developers running Android or iOS emulators at scale.
These workloads can now run on more cost-effective instance types with faster deployment.
The feature requires enabling hardware virtualization extensions in the instance configuration, with full documentation available in the EC2 user guide. Pricing follows standard EC2 rates for the C8i, M8i, and R8i instance families without additional charges for the nested virtualization capability itself.

54:13 Ryan – “These kinds of announcements are usually preceded or quickly followed with Nitro…and it’s neat. It’s neat how they isolate the hardware layer to match these workloads.”

54:50 Announcing Amazon SageMaker Inference for custom Amazon Nova models

AWS now lets customers deploy custom-trained Amazon Nova models on SageMaker Inference with production-grade controls over instance types, auto-scaling, context length, and concurrency settings.
This addresses customer requests for the same deployment flexibility they get with open-weight models, enabling full-rank customized Nova Micro, Nova Lite, and Nova 2 Lite models trained via SageMaker Training Jobs or HyperPod.
The service reduces inference costs by supporting more cost-effective EC2 G5 and G6 instances instead of requiring P5 instances, with auto-scaling based on 5-minute usage patterns and configurable inference parameters.
Customers pay only for compute instances used with per-hour billing and no minimum commitments, following standard SageMaker pricing.
Deployment works through SageMaker Studio UI or SDK, supporting both real-time streaming and asynchronous batch inference modes. The service includes advanced configuration options for context length up to 8000 tokens, max concurrency settings, and inference parameters like temperature and top-p for optimizing latency-cost-accuracy tradeoffs.
Currently available in US East N. Virginia and the US West Oregon regions, with support for Nova models with reasoning capabilities.
Instance type requirements vary by model size, with Nova Micro supporting g5.12xlarge and up, Nova Lite requiring g5.48xlarge minimum, and Nova 2 Lite needing p5.48xlarge instances.

56:47 Ryan – “It’s not an open-source model, and so it is kind of crazy that Nova offers that customization.”

GCP

57:25 Gemini 3 Deep Think: AI model update designed for science

Google has released a major update to Gemini 3 Deep Think, a specialized reasoning mode designed for complex scientific and engineering problems where data is messy or incomplete, and solutions aren’t straightforward.
The model achieved notable benchmark results, including 48.4% on Humanity’s Last Exam, 84.6% on ARC-AGI-2, and gold medal performance on the 2025 International Math, Physics, and Chemistry Olympiads.
Early adopters are using Deep Think for practical applications like identifying logical flaws in peer-reviewed mathematics papers, optimizing semiconductor crystal growth fabrication methods, and converting sketches into 3D-printable files with generated code.
The model combines deep scientific knowledge with engineering utility to move beyond theoretical work into applied research.
The updated Deep Think is available now to Google AI Ultra subscribers through the Gemini app, with pricing following the existing Ultra subscription model.
For the first time, Google is offering API access through an early access program for select researchers, engineers, and enterprises who can apply through a Google form.
The release targets scientific research institutions and engineering teams working on complex problems in physics, chemistry, materials science, and advanced mathematics, where traditional AI models struggle with ambiguous requirements.
Deep Think’s ability to work with incomplete data and generate executable code for physical modeling makes it particularly relevant for R&D workflows.

1:00:19 New global queries in BigQuery span data from multiple regions

BigQuery global queries now allow users to run a single SQL statement across datasets stored in multiple geographic regions without requiring ETL pipelines or data replication.
The feature automatically handles cross-region data movement in the background while respecting existing security controls like VPC Service Controls and requiring explicit opt-in at both the project and user level.
The primary use case targets multinational organizations that need to analyze distributed data for compliance or performance reasons, such as joining US customer data with European transaction logs and Asian operational data in one query.
EssilorLuxottica is using this to perform cross-region aggregated analysis while maintaining data residency requirements for security and compliance. (DOES IT THOUGH?)
Users maintain control over where queries execute and can specify the processing location to meet data residency requirements, though cross-region data transfers will incur additional egress costs that organizations need to factor into their analytics budgets.
The feature is currently in preview with documentation available here.
This addresses a longstanding limitation in cloud data warehousing, where geographic data distribution required complex engineering solutions, now replaced by standard SQL queries that any authorized analyst can run directly from the BigQuery console. The feature respects governance controls by default and prevents accidental data movement through required permissions and explicit enablement.

1:01:36 Matt – “I feel l ike it is compliant… if you’re running local and you’re not collecting anything that could be confidential. So it depends on how your lawyer at your company interprets it.”

Azure

1:03:47 Agentic cloud operations and Azure Copilot for AI‑driven workloads

Microsoft introduces agentic cloud operations through Azure Copilot, which uses AI agents to automate and coordinate cloud management tasks across the full infrastructure lifecycle. Instead of adding another dashboard, Azure Copilot provides a unified interface accessible through natural language, chat, console, or CLI that connects directly to a customer’s actual Azure environment, including subscriptions, resources, and policies.
Azure Copilot includes six specialized agents that handle migration discovery and dependency mapping, deployment with infrastructure-as-code generation, continuous observability across the full stack, cost and performance optimization with carbon impact analysis, resiliency management including ransomware protection, and troubleshooting with root cause diagnosis.
These agents work as a connected system rather than isolated tools, correlating signals and taking action within existing RBAC and policy controls.
The service maintains governance through built-in oversight features, including Bring Your Own Storage for conversation history, which keeps operational data within the customer’s Azure environment for compliance and sovereignty requirements.
All agent-initiated actions are reviewable, traceable, and auditable while respecting existing security policies and role-based access controls.
Target customers are organizations running modern applications and AI workloads at scale, where traditional manual operations cannot keep pace with rapid deployment cycles and infrastructure changes.
The approach addresses environments where workloads move from experimentation to production in weeks and where telemetry streams continuously from every layer of the stack.
Pricing details were not disclosed in the announcement, though the service builds on existing Azure Copilot capabilities introduced at Microsoft Ignite. Organizations can access resources and get started at azure.microsoft.com/products/copilot.

1:05:39 Matt – “Also, a developer actually understanding what they want and telling you what they want and actually being useful? I would love to see too, because how many times have we built something, deployed it, day before the release – we actually need these 16 other things that we didn’t tell you about that we manually did in our dev environment, which is why it’s working… and the release is tomorrow. Good luck. Why is it not done yet?”

1:06:18 General Availability: Instant access support for incremental snapshots of Azure Premium SSD v2 and Ultra Disk

Azure now offers instant access to incremental snapshots for Premium SSD v2 and Ultra Disk storage, eliminating the previous wait time when restoring disks from snapshots.
This addresses a significant operational pain point for customers running high-performance workloads that require rapid disaster recovery or quick environment provisioning.
The feature specifically targets enterprise customers using Azure’s highest-tier storage options, Premium SSD v2 and Ultra Disk, which are typically deployed for mission-critical databases, SAP HANA, and other latency-sensitive applications.
Previously, customers had to wait for snapshot data to fully hydrate before using restored disks, creating delays in recovery scenarios.
Incremental snapshots only capture changes since the last snapshot, reducing storage costs and backup windows compared to full snapshots.
With instant access now available, customers can immediately mount and use restored disks while background hydration completes, improving recovery time objectives for business continuity planning.
This capability brings Premium SSD v2 and Ultra Disk snapshot functionality closer to parity with standard Azure managed disk snapshots.
The feature is now generally available across Azure regions where Premium SSD v2 and Ultra Disk are supported, though specific pricing for snapshot storage follows existing Azure snapshot pricing models based on stored data volume.

1:06:25 Justin – “Welcome to what Amazon and Google have been doing for quite a while, so thanks, Azure!

Emerging Clouds

1:08:16 Introducing the Crusoe Cloud MCP server

Crusoe Cloud released an MCP server that connects AI coding assistants like Claude Code and Cursor directly to cloud infrastructure, but unlike typical API wrappers, it returns filtered responses designed specifically for LLM consumption to avoid flooding context windows with unnecessary data.
The server includes composite tools like get_resource_relationships that map entire infrastructure topologies in a single call by fetching 11 resource types in parallel and resolving cross-references, something that doesn’t exist in their CLI or any single API endpoint.
The cluster_health_check tool provides pre-analyzed node-level health metrics organized by InfiniBand pod placement, returning structured summaries with problem nodes flagged rather than raw metric time series that would require additional processing.
This approach addresses a key limitation of AI agents working with cloud infrastructure: most MCP implementations just wrap CLI commands and return the same JSON a human would see, forcing the AI to parse through irrelevant metadata and empty fields.
The implementation reflects a broader trend of cloud providers releasing MCP servers, but Crusoe’s focus on response filtering and burst-heavy access patterns specific to AI agents suggests infrastructure management tools are being redesigned around LLM capabilities rather than human interaction patterns. For developers already using AI coding assistants, this enables natural language infrastructure queries and troubleshooting without manual scripting or console navigation.

1:10:16 Ryan – “This is gonna be chaos.”

1:10:21 Introducing Markdown for Agents

Cloudflare now automatically converts HTML to markdown for AI agents using content negotiation headers, reducing token usage by up to 80 percent.
When agents request pages with Accept: text/markdown, Cloudflare’s network performs real-time conversion at the edge, eliminating the need for downstream processing and reducing costs for AI systems.
The feature addresses a fundamental inefficiency where AI agents waste tokens parsing HTML markup, navigation elements, and styling that have no semantic value.
A simple heading that costs 3 tokens in markdown can consume 12-15 tokens in HTML, and this blog post example shows 16,180 tokens in HTML versus 3,150 in markdown.
Cloudflare includes an x-markdown-tokens header with converted responses to help developers calculate context window sizes and chunking strategies. The service also automatically adds Content-Signal headers indicating the content can be used for AI training, search results, and agentic use, integrating with their Content Signals framework from Birthday Week.
The feature is available in beta at no cost for Pro, Business, and Enterprise plans, with Cloudflare already enabling it on their own blog and developer documentation.
Popular coding agents like Claude Code and OpenCode already send the appropriate accept headers, positioning this as infrastructure for the shift from traditional SEO to AI-driven content discovery.
Cloudflare Radar now tracks content type distribution for AI bot traffic, allowing analysis of how different agents consume web content over time. This data is accessible through public APIs and shows early adoption patterns like OAI-Searchbot requesting markdown content.

Closing

342: Eight Minutes to Midnight: When AI Helps Hackers Speed Run Your AWS Account

Wed, 18 Feb 2026 19:37:13 +0000

Welcome to episode 342 of The Cloud Pod, where the forecast is always cloudy! Justin, Ryan, and Matt are in the studio today to bring you all the latest in cloud and AI news this week. How do you feel about ads? How do you feel about ads while using AI? We’ve got options! We’ve got a round-up of tech Super Bowl ads, AI ads, Earnings reports (who frankly need the ad revenue), and a plethora of Opus 4.6 announcements, plus more. Let’s get started!

Titles we almost went with this week

ChatGPT Goes Full Mad Men: Your AI Assistant Now Comes With Commercial Breaks
Heroku’s New Feature: No New Features
AWS Gives EC2 Instances a Storage Growth Spurt: 22.8TB of Local NVMe Now Available
Identity Crisis Averted: IAM Identity Center Learns to Replicate Itself
JSON Schema Enforcement: Because Your LLM Needs Structure in Its Life
From Zero to Admin in 480 Seconds: A Serbian Speedrun Story
From Proof of Concept to Proof of Claw: DigitalOcean Tames AI Agent Infrastructure
Azure’s Growth Hits the Clouds: Microsoft’s 39% Increase Still Not Enough for Wall Street
One Lake to Rule Them All: Microsoft and Snowflake Finally Stop Fighting Over Your Data
Free Lunch Officially Over: ChatGPT Learns That Servers Cost Money
Claude Won’t Sell You Anything (Except Maybe Peace of Mind)
IAM Identity Center Goes Multi-Regional: Because One Region to Rule Them All Wasn’t Enough
Databricks Takes the Base Out of Database with Lakebase GA
I’m a Chrome Tab hoarder

General News

01:30 Superbowl Ads of Note

OpenAI: https://www.youtube.com/watch?v=aCN9iCXNJqQ
Microsoft CoPilot: https://www.youtube.com/watch?v=Ndj9Jk-tGKo
Base44?: https://www.youtube.com/watch?v=iKEUWtqvsis
Gemini: https://www.youtube.com/watch?v=Z1yGy9fELtE
Anthropic: https://www.youtube.com/watch?v=gmnjDLwZckA
ai.com: https://www.youtube.com/watch?v=n7I-D4YXbzg&t=3s

16:35 Justin -If you ever want to knowif there’s a bubble, spending dumb money on the Super Bowl on an ad that makes no sense is probably your number one clue.”

16:53 It’s Earnings Time!

Microsoft (MSFT) Q2 earnings report 2026

Microsoft Q2 2026 earnings show Azure cloud growth slowing to 39% from 40% in the prior quarter, missing analyst expectations of 39.4% and causing shares to drop 7% in after-hours trading.
The company’s gross margin hit a three-year low at 68% due to substantial AI infrastructure investments totaling $37.5 billion in capital expenditures, up 66% year over year.
OpenAI now represents 45% of Microsoft’s $625 billion remaining commercial performance obligation after the company committed to a $250 billion cloud services deal during the quarter.
This concentration raises questions about revenue dependence on a single customer, though Microsoft maintains that the remaining backlog is still larger and more diversified than most competitors, with 28% growth.
Microsoft 365 Copilot adoption reached 15 million seats out of 450 million total paid commercial seats, representing only 3.3% penetration.
The company plans to raise prices on commercial Office subscriptions in July to help offset AI infrastructure costs and improve margins, while Q3 guidance projects Azure growth of 37-38% at constant currency.
The More Personal Computing segment declined 3%, with gaming revenue down 9.5% due to an unspecified impairment charge, reflecting ongoing challenges in the Xbox division.
Microsoft added nearly one gigawatt of data center capacity in the quarter alone, but continues to face supply constraints that cannot keep pace with customer demand for AI services.

20:27 Alphabet (GOOGL) Q4 2025 earnings

Alphabet plans to spend between $175 billion and $185 billion on capital expenditures in 2026, more than double its 2025 spending, primarily targeting AI compute capacity for DeepMind and meeting cloud customer demand.
This represents one of the largest infrastructure investments in tech history and signals the scale of resources required to compete in enterprise AI.
Google Cloud revenue grew 48% year-over-year to $17.66 billion and beat analyst expectations, with backlog reaching $240 billion after increasing 55% sequentially.
The cloud division’s performance demonstrates strong enterprise adoption of Google’s AI services and positions it as a more competitive alternative to AWS and Azure.
Gemini AI now has 750 million monthly active users, up from 650 million last quarter, while Google reduced Gemini serving costs by 78% throughout 2025 through model optimizations and efficiency improvements.
- This cost reduction is critical for maintaining profitability as AI services scale to hundreds of millions of users.
YouTube advertising revenue of $11.38 billion missed analyst expectations of $11.84 billion, which Alphabet attributed to difficult year-over-year comparisons against strong US election spending in Q4 2024.
- This shortfall highlights how political advertising cycles create volatility in digital ad revenue forecasting.
Waymo recorded a $2.1 billion stock-based compensation charge following its $16 billion valuation fundraising round, contributing to Other Bets losses exceeding $3.6 billion despite serving 15 million autonomous rides across six US markets.
- The charge reflects the high cost of retaining talent in competitive autonomous vehicle development.

22:05 Justin – “Gemini adoption must be ramping up much faster than I realized, because the fact that Microsoft was missing on earnings, and they’re the OpenAI provider for the most part… makes me question how well OpenAI is actually doing.”

22:50 AWS Q4 earnings report 2025

AWS Q4 2025 revenue reached $35.58 billion with 24% year-over-year growth, maintaining its market leadership position, while operating margins improved to 35%.
The cloud unit now represents 17% of Amazon’s total revenue but generates the majority of the company’s profits at $12.47 billion in operating income.
Amazon plans to invest $200 billion in capital expenditures for 2026, primarily for AWS infrastructure, which significantly exceeds analyst expectations of $148.86 billion.
The company added 4 gigawatts of computing capacity in 2025 and plans to double that by the end of 2027, with most investment directed toward AI workloads rather than traditional cloud services.
AWS growth rate of 24% trails competitors Google Cloud at 48% and Azure at 39%, suggesting potential market share shifts in AI-driven cloud services. Both competitors are reporting stronger growth attributed to artificial intelligence workloads, which may indicate AWS is losing ground in the AI infrastructure race despite its overall market leadership.
The company secured a $38 billion spending commitment from OpenAI and launched Nova Forge for advanced AI model customization at $100,000 annually.
These moves demonstrate AWS’s strategy to compete in the generative AI training market, though the pricing and approach differ from competitors’ offerings.
Capital expenditure guidance reveals that non-AI workloads are growing faster than anticipated, requiring additional infrastructure investment beyond AI capacity.
This indicates traditional cloud computing demand remains strong and may be underestimated in current market analysis focused primarily on AI growth.

25:11 Capex Growth By Quarter

24:14 Justin – “They also took a major write-off on Amazon Fresh, because they’re shutting that down as well. So just bad, bad all the way around for Amazon.”

29:23 An Update on Heroku

Heroku is moving to a sustaining engineering model, meaning no new features will be developed while the platform continues to receive security patches, stability updates, and operational support.
This represents a shift from active development to maintenance mode for the 15-year-old platform-as-a-service.
Existing customers can continue using Heroku with no changes to pricing, billing, or service levels, and all core functionality, including applications, pipelines, teams, and add-ons, remains fully operational.
Credit card-based accounts remain available for both current and new customers through the dashboard.
Salesforce is ending new Enterprise Account contracts while honoring existing enterprise subscriptions and support agreements through their renewal periods. This signals a strategic pivot away from enterprise sales expansion while maintaining commitments to current large customers.
The parent company is redirecting engineering resources toward enterprise AI capabilities rather than continuing platform-as-a-service innovation. This follows a pattern of Salesforce deprioritizing Heroku since the acquisition, including the 2022 elimination of free tiers and reduced feature velocity in recent years.
Developers relying on Heroku for production workloads should evaluate long-term platform viability given the maintenance-only status, though no immediate migration is required.
The announcement provides clarity for capacity planning but raises questions about the platform’s competitiveness as cloud-native alternatives continue advancing.

31:32 Matt – “It’s a great platform as a service, and I’m sad to see it go, because there’s a lot of companies I’ve worked with in the past that have started there because it was just so easy. The problem for them, at least back in the day, was scaling and supporting and having a lot of other features, which meant I helped a lot of customers move from Heroku to AWS to gain other aspects of the platform that they needed. So it doesn’t really surprise me, but it was a good starting point for a lot of companies.”

35:58 AI-assisted cloud intrusion achieves admin access in 8 minutes | Sysdig

An attacker achieved full AWS administrative access in just 8 minutes by exploiting credentials found in public S3 buckets, then used Lambda code injection to escalate privileges.
The attack shows strong evidence of LLM assistance, including Serbian-language code comments, hallucinated AWS account IDs, and references to non-existent GitHub repositories.
The threat actor compromised 19 different AWS principals through role chaining and cross-account access attempts, making detection difficult by distributing operations across multiple identities. They specifically targeted AI infrastructure by invoking 9 different Bedrock models and attempting to launch expensive GPU instances (p5.48xlarge and p4d.24xlarge) for potential model training or compute resale.
The attack demonstrates how AI tools are accelerating offensive operations, with the attacker completing reconnaissance, privilege escalation, and resource abuse in under two hours.
Organizations should implement least-privilege IAM policies, restrict Lambda UpdateFunctionCode permissions, and enable Bedrock model invocation logging to detect similar attacks.
Critical security gaps included overly permissive Lambda execution roles with administrative access and the ReadOnlyAccess policy on the compromised user, which enabled extensive reconnaissance across all AWS services.
The attacker also attempted to deploy a Terraform-based backdoor that would create a publicly accessible Lambda function for generating persistent Bedrock credentials.
The use of IP rotation, role chaining, and distributed operations across multiple principals shows sophisticated evasion techniques.
Detection requires behavioral analytics that can identify patterns like rapid enumeration across services, unusual Bedrock model invocations, and Lambda code modifications rather than relying on single-event alerts.

34:24 Ryan – “These are the types of examples I use when trying to talk to people about least privileged development and how, even in your lower environments where you think you’re safe, and you’re trying to develop things it’s really not okay to start not using least privileged access because there’s very creative ways in which you can do privilege escalation – this lambda attack is a very good example. And now it’s going to be so easy because AI will just do it for you, and this really demonstrates it.”

AI Is Going Great – Or How ML Makes Money

37:09 Claude is a space to think | Anthropic \ Anthropic

Anthropic commits to keeping Claude ad-free, stating that advertising would be incompatible with Claude’s role as a trusted assistant for work and deep thinking.
The company will continue its subscription and enterprise-based revenue model rather than introducing sponsored content or product placements in conversations.
Analysis of Claude conversations shows a substantial portion involves sensitive personal topics or complex technical work where ads would be inappropriate. Anthropic argues that AI conversations differ from search or social media because users share more context, and the open-ended format makes them more susceptible to commercial influence.
The company identifies specific risks with ad-supported AI models, including unpredictable behavior changes when advertising incentives are introduced. For example, a user asking about sleep problems might receive recommendations influenced by commercial motives rather than purely helpful advice, making it difficult to distinguish genuine assistance from monetization attempts.
Anthropic will support commerce through user-initiated interactions like agentic commerce, where Claude handles purchases on behalf of users, and third-party tool integrations with services like Figma and Asana.
The key distinction is that these features are triggered by user requests rather than advertiser interests.
The decision has clear tradeoffs for business model scalability compared to ad-supported competitors.
Anthropic is addressing access through educational partnerships in 60+ countries, nonprofit discounts, and maintaining frontier-level intelligence in free tiers rather than monetizing user attention.

37:22 Claude Opus 4.6 \ Anthropic

Claude Opus 4.6 is now generally available with a 1M token context window in beta, marking the first time an Opus-class model has offered this extended context capability.
The model maintains $5/$25 per million token pricing, with premium pricing of $10/$37.50 for prompts exceeding 200k tokens.
The model introduces adaptive thinking and four effort levels (low, medium, high, max) that let developers control how deeply Claude reasons through problems, balancing intelligence against speed and cost. Context compaction automatically summarizes older conversation history when approaching limits, enabling longer-running agentic tasks without hitting context windows.
Opus 4.6 achieves state-of-the-art performance on Terminal-Bench 2.0 for agentic coding and outperforms GPT-5.2 by 144 Elo points on GDPval-AA, an evaluation of economically valuable knowledge work tasks.
On the 8-needle 1M variant of MRCR v2, it scores 76% compared to Sonnet 4.5’s 18.5%, demonstrating substantially improved long-context retrieval without degradation.
New product features include agent teams in Claude Code that work in parallel and coordinate autonomously, plus Claude in PowerPoint (research preview) and upgraded Claude in Excel for handling multi-step data processing and presentation tasks. The model also supports 128k output tokens and US-only inference at 1.1x pricing for compliance-sensitive workloads.
Safety evaluations show Opus 4.6 maintains alignment comparable to its predecessor while exhibiting the lowest over-refusal rate of any recent Claude model.
Anthropic developed six new cybersecurity probes to monitor potential misuse given the model’s enhanced security capabilities, and is using the model to find and patch vulnerabilities in open-source software.

34:24 Ryan – “One of the things that I’m constantly dabbling with is the context windows, and so I’m not so sure the context compaction works the way it’s advertised, because every time I go through a process like that, you lose so much.”

43:18 Introducing OpenAI Frontier | OpenAI

OpenAI launches Frontier, an enterprise platform for building, deploying, and managing AI agents across existing infrastructure without requiring replatforming.
The platform provides agents with shared business context by connecting siloed data warehouses, CRM systems, and internal applications, plus includes identity management, permissions, and governance controls for regulated environments.
Frontier includes an agent execution environment where AI coworkers can reason over data, work with files, run code, and use tools while building memory from past interactions to improve performance.
The platform works across local environments, enterprise cloud infrastructure, and OpenAI-hosted runtimes, with built-in evaluation and optimization capabilities to help agents learn what good performance looks like over time.
OpenAI pairs Forward Deployed Engineers with customer teams to help develop best practices for production agent deployments, creating a feedback loop between business problems, deployment, and OpenAI Research. Early adopters include HP, Intuit, Oracle, State Farm, Thermo Fisher, and Uber, with existing customers like BBVA, Cisco, and T-Mobile piloting the platform.
The platform uses open standards to integrate with existing systems and applications, allowing third-party agent apps to access shared business context without lengthy custom integrations. OpenAI is working with Frontier Partners including Abridge, Clay, Ambience, Decagon, Harvey, and Sierra, to design and support enterprise AI solutions on the platform.
Frontier is currently available to a limited set of customers with broader availability planned over the next few months.
OpenAI cites customer results, including a manufacturer reducing production optimization from six weeks to one day and a hardware company cutting test failure debugging from four hours to minutes.

44:35 Ryan – “I think they’re extremely late to the market with this. AWS was too early, and they botched it. Gemini seems to be in the sweet spot, and OpenAI – it’s still not ready yet.

46:28 Introducing GPT-5.3-Codex | OpenAI

OpenAI released GPT-5.3-Codex, their most capable agentic coding model that combines the frontier coding performance of GPT-5.2-Codex with the reasoning capabilities of GPT-5.2, while running 25% faster.
The model achieves state-of-the-art results on SWE-Bench Pro and Terminal-Bench 2.0 benchmarks, using fewer tokens than previous models, and can autonomously iterate on complex projects over millions of tokens spanning days.
GPT-5.3-Codex represents the first self-improving model at OpenAI, where the Codex team used early versions to debug its own training, manage deployment, and diagnose test results.
Internal teams report their work has fundamentally changed in the past two months, with researchers using Codex to monitor training runs, engineers using it to optimize harnesses and scale GPU clusters, and data scientists building custom pipelines and visualizations in under three minutes.
The model extends beyond code generation to full computer operation, showing strong performance on OSWorld (visual desktop environment tasks) and matching GPT-5.2 on GDPval, which measures knowledge work across 44 occupations, including presentations, spreadsheets, and other professional deliverables.
The Codex app now provides real-time updates and interactive steering, allowing users to direct and supervise multiple agents working in parallel.
OpenAI classifies GPT-5.3-Codex as having high capability for cybersecurity under their Preparedness Framework, marking the first model directly trained to identify software vulnerabilities.
They are deploying Trusted Access for Cyber, expanding the Aardvark security research agent beta, and committing 10 million dollars in API credits through their Cybersecurity Grant Program for open source and critical infrastructure defense.
GPT-5.3-Codex is available now with paid ChatGPT plans across the Codex app, CLI, IDE extension, and web, with API access coming soon.
The model was co-designed for and trained on NVIDIA GB200 NVL72 systems, with infrastructure improvements delivering the 25% speed increase for all Codex users.

47:48 Ryan – “I’m surprised this is the first self-improving model.”

48:43 Testing ads in ChatGPT | OpenAI

OpenAI is launching ads in ChatGPT for free and Go tier users in the US, while Plus, Pro, Business, Enterprise, and Education subscribers remain ad-free. Users can opt out of ads on the free tier in exchange for reduced daily message limits.
Ads are contextually matched to conversation topics and chat history but do not influence ChatGPT responses, which remain independent. Advertisers receive only aggregate performance metrics like views and clicks, with no access to individual chats, memories, or personal details.
The ad program excludes users under 18 and blocks ads near sensitive topics, including health, mental health, and politics. Users can dismiss ads, provide feedback, delete ad data with one tap, and manage personalization settings at any time.
OpenAI positions this as infrastructure funding to maintain free tier performance and quality while supporting development of more powerful features.
The company plans to expand ad formats, objectives, and buying models over time based on test results and user feedback.

49:45 Announcing Claude Opus 4.6 on Snowflake Cortex AI

Snowflake Cortex AI now offers Claude Opus 4.6, Anthropic’s most capable model, providing enhanced reasoning and complex task handling directly within Snowflake’s data platform.
This integration allows enterprises to process sensitive data without moving it outside their Snowflake environment, maintaining data governance and security controls.
Claude Opus 4.6 delivers improved performance on coding tasks, mathematical reasoning, and multilingual capabilities compared to previous versions. The model excels at nuanced instructions and can handle sophisticated analysis workflows while operating on structured and unstructured data within Snowflake.
Cortex AI’s serverless architecture means customers pay only for actual model usage without managing infrastructure or dealing with capacity planning.
The integration supports both SQL and Python interfaces, enabling data teams to build AI applications using familiar tools and existing Snowflake data pipelines.
Organizations can now combine Claude Opus 4.6 with Snowflake’s data clean rooms and governance features for compliant AI deployments in regulated industries.
This addresses enterprise concerns about data residency and privacy while enabling advanced AI capabilities on proprietary datasets.

49:57 Justin – “And just because we’re already 50 minutes into this, I will tell you we’re also getting Claude Opus 4.6 on multiple other providers, including Bedrock, Kiro, Vertex AI, and we’re getting it on Azure, in the Moicrosift Foundry App, as well as some of the smaller cloud providers, like DataBricks and DigitalOcean.”

50:45 Agent Bricks Supervisor Agent is Now GA: Orchestrate Enterprise Agents | Databricks Blog

Databricks Agent Bricks Supervisor Agent is now Generally Available, providing a managed orchestration layer that coordinates multiple specialized agents through Unity Catalog governance.
The supervisor uses dynamic routing to analyze user intent and delegate tasks between Genie Spaces for structured data queries, Knowledge Assistant agents for unstructured data, and MCP servers for tool execution.
The platform implements On-Behalf-Of authentication where the supervisor acts as a transparent proxy, validating every data fetch and tool execution against the end user’s existing Unity Catalog permissions.
This eliminates the common security gap where agents access data through broad service accounts that users themselves aren’t authorized to see.
Agent Learning on Human Feedback is built directly into the Supervisor Agent, allowing teams to add questions and guidelines that improve routing decisions and response quality over time.
Franklin Templeton reports reducing fund analysis tasks from days to seconds while maintaining compliance, and Zapier uses ALHF to refine orchestration between different Genie spaces without hard-coding routing logic.
The system addresses enterprise agent sprawl, where teams toggle between dozens of specialized bots and duplicate work by creating agents that already exist.
Supervisor Agent provides a single entry point that reasons about intent and coordinates specialized agents while maintaining full MLflow experiment tracking for measurable performance monitoring.

51:40 Ryan – “It just goes to show you, depending on who your provider is, this is the type of platform you’re going to need, right? So if you already are using a whole bunch of AI execution on Snowflake, or if you’re only using it on OpenAI’s platform, you’re just going to need to sign on to the platform that’s already there.”

Cloud Tools

52:09 Introducing HashiCorp Agent Skills

HashiCorp launches Agent Skills, an open-standard repository that packages domain expertise into portable instructions for AI assistants working with Terraform and Packer.
These skills provide AI tools like Claude with specialized HashiCorp product knowledge, schema definitions, and best practices to reduce hallucinations and ensure code follows proper conventions.
The initial skills pack addresses common DevOps challenges, including building and maintaining Terraform providers, generating style-compliant Terraform code, refactoring monolithic configurations into modules, and creating machine images with Packer across AWS, Azure, and Windows.
HashiCorp partnered with Tessl to evaluate skill effectiveness using review and task-based evaluations against Anthropic’s best practices.
Agent Skills differ from Model Context Protocol (MCP) as complementary technologies – MCP is the data pipe connecting information to AI, while Agent Skills are the knowledge textbooks. Installation takes seconds using npx, Tessl CLI, or Claude Code’s plugin marketplace with simple one-line commands.
The skills solve a fundamental problem where AI assistants lack a specific technical context for complex infrastructure tasks, particularly around HashiCorp’s plugin framework architectures and coding conventions.
This prevents AI from suggesting outdated practices or generating code that doesn’t follow established patterns from official documentation.
HashiCorp plans to expand beyond Terraform and Packer to cover additional products and welcomes community contributions through its GitHub repository.
The open-standard format means these skills are portable and reusable across different AI assistants that support the Agent Skills specification.

53:17 Justin – “I love this, because how many times I pointed Claude or others to the documentation, and said ‘I’m pretty sure you’re wrong, this is how it’s supposed to be done, here’s the doc.’ And it comes back and goes, you’re right, Justin, because you’re a genius. That’s what it always tells me.”

AWS

56:10 Amazon EC2 C8id, M8id, and R8id instances with up to 22.8 TB local NVMe storage are generally available

In “instances so big we don’t know what to do with them,” may we present…
AWS launches C8id, M8id, and R8id EC2 instances with up to 22.8TB of local NVMe storage, triple the capacity of sixth-generation instances.
These new instances scale up to 96xlarge with 384 vCPUs and 3TiB of memory, delivering up to 43% higher compute performance and 3.3x more memory bandwidth than previous generation instances.
The instances use custom Intel Xeon 6 processors exclusive to AWS, running at a 3.9 GHz sustained all-core turbo frequency. Performance improvements include up to 46% better I/O intensive database workload performance and 30% faster query results for real-time data analytics compared to sixth-generation instances.
Instance Bandwidth Configuration feature allows customers to dynamically allocate resources between network and EBS bandwidth by 25%, optimizing for specific workload requirements.
The local NVMe storage is hardware-encrypted with XTS-AES-256 and ephemeral, meaning data is lost when instances stop or terminate.
Currently available in US East N. Virginia, US East, Ohio, US West, Oregon, and Europe, Frankfurt regions, with additional regions planned.
Instances can be purchased as On-Demand, Savings Plans, Spot Instances, Dedicated Instances, or Dedicated Hosts, with pricing varying by region and purchase model.

56:47 Matt – “If it’s all core turbo, is it really turbo at that point?”

58:45 AWS IAM Identity Center now supports multi-Region replication for AWS account access and application use

AWS IAM Identity Center now supports multi-Region replication, allowing organizations to replicate workforce identities, permission sets, and metadata from a primary Region to additional Regions for improved resiliency and disaster recovery.
This means if the primary Region experiences a service disruption, users can still access AWS accounts through an active access portal endpoint in a secondary Region using their existing permissions.
The feature requires using an organization instance of IAM Identity Center connected to an external IdP like Microsoft Entra ID or Okta, and you must first configure multi-Region customer-managed KMS keys before replicating to additional Regions.
The primary Region remains the central management point for all configurations, while additional Regions provide read-only console access except for application management and user session revocation.
Organizations can now deploy AWS managed applications closer to users and datasets to meet data residency requirements or improve performance, with applications accessing replicated workforce identities locally in each Region. This addresses compliance scenarios where datasets must remain in specific Regions while still providing centralized identity management.
The feature is available at no additional cost in 17 enabled-by-default commercial AWS Regions, with only standard AWS KMS charges applying for customer-managed keys.
All workforce actions are logged in CloudTrail in the Region where they occur, maintaining audit trails across multiple Regions for security and compliance monitoring.

59:32 Justin – “I recently set up IAM Identity Center for the first time, and I was surprised that it was US East 1 only, so I’m pleased to see this is now available.”

1:00:25 Amazon ECS adds Network Load Balancer support for Linear and Canary deployments

ECS now supports linear and canary deployment strategies natively with Network Load Balancers, bringing managed traffic shifting to TCP/UDP workloads that previously required custom solutions or third-party tools.
This fills a deployment gap for applications needing NLB features like static IPs, long-lived connections, and low latency.
The feature integrates with CloudWatch alarms for automatic rollback if deployment issues are detected, providing safety guardrails for production updates.
Teams can shift traffic incrementally (linear) or start with a small percentage for validation (canary) before completing rollouts.
Primary beneficiaries are latency-sensitive and connection-oriented workloads such as online gaming backends, financial transaction systems, and real-time messaging services that depend on NLB’s Layer 4 capabilities.
These applications can now use the same deployment patterns ALB users have had access to for years.
Available immediately in all AWS commercial and GovCloud US regions for both new and existing ECS services.
Configuration is accessible through the AWS Console, CLI, and Infrastructure-as-Code tools with no additional cost beyond standard ECS and NLB pricing.
This brings ECS deployment parity between ALB and NLB, eliminating a common pain point.

1:01:19 Ryan – “This is one of those rough edges that you hit unexpectedly. You want to use a network load balancer, typically because you have to. It’s easier to set up an application load balancer. You’re only using a network load balancer when it’s not your choice, but then you can’t deploy this app safely without lots of interruption or risk, and it’s kind of a problem.”

1:02:12 Structured outputs now available in Amazon Bedrock

Amazon Bedrock now enforces JSON schema compliance at the model level, eliminating the need for custom validation logic and retry mechanisms when extracting structured data from foundation models.
This addresses a common production pain point where formatting errors in LLM responses break downstream API integrations and automated workflows.
The feature works in two modes: custom JSON schema definitions for response formatting, or strict tool definitions that ensure model tool calls match exact specifications.
This reduces operational overhead by preventing malformed outputs before they reach application code, making AI integrations more reliable for production use cases like data extraction, form processing, and API orchestration.
Available now for Anthropic Claude 4.5 models and select open-weight models across all commercial AWS Regions where Bedrock operates.
The capability works with Converse, ConverseStream, InvokeModel, and InvokeModelWithResponseStream APIs, providing flexibility for both synchronous and streaming applications.
The practical benefit is fewer failed requests and reduced engineering time spent on output parsing and error handling.
Organizations building production AI applications that feed into existing systems or databases can now rely on consistent, machine-readable responses without building extensive validation layers.
int where teams had to choose between advanced deployment strategies and NLB’s technical requirements. Documentation available at https://docs.aws.amazon.com/AmazonECS/latest/developerguide/deployment-type-linear.html

FYI Claude Opus 4.6 now available in Amazon Bedrock

Claude Opus 4.6 is now available in Amazon Bedrock, positioning itself as Anthropic’s most capable model with particular strength in coding, agentic workflows, and enterprise applications.
The model supports both 200K and 1M context windows in preview, enabling analysis of large codebases and extensive document sets without chunking.
The model’s agentic capabilities allow it to manage complex multi-step tasks across dozens of tools with reduced oversight, including the ability to autonomously spin up subagents for task decomposition.
This makes it suitable for enterprise workflows like financial analysis that would typically require days of manual work, cybersecurity threat detection, and cross-application data movement.
For developers, Opus 4.6 handles full software lifecycle management from requirements gathering through implementation and maintenance, particularly for long-horizon projects and large-scale codebases.
The model’s deep reasoning capabilities make it applicable to professional work requiring sophisticated multi-step orchestration.
Regional availability varies by deployment, with specific regions listed in the AWS Bedrock documentation. Pricing follows Bedrock’s standard model-based pricing structure, though specific costs for Opus 4.6 are not detailed in the announcement and should be verified in the Bedrock console.

FYI Opus 4.6 is now available in Kiro

Kiro has released Claude Opus 4.6 integration in their IDE and CLI, marking Anthropic’s newest state-of-the-art model that claims to be the world’s best for coding.
The model is available to Kiro Pro, Pro+, and Power customers in AWS US-East-1 region with a 2.2x credit multiplier, same as Opus 4.5.
Opus 4.6 targets production code and sophisticated agents, with particular strength in large-scale codebases and long-horizon projects.
Anthropic positions it as capable of helping senior engineers complete multi-day projects in hours through task delegation with reduced oversight requirements.
The model integrates with Kiro’s spec-driven development workflows, enabling detailed but precise specifications on large existing projects and surgical precision updates with minimal user input.
This represents a shift toward AI-assisted development at enterprise scale rather than simple code completion.
Access requires authentication through Google, GitHub, AWS BuilderID, or AWS IAM Identity Center, with experimental support currently limited to the Northern Virginia region. Users can access the model immediately by downloading or restarting the Kiro app or CLI.

1:03:55 Amazon Redshift now supports allocating extra compute for automatic optimizations

Amazon Redshift now allows database administrators to allocate dedicated compute resources specifically for automatic optimization tasks like table optimization, sorting, vacuuming, and analysis.
This prevents maintenance operations from competing with user queries during peak usage periods, addressing a common pain point where DBAs had to manually schedule these tasks during off-hours.
The feature includes cost controls for provisioned clusters, letting administrators cap the amount of extra compute resources that autonomics can consume. This prevents runaway costs while still enabling continuous optimization, and works alongside the new SYS_AUTOMATIC_OPTIMIZATION system table that provides visibility into what optimization operations are running and their resource consumption.
This enhancement is available across all AWS Regions where Redshift operates, supporting both provisioned clusters and serverless workgroups.
The feature essentially decouples database maintenance from query performance, which is particularly valuable for organizations running 24/7 analytics workloads that previously had no maintenance windows.
The practical benefit is that Redshift databases can now stay optimized continuously without manual intervention or performance degradation during business hours.
Organizations with high-concurrency analytics workloads or those operating across multiple time zones will see the most immediate value from this capability.

1:04:35 Justin – “This is why I wanted a managed service from you, Amazon, so I didn’t have to think about this. This is you failing me.”

GCP

1:05:26 Introducing the Developer Knowledge API and MCP Server

Google launches the Developer Knowledge API and Model Context Protocol server to provide AI assistants with programmatic access to official Google developer documentation as machine-readable Markdown.
This addresses the problem of LLMs relying on outdated training data or web scraping when helping developers build with Google technologies like Firebase, Android, and Google Cloud.
The MCP server implements the open Model Context Protocol standard, allowing popular AI assistants and IDEs to directly query Google’s documentation for real-time answers about API changes, code examples, and best practices. Developers can enable it through gcloud CLI and configure it in their AI assistant settings, with support for tools like Claude Desktop and various IDE extensions.
The service is currently in public preview with free access through standard Google Cloud API quotas.
Future plans include adding structured content support for code samples and API reference entities, expanding the documentation corpus, and reducing re-indexing latency before general availability.
This integration benefits developers using AI coding assistants by ensuring responses reference current Google documentation rather than potentially stale information from model training cutoffs. The approach provides a canonical source of truth that updates as Google’s documentation changes.
The Developer Knowledge API requires a Google Cloud project with the API enabled through gcloud beta services, and detailed configuration instructions are available in the official documentation at developers.google.com/knowledge/api and developers.google.com/knowledge/mcp.

1:04:35 Ryan – “This won’t fix the fact that Google documentation is awful, but this will make it at least better.”

1:12:17 Delivering a secure, open, and sovereign digital world

Google Cloud expands its Sovereign Cloud portfolio with three tiers – Data Boundary, Dedicated, and Air-Gapped – designed to meet varying data sovereignty requirements.
Air-Gapped operates completely disconnected from Google Cloud and the internet, with no remote access possible by Google, while Dedicated allows partners to monitor and block updates with up to 12 months of independent operation if disconnected.
The company announces substantial infrastructure investments across all continents, including new cloud regions in Thailand, Malaysia, and Sweden, plus subsea cables like TalayLink and Dhivaru for Asia-Pacific connectivity.
Google commits to legal resistance against government shutdown orders and will enable qualified third parties to operate Google Cloud using Google’s code if Google becomes unable to continue operations.
External Key Management lets customers store encryption keys outside Google Cloud with detailed access justifications required, while client-side encryption for Workspace ensures Google cannot read customer collaboration data.
Google eliminated data transfer fees for customers migrating off the platform and expanded local ML processing for select Gemini models to 11 countries, including Australia, Brazil, Canada, France, Germany, India, Japan, Singapore, South Korea, and the UK.
Notable sovereign cloud deployments include NATO Communication and Information Agency, German Armed Forces, UK Ministry of Defence, and Singapore government agencies using Air-Gapped, while France’s S3NS offers Premi3NS built on Dedicated with SecNumCloud 3.2 qualification from ANSSI.
The portfolio targets highly regulated sectors like defense, government, banking, and healthcare, requiring strict data residency and operational independence guarantees.

FYI Expanding Vertex AI with Claude Opus 4.6.

Google Cloud adds Anthropic’s Claude Opus 4.6 to Vertex AI, positioning it as its most powerful model for enterprise workflows, including document generation, financial analysis, and complex coding tasks.
The model excels at multi-step agentic workflows and can handle tasks like creating production-ready spreadsheets and presentations with fewer revision cycles, particularly valuable for finance and legal verticals requiring precision.
Vertex AI provides a complete agentic stack beyond just model access, including Agent Development Kit for rapid prototyping, Agent Engine for serverless deployment, and Memory Bank for persistent context across interactions.
Cost optimization features include provisioned throughput for fixed pricing, prompt caching with flexible TTL, batch predictions, and a 1M token context window in preview for Claude Opus 4.6.
The platform integrates with Google Cloud’s security infrastructure, including Model Armor for protection against prompt injection and tool poisoning, plus Security Command Center for AI threat detection.
Customer implementations show practical results, with Palo Alto Networks reporting a 20-30% increase in code development velocity and companies like Shopify, TELUS, and Replit using Claude on Vertex AI for production workloads.
Claude Opus 4.6 is generally available on Vertex AI with deployment options through Google Cloud Marketplace for streamlined procurement. Regional availability and specific pricing details are documented at cloud.google.com/vertex-ai/generative-ai/pricing#claude-models, with the model accessible through the Vertex AI console and sample notebooks available on GitHub.

1:13:15 GEAR program now available

Google launches GEAR (Gemini Enterprise Agent Ready) as a specialized learning program within the Google Developer Program to help developers build production-ready AI agents.
The program provides 35 monthly learning credits on the Google Skills platform for sandbox testing and lab access at no cost to participants.
The program offers two main learning paths: Introduction to Agents for understanding agent architecture and integration with Gemini Enterprise, and Develop Agents with Agent Development Kit (ADK) for building agents with reasoning loops.
Both paths focus on moving developers from experimentation to production-grade implementations using Google’s open-source ADK.
GEAR includes a credential system with completion badges on Google Developer profiles and skill badges for intermediate and advanced expertise.
For Google Cloud customers, a separate Get Certified cohort-based program offers instructor-led training and technical mentorship to prepare for industry-recognized certifications.
The program addresses the shift toward agentic AI, where software can reason, plan, and execute complex workflows autonomously. Access requires creating or signing into a Google Developer Program profile and claiming the GEAR badge at developers.google.com/program/gear.

1:14:27 Ryan – “I still think there’s a very large amount of people who don’t really understand sort of putting an agentic workflow in place to do what they want, right? It’s still pretty much fire-and-forget chat operations. And so there’s a lot of power in the tool once you know how to use it, but it is sort of less than straightforward, so I think this is a great course.”

Azure

1:15:26 Updates in two of our core priorities

Microsoft announces major security leadership change with Hayete Gallot returning as EVP of Security, reporting directly to CEO Satya Nadella, while Charlie Bell transitions from leading security to focus on engineering quality as an individual contributor.
This organizational shift reflects Microsoft’s continued emphasis on security as a top priority following recent Security Copilot and Purview adoption momentum.
Gallot brings 15-plus years of Microsoft experience building Windows and Office franchises, plus recent Google Cloud customer experience leadership, positioning her to connect product development with customer value realization across Microsoft’s security portfolio.
Her appointment comes as Microsoft integrates security into its new commercial cohorts operating model announced during recent earnings.
Charlie Bell’s move from organizational leadership to an individual contributor engineering role is notable for a senior executive, with his new focus on Quality Excellence Initiative to improve engineering standards and product durability across Microsoft’s global scale operations. He will partner with Azure leadership, including Scott Guthrie, on quality improvements.
Ales Holecek takes on the Chief Architect for Security role to bring platform architecture expertise to security products and connect them with Microsoft’s existing scale businesses and the Agent Platform. This architectural focus suggests deeper integration between security services and Microsoft’s broader cloud infrastructure.
The timing aligns with Microsoft’s recent earnings report, highlighting security business growth and the company’s broader reorganization around commercial cohorts, indicating security will have dedicated product development rhythms separate from other business units. No specific pricing or feature changes were announced as part of this leadership transition.

1:17:19 Justin – “I think this is them recreating the engineering operations review at Amazon at Azure. I think he is basically building a weekly program team that is going to be running the wheel, if you’re familiar with Amazon’s wheel thing, where basically you – as a service owner – can be called on at any time and you have to deep dive into all your KPIs, how your system’s operating, service operations, recent incidents, and you have to answer that at Amazon. They do it every week.”

1:19:23 Enhanced storage resiliency with Azure NetApp Files – Elastic zone-redundant service

Azure NetApp Files Elastic ZRS introduces synchronous replication across three or more availability zones within a region with automatic service-managed failover, maintaining the same mount target and endpoint during zone failures.
This eliminates the need for customers to manage HA clusters or VM-level failover while guaranteeing zero data loss for mission-critical workloads.
The service costs less than running three separate ANF volumes with cross-zone replication while providing the same multi-AZ high availability in a single volume. Volumes can be created as small as 1 GiB, offering flexibility for workloads of any size with support for both NFS and SMB protocols independently.
ANF Elastic ZRS delivers enterprise data management capabilities, including instant snapshots, clones, tiering, and backup integration powered by NetApp ONTAP, plus efficient metadata operations through a shared QoS architecture that dynamically allocates IOPS.
The service is particularly suited for healthcare, financial services, and other regulated industries requiring continuous uptime and compliance.
The service is currently available in select Azure regions with rapid expansion planned, and future capabilities will include simultaneous multi-protocol access (NFS, SMB, and Object REST API), custom region pairs for cross-region replication, and a migration assistant for moving data from on-premises ONTAP systems.
This represents a clear migration path for existing NetApp on-premises customers looking to modernize without re-architecting applications.

1:21:21 PostgreSQL on Azure supercharged for AI

Microsoft has enhanced Azure Database for PostgreSQL with native AI capabilities, including direct integration with Microsoft Foundry for in-database LLM operations like embeddings and semantic search.
The service now supports DiskANN vector indexing for high-performance similarity search and includes a new PostgreSQL extension for Visual Studio Code that enables database provisioning directly from the IDE with built-in Entra ID authentication.
The platform introduces zero-ETL real-time analytics through Microsoft Fabric mirroring and native Parquet file support via the Azure Storage Extension, allowing direct read/write operations to Azure Storage using SQL commands. PostgreSQL 18 is now generally available on Azure with new V6 compute SKUs that deliver improved I/O performance and lower latency, while Elastic Clusters enable horizontal scaling for multi-tenant workloads.
Azure HorizonDB was announced at Ignite as a new PostgreSQL-compatible service in private preview, designed specifically for AI-native workloads with scale-out compute and sub-millisecond latency.
This positions Azure to support both traditional PostgreSQL workloads and next-generation AI applications requiring ultra-low latency and horizontal scale.
The GitHub Copilot integration provides schema-aware SQL assistance within Visual Studio Code, while the new Model Context Protocol server for PostgreSQL enables direct agent framework connections in Microsoft Foundry.
Nasdaq demonstrated a production use case with their Boardvantage platform, using Azure Database for PostgreSQL and Microsoft Foundry to add AI-powered document analysis and summarization to their board governance system serving nearly half of the Fortune 500.

1:22:49 Matt – “Nothing I like better than an LLM inside my database!”

FYI Claude Opus 4.6: Anthropic’s powerful model for coding, agents, and enterprise workflows is now available in Microsoft Foundry

Claude Opus 4.6 is now available in Microsoft Foundry on Azure, bringing Anthropic’s most advanced reasoning model to enterprise customers with a 1M token context window in beta and 128K max output tokens.
The model targets complex coding tasks, agentic workflows, and knowledge work across finance, legal, and cybersecurity domains, with new API features including adaptive thinking that dynamically adjusts reasoning depth and context compaction for long-running conversations.
The integration connects Claude Opus 4.6 with Foundry IQ, enabling access to data across Microsoft 365, Fabric, and web sources within Azure’s governance and compliance framework.
Customers like Adobe, Dentons, and Macroscope are using the model for code review, legal drafting, and document generation, with deployment available through both Microsoft Foundry and Copilot Studio for no-code agent building.
Technical improvements include enhanced computer use capabilities for navigating interfaces and automating multi-application workflows, plus a new max effort control level that joins existing high, medium, and low settings for finer token allocation. The model handles large codebases effectively for refactoring and bug detection, with companies like Momentic AI processing millions of tokens per hour using the Azure infrastructure.
Pricing follows a premium model beyond 200K tokens for the 1M context window beta, though specific per-token costs were not disclosed in the announcement.
The focus is on production-grade deployments where Azure’s managed infrastructure and operational controls help compress development timelines from days to hours while maintaining enterprise security requirements.

1:23:45 Microsoft OneLake and Snowflake interoperability (Generally Available) | Microsoft Fabric Blog

Microsoft OneLake and Snowflake now offer bidirectional Iceberg table interoperability in general availability, allowing customers to store and access data across both platforms without duplication.
Changes made in one platform automatically reflect in the other, eliminating the need for traditional copy-heavy data integration approaches.
Snowflake-managed Iceberg tables can now be natively stored in Microsoft OneLake, while Fabric data automatically converts to Iceberg format for direct Snowflake access.
This addresses the challenge of enterprise data living across fragmented systems by providing a single copy of data accessible through either platform’s analytical engines.
New UI elements launching next week include a Snowflake item in OneLake for simplified access without complex configurations, plus Snowflake UI that pushes managed Iceberg tables directly into Fabric as discoverable OneLake items. The integration also supports OneLake table APIs working with Snowflake’s catalog-linked database feature.
The target use case centers on data teams managing analytics and AI workloads across multiple platforms who want to avoid vendor lock-in and proprietary formats. Organizations can now choose the optimal storage location and analytical engine for each project while maintaining a unified data estate without operational overhead from data duplication.
No specific pricing details were provided in the announcement, though the integration leverages existing OneLake and Snowflake licensing models. Customers can access quickstart guides and documentation through Microsoft Learn and Snowflake’s resources, with hands-on training available at FabCon and SQLCon 2026 in Atlanta from March 16-20.

1:24:44 Ryan – “This is kind of neat. I mean, it’s unexpected because it is data, and the amount of data and what you’d have in a data lake is usually one of those elements that makes using a service very sticky, so providing sort of an easy way to get out of that is a surprise to me, but it’s also – from a customer perspective – if you’ve got data across both, like how fantastic is that? To be able to use it. I like it.”

1:25:52 Generally Available: Azure Container Storage v2.1.0 now with Elastic SAN integration and on-demand installation

Azure Container Storage v2.1.0 brings native Elastic SAN integration, allowing Kubernetes workloads to leverage Azure’s shared block storage service for high-performance persistent volumes.
This integration provides an alternative to existing Azure Disk and ephemeral disk options, particularly beneficial for workloads requiring shared storage across multiple pods.
The release introduces an on-demand installation model that reduces the deployment footprint and operational overhead compared to previous versions. Instead of pre-installing all storage components, the system now deploys only the necessary drivers and resources when specific storage types are requested, streamlining cluster management.
Elastic SAN support targets enterprise customers running stateful containerized applications that need consistent low-latency performance and the ability to scale storage independently from compute.
Common use cases include database workloads, analytics platforms, and applications requiring shared persistent volumes across multiple container instances.
The lightweight installation approach addresses a common pain point where organizations previously had to deploy full storage stacks even when using only a subset of available storage options.
This change reduces resource consumption on AKS clusters and simplifies troubleshooting by limiting the number of active storage components

1:26:26 Justin – “The amount of SAN investment they’ve done in the last year is crazy to me.”

1:27:20 Five Reasons to attend SQLCon | Microsoft Fabric Blog

SQLCon is a new SQL-focused conference co-located with FabCon in Atlanta, March 16-20, offering dual access with a single registration.
The event features 50 SQL sessions covering SQL Server, Azure SQL, and SQL database in Fabric, with hands-on workshops Monday-Tuesday and conference sessions Wednesday-Friday.
Microsoft is sending over 30 SQL product team members to deliver engineering insights, roadmap announcements, and live demos of upcoming capabilities, including SSMS and VS Code extensions, Copilot integrations, and Fabric SQL experiences. This provides direct access to product teams for technical questions and future planning.
The combined conference format allows attendees to mix deep SQL technical sessions with broader Fabric, Power BI, data engineering, and AI content throughout the week.
This structure benefits both specialists needing deep technical content and cross-functional teams building shared understanding across data platforms.
Registration includes access to both conferences, hands-on workshops, Ask-the-Experts sessions with MVPs and engineers, and an attendee party at the Georgia Aquarium.
Early-bird pricing and team discounts are available, with promo code SQLCMTY200 offering $200 off registration.
The event targets DBAs, developers, data engineers, architects, and data team leaders working with SQL Server, Azure SQL, or SQL database in Fabric who need practical migration, modernization, performance tuning, and AI integration guidance.

Closing

341: AWS Layoffs: Scaling Down Instead of Scaling Out

Fri, 13 Feb 2026 03:38:58 +0000

Welcome to episode 341 of The Cloud Pod, where the forecast is always cloudy! Matt & Ryan are picking up Justin’s slack this week while he’s traveling for work, but don’t worry, because they have plenty of news! We’re talking about those mass layoffs over at AWS, a major security breach over at Notepad++, and some new slight of hand over at Elon’s companies. There’s a lot to cover, so let’s get into it!

Titles we almost went with this week

Finally, a Chatbot That Actually Knows Where Your Data Lives **Anthropic
Microsoft Adds Security Analyzer to MSSQL Extension: Because Bobby Tables Jokes Are Only Funny Until They Happen to You
From Sequential Sadness to Parallel Paradise: GKE Node Pools Get Concurrent
From Vibe Coding to Production: AWS MCP Server Gets SOPs
One Prompt to Deploy Them All: AWS MCP Server Automates Infrastructure
AWS Layoffs: Scaling Down Instead of Scaling Out
Mutual TLS: Because CloudFront and Your Origin Need Couples Therapy
Claude Team Plan: Now With More Seats and Less Bills
From Snowflake to Snowball: Rolling Data and Dev Into One Platform
From Notepad++ to Notepad Pwned: A Six-Month Hosting Horror Story
EventBridge Payload Capacity Gets a 4x Upgrade: No More Event Splitting Headaches
CloudFront Finally Learns to Check ID Before Knocking on Origin’s Door

General News

01:30 SpaceX acquires xAI, plans to launch a massive satellite constellation to power it – Ars Technica

SpaceX has acquired xAI to create a vertically integrated AI and space infrastructure company, with plans to deploy up to 1 million satellites as orbital data centers.
This represents a significant bet that space-based compute infrastructure can be cost-competitive with traditional ground-based data centers for AI workloads.
The merger combines SpaceX’s launch capabilities and satellite manufacturing expertise with xAI’s Grok chatbot and X social platform.
The strategy assumes AI demand will continue to grow and that compute capacity, rather than other factors, is the primary bottleneck to AI adoption.
The orbital data center concept raises questions about latency, power requirements, thermal management, and maintenance compared to terrestrial facilities.
Traditional cloud providers have invested heavily in ground-based infrastructure optimized for these factors.
This consolidation of Musk’s companies creates potential conflicts between SpaceX’s established government and commercial contracts and xAI’s more controversial products.
The integration of a proven aerospace company with a newer AI venture introduces execution risk to SpaceX’s core business.
The plan depends on several unproven assumptions, including sustained AI market growth, viable economics for space-based computing, and the ability to manufacture and launch satellites at unprecedented scale.
Cloud providers and enterprises will need to evaluate whether orbital compute offers advantages over existing multi-region terrestrial deployments.

03:22 Ryan – “I feel like this is a shell game con; taxes are over here – no, now they’re over here!”

06:49 Notepad++ Hijacked by State-Sponsored Hackers | Notepad++

Chinese state-sponsored hackers compromised Notepad++ update infrastructure from June through December 2025 by exploiting vulnerabilities at the shared hosting provider level, not in Notepad++ code itself.
The attackers maintained access to internal service credentials even after losing server access in September, allowing them to selectively redirect update traffic to malicious servers until December 2025.
The attack exploited insufficient update verification controls in older Notepad++ versions, with attackers specifically targeting the update manifest endpoint to serve compromised installers to selected users.
Version 8.8.9 added certificate and signature verification for downloaded installers, while the upcoming version 8.9.2 will enforce XMLDSig signature verification on update server responses.
The hosting provider confirmed the compromise was limited to one shared hosting server and found no evidence of other clients being targeted, though the investigation of 400GB of logs yielded no concrete indicators of compromise like binary hashes or IP addresses. Rapid7 and Kaspersky later published a more detailed technical analysis with actual IoCs.
This incident demonstrates supply chain attack risks even for open source software with millions of users, particularly when update infrastructure relies on shared hosting environments.
The Notepad++ project has since migrated to a new hosting provider with stronger security practices and implemented multiple layers of cryptographic verification.

09:24 Matt – “Getting in at this level – and that maintenance of control for 7 months – is crazy. It’s a pretty big attack.”

15:25 Internal Messages Reveal Teams, Jobs Affected in Amazon Layoffs – Business Insider

Amazon is cutting 16,000 corporate roles in its second major layoff round within four months, affecting multiple AWS service teams, including Bedrock AI, Redshift data warehouse, and ProServe consulting divisions.
- The cuts represent a significant restructuring of Amazon’s corporate workforce of approximately 350,000 employees.
AWS engineering teams appear heavily impacted based on internal Slack messages, with software engineers from core cloud services posting job searches.
This raises questions about AWS’s product development velocity and customer support capacity during a period of intense AI competition with Microsoft Azure and Google Cloud.
Affected US employees receive 90 days for internal job searches with severance and benefits for those unable to find new positions.
The timing follows Amazon’s return-to-office mandate and broader tech industry cost-cutting trends.
The layoffs touch customer-facing teams like Prime subscription services and last-mile delivery alongside cloud infrastructure groups. This dual impact on retail and AWS operations suggests company-wide efficiency initiatives rather than targeted underperformance in specific business units.

17:24 Matt – “It really did affect a broad spectrum of the org.”

AI Is Going Great – Or How ML Makes Money

19:10 Project Genie: AI world model now available for Ultra users in U.S.

Google DeepMind launches Project Genie, an experimental web app now available to Google AI Ultra subscribers in the U.S. (18+), powered by the Genie 3 world model that generates interactive 3D environments in real-time based on text prompts and images.
- Unlike static 3D snapshots, Genie 3 simulates physics and interactions dynamically as users navigate, creating expanding worlds on the fly.
The platform offers three core capabilities: World Sketching (using Nano Banana Pro for image preview and fine-tuning before entering), World Exploration (real-time path generation based on user actions with adjustable camera controls), and World Remixing (building on existing worlds from galleries).
- Users can define character perspectives (first-person or third-person) and movement types (walking, flying, driving).
Current limitations include 60-second generation caps, occasional physics inconsistencies, character control issues with higher latency, and generated worlds that may not always match prompts precisely.
Some Genie 3 capabilities announced in August, like promptable events that modify worlds during exploration, are not yet included in this prototype.
This release represents Google’s approach to building general-purpose AI systems that can navigate diverse real-world scenarios, moving beyond domain-specific agents like AlphaGo.
The technology has potential applications in robotics simulation, animation modeling, location exploration, and historical setting recreation, though it remains an early research prototype in Google Labs.

24:07 Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT | OpenAI

OpenAI will retire GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini from ChatGPT on February 13, 2026, though API access remains unchanged.
Only 0.1% of users still select GPT-4o daily, with most usage shifted to GPT-5.2.
GPT-4o was previously deprecated, then restored after user feedback about creative ideation needs and preference for its conversational warmth.
This feedback directly influenced GPT-5.1 and GPT-5.2 development, which now includes customizable personality controls for warmth, enthusiasm, and conversational styles like Friendly.
OpenAI is addressing user complaints about unnecessary refusals and overly cautious responses in newer models. The company is developing an adult-focused version of ChatGPT for users over 18 with expanded freedom within appropriate safeguards, supported by age prediction rollout in most markets.
The model retirement strategy allows OpenAI to concentrate resources on improving models with active user bases rather than maintaining legacy versions.
This follows a pattern of deprecating older models as newer versions incorporate user-requested features and achieve broader adoption.

25:43 Matt – “Deprecation of things is one of the hardest things; we joked a lot last year when AWS finally deprecated things, but it’s hard. People have it built in and hard-coded into their apps and workflows. They’re used to specific types of responses.”

28:15 Introducing the Codex app | OpenAI

OpenAI launches the Codex desktop app for macOS, a command center interface for managing multiple AI coding agents simultaneously across long-running development tasks.
The app includes native support for parallel agent workflows using git worktrees, allowing multiple agents to work on isolated copies of the same repository without conflicts while maintaining separate thread contexts per project.
Codex now extends beyond code generation through a Skills system that bundles instructions, resources, and scripts for tasks like Figma design implementation, Linear project management, and cloud deployment to Cloudflare, Netlify, Render, and Vercel.
OpenAI demonstrated this by having Codex autonomously build a complete racing game using 7 million tokens from a single prompt, with the agent taking on designer, developer, and QA tester roles.
The app introduces Automations for scheduled background tasks like daily issue triage, CI failure analysis, and release briefs, with results landing in a review queue for developer oversight. All agents run in configurable system-level sandboxes by default, restricted to editing files in their working folder and requiring permission for elevated operations like network access.
For a limited time, OpenAI is including Codex access with ChatGPT Free and Go tiers and doubling rate limits across all paid plans (Plus, Pro, Business, Enterprise, Edu).
Usage has doubled since GPT-5.2-Codex launched in mid-December, with over one million developers now using the service, and Windows support is planned for future releases.

29:52 Ryan – “They’ve got a lot of catching up to do. Claude Code is all I hear about…it’s everywhere. I do hear about Gemini Code, mostly because I live in that ecosystem. I haven’t had a chance to play with it and compare it to the other tools.”

AWS

35:20 AWS announces Deployment Agent SOPs in AWS MCP Server

AWS introduces Deployment Agent SOPs in the AWS MCP Server in preview, enabling developers to deploy web applications to production using natural language prompts through MCP-compatible tools like Claude, Cursor, and Kiro.
The system automatically generates CDK infrastructure, deploys CloudFormation stacks, and sets up CI/CD pipelines with AWS security best practices included.
The feature addresses the gap between AI-assisted prototyping and production deployment by allowing developers to move from vibe-coded applications to production environments in a single prompt. This is fine. Just fine.
Agent SOPs follow multi-step procedures to analyze project structure, create preview environments on S3 and CloudFront, and configure CodePipeline for automated deployments from source repositories.
Support includes popular web frameworks like React, Vue.js, Angular, and Next.js, with automatic documentation generation that enables AI agents to handle future deployments and troubleshooting across sessions. The deployment process creates persistent documentation in the repository for continuity.
Currently available in preview at no additional cost in US East N. Virginia region only, with customers paying standard rates for AWS resources created and applicable data transfer costs.
This represents AWS’s integration of AI agents into the deployment workflow, competing with other infrastructure-as-code and deployment automation tools.

36:58 Ryan – “I like and hate this all at the same time.”

40:54 AWS STS now supports validation of select identity provider-specific claims from Google, GitHub, CircleCI and OCI

AWS STS now validates provider-specific claims from Google, GitHub, CircleCI, and Oracle Cloud Infrastructure when federating into AWS via OIDC.
This allows customers to reference custom claims as condition keys in IAM role trust policies and resource control policies, enabling more granular access control for federated identities beyond the standard OIDC claims.
The feature addresses a common security gap where organizations previously could only validate standard OIDC claims like subject and audience, but couldn’t enforce conditions based on provider-specific attributes like GitHub repository names or Google Workspace domains.
- This enhancement helps establish data perimeters by allowing customers to restrict access based on the specific context of the federated identity.
Available now in all AWS Commercial Regions at no additional cost beyond standard STS API usage.
Organizations using OIDC federation for CI/CD pipelines, developer access, or multi-cloud identity management can immediately implement more restrictive trust policies without changing their authentication flows.
The supported claims vary by provider and include attributes like GitHub repository visibility, CircleCI project IDs, and OCI tenancy information. Full documentation of available condition keys is provided in the IAM User Guide under Available Keys for OIDC federation.

17:00 Matt – “This is a fantastic feature that I was convinced was a brand new announcement, until Matt schooled me and said, ‘I’ve been doing this for months, ‘ because I didn’t know you could do this with STS.”

46:33 Amazon CloudFront announces mutual TLS support for origins

CloudFront now supports mutual TLS authentication for origins, allowing customers to verify that requests to their backend servers come only from authorized CloudFront distributions using certificate-based authentication.
This eliminates the operational overhead of managing custom solutions like shared secret headers or IP allow-lists that previously required constant rotation and maintenance.
The feature works with AWS Private Certificate Authority or third-party private CAs imported through AWS Certificate Manager, providing cryptographic verification of CloudFront’s identity to any origin that supports mTLS, including Application Load Balancers, API Gateway, on-premises servers, and third-party cloud providers. There is no additional charge for using origin mTLS beyond standard CloudFront pricing.
This addresses a common security gap for organizations serving proprietary content through CloudFront, particularly when origins are publicly accessible or hosted externally.
Previously, customers had to build custom authentication layers to ensure only their CloudFront distributions could access backend infrastructure, creating an ongoing operational burden.
Configuration is available through the AWS Management Console, CLI, SDK, CDK, or CloudFormation, making it straightforward to implement across existing CloudFront distributions. The feature is also included in CloudFront’s Business and Premium flat-rate pricing plans at no extra cost.

49:33 AWS Management Console now displays Account Name on the Navigation bar for easier account identification

The AWS Management Console now displays account names in the navigation bar, replacing the previous reliance on account numbers for identification.
- This addresses a common pain point for organizations managing multiple AWS accounts across development, production, and different business units.
The feature is available at no additional cost across all public AWS regions and requires administrator enablement through IAM managed policies.
Once enabled, all authorized users in an account will see the account name displayed in the console navigation bar.
This update provides immediate value for teams working across multiple accounts who previously had to memorize or reference 12-digit account numbers.
The visual distinction helps reduce errors when switching between environments like dev and prod.
The implementation follows AWS best practices for multi-account architectures, making it easier to maintain account separation while improving operational efficiency. Organizations using AWS Organizations or Control Tower will particularly benefit from clearer account identification.

51:21 Matt – “Not the sexiest feature, but for the love of God the most USEFUL feature of this podcast.”

53:22 Announcing increased 1 MB payload size support in Amazon EventBridge

EventBridge now supports 1 MB event payloads, up from the previous 256 KB limit, eliminating the need for developers to split large events, compress data, or store payloads externally in S3.
This simplifies architectures for applications handling LLM prompts, telemetry data, and complex JSON structures from machine learning models.
The increased payload size reduces architectural complexity and operational overhead by allowing comprehensive contextual data to be included in a single event rather than requiring chunking logic or coordination with external storage systems.
- This is particularly relevant for AI/ML workloads where model outputs and prompts can exceed the previous size constraints.
The feature is available now in most commercial AWS regions where EventBridge operates, with notable exceptions including Asia Pacific regions like New Zealand, Thailand, Malaysia, and Taipei, plus Mexico Central. No additional cost is mentioned for the larger payload support beyond standard EventBridge pricing.
This change addresses a common pain point in event-driven architectures where developers previously had to implement workarounds for large payloads, adding code complexity and potential failure points.
The 4x increase in payload size aligns EventBridge more closely with modern application needs around AI and real-time data processing.

54:44 Ryan – “I think this is a good thing. I was lauhging at this because I remember event size in Kinesis being a big to-do and a project forever ago, and trying to think through all the limits…but now I was thinking through the AI workloads and how much of a pain it would be to have your prompts referencing and external source everytime…so glad to see this.”

56:55 AWS Network Firewall now supports GenAI traffic visibility and enforcement with Web category-based filtering

- AWS Network Firewall adds URL category-based filtering that lets you control access to GenAI applications, social media, streaming services, and other web categories using pre-defined categories instead of maintaining manual domain lists.
- This reduces operational overhead for security teams who need to enforce consistent policies across AWS environments while gaining visibility into emerging technology usage.
- The GenAI traffic visibility component addresses a growing compliance need as organizations struggle to track and govern employee access to ChatGPT, Claude, Gemini, and other AI services.
- Security teams (booo) can now restrict GenAI usage to approved corporate tools or block access entirely based on their risk tolerance and regulatory requirements.
- When combined with TLS inspection, the feature enables full URL path inspection for granular control beyond just domain-level blocking.
- This matters for scenarios where you need to allow access to a domain but block specific paths or query parameters that might expose sensitive data.
- The feature is available now in all AWS commercial regions where Network Firewall operates, with no additional base cost beyond standard Network Firewall pricing, which starts at 0.395 dollars per firewall endpoint hour plus 0.065 dollars per GB processed.
- You can implement this through stateful rule groups using the AWS Console, CLI, or SDKs without requiring new infrastructure deployment.
Did we talk about this one last week? It feels like we talked about this one already. Guess it’s time to build another bot.

GCP

59:49 Conversational Analytics in BigQuery is in preview

Google launches Conversational Analytics in BigQuery as a preview feature that lets users query data using natural language instead of SQL.
The AI agent uses Gemini models to generate queries, execute them, and create visualizations while maintaining security controls and audit logging within BigQuery’s existing governance framework.
The system goes beyond basic chatbots by grounding responses in actual BigQuery schemas, metadata, and custom business logic, including verified queries and User Defined Functions.
This ensures generated SQL aligns with production metrics and enterprise standards rather than making generic assumptions about data structure.
Users can perform predictive analytics through natural language by leveraging BigQuery AI functions like AI.FORECAST and AI.DETECT_ANOMALIES without writing code.
The agent also supports querying unstructured data such as images stored in BigQuery object tables, expanding analysis beyond traditional row-column datasets.
The agents can be deployed across multiple surfaces, including Looker Studio Pro, the BigQuery UI, custom applications via API, and existing agentic ecosystems through ADK tools.
Documentation and codelabs are available at cloud.google.com for implementation guidance, though specific pricing details were not disclosed in the announcement.
This addresses a common enterprise bottleneck where business users wait in queues for data teams to write queries, potentially reducing time-to-insight from hours or days to seconds for authorized users.

1:01:11 Matt – “Anything that makes BigQuery easier to use.”

1:01:36 Introducing Single-tenant Cloud HSM for more data encryption control

Google Cloud has launched Single-tenant Cloud HSM, a dedicated hardware security module service that gives organizations exclusive control over cryptographic keys with FIPS 140-2 Level 3 validation.
Unlike multi-tenant solutions, customers get sole access to physical HSM partitions with hardware-enforced isolation, meaning their keys are cryptographically separated from other customers and Google operators. The service is generally available now in the US and EU, with “competitive” pricing https://cloud.google.com/kms/pricing#stch_pricing ($3500/month).
The service targets highly-regulated industries like financial services, defense, healthcare, and government that need strict compliance controls but want to avoid managing physical hardware.
Key security features include full ownership of root keys, quorum-based administration requiring multiple authorized users for sensitive operations, and the ability to revoke Google’s access at any time, which immediately makes all keys and encrypted data inaccessible until authorization is restored.
Single-tenant Cloud HSM integrates directly with existing Cloud KMS APIs and works with Customer-Managed Encryption Keys (CMEK) across Google Cloud services. Setup takes approximately 15 minutes using standard gcloud commands, and the service automatically scales to handle peak traffic loads while maintaining high availability across multiple zones.
The service has already obtained compliance certifications, including FedRAMP, DISA IL5, ITAR, SOC 1/2/3, HIPAA, and PCI DSS.
Google manages all hardware provisioning, configuration, monitoring, and compliance, removing the operational burden of physical HSM management while maintaining the same redundancy and availability standards as multi-tenant Cloud HSM.
Administrators can use hardware tokens like YubiKey or other key management systems to generate and manage their administrative credentials, with quorum requirements preventing any single individual from making unauthorized changes.

1:06:21 Ryan – “And that’s why Google is announcing this. Someone had this checkbox – someone with deep enough pockets had this checkbox.”

Azure

44:40 Public Preview: 7th generation Intel-based VMs – Dlsv7/Dsv7/Esv7

Azure launches Dlsv7, Dsv7, and Esv7 virtual machines in public preview, powered by Intel Xeon 6 processors codenamed Granite Rapids.
These 7th-generation Intel-based VMs represent the latest iteration in Azure’s general-purpose and memory-optimized VM families, bringing newer processor architecture to cloud workloads.
The new VM series targets customers running compute-intensive and memory-intensive workloads that can benefit from the latest Intel processor improvements.
General-purpose Dlsv7 and Dsv7 VMs suit balanced workloads like web servers and application hosting, while Esv7 VMs are optimized for memory-heavy applications such as databases and in-memory analytics.
Intel Xeon 6 processors introduce architectural improvements over previous generations, though specific performance metrics and pricing details are not provided in the announcement.
Customers interested in testing these VMs should evaluate them during preview to determine if the newer processor generation delivers meaningful improvements for their specific workloads.
The preview status means these VMs are available for testing but may not yet be suitable for production workloads, depending on service level agreements and regional availability.
Organizations should check Azure documentation for supported regions and any preview limitations before deploying workloads on these new VM series.

1:11:15 Matt – “The other reason I wanted to keep it in was, I’m still struggling to get the V6 in some regions. And granted, these are less common regions, you know, but I have a different skews based on region availability because I just can’t get it, and in some places it’s like, ‘we can do it in two zones.’ And I’m like, cool, thank you. Way to make yourself more money.”

Closing

340: Azure releases a new SQL AI Assistant… Jimmy Droptables

Sat, 07 Feb 2026 01:29:38 +0000

Welcome to episode 340 of The Cloud Pod, where the forecast is always cloudy! It’s a full house (eventually) with Justin, Jonathan, Ryan, and Matt all on board for today’s episode. We’ve got a lot of announcements, from Gemini for Gov (no more CamoGPT!) to Route 52 and Claude. Let’s get started!

Titles we almost went with this week

Claude’s Pricing Tiers: Free, Pro, and Maximum Overdrive
GitHub Copilot Learns Database Schema: Finally an AI That Understands Your Joins
SSMS Gets a Copilot: Your T-SQL Now Writes Itself While You Grab Coffee
Too Many Cooks in the Cloud Kitchen: How 32 GPUs Outcooked the Big Tech Industrial Kitchens
Uncle Sam Gets a Gemini Twin: Google’s AI Goes Federal
Route 53 Gets Domain of Its Own: .ai Joins the Party
Thai One On: Google Cloud Plants Its Flag in Bangkok
NAT So Fast: Azure’s Gateway Gets a V2 Glow-Up
Beware Azure’s SQL Assistant doesn’t smoke your joints.

AI Is Going Great, Or How ML Makes Money

30:10 Announcing BlackIce: A Containerized Red Teaming Toolkit for AI Security Testing | Databricks Blog

Databricks released BlackIce, an open-source containerized toolkit that bundles 14 AI security testing tools into a single Docker image available on Docker Hub as databricksruntime/blackice:17.3-LTS.
The toolkit addresses common red teaming challenges, including conflicting dependencies, complex setup requirements, and the fragmented landscape of AI security tools, by providing a unified command-line interface similar to how Kali Linux works for traditional penetration testing.
The toolkit includes tools covering three main categories: Responsible AI, Security testing, and classical adversarial ML, with capabilities mapped to MITRE ATLAS and the Databricks AI Security Framework.
Tools are organized as either static (simple CLI-based with minimal programming needed) or dynamic (Python-based with customization options), with static tools isolated in separate virtual environments and dynamic tools in a global environment with managed dependencies.
BlackIce integrates directly with Databricks Model Serving endpoints through custom patches applied to several tools, allowing security teams to test for vulnerabilities like prompt injections, data leakage, hallucination detection, jailbreak attacks, and supply chain security issues.
Users can deploy it via Databricks Container Services by specifying the Docker image URL when creating compute clusters.
The release includes a demo notebook showing how to orchestrate multiple security tools in a single environment, with all build artifacts, tool documentation, and examples available in the GitHub repository.
The CAMLIS Red Paper provides additional technical details on tool selection criteria and the Docker image architecture.

04:30 Ryan – “It’s very difficult to feel confident in your AI security practice or patterns. I feel like it’s just bleeding edge, and I’m learning so much all the time. And so I spend a lot of time reading papers and talking to others and seeing what they’re doing and meeting with vendors trying to figure out strategy, and it just feels like I’m drinking from a fire hose, and it’s really difficult to feel confident. So I like tools like this, where not only is it adding a whole bunch of value, but you can use it as a rubric against what you’ve been doing and where your gaps are.”

07:28 Ai2 cooks up open-source coding agents with a tech equivalent of ‘hot plate and frying pan’ – GeekWire

Allen Institute for AI releases SERA (Soft-Verified Efficient Repository Agents), the first in their Open Coding Agents series, as a fully open-source coding agent that organizations can fine-tune on their own codebases for approximately $1,300 using commodity GPUs.
The model handles GitHub issues, generates line-by-line patches, and submits pull requests while learning internal APIs and development conventions.
SERA-32B achieves over 50% success rate on SWE-Bench, matching the performance of proprietary models like GitHub Copilot Workspace and Claude Code, but was built with just 32 GPUs and a five-person team.
This demonstrates that competitive coding agents can be developed without the massive infrastructure typically required by tech giants.
The model runs on organization-owned infrastructure without ongoing licensing fees and integrates with existing tools like Claude Code out of the box.
Teams can deploy it with a few lines of code and customize it for private codebases, offering an alternative to expensive closed systems from Microsoft and Anthropic.
By open-sourcing both the model and training code, Ai2 enables companies to maintain control over their proprietary code while still leveraging advanced AI coding assistance.
This addresses a key concern for enterprises hesitant to send sensitive code to third-party services.

05:30 Justin – “I was playing with Olamma, actually, this week, plugging it into Claude, and I definitely needed to get a new M5 MacBook with much more GPU capacity – or go buy a GPU for my house to make that really perform well. But even on my Mac with the 20B open model, it was serviceable. It just wasn’t as fast as using Anthropix APIs directly.”

09:51 Introducing Agentic Vision in Gemini 3 Flash

Google launches Agentic Vision in Gemini 3 Flash, introducing a Think-Act-Observe loop that enables the model to actively manipulate images through Python code execution rather than processing them in a single static pass.
This approach delivers a 5-10% quality improvement across most vision benchmarks by allowing the model to zoom, crop, rotate, and annotate images iteratively to ground its reasoning in visual evidence.
The capability enables three primary use cases: implicit zooming for fine-grained detail inspection (PlanCheckSolver.com improved building plan validation accuracy by 5%), image annotation with bounding boxes and labels to prevent counting errors, and visual math with deterministic Python execution to parse tables and generate charts without hallucination.
Agentic Vision is available now via the Gemini API in Google AI Studio and Vertex AI, with rollout beginning in the Gemini app under the Thinking model option.
Developers can enable the feature by turning on Code Execution under Tools in the AI Studio Playground.
Google plans to expand the capability by making behaviors like image rotation and visual math fully implicit without requiring prompt nudges, adding more tools, including web and reverse image search, and extending support beyond Flash to other model sizes.
Currently, some capabilities require explicit prompting to trigger code execution.
The feature addresses a fundamental limitation in frontier AI models that previously had to guess when missing fine-grained details like serial numbers or distant street signs, now replacing probabilistic guessing with verifiable code execution in a deterministic Python environment.

11:08 Justin – “Enhance!”

13:44 Introducing Prism | OpenAI

OpenAI launches Prism, a free cloud-based LaTeX workspace for scientific writing that integrates GPT-5.2 directly into the research workflow.
The platform offers unlimited projects and collaborators for anyone with a ChatGPT personal account, with enterprise plans coming soon for Business, Enterprise, and Education customers.
Prism builds on OpenAI’s acquisition of Crixet, a LaTeX platform, and adds native AI capabilities, including real-time collaboration, literature search from sources like arXiv, equation conversion from whiteboard photos to LaTeX, and voice-based editing. GPT-5.2 Thinking mode operates within the document context, understanding the full paper structure, including equations, citations, and figures.
The platform eliminates the fragmented workflow researchers typically face by consolidating drafting, revision, collaboration, and publication preparation into a single workspace.
This removes the need for local LaTeX installations and reduces context switching between separate editors, PDF viewers, reference managers, and chat interfaces.
OpenAI positions this as part of a broader shift where AI accelerates scientific discovery, following examples of GPT-5 advancing mathematical research, immune-cell analysis, and molecular biology experiments.
The free tier provides immediate access to core features, while more advanced AI capabilities will be available through paid ChatGPT plans over time.

14:49 Justin – “I don’t care for LaTex, but I’m not in science either, so maybe this is for those people.”

AWS

16:41 Now available: 48xlarge and metal-48xl sizes for EBS optimized Amazon EC2 instances

AWS launches 48xlarge and metal-48xl instance sizes for Graviton4-powered C8gb, M8gb, and R8gb instances, delivering up to 30% better compute performance than Graviton3 and the highest EBS bandwidth (300 Gbps) among non-accelerated EC2 instances.
These instances support up to 1440K IOPS, making them the highest EBS IOPS performers in EC2.
The new instances scale up to 48xlarge with three memory-to-vCPU ratio options (compute, general purpose, and memory optimized), plus metal sizes for C8gb and R8gb that provide direct hardware access.
They include up to 400 Gbps networking bandwidth and support Elastic Fabric Adapter for low-latency cluster workloads.
Primary use cases include high-throughput database workloads, data analytics pipelines, and tightly coupled HPC applications that require sustained high block storage performance.
The EFA support makes these particularly suitable for distributed computing tasks that need consistent low-latency inter-node communication.
Currently available in US East (N. Virginia) and US West (Oregon), with metal sizes limited to US East (N. Virginia) only.
This follows AWS’s typical pattern of launching new instance types in primary US regions before broader global expansion.
The instances represent AWS’s continued investment in Graviton ARM-based processors, offering customers an alternative to x86 instances with improved price-performance for workloads that can run on ARM architecture.

18:04 Justin – They’re the only thing I used to like to buy on the spot market until AI came around and then ruined it for me.”

18:47 Amazon Route 53 Domains adds support for .ai, and other top-level domains

Route 53 now supports ten new top-level domains, including .ai, .nz, .shop, .bot, .moi, .spot, .free, .deal, .now, and .hot, expanding domain registration options directly within AWS.
The .ai domain has become particularly relevant for AI companies despite originally being Anguilla’s country code, while other TLDs target specific use cases like e-commerce (.shop) and chatbot services (.bot).
The new domains integrate with existing Route 53 features, including DNS management, automatic renewal, and hosted zones, allowing customers to manage domain registration and DNS records through the console, CLI, or SDKs.
This consolidation eliminates the need for third-party domain registrars when building AWS-hosted applications.
Domain registration pricing varies by TLD, with no standard rate mentioned in the announcement.
Customers should check the Route 53 pricing page for specific costs per domain type, as premium TLDs like .ai typically command higher annual registration fees than traditional domains.
The timing aligns with increased demand for AI-related branding, though Route 53 has historically added TLD support incrementally rather than in response to specific market trends.
The service now competes more directly with dedicated domain registrars by offering industry-specific and regional domain options.
follows standard EC2 on-demand and reserved instance models, with Graviton instances typically offering 20-40% better price-performance than comparable x86 instances.

20:03 Ryan – “It is frustrating, and it’s not like these are new. Like, AI’s been around for a while, and so it is strange that it takes that long.”

22:50 Amazon WorkSpaces announces advanced printer redirection

Amazon WorkSpaces now supports advanced printer redirection for Windows users, enabling access to printer-specific features like duplex printing, paper tray selection, and finishing options such as stapling and hole-punching directly from virtual desktops.
This addresses a longstanding limitation where WorkSpaces users were restricted to basic printing capabilities through generic drivers.
The feature includes configurable driver validation modes that let administrators balance compatibility with feature support, automatically falling back to basic printing when matching drivers are not available.
Organizations with users who need professional document printing, specialized labels, or advanced output formatting will benefit most from this capability.
Advanced printer redirection requires WorkSpaces Agent version 2.2.0.2116 or later and Windows client version 5.31 or later, with matching printer drivers installed on both the WorkSpace and client device.
The feature is available in all AWS Regions where Amazon WorkSpaces Personal is offered, though it is limited to Windows WorkSpaces with Windows clients only.
This enhancement brings WorkSpaces closer to feature parity with traditional desktop environments for printing workflows, which is particularly important for industries like legal, healthcare, and finance, where document formatting and specialized printing are common requirements.
The addition fills a notable gap in virtual desktop infrastructure capabilities that has been a barrier for some organizations considering cloud-based desktop solutions.

26:55 AWS Network Firewall now supports GenAI traffic visibility and enforcement with Web category-based filtering

AWS Network Firewall adds URL category-based filtering that specifically identifies and controls GenAI application traffic alongside traditional web categories like social media and streaming services.
This allows security teams to enforce policies like blocking unauthorized AI tools or restricting access to approved GenAI services only, addressing a growing compliance concern as organizations struggle to govern employee use of ChatGPT, Claude, and similar platforms.
The feature works by inspecting traffic against pre-defined URL categories and can be combined with AWS Network Firewall’s existing TLS inspection capability for full URL path analysis.
This provides more granular control than simple domain blocking, enabling organizations to differentiate between different services from the same provider or allow specific features while blocking others.
The capability is available now in all AWS commercial regions where Network Firewall operates, with no separate pricing beyond existing Network Firewall costs, which start at $0.395 per firewall endpoint hour plus $0.065 per GB processed.
Organizations can implement this through stateful rule groups using the AWS Console, CLI, or SDKs without requiring additional infrastructure changes.
This addresses a practical security gap where traditional firewall rules struggle to keep pace with rapidly emerging GenAI services, reducing the operational burden of manually maintaining blocklists and allowlists. The pre-defined categories are maintained by AWS, meaning customers automatically get coverage for new GenAI services as they launch without manual rule updates.

28:32 Ryan – “I’m happy to see this being added to the AWS network firewall. Hoping this gets added to the Google NextGen firewall as well, because it is sort of difficult when you’re forced to do domain-based filtering on these things.”

GCP

30:33 Mastering Gemini CLI: Your Complete Guide from Installation to Advanced Use-Cases

Google has partnered with DeepLearning.ai to launch a free, comprehensive course on Gemini CLI, an open-source command-line agent that integrates AI capabilities into daily workflows.
The course covers installation and context management through GEMINI.md files, extensibility via Model Context Protocol servers, and practical applications across software development, data analysis, content creation, and personalized learning.
The course is structured as a sub-2-hour curriculum with nine lessons that progress from foundational setup to specialized workflows.
Key technical features include memory management for maintaining context across sessions, integration with external tools through MCP servers, and custom extensions that allow users tailor the CLI to specific needs.
Gemini CLI targets a broad user base beyond traditional developers, with dedicated modules for data visualization from local CSVs and Google Sheets, automated blog and social media content generation, and study plan creation.
The tool is available as an open-source project on GitHub with full documentation at geminicli.com.
The course is completely free and available now at goo.gle/gemini-cli-learning-course, positioning it as an accessible entry point for users looking to incorporate AI agents into command-line workflows.
This represents Google’s continued push to make Gemini models more accessible through developer-friendly tooling and educational resources.

32:30 Jonathan – “It’s interesting they didn’t go for a command line coding tool. It’s not Gemini code; it’s Gemini that does a whole bunch of stuff. So they’ve seen the broader implications of what those tools can do.”

34:08 Google Cloud Launches New Region in Bangkok, Thailand

Google Cloud has opened its Bangkok region (asia-southeast3), backed by a $1 billion infrastructure investment that’s projected to contribute $41 billion to Thailand’s economy and support 130,000 jobs annually over five years.
The region addresses data residency requirements under Thailand’s Personal Data Protection Act (PDPA) while providing low-latency access to local customers and connectivity to Google’s global network via the TalayLink subsea cable.
The region launches with key compliance certifications, including ISO 27001/27017/27018, PCI DSS, and SOC 1/2/3, making it suitable for regulated industries like banking and insurance. KASIKORN Business-Technology Group and True Digital Group are among the first customers leveraging the local infrastructure to meet Bank of Thailand regulatory standards while maintaining data sovereignty.
The Bangkok region provides local compute and storage with millisecond-level latency for Thai users, while AI workloads can access globally-hosted services like Vertex AI, Gemini 3, and generative models through the region as a secure on-ramp.
This hybrid approach lets customers run general-purpose workloads locally without investing in specialized AI hardware while still accessing Google’s AI ecosystem when needed.
Launch partners, including Accenture, Deloitte, MFEC, and NTT Data, are providing local engineering and consulting support to help customers migrate to the new region. ZZZZGoogle is also running the PanyaThAI customer success program and Google Skills initiatives to build local cloud and AI talent in Thailand.
The region is now available in the Google Cloud console under asia-southeast3, joining Google’s network of 43 cloud regions connected by over 7.75 million kilometers of fiber infrastructure.
Pricing follows standard Google Cloud regional pricing models with no specific Thailand-region premiums mentioned in the announcement.

35:12 Justin – “It’s really Google’s way of not having to buy a billion GPUs and distribute them globally, but you can argue it as a secure onramp all you want.”

37:36 Cloud Composer supports Apache Airflow 3.1

Cloud Composer now supports Apache Airflow 3.1 in preview, making Google the first hyperscaler to offer this version.
The update builds on Airflow 3.0’s decoupled architecture with new features including Human-in-the-Loop workflows that pause execution for manual approvals via UI or API, Deadline Alerts that replace legacy SLAs with proactive time-based notifications, and native support for 17 languages in the React-based interface.
The Human-in-the-Loop functionality integrates with Airflow Notifiers to send approval requests through Slack, email, or PagerDuty with direct links to decision points. This addresses the growing need for human oversight in AI agent workflows and complex automated pipelines, particularly for deployment approvals or reviewing generative AI outputs.
Google positions Cloud Composer as an open orchestration alternative to proprietary walled garden platforms, emphasizing that Airflow-based workflows remain portable Python code rather than vendor-locked logic. The company contributes directly to the Airflow codebase and highlights access to thousands of community-built providers and custom operator development for legacy system integration.
Additional developer-focused improvements include a React plugin system for embedding custom dashboards in the UI and a new streaming API endpoint for watching synchronous DAG execution until completion. The preview is available now for new Cloud Composer 3 environments, though specific pricing details for Airflow 3.1 support were not disclosed in the announcement.

38:58 Ryan – “This is rich, because after dealing with Cloud Composer and its kind of terribleness… now with Cloud Composer 3, they’re just rebranding and saying that, no, all that stuff that you were complaining about is a feature, not a bug! We’re not going to build a complicated workflow engine where you don’t get exposed to the innards; we’re going to just let you run your own managed airflow. And it’s basically a deployment template. But it’s a feature, because they’re allowing direct access, not wall cards.”

40:23 Gemini for Government: Unlocking Public Sector Innovation

Google launches Gemini for Government, a FedRAMP High-authorized AI platform specifically designed for public sector agencies.
The platform provides secure access to Gemini models and agentic AI capabilities, with the Department of Defense already deploying it to 3 million personnel through GenAI.mil and the FDA implementing agentic AI across their operations.
The platform emphasizes AI agents as productivity multipliers for government employees, automating administrative tasks while allowing workers to focus on strategic decision-making.
At Google’s Public Sector Summit, agencies built over 300 AI agents in a single day to demonstrate potential use cases across different government functions.
Gartner named Google a Company to Beat for Enterprise Agentic AI Platforms in their December 2025 report, citing Google’s integrated tech stack and enterprise-wide adoption capabilities.
This recognition positions Google’s government AI offering against competitors in the federal marketplace, where security accreditation and compliance are critical requirements.
The Department of Transportation selected Google Workspace as its agency-wide collaboration suite, showing broader adoption of Google’s cloud services beyond just AI capabilities.
This indicates government agencies are consolidating on Google’s platform for both productivity and AI workloads rather than using point solutions.
No pricing information was disclosed in the announcement, though agencies can register for a February 5 webinar and download AI agent toolkits to explore implementation options.
The focus appears to be on enterprise agreements rather than public pricing, given the government procurement process.

45:18 New BigQuery gen AI functions for better data analysis

BigQuery now integrates Gemini 3.0 and Vertex AI models directly into SQL queries through new AI functions, including AI.GENERATE for text and structured output, AI.EMBED for embeddings, and AI.SIMILARITY for semantic search. The setup process has been simplified by allowing End User Credentials authentication, eliminating the need for separate service account connections if users have the Vertex AI User role.
The AI.GENERATE function handles multimodal inputs, including text, images, video, audio, and documents, and can perform multiple AI tasks simultaneously, like sentiment analysis, translation, and summarization in a single SQL call. Users can specify an output schema to convert unstructured data directly into structured table columns, making results immediately usable in downstream applications.
The new AI.SIMILARITY function provides a streamlined approach to semantic search by computing embeddings and similarity scores in one step, ideal for interactive analysis on small to medium datasets. For larger-scale operations across millions of rows, users can transition to the VECTOR_SEARCH function with precomputed embeddings and vector indexing.
These functions are fully composable within standard SQL, meaning they can be used in SELECT statements, WHERE clauses, and ORDER BY clauses alongside traditional SQL operations. The AI.GENERATE and AI.GENERATE_TABLE functions are now generally available, while AI.EMBED and AI.SIMILARITY is currently in preview.

46:28 Ryan – “I can have AI generate the query to call AI to analyze the results of the AI-generated query! I don’t see what could go wrong?”

Azure

47:35 SSMS 22.2.1 Release

SQL Server Management Studio 22.2.1 adds GitHub Copilot code completions directly in the query editor, going beyond traditional IntelliSense by providing context-aware T-SQL suggestions that improve as more code is written in the editor.
Microsoft customized the Visual Studio Copilot implementation to include database context, ensuring suggestions are both relevant and performant for SQL workflows.
The release focuses on fundamental improvements with bug fixes addressing user-reported issues from the feedback site, while engineering teams work on the backend pipeline and testing enhancements.
Microsoft spent December and January prioritizing quality and reliability improvements that may not be immediately visible but strengthen the product foundation.
GitHub Copilot Agent mode is coming to SSMS, according to the updated roadmap, along with improvements to instructions functionality, which ranks as a top user request.
Users can vote on specific AI features through the feedback site, with Microsoft using vote counts as the primary metric for prioritizing development work.
Code completions may compete with traditional IntelliSense, so users experiencing conflicts can disable IntelliSense to get the full benefit of Copilot suggestions.
The feature requires a GitHub Copilot subscription, which is separate from SSMS itself and follows standard GitHub Copilot pricing for individuals or organizations.
This positions SSMS as a more AI-native database management tool, particularly relevant for SQL developers already using Copilot in other Microsoft development environments like Visual Studio and VS Code.
The database context integration represents technical work specific to SQL workloads rather than a simple port of existing Copilot functionality.

49:57 What’s New in Azure Repos: Recent Updates – Azure DevOps Blog

Azure Repos has rolled out several quality-of-life improvements focused on pull request workflows and TFVC modernization.
The most impactful change is a breaking update that disables obsolete TFVC check-in policies, requiring teams still using the old storage format to migrate to the new system or lose policy enforcement entirely.
Pull request notifications have been streamlined to reduce noise by removing low-value alerts like draft state changes and auto-complete updates, while simplifying remaining notifications to show only relevant changes like affected files.
- This addresses a common complaint about notification overload in code review workflows.
Pull request templates now support nested folder structures that map to multi-level branch names, automatically selecting the most specific template available when targeting branches like feature/foo/december.
- This eliminates template duplication for teams using hierarchical branching strategies.
The Azure DevOps MCP Server continues expanding with new tools for programmatic interaction with repos, branches, commits, and pull requests directly from VS Code and GitHub Copilot.
- This enables developers to query repository metadata and inspect code without opening the Azure DevOps web interface.
Upcoming improvements include a more efficient Git policy configuration API that reduces unnecessary calls when retrieving policies across repositories and branches, plus additional pull request features like highlighting PRs with outstanding comments and filtering by tags.
These changes target teams managing policies at scale and aim to keep code reviews moving more efficiently.

51:17 Justin – “Wow. TFVC modernization is your feature. You’re just going to turn it off and lose your enforcement when they migrate automatically. That’s brutal. Classic Microsoft.”

55:12 Generally Available: StandardV2 NAT Gateway with zone-redundancy and StandardV2 public IPs 

Azure’s StandardV2 NAT Gateway reaches general availability with zone-redundancy and improved performance while maintaining the same pricing as the original Standard SKU.
This upgrade provides automatic high availability across availability zones without requiring customers to manage multiple NAT Gateways or configure complex failover scenarios.
The StandardV2 SKU introduces dual-stack connectivity supporting both IPv4 and IPv6 traffic through a single NAT Gateway instance.
This simplifies network architecture for organizations transitioning to IPv6 or running hybrid IP environments, eliminating the need to deploy separate NAT solutions for each protocol.
StandardV2 Public IP addresses and prefixes are now available alongside the NAT Gateway upgrade, providing consistent zone-redundant capabilities across the networking stack.
These resources work together to ensure outbound connectivity remains available even during zone-level failures without manual intervention.
The price-neutral upgrade path means existing Standard SKU customers can migrate to StandardV2 for enhanced resiliency without budget impact.
Organizations running mission-critical workloads that require guaranteed outbound connectivity should evaluate this upgrade, particularly those currently managing multiple NAT Gateways for redundancy purposes.

56:27 Jonathan – “I guess it’s not as easy as it sounds. I mean, to us it’s like, well, why don’t I just deploy two, right? But if they’re NATing to public IPs, then those public IPs need to be routable to the zones, and so there’s probably a whole bunch more complexity on the back end in implementing multi-zone support for NAT than perhaps people realize.”

57:46 Announcing Unified SOX & DORA Compliance Solutions in Microsoft Sentinel

Microsoft Sentinel now includes dedicated compliance solutions for SOX IT General Controls, and DORA regulations, providing financial institutions with continuous monitoring and audit-ready evidence through workbook-driven dashboards.
Both solutions are currently in public preview and consolidate telemetry from Microsoft Entra ID, Azure Activity Logs, Defender signals, Microsoft 365 audit logs, and third-party sources into structured compliance views.
The SOX IT Compliance solution maps directly to three core control domains: Access Management, monitoring unauthorized access to financial systems, Change Management tracking configuration modifications across Azure and on-premises environments, and Data Integrity controls detecting audit log tampering or gaps in critical system logging.
Organizations deploy the solution by enabling data connectors, defining a SOX watchlist of authorized users and systems, and customizing queries to match internal policies.
The DORA Compliance solution addresses the EU Digital Operational Resilience Act requirements through four specialized tabs covering Incident Management with MTTR tracking and SLA breach detection, Threat Intelligence correlating IOCs with MITRE ATT&CK techniques, Business Continuity monitoring inactive servers and failover events, and Compliance Mapping that links security alerts directly to specific DORA Articles for audit evidence.
Both solutions target financial services organizations, ICT providers, and any entity handling financial reporting systems that need to demonstrate regulatory compliance.
The workbooks are fully customizable with editable KQL queries, allowing organizations to extend mappings, integrate custom logs, and adapt controls to different financial systems and regulatory frameworks over time.
Deployment requires existing Microsoft Sentinel infrastructure with appropriate data connectors enabled, and organizations can define scope using watchlists to filter regulated assets.
Pricing follows the standard Microsoft Sentinel consumption-based model for data ingestion and retention, with costs varying based on log volume from connected sources.

1:00:55 Maia 200: The AI accelerator built for inference

Microsoft launches Maia 200, a custom AI inference accelerator built on TSMC’s 3nm process that delivers over 10 petaFLOPS in FP4 precision and 5 petaFLOPS in FP8 within a 750W envelope.
The chip offers 30% better performance per dollar than current Azure hardware and outperforms Amazon Trainium third generation and Google’s TPU seventh generation in key metrics.
The accelerator features 216GB HBM3e memory at 7 TB/s bandwidth and 272MB on-chip SRAM, designed specifically for running large language models like GPT-5.2 and synthetic data generation workloads.
Microsoft’s Superintelligence team will use Maia 200 for reinforcement learning and creating training data for next-generation models.
Maia 200 uses a two-tier scale-up network built on standard Ethernet rather than proprietary fabrics, with each accelerator providing 2.8 TB/s bidirectional bandwidth and supporting clusters up to 6,144 accelerators.
This approach reduces power consumption and total cost of ownership while maintaining predictable performance for dense inference workloads.
Initial deployment is in US Central datacenter region near Des Moines, with US West 3 near Phoenix coming next, integrated with Microsoft Foundry and Microsoft 365 Copilot services.
Microsoft is offering a Maia SDK preview with PyTorch integration, Triton compiler, and low-level programming tools for developers to optimize models for the new hardware.
Microsoft achieved rapid deployment by validating the end-to-end system in pre-silicon environments, getting AI models running within days of receiving packaged parts and reducing time from first silicon to datacenter deployment by more than 50% compared to similar programs.
The company positions this as the first in a multi-generational accelerator program with future iterations already in design.

1:02:50 Ryan – “It’s a blessing and a curse that I don’t have these types of workloads. I’m not using these types of things for building models…but I’m sort of jealous because it would be kind of cool to have a use case where I could use this.”

1:03:34 Azure Storage 2026: Built for Agentic Scale and Cloud‑Native Apps

Azure Storage is positioning itself as the foundational platform for AI workloads across the entire lifecycle, from frontier model training to large-scale inference and agentic applications.
Key capabilities include Blob scaled accounts that handle millions of objects across hundreds of scale units, and Azure Managed Lustre delivering up to 512 GBps throughput with 25 PiB namespaces for keeping GPU fleets continuously fed during training and inference operations.
The platform is adapting to handle agentic workloads that generate an order of magnitude more queries than traditional user-driven systems.
Elastic SAN is becoming the core building block for cloud-native applications, offering fully managed block storage pools with multi-tenant capabilities, while Azure Container Storage has been open-sourced and now delivers 7x faster performance for Kubernetes-based stateful applications.
Mission-critical workload performance has reached new levels with M-series VMs pushing disk storage to 780,000 IOPS and 16 GB/s throughput for SAP HANA deployments.
Ultra Disks paired with Ebsv6 VMs can achieve 800,000 IOPS and 14 GB/s throughput with sub-500 microsecond latency, while Azure NetApp Files is introducing Elastic ZRS for zone-redundant high availability without operational complexity.
Microsoft is addressing power and supply chain constraints through Azure Boost Data Processing Units that offload storage operations to dedicated hardware, reducing per-unit energy consumption while improving performance.
The company is also expanding integrations with external datasets and AI frameworks, including Microsoft Foundry, LangChain, Ray, and Anyscale, to simplify data pipeline operations across hybrid environments.
The partner ecosystem is expanding with co-engineered solutions from Commvault, Dell PowerScale, Pure Storage, Qumulo, and others that integrate deeply with Azure Storage services.
These partnerships focus on hybrid data movement and backup solutions that enable customers to leverage Azure AI services while maintaining data across on-premises and cloud environments.

1:05:07 Matt – “I just got concerned when Elastic SAN became the core building blocks of cloud native apps.”

Oracle

1:05:55 Announcing Support for IAM Deny Policies in the OCI IAM

Oracle Cloud Infrastructure now supports IAM Deny Policies, allowing administrators to explicitly block specific actions even when allow policies would otherwise grant access.
This addresses a common security gap where overly permissive policies could inadvertently grant unwanted access, particularly useful for enforcing compliance requirements and preventing accidental resource deletion in production environments.
The deny policies work alongside existing allow policies using a deny-by-default model where explicit denies always override allows, following standard IAM best practices seen in AWS and other cloud providers. Organizations can now create guardrails that prevent even highly privileged users from performing certain actions, like deleting critical resources or accessing sensitive compartments.
This feature integrates with Oracle’s existing IAM infrastructure, including compartments, groups, and dynamic groups, without requiring architectural changes.
Customers can implement deny policies immediately through the OCI console, CLI, or API using the same policy language syntax they already know, though they’ll need to carefully plan policy hierarchies to avoid unintended lockouts.
Primary use cases include preventing production resource deletion, enforcing regulatory compliance by blocking data exports to certain regions, and implementing separation of duties where even administrators cannot bypass certain security controls.
The feature is available across all OCI regions at no additional cost beyond standard IAM usage.

1:06:48 Justin – “Welcome to the party! How long has that been missing?”

Cloud Journey

44:40 How Google SREs Use Gemini CLI to Solve Real-World Outages

Google SREs are using Gemini CLI with their latest foundation model to reduce Mean Time to Mitigation during production outages, targeting a 5-minute SLO just to acknowledge incidents.
The system uses function calling to fetch incident details, analyze logs, correlate time series data, and recommend specific mitigation playbooks like task restarts rather than generating arbitrary bash scripts.
The implementation maintains human-in-the-loop control through multi-layer safety, including strictly typed tools via Model Context Protocol, risk assessment metadata, policy enforcement, and required confirmation steps before executing any production changes.
- This copilot approach allows AI-speed analysis while preserving human accountability and creating automatic audit trails for compliance.
Gemini CLI integrates directly with Google’s monorepo to analyze code changes, generate patches as Changelists, and automate the entire incident lifecycle from initial triage through postmortem generation.
- The system can populate timelines, create action items, file bugs in issue trackers, and export documentation automatically.
The workflow creates a feedback loop where generated postmortems become training data for future incident responses, and the pattern is reproducible outside Google using open-source Gemini CLI with MCP servers connecting to tools like Grafana, Prometheus, PagerDuty, and Kubernetes.
Custom commands allow teams to automate their specific operational workflows, similar to Google’s internal postmortem generator.

Closing

339: Just-in-Time Secrets: Because Your AI Agent Can't Keep Its Mouth Shut

Thu, 29 Jan 2026 05:31:58 +0000

Welcome to episode 339 of The Cloud Pod, where the forecast is always cloudy! Justin and Matt are in the studio today to bring you all the latest in cloud and AI announcements, including more personnel shifts (and it doesn’t seem like it was very friendly), a new way to get much needed copper, and Azure marketplace advertising 4,000 different models. What’s the real story? Let’s get into it and find out!

Titles we almost went with this week

US-EAST-1: Still the Least Reliable Friend You Keep Inviting to Parties **OpenAI
0⃣ From Zero to Inference: BigQuery Makes Open Models a Two-SQL Problem
AWS Goes Full Brandenburg Gate: Sovereign Cloud Opens for Business
Seven Ate Nine: AWS Skips G7 and Goes Straight to G7e Instances
From Crawling to Calling: Cloudflare Buys Human Native to Fix AI’s Data Problem
Finally, an AI That Actually Listens to Your War Room Panic
Tag, You’re Governed: AWS Automation Takes the Wheel
Cloudflare Reaches for the Stars: Astro Framework Acquisition Lands
Gemini Gets Personal: Google AI Finally Reads Your Email (With Permission)
AWS Strikes Ore: Amazon Cuts Out the Middleman in Copper Supply Chain
When Your Region Goes Down More Often Than Your Kubernetes Cluster
ChatGPT Go: OpenAI’s New Middle Child Gets $8 Allowance
Cloudflare’s Space-Age Acquisition: Astro Gets Jetsons-Level Upgrade
Rosie the Robot Fired: Cloudflare Brings Astro Framework Into the Family
It took 5 years, and now we have ads in our AI.
AI now with Ads
EU says hands off my data

General News

00:50 Heather’s data is not unreliable

Maybe it’s unreliable.
I blame Matt for having screwed up his outtro (as he did today), in which case I no longer recognize his participation.

01:11 Astro is joining Cloudflare

Cloudflare acquires The Astro Technology Company, bringing the popular open-source web framework in-house while maintaining its MIT license and multi-cloud deployment capabilities.
Major platforms like Webflow Cloud, Wix Vibe, and Stainless already use Astro on Cloudflare infrastructure to power customer websites.
Astro 6 introduces a redesigned development server built on Vite Environments API that runs code locally using the same runtime as production deployment. When using the Cloudflare Vite plugin, developers can test against workerd runtime with access to Durable Objects, D1, KV, and other Cloudflare services during local development.
The framework focuses on content-driven websites through its Islands Architecture, which renders most pages as static HTML while allowing selective client-side interactivity using any UI framework.
This approach addresses the complexity that made building performant websites difficult before 2021, providing a simpler foundation for both human developers and AI coding agents.
Astro 6 adds stable Live Content Collections for real-time data updates without site rebuilds and includes first-class Content Security Policy support.
The acquisition positions Cloudflare to serve better platform builders who extend Cloudflare services to their own customers through Cloudflare for Platforms.
Tailwind recently laid off 80% of their staff, ostensibly due to AI, so this may have been an opportune moment for an exit.

04:15 Matt – “I would assume that they heavily use it (AI) internally, so hopefully it’s something that they can leverage and continue to grow and they don’t have to redevelop their platform.”

04:53 Human Native is joining Cloudflare

Cloudflare acquired Human Native, a UK-based AI data marketplace that transforms multimedia content into structured, searchable data for AI training.
The acquisition accelerates Cloudflare’s AI Index initiative, which uses a pub/sub model to let websites push structured content updates to AI developers in real time, rather than relying on traditional web crawling.
Human Native’s platform focuses on licensed, high-quality training data rather than scraped content, with one UK video AI company reportedly discarding its existing training data after achieving better results with Human Native’s curated datasets.
- This approach addresses the growing problem of crawl-to-referral ratios reaching tens of thousands of bot crawls per human visitor.
The acquisition builds on Cloudflare’s existing AI Crawl Control and Pay Per Crawl products, giving content owners more control over how AI systems access their content.
Human Native’s technology will help customers structure their content for both AI consumption and traditional human audiences while enabling new monetization models.
Cloudflare is positioning this work alongside the x402 Foundation (partnered with Coinbase) to enable machine-to-machine transactions for digital resources.
The combination aims to create new economic models where AI developers can subscribe to structured content feeds and content creators receive fair compensation for their data.

05:30 Justin – “We block you from getting to people’s AI content, and now we offer you a way to buy better content. Well played.”

AI Is Going Great – Or How ML Makes Money

06:40 Introducing Labs \ Anthropic

Anthropic is launching Labs as a dedicated team focused on incubating experimental AI products at the frontier of Claude’s capabilities, led by Instagram co-founder Mike Krieger and Ben Mann.
This organizational shift separates rapid experimentation from production scaling, with Ami Vora taking over as head of Product to focus on enterprise-grade Claude experiences.
The Labs approach has already produced several products that moved from research to production, including Claude Code, which reached $1 billion in revenue within six months of launch, and the Model Context Protocol, which now has 100 million monthly downloads and has become an industry standard for connecting AI systems to tools and data.
Recent Labs outputs include Skills, Claude in Chrome, and Cowork, which launched as a research preview to bring Claude’s agentic capabilities to desktop environments. This demonstrates the team’s focus on exploring new interaction models and deployment patterns for large language models beyond traditional chat interfaces.
The organizational structure creates two parallel tracks: Labs for frontier experimentation with unpolished versions and early user testing, and the core Product organization partnering with CTO Rahul Patil to scale proven experiences for millions of daily users and enterprise customers.
- This separation aims to balance innovation velocity with reliability requirements.
Anthropic is actively hiring for Labs positions, specifically targeting builders with experience creating consumer products and working with emerging technologies.
The team structure reflects the company’s view that rapid AI advancement requires different organizational approaches than traditional product development cycles.

08:04 Matt – “The fact that you can get a lab to a GA customer product…is a really hard thing. They seem to have done a pretty good job of that with all these different technologies.”

10:56 Mira Murati’s startup, Thinking Machines Lab, is losing two of its co-founders to OpenAI

Thinking Machines Lab, Mira Murati’s AI startup valued at $12 billion after a $2 billion seed round last July, has lost two of its three co-founders back to OpenAI within a year of founding.
Barret Zoph, who served as CTO, along with co-founder Luke Metz and researcher Sam Schoenholz, returned to OpenAI in what reports suggest was not an amicable departure.
The startup has now lost four key personnel in under a year, including co-founder Andrew Tulloch, who left for Meta in October.
Soumith Chintala has been promoted to replace Zoph as CTO, bringing over a decade of AI field experience to the role.
The rapid co-founder departures raise questions about Thinking Machines’ internal dynamics and strategic direction, particularly given the company secured backing from major investors, including Andreessen Horowitz, Accel, Nvidia, and AMD. The startup has not publicly disclosed what products or services it is developing despite the substantial funding.
This talent movement highlights the ongoing competition for AI research talent among major players, with OpenAI CEO of applications Fidji Simo noting the returns had been in the works for several weeks. The pattern mirrors OpenAI’s own history of co-founder departures to competing ventures, including John Schulman, who left for Anthropic before joining Thinking Machines.

12:35 Matt – “It’s interesting that they’re going back to OpenAI. I’m curious, with NDAs and all of that stuff in place, how that is going to work.”

13:49 OpenAI partners with Cerebras

OpenAI is adding 750MW of dedicated low-latency inference capacity through a partnership with Cerebras, with deployment rolling out in phases through 2028.
Cerebras uses a unique architecture with a single giant chip that combines compute, memory, and bandwidth to eliminate traditional bottlenecks in AI inference.
The partnership focuses specifically on accelerating real-time AI responses for workloads like complex queries, code generation, image creation, and AI agents.
OpenAI’s strategy is to match specialized hardware to specific workload types rather than using one-size-fits-all infrastructure.
Cerebras systems are purpose-built for fast token generation during the output phase of inference, which is critical for interactive AI applications where users expect immediate responses. This addresses the request-think-respond loop that determines user experience quality.
The integration represents OpenAI’s approach to building a diversified compute portfolio, adding specialized low-latency systems alongside their existing infrastructure.
This allows them to optimize different types of AI workloads based on performance requirements rather than using general-purpose hardware for everything.

14:29 Justin – “In general, anybody that can get you AI capacity is apparently a musto-do.”

15:49 Introducing ChatGPT Go, now available worldwide

OpenAI launches ChatGPT Go globally at $8 per month, creating a three-tier subscription model with Go, Plus ($20), and Pro ($200).
The Go tier provides 10x more messages, file uploads, and image creation than the free tier, with access to GPT-5.2 Instant, plus longer memory and context windows for improved conversation continuity.
The pricing strategy positions Go as an entry-level paid option for users who need more capacity than the free tier but don’t require the advanced reasoning capabilities of GPT-5.2 Thinking (Plus) or GPT-5.2 Pro.
OpenAI reports that Go became their fastest-growing product after initial rollout to 170 countries, with strong adoption for writing, learning, image creation, and problem-solving tasks.
OpenAI plans to introduce advertising in both the free tier and ChatGPT Go in the US, while Plus, Pro, Business, and Enterprise tiers remain ad-free.
This ad-supported model aims to sustain free and low-cost access points, while generating revenue from users who don’t need premium features.
The tiered approach reflects a shift toward market segmentation similar to traditional SaaS models, with clear differentiation between casual users (Go), professionals (Plus), and power users (Pro). The $8 price point is localized in some markets, suggesting OpenAI is optimizing for purchasing power parity to maximize global adoption.

17:00 Matt – “Ads are coming to AI. We all knew it was coming; they have to find additional ways to monetize it.”

Cloud Tools

19:15 Bringing secure, just-in-time secrets to Cursor with 1Password

1Password has integrated with Cursor, the AI-powered IDE, to provide just-in-time secrets management through Cursor Hooks that validate and inject credentials at runtime without ever storing them on disk.
This eliminates the common security risk of developers hard-coding API keys or committing secrets to source control while working with AI coding assistants.
The integration works by running a Hook Script before Cursor’s AI agent executes shell commands, verifying that the required environment files from 1Password Environments are properly configured and prompting users to authorize access only when needed.
Secrets remain in memory for the runtime session only, and never touch disk or Git history, maintaining zero-trust principles while keeping development velocity high.
This addresses a critical gap in AI-assisted development where AI agents could potentially access unrestricted credentials, or developers might paste tokens directly into config files for convenience.
The solution lets project owners configure secrets management centrally while individual developers maintain control over authorization through 1Password’s existing access policies and vault permissions.
Plans include granular, task-specific access rules for AI agents, broader support for the Model Context Protocol in external API interactions, automated secret rotation for AI workflows, and enhanced audit visibility for security teams.
- The goal is to make secure access a native part of AI-powered development rather than an afterthought bolted on later.
This matters because AI coding tools like Cursor are rapidly becoming standard in developer workflows, but most teams lack proper secrets management for these new AI-driven interactions.
The integration provides a practical path to adopt AI assistance without compromising security posture or requiring developers to change existing 1Password policies.

20:34 Justin – “The one thing they don’t mention, which I think is also a big threat, is you’re sending your context to their servers, and if you’re putting your password into the context, that password is now going to the inference systems, and that could potentially get exposed. So it would be nice if this also had the ability to prevent a secret from getting transmitted to the third party LLM.”

23:36 Announcing the Harness Human-Aware Change Agent

Harness launched the Human-Aware Change Agent, an AI system that listens to incident response conversations in Slack, Teams, and Zoom to extract operational clues like “the checkout button froze after they updated their cart” and automatically correlates them with actual production changes, including deployments, feature flags, and config updates.
This solves the problem where critical incident context lives in human conversations but never makes it into automated investigation tools.
The agent is part of Harness AI SRE, which includes an AI Scribe that filters incident-related conversation from noise and feeds it to the change investigation engine.
Instead of just transcribing chat or generating generic RCA summaries, it produces evidence-backed hypotheses like “deployment to checkout-service 12 minutes before the incident introduced new retry config, followed by latency spike and downstream timeouts.”
The system integrates with existing observability and incident management tools, including Datadog, PagerDuty, Jira, ServiceNow, Slack, and Teams through native integrations and webhooks.
It also includes Automation Runbooks for standardized response and On-Call management to route incidents to the right owners.
The core innovation is treating human insight as operational data rather than assuming incidents can be solved purely through logs, metrics, and traces. This addresses the reality that on-call engineers often identify patterns through conversation before they show up in dashboards, especially as AI-assisted development increases code velocity and reduces clear ownership of changes.
The tool aims to shorten the incident response cycle from “What are we seeing” to “What changed” to “What should we do” by connecting human observations with machine-driven change intelligence in real time during active incidents.

25:22 Justin – “Human awareness of how the system works as a whole – because typically AI systems don’t have the context to handle the whole system view – is also very valuable to the AI as well, so I guess we’re going to serving the AI someday, instead of the otherway around.”

AWS

26:15 Amazon EC2 X8i instances powered by custom Intel Xeon 6 processors are generally available for memory-intensive workloads

Want to burn all your moneys? Good news!
AWS launches X8i instances with custom Intel Xeon 6 processors offering up to 6 TB of memory and 3.9 GHz sustained all-core turbo frequency, delivering 1.5x more memory capacity and 3.4x more memory bandwidth than previous X2i generation.
These SAP-certified instances target memory-intensive workloads like in-memory databases, data analytics, and EDA applications.
Performance improvements are substantial across multiple workloads: 50% higher SAP HANA performance, 47% faster PostgreSQL, 88% faster Memcached, and 46% faster AI inference compared to X2i instances. Real customer deployments show Orion reduced SQL Server licensing costs by 50% while maintaining performance thresholds by using fewer active cores.
The instances come in 14 sizes, including three new larger options (48xlarge, 64xlarge, 96xlarge) and two bare metal variants, with network bandwidth up to 100 Gbps supporting Elastic Fabric Adapter and 80 Gbps EBS throughput.
The instance bandwidth configuration feature allows flexible allocation between network and EBS bandwidth with up to 25% scaling capability.
Currently available in US East N. Virginia, US East Ohio, US West Oregon, and Europe Frankfurt regions with standard purchasing options including On-Demand, Savings Plans, and Spot Instances.
Pricing follows standard EC2 memory-optimized instance rates available on the EC2 pricing page.

27:23 Announcing Amazon EC2 G7e instances accelerated by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs

AWS launches EC2 G7e instances powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, delivering 2.3x better inference performance compared to G6e instances and doubling GPU memory to 96GB per GPU.
These instances can handle models up to 70B parameters with FP8 precision on a single GPU, with configurations scaling up to 8 GPUs and 768GB total GPU memory per node.
The instances feature substantial networking improvements with 4x the bandwidth of G6e instances (up to 1,600 Gbps) and support for NVIDIA GPUDirect RDMA via Elastic Fabric Adapter for multi-node workloads.
GPUDirect P2P enables direct GPU-to-GPU communication over PCIe with 4x the inter-GPU bandwidth compared to previous generation L40s GPUs, reducing latency for distributed model inference.
G7e instances target generative AI inference, spatial computing, and scientific computing workloads with support for GPUDirect Storage integration with FSx for Lustre, providing up to 1.2 Tbps throughput for rapid model loading. Configurations range from single GPU instances to 8-GPU systems with up to 192 vCPUs and 2TB of system memory.
Currently available in US East N. Virginia and Ohio regions with support for On-Demand, Spot, Savings Plans, Dedicated Instances, and Dedicated Hosts purchasing options.
SageMaker AI integration is planned for future release, while ECS and EKS support is available now.

27:46 Justin – “That’s a lot of power, and cooling, and that where all my RAM went to, which is why my RAM is expensive now.”

29:00 Opening the AWS European Sovereign Cloud

AWS European Sovereign Cloud is now generally available with its first region in Brandenburg, Germany, operating as a physically and logically separate infrastructure partition (aws-eusc) entirely within the EU.
The infrastructure will be operated exclusively by EU residents located in the EU, with dedicated IAM and billing systems, and technical controls that prevent access from outside the EU.
The service launches with comprehensive AWS capabilities, including SageMaker, Bedrock, EC2, Lambda, EKS, Aurora, DynamoDB, S3, and other core services, backed by a 7.8 billion EUR investment expected to contribute 17.2 billion EUR to the European economy through 2040.
Expansion plans include sovereign Local Zones in Belgium, the Netherlands, and Portugal, plus options for Dedicated Local Zones, AI Factories, and Outposts deployments.
The operational model features EU-based management through German legal entities, with Stephane Israel appointed as managing director and an advisory board of EU citizens providing sovereignty oversight.
The infrastructure maintains AWS security standards, including Nitro System isolation, ISO/IEC 27001, SOC 1/2/3 reports, and BSI C5 attestation, with a Sovereign Reference Framework available in AWS Artifact.
Data residency guarantees ensure all customer content and metadata, including roles, permissions, and configurations, remain within the EU, using dedicated European trust service providers for certificate authority operations and European TLDs for Route 53 name servers. Pricing is in EUR with billing available in eight supported currencies through Amazon Web Services EMEA SARL.
Major AWS partners, including Adobe, Cisco, SAP, Snowflake, and Wiz, are making their solutions available in the sovereign cloud, enabling public sector and highly regulated industry customers to meet strict compliance requirements while accessing modern cloud capabilities without being stuck in legacy on-premises environments.

31:53 Justin – “Google’s got the same thing on a partnership with Thales in France. I think Azure is doing something similar as well… but the question is kind of, a European entity owned by a US corporation, does that actually fulfill the concerns the European Union has?”

33:16 Rio Tinto and Amazon Web Services collaborate to bring low-carbon Nuton copper to U.S. data centres

AWS becomes the first customer for Rio Tinto’s Nuton bioleaching technology, which uses microorganisms to extract copper from ore at the Johnson Camp mine in Arizona.
The process produces 99.99% pure copper cathode directly at the mine without traditional smelters or refineries, achieving a carbon footprint of 2.82 kgCO2e/kg Cu compared to the global range of 1.5-8.0 kgCO2e/kg Cu.
The two-year agreement supplies low-carbon copper for AWS data center components, including electrical cables, busbars, transformers, circuit boards, and processor heat sinks.
Johnson Camp is now the lowest-carbon primary copper producer in the U.S., targeting approximately 30,000 tonnes of refined copper over four years with 71 liters of water per kilogram versus the industry average of 130 liters.
AWS provides cloud-based data and analytics support to optimize Nuton’s bioleaching operations, including heap-leach performance simulation and advanced analytics for acid and water usage.
The modular system enables rapid scaling and customization for different ore bodies while recovering value from previously classified waste material.
This collaboration addresses supply chain resilience by producing critical materials domestically for U.S. data centers while supporting Amazon’s Climate Pledge goal of net-zero carbon by 2040.
The partnership demonstrates how industrial mining operations can integrate cloud technology to reduce environmental impact and shorten mine-to-market supply chains.

34:39 Justin – “It also tells me how much you desperately need it (copper) for all the AI investments you’re about to be making.”

35:53 Skills, Custom Diff Tools, Improved Code Intelligence, and Conversation Compaction

Kiro CLI version 1.24.0 introduces Skills, a new resource type for progressive context loading that only loads metadata at startup and fetches full documentation content on demand when the AI agent needs it.
This addresses memory constraints when working with large documentation sets by requiring YAML frontmatter with descriptive metadata to help agents determine when to load complete content.
The release adds built-in code intelligence for 18 programming languages, including Python, JavaScript, Go, Rust, and others, without requiring LSP setup. Developers get immediate access to symbol search, definition navigation, and structural code searches, plus a new /code overview command for quick workspace analysis.
New AST-based pattern-search and pattern-rewrite tools enable precise code refactoring by matching syntax tree patterns instead of text regex. This eliminates false matches in string literals and comments, providing more reliable code transformations for AI agents.
Conversation Compaction addresses context window limitations with a /compact command that summarizes conversation history while preserving key information. The feature triggers automatically when context limits are reached and creates a new session while allowing users to resume the original conversation, with configurable retention settings for message pairs and context window percentage.
The update includes granular URL permissions for the web_fetch tool using regex patterns to control which domains AI agents can access, plus remote authentication support for Google and GitHub when running Kiro CLI on remote machines via SSH, SSM, or containers.

GCP

38:43 Introducing BigQuery managed and SQL-native inference for open models | Google

BigQuery now supports SQL-native inference for open models from Hugging Face and Vertex AI Model Garden through a two-step process: CREATE MODEL with a model ID string, then run inference using AI.GENERATE_TEXT or AI.GENERATE_EMBEDDING functions.
This eliminates the need for separate infrastructure management or API integrations outside of BigQuery.
The service includes automated resource management with configurable idle timeout settings that automatically undeploy endpoints when not in use, preventing runaway costs from idle GPU instances.
Users can customize machine types, replica counts, and leverage Compute Engine reservations for consistent GPU availability on demanding workloads.
This extends BigQuery’s existing managed inference capabilities beyond Google’s Gemini models and partner models like Anthropic and Mistral to any compatible open model.
The entire lifecycle from deployment to cleanup happens through SQL statements, making LLM inference accessible to data analysts without requiring ML engineering expertise.
The feature is currently in Preview and supports both text generation and embedding generation workloads directly on data stored in BigQuery tables.
Cost control includes both automated endpoint recycling based on idle time and manual undeploy options via ALTER MODEL statements, with automatic cleanup of all Vertex AI resources when models are dropped.

39:45 Matt – “This all seems crazy to me; this is where we’re at, where AI is writing, creating models, running all of these things for us.”

40:56 TranslateGemma: A new family of open translation models

Google released TranslateGemma, a new family of open translation models based on Gemma 3, available in 4B, 12B, and 27B parameter sizes supporting 55 languages.
The models use a two-stage training process combining supervised fine-tuning on parallel data from human translations and Gemini-generated synthetic translations, followed by reinforcement learning using MetricX-QE and AutoMQM reward models.
The 12B TranslateGemma model outperforms the baseline Gemma 3 27B model on WMT24++ benchmarks while using less than half the parameters, delivering higher throughput and lower latency.
The 4B model matches the performance of the 12B baseline, making it suitable for mobile inference and edge deployment.
TranslateGemma retains Gemma 3’s multimodal capabilities, showing improved performance on the Vistra image translation benchmark without specific multimodal fine-tuning.
The models were trained on nearly 500 language pairs beyond the core 55, providing a foundation for researchers to fine-tune for specific language pairs or low-resource languages.
The models are optimized for different deployment scenarios: 4B for mobile and edge devices, 12B for consumer laptops, and 27B for a single H100 GPU or TPU cloud deployment.
All three sizes are available now for developers and researchers to download and use.

41:50 Justin – “I am excited about the idea of models that specialize in supporting language translations; and so this is things that power future products inside of your Android phones someday, where Apple has a feature where it can slowly translate things through your Airpods… it’s a little delayed but it works relatively well. I’m sure this will bring similar type capabilities to you and your Android phone.”

Azure

44:40 Design your AI strategy with Microsoft Marketplace Solutions

Microsoft positions its Marketplace as a central hub for AI adoption with over 11,000 pre-packaged models (“models”) and 4,000 AI apps and agents, offering organizations flexible build-buy-blend strategies for implementing AI solutions.
The platform integrates directly into existing Microsoft tools like Copilot Studio and Azure Foundry, allowing teams to discover and deploy AI components within their normal workflows rather than switching between separate procurement systems.
The Marketplace supports both pro-code development with full control over custom logic and IP ownership, and low-code approaches through Copilot Studio using models from providers like Anthropic, OpenAI, Meta, and NVIDIA.
Organizations with Azure consumption commitments can apply Marketplace purchases dollar-for-dollar against their contracts with no limit, potentially improving ROI on existing Microsoft agreements.
Microsoft emphasizes a blended approach where companies can extend partner solutions with proprietary components, illustrated by financial services firms deploying pre-built fraud detection models while customizing them with internal data pipelines and compliance workflows.
This strategy reduces the engineering effort and compliance review cycles compared to building detection systems from scratch while maintaining data security through Managed Identity within Azure tenants.
The platform includes try-before-you-buy capabilities with trials and proofs-of-concept that run within customer Microsoft environments, allowing validation before full deployment.
Solutions are filtered by product, category, and industry to match specific organizational needs, with agents available directly in Microsoft 365 Copilot and models accessible through the Azure portal.

Cloud Journey

52:07 Is Northern Virginia Still the Least Reliable AWS Region in 2025? We Analyzed the Data

StatusGator published an analysis of AWS outages from January through December 2025, focusing on regional reliability and service-level incidents across all commercial AWS regions

N. Virginia (us-east-1) is the least reliable AWS region: 10 outages, 34 hours of downtime, 126 components affected
October 20, 2025, was one of AWS’s most significant outages ever: 76 components down for ~15 hours, cascading failures across thousands of SaaS platforms
Compute and ML services hit hardest: EC2 (14 outages), SageMaker (11), Glue (10), EMR (10), ECS (10)
Several services exceeded 24 hours cumulative downtime: OpenSearch, CloudWatch, EMR Serverless, STS
Multi-region (“Regionless”) outages increased: 12 incidents, 32 hours of downtime
Status Gator speculates reasons:

Customer density: us-east-1 has 2x the users of Oregon and 3x other regions
Higher service density creates more interconnected dependencies and potential failure points
Heavier API traffic and more complex multi-AZ coordination
No evidence that the age of the region or architectural differences are factors

Best Practices from Status Gator
- Avoid over-reliance on a single region, especially us-east-1
- Design for multi-region resilience and failover
- Monitor authentication/identity services (STS) as critical dependencies
- Consider the blast radius when selecting primary regions

Closing

338: T5Gemma Says "AI’ll be Back”

Thu, 22 Jan 2026 00:20:14 +0000

Welcome to episode 338 of The Cloud Pod, where the forecast is always cloudy! Justin, Ryan, Matt, and Jonathan are in the studio today to bring you all the latest in cloud and AI news, including a bit of a buying spree (inlcuding whole power companies) Veo 3.1, Cowork, and more – today in the cloud!

Titles we almost went with this week

Snowflake’s Ironic Timing: Buying Downtime Prevention Tool While Experiencing Downtime
Flexera Buys ProsperOps and Chaos Genius, Promises Less Chaos and More Prosperity
Flexera Goes Shopping: Two FinOps Acquisitions to Prosper and Reduce Chaos
Token of Appreciation: Gemini CLI Now Tracks Every Penny of Your AI Spend
Snowflake Buys Observe to Stop Its Own Services from Melting Down
Google’s Veo 3.1 Goes Vertical: Finally Understanding How People Actually Hold Their Phones
Alphabet’s New Power Move: Buying the Company That Literally Powers Data Centers
Dashboard Confessional: Gemini CLI Gets Transparent About Its Usage
Microsoft’s New Agent Works 24/7 and Never Asks for a Raise
From Robot Vacuums That Climb Stairs to TVs You Can’t Feel: CES Gets Weird
Agent Shopping: When Your AI Has Better Taste Than You Do
The cloudpod hosts do not like any stories this week
AWS took a nap on announcements this week
Claude is my new co-worker
Wake up, AWS, and give us some fun news
The $200 Assistant: Is Cowork the End of Workplace Admins?
Azure has more interesting announcements than AWS oh noooo
If you can’t beat them in AI, just acquire everyone
Notebook LM turns the Data Tables on you

AI Is Going Great – Or How ML Makes Money

01:11 Anthropic launches Cowork, a Claude Code-like for general computing – Ars Technica

Anthropic launches Cowork, a new feature in the macOS Claude desktop app that extends Claude Code‘s agentic capabilities to general office work tasks.
Users can grant Claude access to specific folders and use plain language instructions to automate tasks like filling expense reports from receipt photos, writing reports from notes, or reorganizing files.
Cowork lowers the technical barrier compared to Claude Code by making AI-assisted file operations accessible to non-developer knowledge workers, including marketers and office staff.
The feature was developed after Anthropic observed users already applying Claude Code to general knowledge work despite its developer-focused positioning.
The tool provides similar functionality to what was possible through Model Context Protocol integrations, but offers a more streamlined interface with Claude Code-style usability improvements.
Users can submit new requests or modifications to ongoing tasks without waiting for the initial assignment to complete.
Cowork represents a strategic expansion of Anthropic’s agentic AI approach beyond software development into broader productivity workflows. The feature demonstrates how AI agents with file system access can automate routine knowledge work tasks that previously required manual processing of documents and data.

02:15 Ryan – “This week is the first time I actually tried to use AI to generate a PowerPoint presentation. It did not go well. It did generate some cool images, though.”

07:42 Enhanced Veo 3.1 capabilities are now available in the Gemini API.

Google has released Veo 3.1 updates in the Gemini API and Google AI Studio, adding enhanced Ingredients to Video capabilities that maintain character identity and background consistency across generated videos.
The model now supports native 9:16 vertical format generation optimized for mobile-first applications, eliminating the need to crop from landscape orientation.
The updated model delivers professional-grade output with new 4K resolution support and improved 1080p quality using state-of-the-art enhancement techniques. All generated videos include SynthID digital watermarking for content provenance tracking.
These capabilities are available today through the Gemini API for developers and Vertex AI for enterprise customers. Google AI Studio provides a demo app for testing the new features at ai.studio/apps/bundled/veo_studio.
The vertical video format addresses the growing demand for social media content creation, while the 4K output positions Veo 3.1 for professional video production workflows. The character consistency improvements reduce the need for manual editing and post-processing in multi-shot video projects.

08:20 Justin – “Don’t make the same mistakes that I do, and go try this and then get a $35 bill, which I did the first time I tried Veo out. So, do be cautious with this one!”

11:08 Snowflake Announces Intent to Acquire Observe to Deliver AI-Powered Observability

Snowflake is acquiring Observe to integrate AI-powered observability directly into its data platform, allowing customers to analyze telemetry data like logs, metrics, and traces alongside their business data.
This consolidation eliminates the need for separate observability tools and reduces data movement between systems.
The acquisition addresses the growing challenge of managing observability data at scale, which has become increasingly expensive and complex as organizations generate massive volumes of telemetry information.
Observe’s approach stores data in a structured format that enables more efficient querying and analysis compared to traditional observability platforms.
By bringing observability into Snowflake’s platform, customers can correlate operational metrics with business outcomes using the same SQL-based tools they already use for analytics.
This unified approach should help teams identify how application performance issues directly impact revenue, customer experience, and other business metrics.
The deal positions Snowflake to compete more directly with observability vendors like Datadog, Splunk, and New Relic by offering native capabilities rather than requiring third-party integrations.
Organizations already using Snowflake for data warehousing can now consolidate their observability spend and simplify their tool stack.

12:08 Ryan – “I don’t know how to feel about this; I feel like Snowflake is a part of an application, but it’s not the entirety of an application. I definitely see a use for this for data warehousing and visualizing, but I don’t think it replaces your traditional observability tools because you have too many data sources that are outside of Snowflake.”

Cloud Tools

13:58 Flexera acquires ProsperOps and Chaos Genius to expand its FinOps solution with agentic and AI-enabled cost optimization

Flexera acquires two FinOps companies to add autonomous AI-driven cost optimization across major cloud platforms and data analytics services: ProsperOps brings automated commitment management for AWS, Azure, and Google Cloud with over $6B in annual cloud usage under management, while Chaos Genius focuses specifically on Snowflake and Databricks optimization with reported cost reductions up to 30%.
The acquisitions shift Flexera’s FinOps approach from passive recommendations to active autonomous execution through agentic AI.
This means the platform can automatically purchase and manage cloud commitments and optimize data workloads without requiring manual human intervention, addressing the challenge of dynamic cloud usage patterns that don’t align well with static commitment purchases.
ProsperOps will continue operating as a separate brand while integrating with Flexera’s existing FinOps capabilities. The company was growing at over 90% and has generated more than $3 billion in lifetime savings for customers, suggesting strong market demand for automated rate optimization solutions.
The Chaos Genius acquisition specifically targets the emerging problem of runaway costs in data analytics platforms like Snowflake and Databricks as AI workloads scale.
This addresses a gap in traditional FinOps tools that primarily focused on compute and storage optimization but lacked specialized capabilities for modern data cloud platforms.
These moves position Flexera to cover the complete FinOps Framework defined by the FinOps Foundation, combining cost visibility, workload optimization, and rate optimization in a single platform.
This matters for enterprises struggling to manage costs across an increasingly complex mix of traditional cloud services, AI infrastructure, and specialized data platforms.

15:35 Matt – “It definitely needs some pretty strong guardrails of what your business objective is, like don’t go over 90% savings plan or look at the secondary market for short term if you see a random burst for a few months. But it’s not a terrible idea…”

AWS

19:12 Weirdly enough, there are no AWS stories this week.

GCP

20:06 Instant insights: Gemini CLI’s New Pre-Configured Monitoring Dashboards | Google Cloud Blog

Google has added pre-configured monitoring dashboards to Gemini CLI that provide immediate visibility into usage metrics like monthly active users, token consumption, and code changes without requiring custom query writing.
The dashboards integrate with Google Cloud Monitoring and use OpenTelemetry for standardized data collection, allowing teams to track CLI adoption and performance across their organization.
The implementation uses direct GCP exporters that bypass intermediate OTLP collector configurations, simplifying setup to three steps: setting the project ID, authenticating with proper IAM roles, and updating the settings.json file. This reduces infrastructure complexity compared to traditional OpenTelemetry deployments that require separate collector services.
Organizations can analyze raw OpenTelemetry logs and metrics to answer specific questions like identifying power users by token consumption, tracking budget allocation by command type, and monitoring tool reliability through status codes. The data follows GenAI OpenTelemetry conventions, ensuring compatibility with other observability backends like Prometheus, Jaeger, or Datadog if teams want to switch platforms.
The feature targets development teams using Gemini CLI who need to understand tool adoption patterns and justify AI tooling investments through concrete usage metrics.
Engineering managers can track which developers benefit most from AI assistance and where token budgets are being allocated across different command types

21:55 Ryan – “As long as there’s no metric for how stupid a question is, because that. That I don’t want.”

22:40 We’re advancing U.S. energy innovation with Intersect.

Alphabet announced a definitive agreement to acquire Intersect, a company specializing in data center and energy infrastructure solutions.
This acquisition aims to accelerate the deployment of data center capacity and energy generation infrastructure in the United States.
The deal addresses a critical bottleneck in AI and cloud infrastructure expansion by bringing expertise in energy development and data center deployment under Alphabet’s umbrella. Intersect’s capabilities will help Google bring more computing capacity online faster, which is essential given the substantial power requirements of AI workloads and hyperscale cloud operations.
This acquisition reflects the growing importance of energy infrastructure as a limiting factor for cloud providers, particularly as AI training and inference workloads drive unprecedented power demands. By acquiring energy infrastructure expertise, Google positions itself to better control the full stack from power generation through data center operations.
The announcement provides limited technical details about integration timelines or specific projects, but signals Google’s commitment to vertical integration in the infrastructure space. This move follows similar investments by other hyperscalers in power generation and energy partnerships to support their expanding data center footprints.

22:50 Justin – “If you can’t get the capacity from the vendor, just buy them – and then force them to do it. Good move!”

25:00 Google’s NotebookLM introduces Data Tables feature

NotebookLM now includes Data Tables, a feature that automatically synthesizes information from multiple sources into structured tables that can be exported directly to Google Sheets.
The feature is available today for Pro and Ultra users, with rollout to all users planned for the coming weeks.
The feature addresses a common workflow challenge where valuable information is scattered across multiple documents, requiring manual compilation. Data Tables automates this process by extracting and organizing key facts into clean, structured formats without manual data entry.
Use cases span professional and personal applications, including converting meeting transcripts into action item tables with owners and priorities, synthesizing research data like clinical trial outcomes across multiple papers, creating competitor analysis tables with pricing and strategy comparisons, and building study guides organized by relevant categories.
The feature represents Google’s continued integration of AI capabilities into productivity tools, positioning NotebookLM as a research and synthesis tool rather than just a note-taking application.
This builds on NotebookLM’s existing source analysis capabilities by adding structured data output.
The tiered rollout strategy, with Pro and Ultra users receiving immediate access, suggests Google is testing the feature with power users before broader deployment, likely to gather usage patterns and refine the table generation algorithms.

25:52 Justin – “I love creating spreadsheets; my budgets, all of my tracking of things, tasks I’m doing, vacation planning – it all lives in spreadsheets. And you’re going to take that away from me, Google? How dare you. AI is coming for my passion for spreadsheets.”

29:53 T5Gemma 2: The next generation of encoder-decoder models

Google releases T5Gemma 2, a new generation of encoder-decoder models based on Gemma 3, available now in pre-trained checkpoints at three sizes: 270M-270M (370M total), 1B-1B (1.7B total), and 4B-4B (7B total) parameters. The models use tied word embeddings and merged decoder attention to reduce parameter count while maintaining capabilities, making them suitable for on-device applications and rapid experimentation.
T5Gemma 2 adds multimodal vision capabilities using an efficient vision encoder for visual question answering and reasoning tasks, extends context windows to 128K tokens using Gemma 3’s alternating local and global attention mechanism, and supports over 140 languages out of the box.
These represent the first multi-modal and long-context encoder-decoder models in the Gemma family.
The architecture merges decoder self-attention and cross-attention into a single unified layer, reducing model complexity and improving parallelization for better inference performance.
This structural change, combined with tied embeddings, allows more active capabilities within the same memory footprint compared to the original T5Gemma.
Benchmarks show T5Gemma 2 outperforms Gemma 3 on several multimodal tasks, delivers substantial quality gains on long-context problems compared to both Gemma 3 and T5Gemma, and shows improved performance on coding, reasoning, and multilingual tasks. Post-training results indicate better performance than decoder-only counterparts, making these models suitable for both research and production applications.
The models are designed for developers to post-train for specific tasks before deployment, continuing the approach from the original T5Gemma of adapting pre-trained decoder-only models into an encoder-decoder architecture without the computational cost of training from scratch.
Pre-trained checkpoints are available across multiple platforms for broad developer access.

31:14 Jonathan – “I’m actually looking forward to playing with the T5Gemma model because the encoder part of it is what’s going to make it really special. Transformers have always had these two halves, encoder and decoder, and most LMs only use the decoder. And what that means is that as the attention is calculated for each token in the context window, it only ever attends to previous tokens in the message. So if you have a word, that word can only ever be related to something that you’ve already said in the conversation. But people aren’t like that. People go back and forth, and they refer back to things they said… people just suck at communication most of the time. And so what the encoder model does is it looks at the entire message holistically. It doesn’t only look at the last word by the time it gets to the last word, it looks at everything and encodes the meaning of the entire text. And then from there, it passes it to the decoder, and the decoder starts generating text based on the entire knowledge of the whole thing.”

33:39 New tech and tools for retailers to succeed in an agentic shopping era

Google launches Universal Commerce Protocol (UCP), an open standard for agentic commerce co-developed with Shopify, Etsy, Wayfair, Target, and Walmart.
UCP enables AI agents to interact across the entire shopping journey from discovery to post-purchase support, working alongside existing protocols like A2A, AP2, and MCP. The protocol is endorsed by over 20 companies, including Adyen, American Express, Mastercard, Stripe, and Visa.
New agentic checkout feature goes live in AI Mode in Search and Gemini app, allowing shoppers to purchase from eligible U.S. retailers directly within Google’s AI surfaces.
The integration uses Google Pay and PayPal for payments, with retailers maintaining seller of record status and the ability to customize the implementation. Global expansion and additional capabilities like loyalty rewards and product discovery are planned for the coming months.
Business Agent launches tomorrow as a branded AI assistant that appears directly in Search results for retailers like Lowe’s, Michaels, Poshmark, and Reebok. U.S. retailers can activate and customize this agent through Merchant Center, with future capabilities including training on retailer data, customer insights, product offers, and direct agentic checkout within the chat experience.
Google introduces Direct Offers pilot in AI Mode, allowing advertisers to present exclusive discounts and deals to shoppers during AI-powered searches. The system uses AI to determine when offers are relevant to display, initially focusing on discounts with plans to expand to bundles and free shipping. Early partners include Petco, e.l.f. Cosmetics, Samsonite, Rugs USA, and Shopify merchants.
Merchant Center adds dozens of new data attributes designed for conversational commerce discovery across AI Mode, Gemini, and Business Agent. These attributes extend beyond traditional keywords to include product Q&A, compatible accessories, and substitutes, rolling out first to a small group of retailers before broader expansion.

35:20 Ryan – “I think it’s important to standardize. In a web transaction where you’re doing shopping, there’s so many handoffs to different things, I can see, as more and more AI and agent-based or agent-assisted transactions happen, being able to talk a common language is super important.”

33:38 Read Sundar Pichai’s remarks at the 2026 National Retail Federation

Google announced Universal Commerce Protocol (UCP), an open standard for agentic commerce built with Shopify, Etsy, Wayfair, Target, and Walmart. The protocol enables native checkout directly in Google Search AI Mode and Gemini, allowing retailers to maintain merchant of record status and own customer relationships while offering personalized pricing and loyalty enrollment at checkout.
Gemini Enterprise for Customer Experience is now available in preview, providing retailers with integrated shopping assistants, support bots, and agentic search capabilities.
The Home Depot and McDonald’s are already using these agents for customer service, while Kroger is testing a shopping agent that brings AI Mode functionality directly into retailer apps.
Google processed over 90 trillion tokens through its API in December 2025, representing an 11x increase from 8.3 trillion tokens in December 2024. This growth demonstrates the rapid adoption of AI capabilities by retailers and the scale at which Google’s infrastructure is supporting commercial AI applications.
Wing delivery service expanded to Houston, with Orlando, Tampa, and Charlotte coming soon, after doubling deliveries in existing markets during 2025 through its Walmart partnership.
The expansion addresses the high cost and logistical challenges of last-mile delivery for retailers.

38:35 Jonathan – “So is this how Google is going to make money in the future? Because obviously serving ads through AI is both controversial and a very lame customer experience. Are they going to start skimming off a percentage of sales for sales they direct to these retailers through their AI interface?”

Azure

39:58 Announcing public preview: Uncovering hidden threats with the Dynamic Threat Detection Agent | Microsoft Community Hub

Microsoft launches the Dynamic Threat Detection Agent in public preview, an AI-powered backend service that runs continuously within Defender to identify hidden threats across Defender and Sentinel environments.
The agent operates autonomously with no setup required, automatically generating alerts with natural language explanations, MITRE technique mappings, and remediation steps directly into existing XDR workflows.
The agent achieves over 85% precision across thousands of alerts and 28 threat types by combining adaptive GenAI detection with hyperscale threat intelligence from TITAN and UEBA behavioral analytics.
It runs a five-step investigation loop at machine scale, starting from high-priority incidents, building unified activity timelines, testing hypotheses through automated Q&A, and closing detection gaps with explainable alerts that include transparent reasoning traces.
Public preview is free for Security Copilot customers and enabled by default for eligible organizations, with general availability planned for late 2026 when it transitions to Security Copilot’s SCU-based consumption model.
Starting July 2026, the agent will be included with Microsoft 365 E5 licenses that have Security Copilot entitlement, and customers can disable it or monitor usage through detailed consumption reporting at any time.
The service respects data residency by running region-local and integrates deeply with the Microsoft security ecosystem, using Sentinel to correlate third-party and native telemetry while surfacing Copilot-sourced detections in Defender.
Built on Azure Synapse for massive scale, it can run thousands of parallel investigations and deliver near-real-time detections while continuously learning from analyst feedback to improve detection quality and reduce alert noise.

43:54 Jonathan – “You don’t want to block a potential customer who’s about to press a button to spend tens of thousands of dollars either. guess false positives are almost as bad as false negatives.”

45:26 Generally Available: Geo-Replication for Azure Service Bus Premium

Azure Service Bus Premium now includes generally available Geo-Replication, allowing customers to replicate messaging infrastructure across regions for disaster recovery.
This addresses a critical need for enterprises running mission-critical messaging workloads that require protection against regional outages.
The feature provides active replication of Service Bus entities, including queues, topics, and subscriptions, between paired regions, maintaining message ordering and metadata consistency.
Organizations can now implement cross-region failover strategies without building custom replication logic or managing multiple Service Bus namespaces manually.
This capability is exclusive to the Premium tier of Service Bus, which starts at approximately $677 per month for the base messaging unit. Customers should factor in additional costs for cross-region data transfer and the secondary namespace when planning their disaster recovery architecture.
The geo-replication option complements existing Service Bus disaster recovery features like Geo-Disaster Recovery (metadata-only failover), giving customers flexibility in choosing between cost-optimized metadata replication or full data replication based on their recovery time objectives.
This is particularly relevant for financial services, healthcare, and retail sectors, where message loss during regional failures is unacceptable.

46:23 Justin – “I’m surprised this wasn’t already part of premium, but I’m also sort of intrigued that they think people’s messaging strategies only involve two regions, because some of the cost architectures I’ve seen are like multiple regions with active replication across these things for geodistributed applications that need to have globally low latency for user populations everywhere – and I guess I just can’t run that on this service. So I guess, screw you? Or wait for Azure Service Bus Ultra?”

After Show

46:38 CES 2026: The best tech announced so far | The Verge

CES 2026 showcased significant infrastructure innovations, including Wi-Fi 8 routers from Asus and others, despite the standard not being finalized until 2028, plus solid-state battery breakthroughs from Donut Lab claiming 400 Wh/kg energy density that could give EVs 30 percent more range. These developments signal major shifts in networking and power infrastructure that cloud and edge computing deployments will eventually leverage.
Smart home and IoT devices are getting serious upgrades with Matter compatibility becoming standard across Ikea and Philips Hue products, while spatial awareness features like Hue’s SpatialAware use AR to map rooms for better lighting distribution. For cloud professionals, this represents the maturation of IoT protocols and edge AI processing that will drive increased demand for home automation backend services.
The display technology race is heating up with Samsung showing creaseless foldable OLED panels, Dell launching a 52-inch 6K Thunderbolt hub monitor, and LG reviving its Wallpaper TV with wireless video transmission. These advances in display tech and connectivity standards like Thunderbolt 5, delivering 120Gbps speeds, will impact how professionals design workspaces and remote work setups.
AI wearables are moving beyond glasses with Razer’s Project Motoko headphones featuring 4K cameras, on-device AI processing via Qualcomm chips, and 36-hour battery life that eclipses current smart glasses. This shift toward headphone-based AI assistants could influence how voice interfaces and edge AI applications are developed for consumer devices.
Robotics took center stage with practical home automation like Roborock’s stair-climbing Saros Rover vacuum and LG’s CLOiD dual-arm robot that can fold laundry and handle kitchen tasks. While still in development, these robots represent the convergence of computer vision, edge AI, and mechanical engineering that will require robust cloud backends for training and coordination.

Closing

337: AWS Discovers Prices Can Go Both Ways, Raises GPU Costs 15 Percent

Fri, 16 Jan 2026 01:02:43 +0000

Welcome to episode 337 of The Cloud Pod, where the forecast is always cloudy! Justin, Matt, and Ryan have hit the recording studio to bring you all the latest in cloud and AI news, from acquisitions and price hikes to new tools that Ryan somehow loves but also hates? We don’t understand either… but let’s get started!

Titles we almost went with this week

Prompt Engineering Our Way Into Trouble
The Demo Worked Yesterday, We Swear
It Scales Horizontally, Trust Us
Responsible AI But Terrible Copy (Marketing Edition)

General News

00:58 Watch ‘The Thinking Game’ documentary for free on YouTube

Google DeepMind is releasing the “The Thinking Game” documentary for free on YouTube starting November 25, marking the fifth anniversary of AlphaFold.
The feature-length film provides behind-the-scenes access to the AI lab and documents the team’s work toward artificial general intelligence over five years.
The documentary captures the moment when the AlphaFold team learned they had solved the 50-year protein folding problem in biology, a scientific achievement that recently earned Demis Hassabis and John Jumper the Nobel Prize in Chemistry.
This represents one of the most significant practical applications of deep learning to fundamental scientific research.
The film was produced by the same award-winning team that created the AlphaGo documentary, which chronicled DeepMind’s earlier achievement in mastering the game of Go. For cloud and AI practitioners, this offers insight into how Google DeepMind approaches complex AI research problems and the development process behind their models.
While this is primarily a documentary release rather than a technical product announcement, it provides context for understanding Google’s broader AI strategy and the research foundation underlying its cloud AI services. The AlphaFold model itself is available through Google Cloud for protein structure prediction workloads.

01:54 Justin – “If you’re not into technology, don’t care about any of that, and don’t care about AI and how they built all the AI models that are now powering the world of LLMs we have, you will not like this documentary.”

04:22 ServiceNow to buy Armis in $7.7 billion security deal • The Register

ServiceNow is acquiring Armis for $7.75 billion to integrate real-time security intelligence with its Configuration Management Database, allowing customers to identify vulnerabilities across IT, OT, and medical devices and remediate them through automated workflows.
The deal is expected to close in the second half of 2026 and aims to triple ServiceNow’s current $1 billion annual security revenue.
The acquisition represents a strategic data play when combined with ServiceNow’s recent purchase of Data.World, giving the company both massive volumes of security asset data from Armis and the governance tools to make that data searchable and usable with AI.
This combination enhances ServiceNow’s CMDB capabilities by an order of magnitude, according to Forrester analysts.
ServiceNow has completed six acquisitions this year, including Armis, Veza for identity access management, and Data.World for data governance, signaling an aggressive expansion strategy focused on security and data management.
The company’s integration approach will be critical as customers watch how well these separate platforms merge into ServiceNow’s unified platform.
The deal positions ServiceNow to eliminate the patchwork of security tools organizations currently use by embedding security capabilities directly into its AI platform.
Armis brings 950 employees, $340 million in annual recurring revenue, and recognition as a Gartner leader in cyber-physical systems protection.
Despite Salesforce entering the ITSM market, analysts assess ServiceNow maintains a five-year development lead in the space, though successful integration of multiple acquisitions remains the key challenge for maintaining that advantage.

05:49 Ryan – “Is this security tooling that you use for analysis or threat hunting? Or is this something that they’re adding to their existing tooling, so it’s more of an integration?”

Listener Note: If you have any idea what this company does, let us know!

Cloud Tools

08:38 TOON vs. JSON | DigitalOcean

TOON (Token Oriented Object Notation) is a new data format designed to replace JSON in LLM prompts, claiming to reduce input token usage by approximately 40% while maintaining or improving accuracy.
The format works by eliminating verbose JSON syntax and repeated tokens, converting structured data into a more compact representation that LLMs can still interpret effectively.
DigitalOcean released a Python library (toon-python) that automatically converts JSON datasets to TOON format before sending them to LLM endpoints. In their testing example, a JSON dataset using 172 tokens was reduced to 71 tokens in TOON format (59% reduction) while producing identical query results across multiple model providers, including Mistral 3.
TOON is specifically designed for input context containing structured data from databases or other sources, not for replacing plain text prompts or LLM outputs. Studies show that converting plain text instructions to structured formats like JSON doesn’t consistently improve accuracy, so TOON’s value proposition is primarily for applications already using JSON-formatted datasets in their prompts.
The format has limitations, including a lack of proven effectiveness for model outputs, potential compatibility issues with models that haven’t been trained on TOON examples, and the need for application-specific testing to verify accuracy and token savings. Function calling, parsing, and other use cases requiring JSON outputs should continue using JSON rather than attempting TOON conversions.
For cost-conscious LLM applications processing large structured datasets, TOON represents a practical optimization that could reduce token costs by 40% without requiring changes to model architecture or training. The token savings become more significant at scale, particularly for applications making frequent API calls with substantial context data.

09:16 Justin – “I’d almost argue that TOON is more of what I would have wanted; very simple comma-separated values… so maybe LLMs will finally solve all my JSON complaints…but maybe not.”

10:40 Google 2025 recap: Research breakthroughs of the year

Google released Gemini 3 Pro in November 2025 and Gemini 3 Flash in December 2025, with Gemini 3 Pro topping the LMArena Leaderboard and achieving 23.4% on MathArena Apex benchmark.
Gemini 3 Flash delivers Pro-grade reasoning at Flash-level latency and cost, continuing Google’s trend where each generation’s Flash model surpasses the previous generation’s Pro model in quality while being substantially cheaper and faster.
The company introduced several specialized AI models, including Nano Banana Pro for native image generation and editing, Veo 3.1 for video generation, and Imagen 4 for image creation.
Google also launched developer tools like Google Antigravity for AI-assisted software development and Jules, an asynchronous coding agent that acts as a collaborative partner for developers.
Google’s AlphaFold celebrated its 5th anniversary with over 3 million researchers across 190+ countries using the Nobel Prize-winning protein folding system, including 1 million users in low and middle-income countries.
New AI tools for genomics include AlphaGenome for genome understanding and DeepSomatic for identifying genetic variants in tumors, moving beyond sequencing to the interpretation of complex genomic data.
Google’s quantum computing work achieved recognition with Googler Michel Devoret receiving the 2025 Nobel Prize in Physics, while the Quantum Echoes algorithm demonstrated progress toward real-world quantum applications.
The company also introduced Ironwood, a new TPU designed for inference workloads using the AlphaChip design method, and launched WeatherNext 2, which generates weather forecasts 8x faster with up to 1-hour resolution covering flood predictions for 2 billion people across 150 countries.
Google formed the Agentic AI Foundation with other AI labs to establish open standards for agentic AI interoperability and announced Model Context Protocol support for Google services.
The company also partnered with the US Department of Energy’s 17 national laboratories on the Genesis project to transform scientific research and expand educational AI initiatives with school districts like Miami-Dade County.

11:52 Meta acquires intelligent agent firm Manus, capping a year of aggressive AI moves

Meta acquired Singapore-based AI agent firm Manus for over $2 billion, bringing on board a company that claims $125 million in revenue run rate just eight months after launching its general-purpose AI agent.
Manus will continue operating its subscription service while its team joins Meta to enhance automation across consumer products like Meta AI assistant and business tools.
Manus offers AI agents capable of executing complex tasks, including market research, coding, and data analysis, having processed over 147 trillion tokens and supported 80 million virtual computers to date.
The platform provides both free and paid subscription tiers and has already been tested by Microsoft in Windows 11 PCs for tasks like creating websites from local files.
The acquisition represents Meta’s continued strategy of acquiring specialized AI startups to accelerate its AI capabilities and Llama large language model development.
This follows Meta’s $14.3 billion investment in Scale AI in June and its acquisition of AI-wearables startup Limitless earlier this month, demonstrating an aggressive talent and technology acquisition approach.
Manus originated as a product of Chinese startup Butterfly Effect before relocating its headquarters from Beijing to Singapore in June, backed by investors including Tencent, HongShan Capital Group, and Benchmark, which led a $75 million Series B round. The company maintains strategic partnerships with Chinese tech firms, including Alibaba’s Qwen AI team, despite its geographic shift.

13:04 Ryan – “You know, the upside, if they’ve just been around for 8 months, they don’t have the terrible tech debt that all these other firms have…they have 8 months of it.”

AWS

15:59 Security Hub CSPM automation rule migration to Security Hub | AWS Security Blog

AWS has split Security Hub into two services: the new Security Hub with enhanced capabilities using the Open Cybersecurity Schema Framework (OCSF), and Security Hub CSPM, which continues as a separate service focused on cloud security posture management.
The schema change from AWS Security Finding Format (ASFF) to OCSF means existing automation rules need migration to work with the new service.
AWS released an open-source Python migration tool on GitHub that automatically discovers Security Hub CSPM automation rules, transforms them to OCSF schema, and generates CloudFormation templates for deployment.
The tool handles Regional differences intelligently, supporting both home Region deployments where rules apply across linked Regions and Region-by-Region deployments for unlinked Regions.
Not all automation rules can be fully migrated due to schema differences between ASFF and OCSF. The tool generates a migration report identifying rules that cannot be migrated or are only partially migrated, and creates all new rules in a disabled state by default so administrators can validate them before enabling.
The migration tool preserves the original order of automation rules, which matters when multiple rules operate on the same findings or fields.
For organizations using a delegated administrator account with AWS Organizations, rules must be created in that account’s home Region, and the tool is designed to work with this model while also supporting single-account deployments.
This migration capability is included in the Security Hub essentials plan at no additional cost beyond standard Security Hub pricing.
Organizations should review the ASFF to OCSF field mapping tables in the documentation before migration, as some criteria fields, like ComplianceAssociatedStandardsId and ProductName have no OCSF equivalents and require manual rule redesign.

18:21 Matt – “The problem I always have with CPAMs – and this is a larger rant or conversation we can have – is there’s no interoperability. So if you have a CPAM and you want to then set up a GRC tool, or your other security tool can also run it, there’s no interoperability. So you then have to acknowledge things in three different spots, and there’s no single source of truth.

20:08 Proactive Amazon EKS monitoring with Amazon CloudWatch Operator and AWS Control Plane metrics | Containers

EKS clusters running version 1.28 and above now automatically send control plane metrics to CloudWatch at no extra cost, covering API server health, scheduler performance, and etcd database status.
The new CloudWatch Observability Operator add-on extends this with Container Insights and Application Signals for deeper visibility into workloads and applications without code changes.
The enhanced monitoring addresses common operational challenges like detecting pod scheduling bottlenecks through metrics such as scheduler_pending_pods and scheduler_schedule_attempts_UNSCHEDULABLE, which help identify under-resourced worker nodes. API server throttling issues become visible through apiserver_request_total_429 metrics, showing when the default 600 in-flight request limit is approached.
Critical infrastructure components like admission webhooks, which power AWS Load Balancer Controller and IRSA functionality, can now be monitored for failures and latency issues. The apiserver_admission_webhook_rejection_count metric helps catch silent webhook failures that could prevent deployments, with CloudWatch Log Insights providing correlated log data for troubleshooting.
The etcd database monitoring is particularly important since EKS has an 8 GB recommended limit, and exceeding it makes clusters read-only. CloudWatch alarms can trigger at 80 percent capacity (6.4 GB) using the apiserver_storage_size_bytes metric, giving teams time to clean up unnecessary resources before hitting the limit.
Application Signals provides automatic instrumentation for Java applications with pre-built dashboards tracking traffic, latency, and availability at a 5 percent sampling rate.
The feature integrates with CloudWatch anomaly detection using machine learning to identify unusual patterns in metrics like node_cpu_utilization without manual threshold configuration.

21:15 Ryan – “I like this, except for the fact that it’s an operator…I don’t understand why this isn’t just configuration options in your cluster.”

21:59 Amazon ECS Managed Instances now supports Amazon EC2 Spot Instances

ECS Managed Instances now supports EC2 Spot capacity, allowing customers to run fault-tolerant containerized workloads at up to 90% discount compared to On-Demand pricing while AWS handles all infrastructure management.
You configure a new capacityOptionType parameter as spot or on-demand in your capacity provider settings.
This extends ECS Managed Instances beyond its existing capabilities of automatic provisioning, dynamic scaling, and cost-optimized task placement. AWS still handles the infrastructure operations through AWS-controlled access in your account, but now you can choose between spot and on-demand capacity types alongside existing options for GPU, network-optimized, and burstable instance families.
The feature is available in all AWS Regions where ECS Managed Instances currently operate. Pricing includes both the spot EC2 instance costs and an additional management fee for the compute provisioning service, though specific management costs are not disclosed in the announcement.
This targets customers running stateless or fault-tolerant containerized applications like batch processing, CI/CD pipelines, or web services that can handle interruptions.
The combination of managed infrastructure and spot pricing addresses a common challenge where teams want cost savings from spot instances but lack resources to manage the complexity of spot interruptions and capacity management.

24:07 Enhance Amazon EKS network security posture with DNS and admin network policies | Containers

Amazon EKS now supports DNS-based and Admin network policies, allowing teams to control pod traffic using stable domain names instead of constantly changing IP addresses.
This eliminates the operational overhead of maintaining IP allowlists for AWS services, on-premises systems, and third-party APIs while providing centralized policy management across multiple namespaces.
Admin network policies operate in two tiers with hierarchical enforcement that cannot be overridden by namespace-level policies, enabling platform teams to enforce mandatory security controls like blocking access to EC2 Instance Metadata Service at 169.254.169.254.
The policies use label-based segmentation to apply security standards across multiple namespaces simultaneously, reducing the need for per-namespace policy management.
DNS-based policies are available in EKS Auto mode clusters version 1.29 and later, while Admin policies work in both EKS Auto mode and EC2-based clusters running VPC CNI version 1.21.1 or later.
The feature removes the need for third-party network policy tools and integrates with existing Kubernetes NetworkPolicy resources for defense-in-depth security.
The policy evaluation order follows a strict hierarchy: Admin tier Deny rules take precedence over everything, followed by Admin Allow rules, then namespace-scoped policies, and finally Baseline tier policies.
This ensures security teams can enforce organization-wide controls while still allowing application teams flexibility for namespace-specific requirements.
Real-world applications include multi-tenant environments where different applications need controlled access to specific AWS services like S3 or DynamoDB using patterns like asterisk.s3.amazonaws.com, and hybrid cloud scenarios where workloads access on-premises databases through stable DNS names that remain valid even as underlying infrastructure changes.

24:17 Justin – “Thank you, Jesus.”

27:46 Ryan – “If you are a traditional engineer listening to our show, this is an example of something where you can take your skillset and add a ton of value.”

28:00 AWS raises GPU prices 15% on a Saturday • The Register

AWS increased prices for EC2 Capacity Blocks for ML by approximately 15 percent over the weekend, with p5e.48xlarge instances jumping from $34.61 to $39.80 per hour in most regions.
This marks a departure from AWS’s two-decade pattern of price reductions and represents one of the first straight increases to a line item not tied to regulatory requirements.
Capacity Blocks allow customers to reserve guaranteed GPU capacity for ML training jobs from one day to several weeks in advance with locked-in rates paid upfront.
AWS attributes the increase to supply and demand patterns for this quarter, reflecting the global GPU shortage driven by increased AI workload demand across the industry.
The price increase creates complications for customers with Enterprise Discount Programs, as their percentage discounts remain the same, but absolute costs rise by 15 percent.
This gives competitors like Azure and GCP a direct talking point for enterprise sales conversations, though whether they can absorb the demand remains uncertain given industry-wide GPU constraints.
The change establishes a precedent that could extend to other resource-constrained services, particularly RAM-intensive offerings that touch nearly every AWS service.
The timing and execution on a Saturday with minimal announcement suggest AWS is testing customer response to price increases after conditioning the market to expect only decreases.
This affects primarily enterprise customers running serious ML workloads with budgets in the millions, as Capacity Block pricing targets teams that cannot afford training run interruptions.
The broader concern is whether this signals a shift in AWS’s pricing strategy across other services where supply constraints or cost increases exist.

29:31 Matt – “I don’t think it’s a broader concern; but I think it’s the first real time you’re seeing a dramatic increase, and it’s been a fear for many companies for many years…what if they raise the prices and there’s nothing we can do because we’re already there? And they’re doing it, and there’s not much you CAN do.”

30:51 EC2 Capacity Manager now includes Spot interruption metrics

EC2 Capacity Manager adds three new Spot interruption metrics at no additional cost across all commercial AWS regions.
The metrics track total Spot instance count, interruption counts, and interruption rates across regions, availability zones, and accounts to help optimize Spot placement strategies.
The new visibility helps customers make data-driven decisions about Spot instance diversification by identifying patterns in interruptions.
Organizations can use this data to determine which availability zones or instance types experience fewer interruptions and adjust their Spot strategies accordingly.
This enhancement integrates with existing Spot placement score functionality to provide a complete picture of Spot capacity management.
Customers can now correlate predicted availability scores with actual interruption data to validate and refine their capacity planning decisions.
The metrics are particularly valuable for organizations running large-scale Spot fleets where even small improvements in interruption rates translate to meaningful cost savings.
By tracking interruption rates over time, teams can measure the effectiveness of their diversification strategies and identify opportunities to expand into more stable capacity pools.

31:11 Justin – “Or…you could just make this a service I could subscribe to.”

GCP

32:50 Looker self-service Explores, tabbed dashboards, custom themes | Google Cloud Blog

Y’all can thank Ryan if you’re not into this particular story. Hit him up on Slack and let him know your thoughts.
Looker now allows users to upload CSV and spreadsheet files directly into the platform through a drag-and-drop interface in the new self-service Explores feature, currently in Public Preview.
This bridges the gap between governed data models and ad-hoc analysis by letting users combine local files with existing Looker data while maintaining administrator oversight on uploads and permissions.
The new tabbed dashboard feature helps organize complex dashboards into logical sections with automatic filter propagation across tabs, reducing visual clutter by showing only relevant filters per view.
Users can share specific tab URLs and export entire multi-tab dashboards as single PDF documents, making it easier to present cohesive data narratives.
Internal dashboard theming is now available in Public Preview, enabling organizations to customize tile styles, colors, fonts, and formatting to match corporate branding within the Looker application.
Administrators can create reusable theme templates and set default themes across entire instances to ensure consistency.
A new content certification flow helps distinguish between ad-hoc experiments and vetted data sources, addressing governance concerns when users upload their own datasets.
This feature works alongside administrator controls to maintain data quality standards while enabling self-service capabilities.
These features are available starting with Looker version 25.20 and can be enabled through the Admin Labs page, with no specific pricing changes announced as they appear to be included in existing Looker subscriptions.

34:06 Ryan – “For everyone that has to supply you with pretty graphs and pictures, this is very important. It is very difficult to sort of modify and work with existing data sets in any BI tool, and so this is another knob that you can put. And I could use something like this for just uploading a very easy CSV of like product names or usernames or something that’s just a list, versus having to parse that out of a very large data set, which may have a combination of structured and unstructured data or just bad schema adherence. And so this is sort of a nice tool for being able to create those types of things.”

35:28 Optimizing AlloyDB AI text-to-SQL accuracy | Google Cloud Blog

AlloyDB AI’s natural language API, currently in preview, enables developers to build agentic applications that translate natural language questions into SQL queries with near-100% accuracy.
The system uses descriptive context like table descriptions, prescriptive context including SQL templates and facets for complex conditions, and a value index to disambiguate database-specific terms that foundation models wouldn’t recognize.
The API addresses a critical business need where 80-90% accuracy isn’t sufficient, particularly in industries like real estate search and retail, where poor query interpretation directly impacts conversions and revenue.
Users can iteratively improve accuracy through a hill-climbing approach, starting with out-of-the-box capabilities and progressively adding context to handle nuanced questions like “homes near good schools” that require specific business logic for terms like “near” and “good.”
The system provides explainability features that show users what the API understood their question to mean, allowing agents and end users to verify the interpretation even when accuracy isn’t perfect.
This transparency helps mitigate the impact of occasional misinterpretations while the system approaches 100% accuracy for specific use cases.
Integration options include MCP Toolbox for Databases for developers writing AI tools or Gemini Enterprise for no-code agentic programming, allowing conversational applications that combine web knowledge with database queries. The technology works across structured, unstructured, and multimodal data using AlloyDB’s vector search, text search, and AI operators like AI.IF for semantic conditions.
Google plans to expand this natural language capability beyond AlloyDB to a broader set of Google Cloud databases, though specific timelines and pricing details for the preview or general availability weren’t disclosed in the: announcement.

36:43 Justin – “Natural language query – I am here for it.”

37:56 New Enhanced Tool Governance in Vertex AI Agent Builder | Google Cloud Blog

Google introduces enhanced tool governance for Vertex AI Agent Builder through Cloud API Registry integration, allowing administrators to centrally manage and curate approved tools across their organization while developers access them via a new ApiRegistry object in the Agent Development Kit.
This addresses the duplicate work problem where developers previously built tools separately for each agent and gives enterprises better control over what data and APIs their AI agents can access.
The Agent Development Kit now supports Gemini 3 Pro and Flash models with full TypeScript compatibility, plus improved state management features including automatic recovery from failures, human-in-the-loop pause and resume capabilities, and conversation rewind functionality.
The new Interactions API integration provides consistent multimodal input/output handling across agents, while A2UI enables agents to pass UI components directly to applications without the security risks of executable code.
Agent Engine sessions and memory bank reach general availability, powered by Google Cloud AI Research’s topic-based approach for managing both short-term and long-term agent memory across interactions.
The service expands to seven additional regions globally, with runtime pricing reduced and billing for additional Agent Engine services beginning January 28, 2026 (specific pricing details available in documentation).
Customer implementations show practical benefits: Burns & McDonnell uses Agent Builder to transform project data into real-time intelligence, Payhawk reduced expense submission time by over 50 percent through Memory Bank’s context retention, and Gurunavi projects a 30 percent improvement in user experience for their restaurant discovery app by remembering user preferences and patterns.
The platform now includes Vertex AI Agent Garden with one-click deployment of curated agent samples and an Agent Starter Pack providing production-ready templates for building, testing, and deploying agents.
Apigee integration allows organizations to transform existing managed APIs into custom MCP servers, bringing multi-cloud tools into a centralized catalog through Cloud API Registry.

38:47 Ryan – “This just goes to show how early we are in this ecosystem. Companies are just starting to sort of get wise that they’ve got a whole bunch of developers using these platforms, and they’re all kind of doing their own things and separate little silos and there’s very little ability to share or get any kind of optimization with those central resources… I do think that this is a good thing.”

39:55 Introducing VM Extensions Manager | Google Cloud Blog

Google launches VM Extensions Manager in preview to centralize and automate the installation and lifecycle management of OS agents across Compute Engine fleets.
The service eliminates manual scripting and startup script dependencies by providing policy-driven control that can reduce operational overhead from months to hours, according to Google.
The preview supports three critical extensions at launch: Cloud Ops Agent for telemetry collection, Agent for SAP for monitoring SAP workloads, and Agent for Compute Workloads for workload evaluation.
Administrators can pin specific extension versions or let the system automatically deploy the latest releases, with more extensions planned for future support.
VM Extensions Manager offers two rollout speeds for global policies: SLOW mode executes zone-by-zone deployments over 5 days by default to minimize risk, while FAST mode enables immediate fleet-wide updates for urgent security patches.
Zonal policies at the project level are available now, with global policies and organization or folder-level policies coming in the following months.
The service integrates directly into the existing compute.googleapis.com API without requiring new API enablement or discovery, allowing administrators to start creating policies immediately through the Cloud Console or gcloud CLI. Documentation is available here.

42:18 Matt – “I like that they released both of those day one – both slow and fast mode.”

43:23 Cloud SQL for MySQL introduces optimized writes | Google Cloud Blog

Cloud SQL for MySQL Enterprise Plus edition now includes optimized writes, a feature that automatically tunes five different MySQL parameters and configurations based on real-time workload metrics to improve write performance.
The feature is enabled by default on all Enterprise Plus instances and requires no manual intervention or configuration changes.
Google reports up to 3x better write throughput compared to the standard Enterprise edition, with reduced latency, particularly beneficial for write-intensive OLTP workloads.
Performance gains vary based on machine configuration, and the feature complements the existing SSD-backed data cache that provides up to 3x higher read throughput.
The optimized writes feature works by automatically adjusting MySQL flags, data handling, and parameters in response to instance and workload characteristics.
Customers can benchmark the improvements using sysbench by comparingthe Enterprise edition, the Enterprise Plus without optimized writes, and the Enterprise Plus with optimized writes enabled.
Existing Cloud SQL instances can upgrade to the Enterprise Plus edition in-place to access optimized writes, though specific pricing details for the Enterprise Plus tier are not provided in the announcement.
The feature targets organizations running write-heavy database workloads that previously required manual MySQL tuning and optimization efforts.

Azure

43:23 Microsoft announces acquisition of Osmos to accelerate autonomous data engineering in Fabric – The Official Microsoft Blog

Microsoft acquires Osmos to bring agentic AI capabilities to Fabric for autonomous data engineering workflows. Osmos uses AI agents to automate data preparation tasks that typically consume most of data teams’ time, transforming raw data into analytics-ready assets in OneLake without manual intervention.
The acquisition addresses a common enterprise challenge where organizations have abundant data but lack efficient ways to make it actionable.
Osmos will integrate into Microsoft Fabric’s unified data platform, allowing AI agents to handle data connection, preparation, and transformation tasks that currently require significant manual effort and technical expertise.
The Osmos team joins Microsoft’s Fabric engineering organization to advance autonomous data operations within the existing Fabric ecosystem.
This builds on Fabric’s existing capabilities around OneLake, Power BI, and unified data analytics by adding intelligent automation for data engineering workflows.
No pricing details or availability timeline were announced, though Microsoft indicates integration updates will be shared through the Microsoft Fabric Blog. The acquisition targets organizations spending excessive resources on data preparation rather than analysis, particularly those already invested in the Fabric ecosystem.

45:32 Ryan – “As long as they deliver on the promise. There’s been solutions that make the same promise – not with AI… and it just never works the way it should. Drives me nuts that it’s so failure prone. As long as the AI and Agentic add to these things, that’s fantastic.”

46:38 Microsoft’s strategic AI datacenter planning enables seamless, large-scale NVIDIA Rubin deployments | Microsoft Azure Blog

Azure is deploying NVIDIA’s next-generation Rubin platform at scale, with infrastructure already designed to handle its power, cooling, and networking requirements.
Microsoft’s Fairwater datacenters in Wisconsin and Atlanta can accommodate Rubin’s 50 petaflops per chip and 3.6 exaflops per rack without retrofitting, representing a five-times performance jump over GB200 systems.
The deployment leverages Azure’s systems approach where compute, networking, storage, and infrastructure work as an integrated platform.
Key technical enablements include support for sixth-generation NVLink with 260 TB/s bandwidth, ConnectX-9 1,600 Gb/s networking, HBM4 memory thermal management, and pod exchange architecture for rapid hardware servicing without extensive rewiring.
Azure’s track record includes operating the world’s largest commercial InfiniBand deployments and being first to deploy both GB200 and GB300 NVL72 platforms at scale.
The company’s multi-year collaboration with NVIDIA on co-design means Rubin integrates directly into existing infrastructure, enabling faster customer deployments compared to competitors who need infrastructure upgrades.
Microsoft’s regional superfactory approach differs from other hyperscalers’ single megasite strategy, allowing more predictable global rollout of new AI capabilities.
This modular design combined with Azure Boost offload engines, liquid cooling systems, and optimized orchestration through CycleCloud and AKS aims to maximize GPU utilization and deliver better performance per dollar at cluster scale.

Oracle

46:38 Oracle is Set to Power on New Data Center in Michigan

Oracle is building a new data center in Saline Township, Michigan specifically to serve OpenAI’s infrastructure needs, marking another major cloud capacity expansion for AI workloads.
The facility will use closed-loop non-evaporative cooling systems that consume water comparable to an average office building rather than millions of gallons daily like traditional evaporative systems.
The project includes a 17-year power agreement with DTE Energy where Oracle pays 100% of energy costs including new transmission lines and an onsite substation, with Michigan law prohibiting utilities from passing data center costs to existing ratepayers.
Oracle claims its large customer contribution to DTE’s fixed costs will reduce overall energy costs for other customers by approximately $300 million annually by 2029-2030.
The facility will create 2,500 union construction jobs and 450 permanent on-site positions plus an estimated 1,500 jobs across Washtenaw County, with construction scheduled to begin in Q1 2026. The project includes $8 million annually for local schools, $1.6 million yearly in direct tax revenue for Saline Township, and over $14 million in community benefits.
Oracle is developing only 250 of 575 acres with the remaining land protected as open space, farmland, wetlands and woodlands including 47.5 acres in conservation easement.
This represents Oracle’s 148th data center with 64 more under construction globally, though the company provides no specific pricing or service details for customers beyond OpenAI.

52:15 Ryan – “But are you trading the water concern for the high energy costs?”

Closing

336: We Were Right (Mostly), 2026: The New Prophecies

Tue, 13 Jan 2026 04:49:10 +0000

Welcome to episode 335 of The Cloud Pod, where the forecast is always cloudy! Welcome to the first show of 2026, and it’s a full house, too! Justin, Jonathan, Ryan, and Matt are all here to reflect on 2025, plus bring you their predictions for 2026.

Let’s get started!

Titles we almost went with this week

SQL Me Maybe: AlloyDB Gets Chatty With Your Database **OpenAI
SELECT * FROM natural_language WHERE accuracy LIKE ‘100%’ **Anthropic
etcd You Were Worried About Database Limits: CloudWatch Has Your Back
CSV You Later: Looker Adds Drag-and-Drop Data Uploads
AWS Spots an Opportunity to Manage Your Container Costs
EKS Network Policies: No More IP Address Whack-a-Mole
AWS Security Hub Splits: It’s Not You, It’s CSPM
Spot On: ECS Finally Manages Your Cheapest Compute
TOON Squad: DigitalOcean’s New Format Makes JSON Look Bloated
The Price is Wrong: AWS Breaks Two Decades of Downward Pricing Tradition
Show Your Work: Why AI-Generated Code Without Tests is Just Expensive Spam
No More Agent Orange: Google Simplifies VM Extension Deployment
AWS Discovers Prices Can Go Both Ways, Raises GPU Costs 15 Percent
Sovereignty Washing: When Your European Cloud Still Answers to Uncle Sam
Agent Builder Gets a Memory Upgrade: Google’s AI Finally Remembers Where It Put Its Keys
Ctrl+F for the Future: A year-end Scorecard & Next-Gen Bets
AI Agents, GPU Prices, and The best of the Cloud Pod 2025
Beyond the Hype: The Cloud Pods Definitive 2025 Year in Review
Apocalypse Now… What? Our 2026 Forecast

Follow Up

01:27 RYAN’S PREDICTIONS

Prediction Status Notes Quick LLM models for individuals ACCURATE Meta-Llama-3.1-8B-Instruct, GLM-4-9B-0414, and Qwen2.5-VL-7B-Instruct—each chosen for an outstanding balance of performance and computational efficiency, making them ideal for edge AI deployment. A new AI inference application called Inferencer allows even modest Apple Mac computers to run the largest open-source LLMs. AI at the edge natively (Lambda-esque) ACCURATE Akamai launched a new Inference Cloud product for edge AI using Nvidia’s Blackwell 6000 GPUs in 17 cities. AWS IoT Greengrass with Lambda functions for edge logic. “Edge AI allows for instant decision-making where it matters most—close to the data source.” Cloud native security mesh multi-cloud UNCLEAR Service mesh technologies continue to evolve (Istio, Linkerd), but I didn’t find a breakthrough “app-to-app at the edge” security mesh product announcement in 2025. This one needs more specific evidence.

Ryan Score: 2/3

02:25 MATTHEW’S PREDICTIONS

Prediction Status Notes FOCUS adopted by Snowflake or Databricks ACCURATE FOCUS version 1.2 was ratified on May 29, 2025. Three new providers announced support: Alibaba Cloud, Databricks, and Grafana. Databricks officially adopted FOCUS! AI security/ethical standard (SOC or ISO) ACCURATE ISO 42001 is the first international standard outlining requirements for AI governance. Major companies achieving certification in 2025: Automation Anywhere is among the first 100 companies worldwide to earn ISO/IEC 42001:2023 certification. Anthropic also achieved ISO 42001 certification. Amazon deprecates 5+ services (WorkMail bonus) ACCURATE (no bonus) 19 services are mothballed, four are being sunset, and one is end of its supported life. Deprecated services include CodeCommit, Cloud9, S3 Select, CloudSearch, SimpleDB, Forecast, Data Pipeline, QLDB, Snowball Edge, and more. WorkMail NOT deprecated – WorkDocs was (April 2025), but WorkMail remains active.

Matthew Score: 3/3

03:22 JONATHAN’S PREDICTIONS

Prediction Status Notes Company claims AGI achieved ACCURATE Integral AI, founded by ex-Google veteran Jad Tarifi, claims to have built a world-first AGI model (December 2025). Also, Sam Altman called GPT-5 “a significant step along the path to AGI” at release. AI agents booking reservations/real-world tasks FULLY ACCURATE OpenAI’s Operator can execute tasks like filling out forms, managing online reservations, and even booking tickets to sporting events. Google AI Mode’s agentic capabilities help take the hassle out of booking restaurant reservations, event tickets, or beauty and wellness appointments. Models that can learn in real-time PARTIALLY ACCURATE Extended context windows and memory systems have improved dramatically. Claude 4 has “memory capabilities, extracting and saving key facts to maintain continuity.” However, true real-time learning/weight updates during conversations haven’t fully materialized yet.

Jonathan Score: 2.5/3

05:07 JUSTIN’S PREDICTIONS

Prediction Status Notes GPT-5, Claude 4, and Gemini 3.0 FULLY ACCURATE GPT-5 (August 7, 2025), Claude 4 (May 22, 2025), Gemini 3 (November 18, 2025). All three major models have been released! Plus, we’ve already seen GPT-5.1, GPT-5.2, and Claude Opus 4.5. OpenAI is not seen as a leader ACCURATE ChatGPT’s user growth is slowing, and Google’s Gemini is gaining ground. Anthropic now holds 32% of the enterprise LLM market share by usage, with OpenAI at 25%—a sharp reversal from 50% vs. 12% in 2023. Sam Altman issued a “code red” memo following the release of Gemini 3. 10+ companies RTO 5 days after Q2 PARTIALLY ACCURATE Major announcements after Q2: Novo Nordisk, Paramount Skydance, NBCUniversal, Instagram, Starbucks, Samsung, Freddie Mac. Many 5-day mandates took effect in 2025 (Amazon, AT&T, JPMorgan, Dell), but several were announced pre-Q2. Close call.

Justin Score: 2.5/3

JONATHAN’S PREDICTIONS

Jonathan Score: 2.5/3

FINAL STANDINGS

Host Score Grade Matthew 3/3 A+ Justin 2.5/3 A Jonathan 2.5/3 A Ryan 2/3 B+

Key Takeaways for the Pod

The AI model predictions were NAILED – All three major model releases happened exactly as predicted.
OpenAI’s dominance really did slip – Anthropic now leads enterprise, Gemini is surging, Sam issued “code red.”
AI agents are HERE – OpenAI Operator and Google AI Mode are booking real reservations.
AWS deprecation wave was massive – Way more than 5 services axed (but WorkMail survived!)
Edge AI exploded – Akamai, AWS, and others went all-in on inference at the edge.e

Solid predictions all around – Matthew takes the crown!

06:08 Jonathan – “That’s good; it only took us 6 years to know what the hell we’re talking about!”

06:23 2025 Stats Review

We covered 1,308 stories from 15 different, unique sources.
Amazon accounted for 39% of those stories.
Ryan’s favorite, Azure, made up 22.9% of the stories (Thanks, Matt…)
GCP was 38.1% of our news announcements.
The official blogs from cloud providers, including AWS, Azure, and GCP, made up the bulk of the sources for the above stories.
This is an interesting change from the first year we recorded, 2019, when AWS accounted for 73% of the announcements.
When it comes to host participation, only 6 shows had all four hosts participating. Justin was present for 95%, Ryan for 85%, Matt recorded 78% (not bad with a new baby, honestly), and we had Jonathan for 12 episodes.
We only had one guest, and increasing the number of guests is one of our 2026 resolutions, so thanks to Elise for joining us.
AI was mentioned 526 times, averaging 12.2x per episode (which seems low to the show note editor), and has definitely been growing each year exponentially.
Outages were discussed 19 times (boooo).
And we got to talk about our favorite topic, deep-sea cables, 5 times.
There were 58.9 hours of runtime over the course of 49 shows, with an average length of 72 minutes.
The in memorium includes AWS Cloud Search, Glacier, Migration Hub, S3 Object Lambda, Azure Consumption API, dial-up internet, and RC4 encryption, among many others. RIP.
The most mentioned non-hyperscaler company was OpenAI, followed closely by Nvidia and Antropic.
Lastly, Justin has updated our show LLM Bolt, building a brand new data pipeline for the podcast, which will include show notes, transcripts, etc., all with a new AI-based search. Want to check it out? Join our Slack channel!

16:28 Ryan – “I’m having a similar experience mostly in my day job… trying to use AI for different workloads and then falling back into more traditional technologies or different ways, and at first I thought it was just like old dog, new tricks, just falling back in the comfort zone. But I find more and more I’m identifying things that, you know, the large language models just are not good at. And I think a lot of stats and the metrics, it feels like it should be able to do that, right? Because it’s conversational and you’re building a corpus of data for the model to query and do all that, but that it really can’t, right? And so, fortunately, we do have machine learning technologies and the ability to do notebooks and stuff. And agentic can absolutely help you make the notebook, but it can’t do the analysis for you, which I find funny.”

To be a good vibe coder, you need to be an experienced programmer, you need to have business experience, and I don’t think the people who are vibe coding right now are getting really good results if they don’t have that kind of background.”

https://tcp-media.s3.us-west-2.amazonaws.com/2025_year_in_review.html

25:54 Favorite Announcements

- Justin:
  - Amazon saying F*** your security to Microsoft was great.
    - Episode 287: Recorded for the week of Jan 8th, 2025: The Cloud Pod rebrands to The Cloud AI so we can get a 1B valuation.
    - https://www.csoonline.com/article/3625205/amazon-refuses-microsoft-365-deployment-because-of-lax-cybersecurity.html
  - Episode 303 – Someday You Will Find Me, Caught beneath the AI Landslide, in a Champagne Premier Nova in the Sky, from May 18th.
    - https://aws.amazon.com/blogs/aws/amazon-nova-premier-our-most-capable-model-for-complex-tasks-and-teacher-for-model-distillation/
  - Episode 288: Recorded for the week of Jan 14th, 2025: You might be able to retrain Notebook LM hosts to be less annoyed, but not your cloud pod hosts
    - https://www.theverge.com/2025/1/6/24337530/nvidia-ces-digits-super-computer-ai
  - Episode 322: Recorded for September 16th, 2025: Did OpenAI and Microsoft break up?… It’s complicated
    - https://www.anthropic.com/news/claude-4
- Matt:
  - Chime is dead: Update on Support for Amazon Chime
    - episode 294: “Ding: Chime is Dead”** (recorded for the week of February 25th, 2025).
  - GitHub Will Prioritize Migrating to Azure Over Feature Development – The New Stack
    - Episode 317** (“I Got 99 Problems, But a Hallucination Ain’t One”).
    - https://thenewstack.io/github-will-prioritize-migrating-to-azure-over-feature-development/
  - Claude on Azure
    - **Episode 331** is where Claude’s big Azure announcement happened!
    - The episode title says it all: “Claude Gets a $30 Billion Azure Wardrobe and Two New Best Friends” (published November 18, 2025).
- Ryan:
  - A2A protocol

Jonathan:

- DeepSeek is stirring things up
- AWS Frontier Agents

47:35 2026 Predictions

Matt
- A Major GCP Outage will occur
- A step forward in quantum computing (A quantum leap into 2026)
- A new MicroHyperscaler will go into the market at the same level as Digital Ocean
Justin
- AI Layoff Regret
- AI Agent Security Breach (Agent that breaches an organization and exfiltrates data)
- AI-designed web instead of Eyeballs/Humans
Ryan
- Multi-Agent Orchestration will blow up in a big way. Major providers of more A2A integrations of workflows between services/clouds
- Infrastructure as Code will turn into Infrastructure as Intent.
- Full Stack Media Creation company with AI? With CMS and Providence tracking and watermarking. Tooling/etc.
Jonathan
- Highly Visible company bankruptcy due to rising AI/GPU/Inference Costs.
- Explosion of Competition against existing SaaS companies
- An entirely AI-generated Podcast episode from the cloud pod

56:11 Ryan – “Trying to think through emerging threats on technology that I barely understand – because it’s coming out so fast – it’s changing the way we work. You’re already starting to see AI in attacks where groups of people are using AI to put together pretty sophisticated attacks on companies. It’s a lot easier for natural language speakers to generate content for spearfishing; it’s a lot easier for malicious actors to have an AI agent to do a bunch of research on a company real quick, and this is where I think it will be weak.”

Closing

335: EKS Network Policies: Now With More Layers Than Your Security Team's Org Chart

Wed, 24 Dec 2025 16:43:16 +0000

Welcome to episode 335 of The Cloud Pod, where the forecast is always cloudy! This pre-Christmas week, Ryan and Justin have hit the studio to bring you the final show of 2025. We’ve got lots of AI images, EKS Network Policies, Gemini 3, and even some Disney drama.

Let’s get into it!

Titles we almost went with this week

From Roomba to Tomb-ba: How the Robot Vacuum Pioneer Got Cleaned Out **OpenAI
From Napkin Sketch to Production: Google’s App Design Center Goes GA
Terraform Gets a Canvas: Google Paints Infrastructure Design with AI
Mickey Mouse Takes Off the Gloves: Disney vs Google AI Showdown
From Data Silos to Data Solos: Google Conducts the Integration Orchestra
No More Thread Dread: AWS Brings AI to JVM Performance Troubleshooting
MCP: More Corporate Plumbing Than You Think
GPT-5.2 Beats Humans at Work Tasks, Still Can’t Get You Out of Monday Meetings
Kerberos More Like Kerbero-Less: Microsoft Axes Ancient Encryption Standard
OpenAI Teaches GPT-5.2 to PowerPoint: Death by Bullet Points Now AI-Generated
MCP: Like USB-C, But Everyone’s Keeping Theirs in the Drawer
Flash Gordon: Google’s Gemini 3 Gets a Speed Boost Without the Sacrifice
Tag, You’re It: AWS Finally Knows Who to Bill
Snowflake Gets a GPT-5.2 Upgrade: Now With More Intelligence Per Query
OpenAI and Snowflake: Making Data Warehouses Smarter Than Your Average Analyst
GPT-5.2 Moves Into the Snowflake: No Melting Required

AI Is Going Great, or How ML Makes Money

01:06 Meta’s multibillion-dollar AI strategy overhaul creates culture clash:

Meta is developing Avocado, a new frontier AI model codenamed to succeed Llama, now expected to launch in Q1 2026 after internal delays related to training performance testing.
The model may be proprietary rather than open source, marking a significant shift from Meta’s previous strategy of freely distributing Llama’s weights and architecture to developers. We feel like this is an interesting choice for Meta, but what do we know?
Meta spent 14.3 billion dollars in June 2025 to hire Scale AI founder Alexandr Wang as Chief AI Officer and acquire a stake in Scale, while raising 2026 capital expenditure guidance to 70-72 billion dollars.
- Wang now leads the elite TBD Lab developing Avocado, operating separately from traditional Meta teams and not using the company’s internal workplace network.
The company has restructured its AI leadership following the poor reception of Llama 4 in April, with Chief Product Officer Chris Cox no longer overseeing the GenAI unit.
Meta cut 600 jobs in Meta Superintelligence Labs in October, contributing to the departure of Chief AI Scientist Yann LeCun to launch a startup, while implementing 70-hour workweeks across AI organizations.
Meta’s new AI leadership under Wang and former GitHub CEO Nat Friedman has introduced a “demo, don’t memo” development approach, replacing traditional multi-step approval processes with rapid prototyping using AI agents and newer tools.
The company is also leveraging third-party cloud services from CoreWeave and Oracle while building the 27 billion dollar Hyperion data center in Louisiana.
Meta’s Vibes AI video product, launched in September, trails OpenAI’s Sora 2 in downloads, and was criticized for lacking features like realistic lip-synced audio, while the company increasingly relies on external AI models from Black Forest Labs and Midjourney rather than exclusively using internal technology.

02:23 Ryan – “I guess I really don’t understand the business of the AI models. I guess if you’re going to offer a chat service, you have to have a proprietary model, but it’s kind of strange.”

03:04 Disney says Google AI infringes copyright “on a massive scale” – Ars Technica

Disney has issued a cease and desist letter to Google alleging copyright infringement through its generative AI models, claiming Google trained its systems on Disney’s copyrighted content without authorization and now enables users to generate Disney-owned characters like those from The Lion King, Deadpool, and Star Wars.
This represents one of the first major legal challenges from a content owner with substantial legal resources against a cloud AI provider.
The legal notice targets two specific violations: Google’s use of Disney’s copyrighted works in training data for its image and video generation models, and the distribution of Disney character reproductions to end users through AI-generated outputs.
- Disney demands the immediate cessation of using its content and the implementation of safeguards to prevent the future generation of Disney-owned intellectual property.
This case could establish important precedents for how cloud providers handle copyrighted training data and implement content filtering in AI services.
The outcome may force cloud AI platforms to develop more sophisticated copyright detection systems or negotiate licensing agreements with content owners before deploying generative models.
Disney’s involvement brings considerable legal firepower to the AI copyright debate, as the company has historically shaped US copyright law through decades of litigation to protect its intellectual property.
Cloud providers offering generative AI services may need to reassess their training data sources and output filtering mechanisms to avoid similar legal challenges from other major content owners.

04:06 Ryan – “Disney – suing for copyright infringement – shocking.”

04:54 Disney invests $1 billion in OpenAI, licenses 200 characters for AI video app Sora – Ars Technica

Disney invests $1 billion in OpenAI and licenses over 200 characters from Disney, Marvel, Pixar, and Star Wars franchises for use in Sora video generator.
This marks the first major Hollywood studio content licensing deal for OpenAI’s AI video platform, which launched in late September and faced industry criticism over copyright concerns.
The three-year licensing agreement allows Sora users to create short video clips featuring licensed Disney characters, representing a shift from OpenAI’s previous approach of training models on copyrighted material without permission.
This deal is notable given Disney’s history of aggressive copyright protection and lobbying that shaped modern US copyright law in the 1990s.
OpenAI has been pursuing content licensing deals with major IP holders after facing multiple lawsuits over unauthorized use of copyrighted training data.
The company previously argued that useful AI models cannot be created without copyrighted material, but has shifted strategy since becoming well-funded through investments.
The partnership aims to extend Disney’s storytelling reach through generative AI while addressing creator concerns about unauthorized use of intellectual property.
Disney CEO Robert Iger emphasized the company’s commitment to respecting and protecting creators’ works while leveraging AI technology for content creation.
This deal could establish a precedent for how AI companies and content owners structure licensing agreements, potentially influencing how other studios and IP holders approach AI-generated content partnerships.
The financial terms suggest significant value in controlled character licensing for AI applications.

06:26 Ryan – “Is it just a way to get out of the lawsuit so they can generate the content?”

07:12 The new ChatGPT Images is here | OpenAI

OpenAI released GPT Image 1.5, their new flagship image generation model, now available in ChatGPT for all users and via API.
The model generates images up to 4x faster than the previous version and includes a dedicated Images feature in the ChatGPT sidebar with preset filters and prompts for quick exploration.
The model delivers improved image editing capabilities with better preservation of original elements like lighting, composition, and people’s appearance across edits.
It handles precise modifications, including adding, subtracting, combining, and blending elements while maintaining consistency, making it suitable for practical photo edits and creative transformations.
GPT Image 1.5 shows improvements in text rendering with support for denser and smaller text, better handling of multiple small faces, and more natural-looking outputs.
The model follows instructions more reliably than the initial version, enabling more intricate compositions where relationships between elements are preserved as intended.
API pricing for GPT Image 1.5 is 20% cheaper than GPT Image 1 for both inputs and outputs, allowing developers to generate and iterate on more images within the same budget.
The model is particularly useful for marketing teams, ecommerce product catalogs, and brand work requiring consistent logo and visual preservation across multiple edits.
The new ChatGPT Images model works across all ChatGPT models without requiring manual selection, while the earlier version remains available as a custom GPT.
Business and Enterprise users will receive access to the new Images experience later, with the API version available now through OpenAI Playground.

07:38 Justin – “It’s very competitive against Nano Banana, and I was looking at some of the charts, and it’s already jumped to the top of the charts.”

08:52 Introducing GPT-5.2 | OpenAI

OpenAI has released GPT-5.2, now generally available in ChatGPT for paid users and via API as gpt-5.2, with three variants: Instant for everyday tasks, Thinking for complex work, and Pro for the highest-quality outputs.
The model introduces native spreadsheet and presentation generation capabilities, with ChatGPT Enterprise users reporting 40-60 minutes saved daily on average.
GPT-5.2 Thinking achieves a 70.9% win rate against human experts on GDPval benchmark spanning 44 occupations and sets new records on SWE-Bench Pro at 55.6% (80% on SWE-bench Verified).
The model demonstrates 11x faster output generation and less than 1% the cost of expert professionals on knowledge work tasks, though human oversight remains necessary.
Long-context performance reaches near 100% accuracy on the 4-needle MRCR variant up to 256k tokens, with a new Responses compact endpoint extending the effective context window for tool-heavy workflows. Vision capabilities show roughly 50% error reduction on chart reasoning and interface understanding compared to GPT-5.1.
API pricing is set at $1.75 per million input tokens and $14 per million output tokens, with a 90% discount on cached inputs.
OpenAI reports that despite higher per-token costs, GPT-5.2 achieves a lower total cost for given quality levels due to improved token efficiency.
The company has no current plans to deprecate GPT-5.1, GPT-5, or GPT-4.1.
The model introduces improved safety features, including strengthened responses for mental health and self-harm scenarios, plus a gradual rollout of age prediction for content protections.
GPT-5.2 was built on NVIDIA H100, H200, and GB200-NVL72 GPUs in Microsoft Azure data centers, with a Codex-optimized version planned for the coming weeks.

10:06 Ryan – “I’m happy to see the improved safety features because that’s come up in the news recently and had some high-profile events happen, where it’s become a concern, for sure. So I want to see more protection in that space from all the providers.”

Cloud Tools

10:58 Cedar Joins CNCF as a Sandbox Project | AWS Open Source Blog

Cedar is an open source authorization policy language that just joined CNCF as a Sandbox project, solving the problem of hard-coded access control by letting developers define fine-grained permissions as policies separate from application code.
- It supports RBAC, ABAC, and ReBAC models with fast real-time evaluation.
The language stands out for its formal verification using the Lean theorem prover and differential random testing against its specification, providing mathematical guarantees for security-critical authorization logic. This rigor addresses the growing complexity of cloud-native authorization, where traditional ad-hoc systems fall short.
Production adoption is already strong with users including Cloudflare, MongoDB, AWS Bedrock, and Kubernetes integrations like kubernetes-cedar-authorizer.
The CNCF move provides vendor-neutral governance and broader community access beyond AWS stewardship.
Cedar offers an interactive policy playground and Rust SDK for developers to test authorization logic before deployment.
The analyzability features enable automated policy optimization and verification, reducing the risk of misconfigured permissions in production.
The CNCF acceptance fills a gap in the cloud-native landscape for a foundation-backed authorization standard, complementing existing projects and potentially becoming the go-to solution as it progresses from Sandbox to Incubation status.

12:05 Ryan – “I think this kind of policy is going to be absolutely key to managing permissions going forward.”

AWS

12:50 GuardDuty Extended Threat Detection uncovers a cryptomining campaign on Amazon EC2 and Amazon ECS | AWS Security Blog

GuardDuty Extended Threat Detection identified a coordinated cryptomining campaign starting November 2, 2025, where attackers used compromised IAM credentials to deploy miners across EC2 and ECS within 10 minutes of initial access.
The new AttackSequence: EC2/CompromisedInstanceGroup finding correlated signals across multiple data sources to detect the sophisticated attack pattern, demonstrating how Extended Threat Detection capabilities launched at re:Invent 2025 can identify coordinated campaigns.
The attackers employed a novel persistence technique using ModifyInstanceAttribute to disable API termination on all launched instances, forcing victims to manually re-enable termination before cleanup and disrupting automated remediation workflows.
They also created public Lambda endpoints without authentication and established backdoor IAM users with SES permissions, showing advancement in cryptomining persistence methodologies beyond typical mining operations.
The campaign targeted high-value GPU and ML instances (g4dn, g5, p3, p4d) through auto scaling groups configured to scale from 20 to 999 instances, with attackers first using DryRun flags to validate permissions without triggering costs. The malicious Docker Hub image yenik65958/secret accumulated over 100,000 pulls before takedown, and attackers created up to 50 ECS clusters per account with Fargate tasks configured for maximum CPU allocation of 16,384 units.
AWS recommends enabling GuardDuty Runtime Monitoring alongside the foundational protection plan for comprehensive coverage, as Runtime Monitoring provides host-level signals critical for Extended Threat Detection correlation and detects crypto mining execution through Impact:Runtime/CryptoMinerExecuted findings.
Organizations should implement SCPs to deny Lambda URL creation with an AuthType of NONE and monitor CloudTrail for unusual DryRun API patterns as early warning indicators.
The attack demonstrates the importance of temporary credentials over long-term access keys, MFA enforcement, and least privilege IAM policies, as the compromise exploited valid credentials rather than AWS service vulnerabilities. GuardDuty’s multilayered detection using threat intelligence, anomaly detection, and Extended Threat Detection successfully identified all attack stages from initial access through persistence.

55:31 Justin – “Hackers have the same tools we do for development.”

16:17 Amazon EKS introduces enhanced network policy capabilities | Containers

Amazon EKS now supports Admin Network Policies and Application Network Policies, giving cluster administrators centralized control over network security across all namespaces while allowing namespace administrators to filter outbound traffic using domain names instead of maintaining IP address lists.
This addresses a key limitation of standard Kubernetes Network Policies, which only work within individual namespaces and lack explicit deny rules or policy hierarchies.
The new Admin Network Policies operate in two tiers: Admin Tier rules that cannot be overridden by developers, and Baseline Tier rules that provide default connectivity but can be overridden by standard Network Policies.
This enables platform teams to enforce cluster-wide security requirements like isolating sensitive workloads or ensuring monitoring access while still giving application teams flexibility within those boundaries.
Application Network Policies, exclusive to EKS Auto Mode clusters, add Layer 7 FQDN-based filtering to traditional Layer 3/4 network policies, solving the problem of managing egress to external services with frequently changing IP addresses. Instead of maintaining IP lists for SaaS providers or on-premises resources behind load balancers, teams can simply whitelist domain names like internal-api.company.com, and policies remain valid even when underlying IPs change.
Requirements include Kubernetes 1.29 or later, Amazon VPC CNI plugin v1.21.0 for standard EKS clusters, and EKS Auto Mode for Application Network Policies with DNS filtering.
The feature is available now for new clusters, with support for existing clusters coming in the following weeks, though pricing remains unchanged, as this is a native capability of the VPC CNI plugin.

17:30 Ryan – “This is one of those things that’s showing a maturity level of container-driven applications. It’s been a while since security teams have been aware of some of the things you can do with network policies and routing, and so you want to empower your developers, but also being able to have a comprehensive way to ban and approve has been missing from a lot of these ingress controllers. So this is a great thing for security teams, and probably terrible for developers.”

19:12 Automate java performance troubleshooting with AI-Powered thread dump analysis on Amazon ECS and EKS | Containers

AWS has released an automated Java thread dump analysis solution that combines Prometheus monitoring, Grafana alerting, Lambda orchestration, and Amazon Bedrock AI to diagnose JVM performance issues in seconds rather than hours.
The system works across both ECS and EKS environments, automatically detecting high thread counts and generating actionable insights without requiring deep JVM expertise from operations teams.
The solution uses Spring Boot Actuator endpoints for ECS deployments and Kubernetes API commands for EKS to capture thread dumps when Grafana alerts trigger.
Amazon Bedrock then analyzes the dumps to identify deadlocks, performance bottlenecks, and thread states while providing structured recommendations across six key areas, including executive summary and optimization guidance.
Deployment is handled through CloudFormation templates available in the Java on AWS Immersion Day Workshop, with all thread dumps and AI analysis reports automatically stored in S3 for historical trending.
The architecture follows event-driven principles with modular components that can be extended to other diagnostic tools like heap dump analysis or automated remediation workflows.
The system enriches JVM metrics with contextual tags, including cluster identification and container metadata, enabling the Lambda function to determine the appropriate thread dump collection method. This metadata-driven approach allows a single solution to handle heterogeneous container environments without manual configuration for each deployment type.
Pricing follows standard AWS service costs for Lambda invocations, Bedrock LLM usage per token, S3 storage, and CloudWatch metrics, with no additional licensing fees for the open source monitoring components.
The solution addresses the common problem where only a handful of engineers on most teams can interpret thread dumps, democratizing JVM troubleshooting across operations teams.

20:55 Justin – “This tells me that if you have a bad container that crashes a lot, you could spend a lot of money on LLM usage for tokens analyzing your exact same crash dump every time. Do keep that in mind.”

22:50 EC2 Auto Scaling now offers a synchronous API to launch instances inside an Auto Scaling group

EC2 Auto Scaling introduces a new LaunchInstances API that provides synchronous feedback when launching instances, allowing customers to immediately know if capacity is available in their specified Availability Zone or subnet.
This addresses scenarios where customers need precise control over instance placement and real-time confirmation of scaling operations rather than waiting for asynchronous results.
The API enables customers to override default Auto Scaling group configurations by specifying exact Availability Zones and subnets for new instances, while still maintaining the benefits of automated fleet management like health checks and scaling policies. Optional asynchronous retries are included to help reach the desired capacity if initial synchronous attempts fail.
This feature is particularly useful for workloads that require strict placement requirements or need to implement fallback strategies quickly when capacity constraints occur in specific zones. Customers can now build more sophisticated scaling logic that responds immediately to capacity availability rather than discovering issues after the fact.
Available immediately in all AWS Regions and GovCloud at no additional cost beyond standard EC2 and EBS charges. Customers can access the feature through AWS CLI and SDKs, with documentation available at https://docs.aws.amazon.com/autoscaling/ec2/userguide/launch-instances-synchronously.

23:47 Ryan – “I find that the things that it’s allowing you to tune – it’s the things that I moved to autoscaling for; I don’t want to deal with any of this nonsense. And so you still have to maintain your own orchestration, which understands which zone that you need to roll out to, because it’s going to have to call that API.”

24:28 Announcing cost allocation using users’ attributes

AWS now enables cost allocation based on workforce user attributes like cost center, division, and department imported from IAM Identity Center.
This allows organizations to automatically tag per-user subscription and on-demand fees for services like Amazon Q Business, Q Developer, and QuickSight with organizational metadata for chargeback purposes.
The feature addresses a common FinOps challenge where companies struggle to attribute SaaS-style AWS application costs back to specific business units. Once user attributes are imported to IAM Identity Center and enabled as cost allocation tags in the Billing Console, usage automatically flows to Cost Explorer and CUR 2.0 with the appropriate organizational tags attached.
This capability is particularly relevant for enterprises deploying Amazon Q Business or QuickSight at scale, where individual user subscriptions can quickly add up across departments. Instead of manually tracking which users belong to which cost centers, the system automatically associates costs based on existing identity data.
The feature is generally available in all commercial AWS regions except GovCloud and China regions.
No additional pricing is mentioned beyond the standard costs of the underlying AWS applications being tracked.

25:26 Justin – “There’s lots of use cases; this gets interesting real quickly. It’s a really nice feature that I’m really happy about.”

GCP

26:34 Introducing Gemini 3 Flash: Benchmarks, global availability

Google launches Gemini 3 Flash in general availability, positioning it as a frontier intelligence model optimized for speed at reduced cost.
The model processes over 1 trillion tokens daily through Google’s API and replaces Gemini 2.5 Flash as the default model in the Gemini app globally at no cost to users.
Gemini 3 Flash achieves strong benchmark performance with 90.4% on GPQA Diamond and 81.2% on MMMU Pro while running 3x faster than Gemini 2.5 Pro and using 30% fewer tokens on average for typical tasks.
Pricing is set at $0.50 per million input tokens and $3 per million output tokens, with audio input at $1 per million tokens.
The model demonstrates strong coding capabilities with a 78% score on SWE-bench Verified, outperforming both the 2.5 series and Gemini 3 Pro. This makes it suitable for agentic workflows, production systems, and interactive applications requiring both speed and reasoning depth.
Gemini 3 Flash is available through multiple channels, including Google AI Studio, Vertex AI, Gemini Enterprise, Google Antigravity platform, Gemini CLI, and Android Studio.
The model is also rolling out as the default for AI Mode in Search globally, combining real-time information retrieval with multimodal reasoning capabilities.
Early enterprise adopters, including JetBrains, Bridgewater Associates, and Figma, are using the model for applications ranging from video analysis and data extraction to visual Q&A and in-game assistance.
The multimodal capabilities support real-time analysis of images, video, and audio content for actionable insights.

27:01 Justin – “This, just in general, is a pretty big improvement from not only the cost perspective, but also the overall performance, and the ability to run this on local devices, for like Android phones, is gonna be a huge breakthrough in LM performance on the device. So I suspect you’ll see a lot of Gemini 3 flash getting rolled out all over the place because it does a lot of things really darn well.”

28:16 Connect Google Antigravity IDE to Google’s Data Cloud services | Google Cloud Blog

Google has integrated Model Context Protocol servers into its new Antigravity IDE, allowing AI agents to directly connect to Google Cloud data services, including AlloyDB, BigQuery, Spanner, Cloud SQL, and Looker.
The MCP Toolbox for Databases provides pre-built connectors that eliminate manual configuration, letting developers access enterprise data through a UI-driven setup process within the IDE.
The integration enables AI agents to perform database administration tasks, generate SQL code, and run queries without switching between tools.
For AlloyDB and Cloud SQL, agents can explore schemas, develop queries, and optimize performance using tools like list_tables, execute_sql, and get_query_plan directly in the development environment.
BigQuery and Looker connections extend agent capabilities into analytics and business intelligence workflows.
Agents can forecast trends, search data catalogs, validate metric definitions against semantic models, and run ad-hoc queries to ensure application logic matches production reporting standards.
The MCP servers use IAM credentials or secure password storage to maintain security while giving agents access to production data sources. This approach positions Antigravity as a data-aware development environment where AI assistance is grounded in actual enterprise data rather than abstract reasoning alone.
The feature is available now through the Antigravity MCP Store with documentation at cloud.google.com/alloydb/docs and the open-source MCP Toolbox on GitHub at googleapis/genai-toolbox.
No specific pricing information was provided for the MCP integration itself, though standard data service costs for AlloyDB, BigQuery, and other connected services apply.

29:15 Announcing official MCP support for Google services | Google Cloud Blog

Google now offers fully-managed, remote Model Context Protocol (MCP) servers for its services, eliminating the need for developers to deploy and maintain individual local MCP servers.
This provides a unified, enterprise-ready endpoint for connecting AI agents to Google and Google Cloud services with built-in IAM, audit logging, and Model Armor security.
Initial MCP support launches for four key services: Google Maps Platform for location grounding, BigQuery for querying enterprise data in-place, Compute Engine for infrastructure management, and GKE for container operations. Additional services, including Cloud Run, Cloud Storage, AlloyDB, Spanner, and SecOps, will receive MCP support in the coming months.
Apigee integration allows enterprises to expose their own custom APIs and third-party APIs as discoverable tools for AI agents, extending MCP capabilities beyond Google services to the broader enterprise stack.
Organizations can use Cloud API Registry and Apigee API Hub to discover and govern available MCP tools across their environment.
The implementation enables agents to perform complex multi-step workflows like analyzing BigQuery sales data for revenue forecasting while simultaneously querying Google Maps for location intelligence, all through standardized MCP interfaces.
This approach keeps data in place rather than moving it into context windows, reducing security risks and latency.

30:34 MCP support for Apigee | Google Cloud Blog

Apigee now supports Model Context Protocol (MCP), allowing organizations to expose their existing APIs as tools for AI agents without writing code or managing MCP servers. Google handles the infrastructure, transcoding, and protocol management while Apigee applies its 30+ built-in policies for authentication, authorization, and security to govern agentic interactions.
The implementation automatically registers deployed MCP proxies in Apigee API hub as searchable MCP APIs, enabling centralized tool catalogs and granular access controls through API products.
Organizations can apply quota policies and identity controls to restrict which agents and clients can access specific MCP tools, with full visibility through Apigee Analytics and the new API Insights feature.
Integration with Google’s Agent Development Kit (ADK) provides streamlined access to Apigee MCP endpoints for developers building custom agents, with an ApigeeLLM wrapper available for routing LLM calls through Apigee proxies.
The feature works with multiple agent frameworks, including LangGraph, though ADK users get optimized tooling for the Google ecosystem, including Vertex AI Agent Engine and Gemini Enterprise deployment options.
Security capabilities extend beyond standard API protection to include Cloud Data Loss Prevention for sensitive data classification and Model Armor for defending against prompt injection attacks.
The feature is currently in preview with select customers, requiring contact with Apigee or Google Cloud account teams for access, with no pricing information disclosed yet.

31:07 Ryan – “I just did some real-time analysis about the features of the MCP and then also the browser and stuff. It’s one of those things where it is the newer model of coding, where you’re having distributed agents do tasks, and that, so the new IDs are taking advantage of that… And it is a VS Code fork. So it’s very comfortable to your VS Code users.”

32:05 Application Design Center now GA | Google Cloud Blog

Google’s Application Design Center reaches general availability as a visual, AI-powered platform for designing and deploying Terraform-backed application infrastructure on GCP.
The service integrates with Gemini Cloud Assist to let users describe infrastructure needs in natural language and receive deployable architecture diagrams with Terraform code, while automatically registering applications with App Hub for unified management.
The platform addresses platform engineering needs by providing a curated catalog of opinionated application templates, including specialized GKE templates for AI inference workloads using various LLM models.
Organizations can bring their own Terraform configurations from Git repositories and combine them with Google-provided components to create standardized infrastructure patterns for reuse across development teams.
New GA features include public APIs and gcloud CLI support, VPC service controls compatibility, and GitOps integration for CI/CD workflows.
The service offers application template revisions as an immutable audit trail and automatically detects configuration drift between intended designs and deployed applications to maintain compliance.
The platform is available free of cost for building and deploying application templates, with pricing details at cloud.google.com/products/application-design-center/pricing.
Integration with Cloud Hub provides operational insights and a unified control plane for managing application portfolios across the organization.
Platform teams can create secure, shareable catalogs of approved templates that give developers self-service access to compliant infrastructure while maintaining governance and security standards.
The service supports downloading templates as infrastructure-as-code for direct editing in local IDEs with changes flowing through standard Git pull request workflows.

33:10 Ryan – “It’s kind of the pangea that everyone’s been hoping for, for a long time. With AI making it possible. Being able to plain text speak your infrastructure into existence…I definitely like this model better than like Beanstalk or the hosted application model, which has been the solution until this. This is the answer I want.”

Azure

34:30 Microsoft will finally kill obsolete cipher that has wreaked decades of havoc – Ars Technica

Microsoft is deprecating RC4 encryption in Windows Active Directory after 26 years of default support, following its role in major breaches, including the 2024 Ascension healthcare attack that affected 5.6 million patient records.
The cipher has been cryptographically weak since 1994 and enabled Kerberoasting attacks that have compromised enterprise networks for over a decade.
Windows servers have continued to accept RC4-based authentication requests by default even after AES support was added, creating a persistent attack vector that hackers routinely exploit.
Senator Ron Wyden called for an FTC investigation of Microsoft in September 2025 for gross cybersecurity negligence related to this default configuration.
The deprecation addresses a fundamental security gap in enterprise identity management that has existed since Active Directory launched in 2000. Organizations using Windows authentication will need to ensure their systems are configured to use AES encryption and disable RC4 fallback to prevent downgrade attacks.
This change affects any organization running Active Directory for user authentication and access control, particularly those in healthcare, finance, and other regulated industries where credential theft can lead to catastrophic breaches. (Or literally anyone running Windows.)
The move comes after years of security researchers and government officials pressuring Microsoft to remove the obsolete cipher from default configurations.

36:06 Ryan – “It’s so complex, everyone just accepts the defaults just to get it up and going, and if you don’t know how compromised the cipher is, you don’t really prioritize getting back and fixing the encryption. So I’m really happy to see this; it’s always been a black mark that’s made me not trust Windows.”

37:11 Azure Storage innovations: Unlocking the future of data

Azure Blob Storage now scales to exabytes with 50+ Tbps throughput and millions of IOPS, specifically architected to keep GPUs continuously fed during AI training workloads.
The platform powers OpenAI’s model training and includes a new Smart Tier preview that automatically moves data between hot, cool, and cold tiers based on 30 and 90-day access patterns to optimize costs without manual intervention.
Azure Ultra Disk delivers sub-0.5ms latency with 30% improvement on Azure Boost VMs, scaling to 400K IOPS per disk and up to 800K IOPS per VM on new Ebsv6 instances.
The new Instant Access Snapshots preview eliminates pre-warming requirements and reduces recovery times from hours to seconds for Premium SSD v2 and Ultra Disk, while flexible provisioning can reduce total cost of ownership by up to 50%.
Azure Managed Lustre AMLFS 20 preview supports 25 PiB namespaces with 512 GBps throughput, featuring auto-import and auto-export capabilities for seamless data movement between AMLFS and Azure Blob Storage.
This addresses the specific challenge of training AI models at terabyte and petabyte scale by maintaining high GPU utilization through parallel I/O operations.
Azure Files introduces Entra-only identity support for SMB shares, eliminating the need for on-premises Active Directory infrastructure and enabling cloud-native identity management, including external identities for Azure Virtual Desktop. Storage Mover adds cloud-to-cloud transfers and on-premises NFS to Azure Files NFS 4.1 migration, while Azure NetApp Files large volumes now scale to 7.2 PiB capacity with 50 GiBps throughput, representing a 3x and 4x increase, respectively.
Azure Native offers now include Pure Storage and Dell PowerScale for customers wanting to migrate existing on-premises partner solutions to Azure using familiar technology stacks. The Storage Migration Program provides access to partners like Atempo, Cirata, Cirrus Data, and Komprise for SAN and NAS workload migrations, with a new Storage Migration Solution Advisor in Copilot to streamline decision-making. Pricing details were not disclosed in the announcement.

38:26 Ryan – “It just dawned on me, as you’re reading through here… this is interesting; getting all this high performance from object stores just sort of blows my mind. And then I realized that all these sorts of ‘cloud file systems’ have been backed underneath by these object stores for a long time; like, of course, they need this.”

39:49 Future-Ready Cloud: Microsoft’s U.S. Infrastructure Investments

Microsoft is expanding its U.S. datacenter footprint with a new East US 3 region launching in Greater Atlanta in early 2027, plus adding Availability Zones to five existing regions by the end of 2027.
The Atlanta, Georgia region will support advanced AI workloads and feature zone-redundant storage for improved application resilience, designed to meet LEED Gold certification standards for sustainability.
The expansion adds Availability Zones to North Central US, West Central US, and US Gov Arizona regions, plus enhances existing zones in East US 2 Virginia and South Central US Texas.
This provides customers with more options for multi-region architectures to improve recovery time objectives and meet compliance requirements like CMMC and NIST guidance for government workloads.
Azure Government customers get dedicated infrastructure expansion with three Availability Zones coming to US Gov Arizona in early 2026, specifically supporting Defense Industrial Base requirements.
This complements the Azure for US Government Secret cloud region launched earlier in 2025, offering an alternative to US Gov Virginia for latency-sensitive and mission-critical deployments.
The infrastructure investments support organizations like the University of Miami using Availability Zones for disaster recovery in hurricane-prone regions, and the State of Alaska consolidating legacy systems while improving reliability.
Microsoft emphasizes its global network of over 70 regions, 400 datacenters, and 370,000 miles of fiber as a foundation for resilient cloud strategies using its Cloud Adoption Framework and Well-Architected Framework guidance.
ai.azure.com for building production-ready AI agents.

40:33 Ryan – “AI is definitely driving a lot of this, but like with large data sets, you don’t really want that distributed globally. But I also think that they’re just purely running out of space.”

41:17 Azure Networking Updates: Secure, Scalable, and AI-Optimized

Azure is tripling down on AI infrastructure with its global network now reaching 18 petabits per second of total capacity, up from 6 Pbps at the end of FY24.
The network spans over 60 AI regions with 500,000 miles of fiber and 4 Pbps of WAN capacity, using InfiniBand and high-speed Ethernet for lossless data transfer between GPU clusters.
NAT Gateway Standard V2 enters public preview with zone redundancy by default at no additional cost, delivering 100 Gbps throughput and 10 million packets per second.
This joins ExpressRoute, VPN, and Application Gateway in offering zone-resilient SKUs as part of Azure’s resiliency-by-default strategy.
Security updates include DNS Security Policy with Threat Intel now generally available for blocking malicious domains, Private Link Direct Connect in preview for extending connectivity to any routable private IP, and JWT validation at Layer 7 in Application Gateway preview to offload token validation from backend servers.
ExpressRoute is getting 400G direct ports in select locations starting in 2026 for multi-terabit throughput, while VPN Gateway, now generally available, supports 5 Gbps single TCP flow and 20 Gbps total throughput with four tunnels.
Private Link scales to 5,000 endpoints per VNet and 20,000 across peered VNets.
Container networking improvements for AKS include eBPF Host Routing for lower latency, Pod CIDR Expansion without cluster redeployment, WAF for Application Gateway for Containers now generally available, and Azure Bastion support for private AKS cluster access.

42:45 Ryan – “If you have those high-end network throughput needs, that’s fantastic! It’s been a while since I’ve really got into cloud at that deep layer, but I do remember in AWS the VPN limitations really biting; it was easy to hit those limits really fast.”

After Show

44:22 Roomba maker iRobot swept into bankruptcy

iRobot’s bankruptcy marks the end of an era for the company that pioneered consumer robotics with the Roomba, now being acquired by its Chinese supplier Picea Robotics after losing ground to cheaper competitors.
The stock crashed from Amazon’s $52 offer in 2023 to just $4, showing how quickly market leaders can fall when undercut on price.
The failed Amazon acquisition in 2023 due to EU antitrust concerns looks particularly painful in hindsight, as iRobot might have been better off with Amazon’s resources than facing bankruptcy.
This highlights how regulatory decisions intended to preserve competition can sometimes accelerate a company’s decline instead.
For cloud professionals, this demonstrates how hardware IoT companies struggle without strong cloud services and ecosystem lock-in that could justify premium pricing. iRobot’s inability to differentiate beyond hardware shows why companies like Amazon, Google, and Apple integrate devices tightly with their cloud platforms.
The Chinese supplier takeover raises questions about data privacy and security for the millions of Roombas already mapping homes worldwide.
This could become a cautionary tale about supply chain dependencies and what happens when your manufacturer becomes your owner.
Founded by MIT engineers in 1990 and selling 40 million devices, iRobot’s fall shows that innovation alone isn’t enough without sustainable competitive advantages in manufacturing costs and ongoing software value.
This is a sad day, especially if you’re a fan of all things serverless, as they were the poster child of all things serverless.

Closing

334: AWS Makes Kubernetes Conversational

Fri, 19 Dec 2025 01:56:26 +0000

Welcome to episode 334 of The Cloud Pod, where the forecast is always cloudy! This week, we’re bringing you a jam-packed recap of re:Invent! We’ve got all the news, from keynotes to announcements. Whether you were there live or catching up on all the news, Justin, Matt, and Ryan are here to break it all down. Let’s get started!

Titles we almost went with this week

EKS Gets Chatty: Natural Language Replaces Command Line Nightmares
Harvest Now, Decrypt Later: Why Your RSA Keys Need a Quantum Makeover Before 2026
NAT So Fast: AWS Helps You Find Gateways Doing Absolutely Nothing
AWS Finally Admits You Have Too Many Log Buckets
AWS Finally Lets You Log In Like a Normal Human
Lambda Gets a Memory: Checkpoint Your Way to Multi-Step Workflows
Step Functions at Home: Lambda Durable Functions Let You Write Workflows in Actual Code
No More Bucket List: S3 Public Access Gets Organization-Wide Lockdown
AWS Hits Ctrl-Z on CodeCommit Deprecation
AWS Puts a Cap on CloudFront: Unlimited Traffic, Limited Anxiety
AWS Tells SQL Server to Take a Thread Off: Optimize CPU Cuts Costs by 55%
Amazon Bedrock Gets a Bouncer: AgentCore Identity Checks IDs at the Door
AI Brings on the Developer Renaissance

Follow Up

01:27 re:Invent

Matt Garman- 14th Reinvent, which is weird, since we’ve been doing cloud stuff for 87 years…
Warner – Open Mind for a different View and nothing else matters T-shirt.

02:59 re:Invent predictions

Jonathan

1. Serverless GPU support (extension in Lambda or a different service), it’s about time we have a serverless GPU/Inference capability.
  1. It is talked about in the keynote with DeSantis.

AI Agent with a goal/instructions that can run when they need to, periodically, or always, and perform an action (Agentic Platform that runs agents) –

Garman – Bedrock AgentCore and Kiro Autonomous Agent

Werner will announce this is his last keynote and he will retire

He retired from re:Invent Presentations

Ryan

New Tranium 3 chips, Inferentia, and Graviton chips

Garman – announced Tranium 3 Ultraservers.

They brought the Rack Ryan

Expand the number of models in or via bedrock

Doubled the number of models and announced Gemma, Minimax M2, Nvidia Nemotron, Mistral Large, and Mistral 3
Refresh to AWS Organizations

Justin

New Nova Model & Sonic with Multi-modal

Garman Nova 2 – Lite, Pro, and Sonic (the lack of Sonic the Hedgehog/Sega reference is a shame)

Nova 2 Omni

Announce a partnership with OpenAI (likely on stage)

1. 1. Not announced as new, but said they’re running on AWS and that EC2 Ultraservers are in use.

Advanced Agentic AI Capabilities for Security Hub (Automate the SOC teams)

Garman – Advanced Agentic AI Capabilities for Security Hub – with NEW AWS Security Agent

Matt

A model router to route LLM queries to different AI models
Well-architected framework expansion
End user Authentication that doesn’t suck (not current Cognito)

Tie Breaker – How many times will they say AI or Artificial Intelligence

Matt: 200

Justin: 160

Ryan: 99 Jonathan: 1

Matt Garman’s Keynote: 77

DeSantis’ Keynote: 31

Swami: 44

Werner: 31

Total: 183

This means Justin wins this year!

10:05 Honorable Mentions:

- Mathematical Proof that one of Amazon’s Models has output that can be verifiable with math

Marketplace for AI Work

- New Device to go along with the Nova Models
- Cost Savings for Networking
- FinOps AI recommender for Model Usage
- Savings Plans for AI/Bedrock Models
- S3 Vectors with integration bedrock
- FinOps Kubernetes Service

Q Developer with Autonomous Agents

Next Generation Silicone for a combined TPU competitor, ie GPU/Graviton/Learning
Bedrock Model Marketplace with Revenue Share for fine-tuned models (Ryan)
Sustainability Dashboard
Aurora/DSQL is an AI feature

AWS

11:59 re:Invent keynote Recap

Matt – started the weekend strong, although we struggled with his keynotes. (Sounds like he could use a good copywriter to help with his speeches.)
Swami – Solid B from us, but that’s because we’re not super interested in his topics. Sorry.
Peter – we enjoyed this one more. Cool tech, lots of mentions, and one of the better presenters. A for him.
Werner – Great Intro Video. Welcome to the Renaissance Coder

15:00 A Quick Recap

Look. We know you care about non-AI things (and so do we), so we’re going to do 25 exciting new announcements in 10 minutes. x8, elon instance, c8a, c8ine instances, m8azn, m3 and m4 max macs, lambda durable functions, 50tb s3 object, s3 batch ops 10x faster, intelligent tiering for s3 tables, automatic replication for s3 tables, s3 access points for FSX netapp, S3 Vectors, GPU Index for Amazon Opensearch, Amazon EMR Serverless with no storage provisioning, Guardduty to ECS & Ec2, Security Hub is GA, Unified data store in cloudwatch, Increases STorage for SQL and Oracle RDS, Optimize CPus for RDS for SQL server, SQL Server Development support, Database Savings Plans. 2 hours on AI…when we would have been really happy with all of THIS as the keynote.

26:08 AI/ML & Amazon Bedrock

Bedrock Service Tiers (Priority/Standard/Flex) – Match AI workload performance with cost
Bedrock Reserved Service Tier – Pre-purchase guaranteed tokens-per-minute capacity with 99.5% SLA
Bedrock AgentCore – Policy controls, evaluations, episodic memory for AI agents
Bedrock Reinforcement Fine-tuning – RLVR and RLAIF for model customization
Amazon Nova 2 Lite – Fast, cost-effective reasoning model with configurable thinking
Nova Forge – Build your own foundational models
18 New Open Weight Models – Mistral Large 3, Ministral 3 variants, others
Amazon Q Developer Cost Management – Natural language queries for AWS spending analysis
SageMaker Serverless Customization – Automated infrastructure for fine-tuning
SageMaker HyperPod – Checkpointless and elastic training capabilities
AWS Clean Rooms ML – Privacy-enhancing synthetic dataset generation
AgentCore Evaluations – Continuously inspect agent quality based on real-world behavior

29:09 Ryan – “I do agree with you that no one should be building their own foundational models unless it’s really, truly built on a data set that’s unique, but I do think that everyone should go through the exercise of building a model to understand how AI works.”

30:58 Compute (EC2 & Lambda)

EC2 P6-B300 Instances – NVIDIA Blackwell Ultra GPUs, 6.4Tbps networking
EC2 X8aedz Instances – AMD EPYC 5GHz, memory-optimized for EDA/databases
- X Æ A-Xii Musk
EC2 C8a Instances – AMD EPYC Turin, 30% higher compute performance
EC2 M9g Instances – Graviton5 powered, 25% better than Graviton4
Graviton5 Processor – 192 cores, 5x larger cache
Lambda Tenant Isolation Mode – Built-in multi-tenant separation
Lambda Managed Instances – Run Lambda on your EC2 with AWS management
Lambda Durable Functions – Multi-step workflows with automatic state management
AWS AI Factories – Cloud-scale AI infrastructure in customer data centers|

33:46 Matt – “I feel like we should have seen this coming, given that they just released the ECS management system a couple of months ago, and it feels like the next step.”

42:24 Containers (EKS & ECS)

EKS Capabilities – Managed Argo CD, ACK, KRO in AWS-owned infrastructure
EKS MCP Server – Natural language Kubernetes management (preview)
EKS Container Network Observability – Service maps, flow tables, performance metrics
EKS/ECS Amazon Q Troubleshooting – AI-powered console diagnostics
ECS Express Mode – Simplified deployment with automatic ALB, domains, HTTPS

43:36 Ryan – “I think this is what I’ve always wanted Beanstalk and Lightsail to be, is this service. This, for me, feels like the best of both worlds.”

45:34 Networking & Content Delivery

CloudFront Flat-Rate Pricing – Bundled delivery, WAF, DDoS protection ($0-$1K/month tiers)
VPN Concentrator – 25-100 low-bandwidth sites via a single Transit Gateway attachment
Route 53 Accelerated Recovery – 60-minute RTO for DNS during regional outages
Route 53 Global Resolver (preview) – Anycast DNS for remote/distributed clients
NAT Gateway Regional Availability – Auto-scale across AZs, simplified management
VPC Encryption Controls – Enforce encryption in transit within/across VPCs
Network Firewall Proxy (preview) – Explicit proxy for outbound traffic filtering

50:29 Ryan – “If you’ve ever had to do any kind of compliance evidence, that’s the reason why this exists and that’s why I love it so much. The song and dance that you have to do to illustrate your use of encryption across your environment is painful.”

53:14 Storage (S3 & FSx)

S3 Vectors GA – Native vector support, 2B vectors/index, 20T vectors/bucket
S3 Tables Replication & Intelligent-Tiering – Cross-region/account Iceberg replication
S3 Storage Lens Enhancements – Performance metrics, billions of prefixes, S3 Tables export
S3 Encryption Controls – Bucket-level encryption type enforcement
S3 Block Public Access – Organization-level enforcement
S3 50TB Object Size – 10x increase from previous 5TB limit
FSx for NetApp ONTAP S3 Access Points – Access file data via S3 API

54:38 Matt – “This is just a nice quality of life improvement.”

58:24 Databases

Aurora DSQL Cost Estimates – Statement-level DPU usage in query plans
Aurora PostgreSQL Dynamic Data Masking – pg_columnmask extension
OpenSearch 3.3 – Agentic search, semantic highlighter improvements
OpenSearch GPU Acceleration – 6-14x faster vector indexing
RDS SQL Server/Oracle Optimizations – Free Developer Edition, 256 TiB storage, CPU optimization
RDS SQL Server Resource Governor – Workload resource control
Database Savings Plans – Up to 35% savings across 9 database services

1:01:01 Justin – “This is quite nice, and quite broad, so they definitely heard all of the community saying please bring us database savings plans.”

1:03:33 Security & Identity

Security Hub GA – Near real-time analytics, risk prioritization, Trends feature
Secrets Manager External Secrets – Managed rotation for Salesforce, Snowflake, BigID
IAM Outbound Identity Federation – Short-lived JWTs for external service authentication
AWS login CLI Command – Eliminate long-term access keys with OAuth 2.0
WAF Web Bot Auth – Cryptographic signature verification for legitimate AI agents
Agentcore Identity
GuardDuty Extended Threat Detection – EC2/ECS multistage attack correlation
AWS Security Agent (preview) – AI-powered security reviews, code scanning, pen testing
IAM Policy Autopilot – Open source MCP server for generating IAM policies from code.

1:08:18 Matt – “…it’s definitely competing with Azure releasing the same thing during their conference. The piece I like about this is the pen test piece because it now lives in your source code, which you probably already have in SCA or a static code analysis tool.”

1:11:46 Cost Management & FinOps

Cost Explorer 18-Month Forecasting – Extended from 12 months to 18 months, explainable with AI (in preview).
Cost Efficiency Metric – Single percentage score combining optimization opportunities.
AWS Data Exports FOCUS 1.2 – Standardized multi-cloud billing format
Billing Transfer – Centralized billing across multiple Organizations
Compute Optimizer NAT Gateway Recommendations – Identify unused NAT Gateways

1:14:09 Developer Tools & Modernization

Step Functions Local Testing – TestState API with mocking support
AWS Transform Custom – AI-powered code modernization (Java, Node.js, Python)
AWS Transform Mainframe – COBOL to microservices with automated testing
API Gateway Developer Portals – Native API discovery and documentation
CodeCommit Restored to GA – Git LFS (Q1 2026), regional expansion (Q3 2026)
AWS Transform Windows – Full-stack .NET/SQL Server modernization
CloudWatch Unified Data Management – Consolidated ops/security/compliance logs
CloudWatch Deletion Protection – Prevent accidental log group removal.
CloudWatch Network Flow Monitor – Container network observability for EKS

1:18:09 Matt – “I mean, I hope all customers have some sort of plan, knowing that I’ve seen many companies say ‘we got this notice six months ago, we’ll deal with it in six months’ and now it’s three weeks and six days, and it expires tomorrow…there’s probably a lot of customers still there.”

1:20:58 Observability & Monitoring

CloudWatch Unified Data Management – Consolidated ops/security/compliance logs
CloudWatch Deletion Protection – Prevent accidental log group removal
CloudWatch Network Flow Monitor – Container network observability for EKS

1:21:39 Governance & Management

Control Tower Controls Dedicated – Use managed controls without a full landing zone.
Service Quotas Automatic Management – Auto-adjust limits based on usage
Supplementary Packages for Amazon Linux – Pre-built EPEL9 packages
AMI Ancestry – Automatic lineage tracking for AMIs

1:23:05 Matt – “I’ve built three different ways to do this in my career. You always want to know where it came from, so if there’s a vulnerability, you know where to start patching and go up from there…but if you have multiple teams, it’s hard to track. So knowing I can track it is a godsend.”

1:25:35 DevOps & Operations

AWS DevOps Agent (preview) – Autonomous incident investigation and root cause analysis
AWS Support Plan Restructure – Business Support+ ($29/mo), Enterprise ($5K/mo), Unified Ops ($50K/mo)

1:26:41 Ryan – “I hope this ends up being decent service, but in my head I’m thinking they’re lowering the cost because they’re getting rid of all their support staff.”

1:29:29 Marketplace & Partner

Partner Central in Console – Unified customer/partner experience
Multi-Product Solutions – Bundled offerings from multiple vendors
CrowdStrike Falcon Integration – Automated SIEM setup wizard

1:30:15 Connectivity & Contact Center

Amazon Connect Predictive Insights (preview) – AI-powered recommendations
Amazon Connect MCP Support – Standardized tools for AI agents

Noteable Announcments We Didn’t Cover in the Show:

Closing

333: The Cloud Pod Goes Nano Banana

Wed, 10 Dec 2025 23:25:34 +0000

Welcome to episode 333 of The Cloud Pod, where the forecast is always cloudy! Justin, Ryan, and Matt are taking a quick break from re:Invent festivities. They bring you the latest and greatest in Cloud and AI news. This week, we discuss Norad and Anthropic teaming up to bring you Christmas cheer. Wait, is that right? Huh. We also have undersea cables, some Turkish region delight, and a LOT of Opus 4.5 news. Let’s get into it!

Titles we almost went with this week

Boring Error Pages Not Found
Claude Goes Native in Snowflake: Finally, AI That Stays Where Your Data Lives
Cross-Cloud Romance: AWS and Google Make It Official with Interconnect
Google Gemini Puts OpenAI in Code Red: The Tables Have Turned
Azure NAT Gateway V2: Now With More Zones Than a Parking Lot
From ChatGPT to Chat-Uh-Oh: OpenAI Sounds the Alarm as Gemini Steals 200 Million
Users **Anthropic
Scheduled Actions: Because Your VMs Need a Work-Life Balance Too
Finally, Your 500 Errors Can Look as Good as Your Homepage
Foundry Model Router: Because Choosing Between 47 AI Models is Nobody’s Idea of Fun
Google Takes the Scenic Route: New Cable Avoids the Sunda Strait Traffic Jam
Azure Application Gateway Gets Its TCP/IP Diploma
Google Cloud Gets Its Türkiye Dinner: 2 Billion Dollar Cloud Feast Coming Soon
Microsoft Foundry: Turning AI Chaos into Compliance Gold

AI Is Going Great, or How ML Makes Money

02:59 Nano Banana Pro available for enterprise

Google launches Nano Banana Pro (Gemini 3 Pro Image) in general availability on Vertex AI and Google Workspace, with Gemini Enterprise support coming soon.
The model supports up to 14 reference images for style consistency and generates 4K resolution outputs with multilingual text rendering capabilities.
The model includes Google Search grounding for factual accuracy in generated infographics and diagrams, plus built-in SynthID watermarking for transparency. Copyright indemnification will be available at general availability under Google’s shared responsibility framework.
Enterprise integrations are live with Adobe Firefly, Photoshop, Canva, and Figma, enabling production-grade creative workflows. Major retailers, including Klarna, Shopify, and Wayfair, report using the model for product visualization and marketing asset generation at scale.
Developers can access Nano Banana Pro through Vertex AI with Provisioned Throughput and Pay As You Go pricing options, plus advanced safety filters. Business users get access through Google Workspace apps, including Slides, Vids, and NotebookLM, starting today.
The model handles complex editing tasks like translating text within images while preserving visual elements, and maintains character and brand consistency across multiple generated assets. This addresses a key enterprise challenge of maintaining creative control when using AI for production assets.

03:59 Justin – “The thing that’s the most important about this is when Nano Banana messes up the text (which it doesn’t do as often), you can now edit it without generating a whole completely different image.”

05:58 Introducing Claude Opus 4.5

Claude Opus 4.5 is now generally available across Anthropic’s API, apps, and all three major cloud platforms at $5 per million input tokens and $25 per million output tokens. This represents a substantial price reduction that makes Opus-level capabilities more accessible.
Developers can access it via the claude-opus-4-5-20251101 model identifier.
The model achieves state-of-the-art performance on software engineering benchmarks, scoring higher than any human candidate on Anthropic’s internal performance engineering exam within a 2-hour time limit on SWE-bench Verified. It matches Sonnet 4.5‘s best score while using 76% fewer output tokens at medium effort, and exceeds it by 4.3 percentage points at highest effort while still using 48% fewer tokens.
Anthropic introduces a new effort parameter in the API that lets developers control the tradeoff between speed and capability, allowing optimization for either minimal time and cost or maximum performance depending on the task requirements.
- This combines with new context management and memory capabilities to boost performance on agentic tasks by nearly 15 percentage points in testing.
Claude Code gains Plan Mode that builds a user-editable plan.md files before execution, and is now available in the desktop app for running multiple parallel sessions. The consumer apps remove message limits for Opus 4.5 through automatic context summarization, and Claude for Chrome and Claude for Excel expand to all Max, Team, and Enterprise users.
The model demonstrates improved robustness against prompt injection attacks compared to other frontier models and is described as the most robustly aligned model Anthropic has released.
- It shows better performance across vision, reasoning, and mathematics tasks while using dramatically fewer tokens than predecessors, reaching similar or better outcomes.

08:01 Justin – “The most important part of the whole announcement is the cheaper context input and output tokens.”

09:58 Announcing Claude Opus 4.5 on Snowflake Cortex AI

Snowflake Cortex AI now offers Claude Opus 4.5 and Claude Sonnet 4.5 in general availability, bringing Anthropic’s latest models directly into Snowflake’s data platform.
Users can access these models through SQL, Python, or REST APIs without moving data outside their Snowflake environment.
Claude Opus 4.5 delivers improved performance on complex reasoning tasks, coding, and multilingual capabilities compared to previous versions, while Claude Sonnet 4.5 provides a balanced option for speed and intelligence.
- Both models support 200K token context windows and can process text and images natively within Snowflake queries.
The integration enables enterprises to build AI applications using their Snowflake data with built-in governance and security controls, eliminating the need to export sensitive data to external AI services.
- Pricing follows Snowflake’s credit-based model, with costs varying by model and token usage.
Developers can combine Claude models with other Cortex AI features like vector search, document understanding, and fine-tuning capabilities to create end-to-end AI workflows.
- This allows for use cases ranging from customer service automation to financial analysis and code generation, all within the Snowflake ecosystem.

11:03 OpenAI CEO declares “code red” as Gemini gains 200 million users in 3 months

Oh, how the turn tables have turned…
OpenAI CEO Sam Altman issued an internal code red memo to refocus the company on improving ChatGPT after Google’s Gemini 3 model topped the LMArena leaderboard and gained 200 million users in three months.
The directive delays planned features, including advertising integration, AI agents for health and shopping, and the Pulse personal assistant feature.
Google’s Gemini 3 model, released in mid-November, has outperformed ChatGPT on industry benchmark tests and attracted high-profile users like Salesforce CEO Marc Benioff, who publicly announced switching from ChatGPT after three years.
The model’s performance represents a significant shift in the competitive landscape since OpenAI’s initial ChatGPT launch in December 2022.
The situation mirrors December 2022, when Google declared its own code red after ChatGPT’s rapid adoption, with CEO Sundar Pichai reassigning teams to develop competing AI products.
This role reversal demonstrates how quickly competitive positions can shift in the AI model space, particularly around user experience and benchmark performance.
OpenAI is implementing daily calls for teams responsible for ChatGPT improvements and encouraging temporary team transfers to address the competitive pressure.
The company’s response indicates that maintaining market leadership in conversational AI requires continuous iteration even for established products with large user bases.

13:11 Ryan – “I started on ChatGPT and tried to use it after adopting Claude, and I try to go back every once in a while – especially when they would announce a new model, but I always end up going back to one of the Anthropic models.”

GCP

15:19 New Google Cloud region coming to Türkiye

Google Cloud is launching a new region in Türkiye as part of a 2 billion dollar investment over 10 years, partnering with local telecom provider Turkcell, which will invest an additional 1 billion dollars in data centers and cloud infrastructure.
This brings Google Cloud’s global footprint to 43 regions and 127 zones, with Türkiye serving as a strategic hub for EMEA customers.
The region targets three key verticals already committed as customers: financial services with Garanti BBVA and Yapi Kredi Bank modernizing core banking systems, airlines with Turkish Airlines improving flight operations and passenger systems, and government entities focused on digital sovereignty.
The local presence addresses data residency requirements and provides low-latency access for organizations that need to keep data within national borders.
Technical capabilities include standard Google Cloud services for data analytics, AI, and cybersecurity with data encryption at rest and in transit, granular access controls, and threat detection systems meeting international security standards. The region will serve both Türkiye and neighboring countries with reduced latency compared to existing European regions.
The announcement emphasizes digital sovereignty as a primary driver, with government officials highlighting the importance of local infrastructure for maintaining control over national data while accessing hyperscale cloud capabilities.
This follows a pattern of Google Cloud expanding into regions where data localization requirements create demand for in-country infrastructure.
No specific pricing details were provided for the Türkiye region, though standard Google Cloud pricing models based on compute, storage, and network usage will apply once the region launches.
The timeline for when the region will be operational was not disclosed in the announcement.
Show note editor Heather note: If you enjoy history, you need to travel to Türkiye immediately!

17:03 Introducing BigQuery Agent Analytics

Google launches BigQuery Agent Analytics, a new plugin for their Agent Development Kit that streams AI agent interaction data directly to BigQuery with a single line of code.
The plugin captures metrics like latency, token consumption, tool usage, and user interactions in real-time using the BigQuery Storage Write API, enabling developers to analyze agent performance and optimize costs without complex instrumentation.
The integration allows developers to leverage BigQuery’s advanced capabilities, including generative AI functions, vector search, and embedding generation to perform sophisticated analysis on agent conversations.
Teams can cluster similar interactions, identify failure patterns, and join agent data with business metrics like CSAT scores to measure real-world impact, going beyond basic operational metrics to quality analysis.
The plugin includes three core components: an ADK plugin that requires minimal code changes, a predefined optimized BigQuery schema for storing interaction data, and low-cost streaming via the BigQuery Storage Write API.
- Developers maintain full control over what data gets streamed and can customize pre-processing, such as redacting sensitive information before logging.
Currently available in preview for ADK users, with support for other agent frameworks like LangGraph coming soon.
The feature addresses a critical gap in agentic AI development where understanding user interaction patterns and agent performance is essential for refinement, particularly as organizations move from building agents to optimizing them at scale.
Pricing follows standard BigQuery costs for storage and queries, with the Storage Write API offering cost-effective real-time streaming compared to traditional batch loading methods.
Documentation and a hands-on codelab are available at google.github.io/adk-docs for developers ready to implement agent analytics.

18:16 Ryan – “This is an interesting model; providing both the schema and the already instrumented integration. I feel like a lot of times with other types of development, you’re left to your own devices, and so this is a neat thing. As you’re developing an agent, everyone is instrumenting these things in odd ways, and it’s very difficult to compile the data in a way where you get usable queries out of it. So it’s kind of an interesting concept.”

19:35 TalayLink subsea cable to connect Australia and Thailand

You know how much we love a good undersea cable…
Google announces TalayLink, a new subsea cable connecting Australia and Thailand via the Indian Ocean, taking a western route around the Sunda Strait to avoid congestion from existing cable paths.
- This cable extends the Interlink system from the Australia Connect initiative and will directly connect to Google’s planned Thailand cloud region and data centers.
The project includes two new connectivity hubs in Mandurah, Western Australia, and South Thailand, providing diverse landing points away from existing cable concentrations in Perth and enabling cable switching, content caching, and colocation capabilities.
- Google is partnering with AIS for the South Thailand hub to leverage existing infrastructure.
TalayLink forms part of a broader Indian Ocean connectivity strategy, linking with previously announced hubs in the Maldives and Christmas Island to create redundant paths connecting Australia, Southeast Asia, Africa, and the Middle East.
This routing diversity aims to improve network resilience across multiple regions.
The infrastructure supports Thailand’s digital economy transformation goals and Western Australia’s digital future roadmap, with the Thailand Board of Investment actively backing the project.
- - No pricing or specific completion timeline was disclosed in the announcement.
The Cloud Pod is excited to cover the latest innovations and trends. We aim to keep you informed about the evolving landscape of cloud technology and artificial intelligence.

20:34 Matt – “It’s amazing…subsea cable congestion. How many cables can be there that there’s congestion?”

23:16 Claude Opus 4.5 on Vertex AI

Claude Opus 4.5 is now generally available on Vertex AI, delivering Anthropic’s most advanced model at one-third the cost of its predecessor Opus 4.1.
The model excels in coding tasks that can compress multi-day development projects into hours, agentic workflows with dynamic tool discovery from hundreds of tools without context window bloat, and office productivity tasks with improved memory for maintaining consistency across documents.
Google is positioning Vertex AI as a unified platform for deploying Claude with enterprise features, including global endpoints for reduced latency, provisioned throughput for dedicated capacity at fixed costs, and prompt caching with flexible Time To Live up to one hour.
The platform integrates with Google’s Agent Builder stack, including the open Agent Development Kit, Agent2Agent protocol, and fully managed Agent Engine for moving multi-step workflows from prototype to production.
Security and governance capabilities include Google Cloud’s foundational security controls, data residency options, and Model Armor protection against AI-specific threats like prompt injection and tool poisoning through Security Command Center.
Customers like Palo Alto Networks report 20-30 percent increases in code development velocity when using Claude on Vertex AI.
The model supports a 1 million token context window, batch predictions for cost efficiency, and web search capabilities in preview.
Regional availability and specific pricing details are available in the Vertex AI documentation, with the model accessible through both the Model Garden and Google Cloud Marketplace.

23:58 Registration is live for Google Cloud Next 2026 in Las Vegas

Google Cloud Next 2026 takes place April 22-24 in Las Vegas, with registration now open at an early bird price of $999 for a limited time.
This represents the standard pricing structure for Google’s flagship annual conference following their record-breaking attendance in 2025.
The conference focuses heavily on AI agent development and implementation, featuring interactive demos, hackathons, and workshops designed to help attendees build intelligent agents.
Organizations can learn from real-world case studies of companies deploying AI solutions at scale.
Next 2026 offers hands-on technical training through deep-dive sessions, keynotes, and practical labs aimed at developers and technical practitioners. The format emphasizes actionable learning with direct access to Google engineers and product experts.
The event serves as a networking hub for cloud practitioners to connect with peers facing similar technical challenges and to provide feedback that influences Google Cloud’s product roadmap. This direct line to product teams can be valuable for organizations planning their cloud strategy.
Ready to register? You can do that here.

27:19 VPC Flow Logs for Cross-Cloud Network

VPC Flow Logs now support Cloud VPN tunnels and VLAN attachments for Cloud Interconnect and Cross-Cloud Interconnect, extending visibility beyond traditional VPC subnet traffic to hybrid and multi-cloud connections.
- This addresses a critical gap for organizations running Cross-Cloud Network architectures who previously lacked detailed telemetry on traffic flowing between Google Cloud, on-premises infrastructure, and other cloud providers.
The feature provides 5-tuple granularity logging (source/destination IP, port, and protocol) with new gateway annotations that identify traffic direction and context through reporter and gateway object fields.
Flow Analyzer integration eliminates the need for complex SQL queries, offering built-in analysis capabilities including Gemini-powered natural language queries and in-context Connectivity Tests to correlate flow data with firewall policies and network configurations.
Primary use cases include identifying elephant flows that congest specific tunnels or attachments, auditing Shared VPC bandwidth consumption by service projects, and troubleshooting connectivity issues by verifying whether traffic reaches Google Cloud gateways.
- Organizations can also validate DSCP markings for application-aware Cloud Interconnect policy configurations, which is particularly valuable for enterprises with quality-of-service requirements.
The feature is available now for both new and existing deployments through Console, CLI, API, and Terraform, with Flow Analyzer providing no-cost analysis of logs stored in Cloud Logging.
This capability is particularly relevant for financial services, healthcare, and enterprises with strict compliance requirements that need comprehensive audit trails of cross-cloud and hybrid network traffic.

28:37 Ryan – “The controls say that you have to have logging, not what the logging is – and so very frequently it is sort of ‘turn it on and sort of forget it’. I do think this is great, but it is sort of, they say the five-tuple granularity will help you measure congestion, but I don’t see them actually producing any sort of bandwidth or request size metrics. So it is sort of an interesting thing, but it’s at least better than the nothing that we had before. So I’ll take it.”

30:35 AWS and Google Cloud collaborate on multicloud networking

AWS and Google Cloud jointly engineered a multicloud networking solution that eliminates the need for manual physical infrastructure setup between their platforms.
Customers can now provision dedicated bandwidth and establish connectivity in minutes instead of weeks through either cloud console or API.
The solution uses AWS Interconnect multicloud and Google Cloud Cross-Cloud Interconnect with quad-redundancy across physically separate facilities and MACsec encryption between edge routers.
Both providers published open API specifications on GitHub for other cloud providers to adopt the same standard.
Previously, connecting AWS and Google Cloud required customers to manually coordinate physical connections, equipment, and multiple teams over weeks or months.
This new managed service abstracts away physical connectivity, network addressing, and routing policy complexity into a cloud-native experience.
Salesforce is using this capability to connect its Data 360 platform across clouds using pre-built capacity pools and familiar AWS tooling.
- The integration allows them to ground AI and analytics in trusted data regardless of which cloud it resides in.
The collaboration represents a shift toward cloud provider interoperability through open standards rather than proprietary solutions.
The published specifications enable any cloud provider or partner to implement compatible multicloud connectivity using the same framework.

31:38 Justin – “I do want you guys to check the weather. Do you see pigs flying or anything crazy?”

Azure

33:17 Generally Available: TLS and TCP termination on Azure Application Gateway

Azure Application Gateway now supports TLS and TCP protocol termination at general availability, expanding beyond its traditional HTTP/HTTPS load balancing capabilities.
This allows customers to use Application Gateway for non-web workloads like database connections, message queuing systems, and other TCP-based applications that previously required separate load balancing solutions.
The feature consolidates infrastructure by letting organizations use a single gateway service for both web and non-web traffic, reducing the need to deploy and manage multiple load balancers.
This is particularly useful for enterprises running mixed workloads that include legacy applications, databases like SQL Server or PostgreSQL, and custom TCP services alongside modern web applications.
Application Gateway’s existing features, like Web Application Firewall, autoscaling, and zone redundancy, now extend to TCP and TLS traffic, providing consistent security and availability across all application types.
The pricing model follows Application Gateway’s standard consumption-based structure with charges for gateway hours and data processing, though specific costs for TCP/TLS termination were not detailed in the announcement.
Common use cases include load balancing for database clusters, securing MQTT or AMQP message broker connections, and providing SSL offloading for legacy applications that don’t natively support modern TLS versions.
- This positions Application Gateway as a more versatile Layer 4-7 load balancing solution competing with dedicated TCP load balancers and third-party appliances.

33:38 Justin – “Thank you for developing network load balancers.”

34:48 Generally Available: Azure Application Gateway mTLS passthrough support

Want to make your life even more complicated? Well, it’s GOOD NEWS!
Azure Application Gateway now supports mutual TLS passthrough in general availability, allowing backend applications to validate client certificates and authorization headers directly while still benefiting from Web Application Firewall inspection.
- This addresses a specific compliance requirement where organizations need end-to-end certificate validation but cannot terminate TLS at the gateway layer.
The feature enables scenarios where backend services must verify client identity through certificates for regulatory compliance or zero-trust architectures, particularly relevant for financial services, healthcare, and government workloads. Previously, customers had to choose between WAF protection or backend certificate validation, creating security or compliance gaps.
Application Gateway continues to inspect traffic through WAF rules even as the mTLS connection passes through to the backend, maintaining protection against common web exploits and OWASP vulnerabilities.
This dual-layer approach means organizations can enforce both perimeter security policies and application-level authentication without architectural compromises.
The capability is available across all Azure regions where Application Gateway v2 SKU operates, with standard Application Gateway pricing applying based on capacity units consumed.
No additional charges exist specifically for the mTLS passthrough feature itself, though backend certificate validation may increase processing overhead slightly.

36:30 Matt – “I did S tunnel and MongoDB because it didn’t support encryption for the longest time…that was a fun one.”

36:50 Public Preview: Azure API Management adds support for A2A Agent APIs

Azure API Management now supports Agent-to-Agent (A2A) APIs in public preview, allowing organizations to manage AI agent APIs alongside traditional REST APIs, AI model APIs, and Model Context Protocol tools within a single governance framework.
- This addresses the growing need to standardize how autonomous agents communicate and interact across enterprise systems.
The feature enables centralized management of agent interactions, which is particularly relevant as organizations deploy multiple AI agents that need to coordinate tasks and share information.
API Management can now apply consistent security policies, rate limiting, and monitoring across all agent communications, reducing the operational complexity of multi-agent architectures.
This capability positions Azure API Management as a unified control plane for the full spectrum of API types emerging in AI-driven applications.
Organizations already using API Management for traditional APIs can extend their existing governance practices to cover agent-based workflows without deploying separate infrastructure.
The preview is available in Azure regions where API Management is currently supported, though specific pricing for A2A API features has not been disclosed separately from standard API Management tiers.
Organizations should evaluate this against their existing API Management costs, which start at approximately $50 per month for the Developer tier.

38:13 Introducing Claude Opus 4.5 in Microsoft Foundry

Claude Opus 4.5 is now available in public preview on Microsoft Foundry, GitHub Copilot paid plans, and Microsoft Copilot Studio, expanding Azure’s frontier model portfolio following the Microsoft-Anthropic partnership announced at Ignite.
The model achieves 80.9% on SWE-bench software engineering benchmarks and is priced at one-third the cost of previous Opus-class models, making advanced AI capabilities more accessible for enterprise workloads.
The model introduces three key developer features on Foundry: an Effort Parameter in beta that lets teams control computational allocation across thinking and tool calls, Compaction Control for managing context in long-running agentic tasks, and enhanced programmatic tool calling with dynamic tool discovery that doesn’t consume context window space.
- These capabilities enable sophisticated multi-tool workflows across cybersecurity, financial modeling, and full-stack development.
Opus 4.5 serves as Anthropic’s strongest vision model and delivers improved computer use performance for automating desktop tasks, particularly for creating spreadsheets, presentations, and documents with professional polish.
The model maintains context across complex projects using memory features, making it suitable for precision-critical verticals like finance and legal, where consistency matters.
Microsoft Foundry’s rapid integration strategy gives Azure customers immediate access to the latest frontier models while maintaining centralized governance, security, and observability at scale.
This positions Azure as offering the widest selection of advanced AI models among cloud providers, with Opus 4.5 available now through the Foundry portal and coming soon to Visual Studio Code via the Foundry extension.

38:37 Justin – “Cool, it’s in Foundry – hooray!”

40:21 Generally Available: DNS security policy Threat Intelligence feed

Azure DNS security policy now includes a managed Threat Intelligence feed that blocks queries to known malicious domains.
This feature addresses the common attack vector where nearly all cyber attacks begin with a DNS query, providing an additional layer of protection at the DNS resolution level.
The service integrates with Azure’s existing DNS infrastructure and uses Microsoft’s threat intelligence data to automatically update the list of malicious domains.
- Organizations can enable this protection without managing their own threat feeds or maintaining blocklists, reducing operational overhead for security teams.
This capability is particularly relevant for enterprises looking to implement defense-in-depth strategies, as it stops threats before they can establish connections to command and control servers or phishing sites.
The feature works alongside existing Azure Firewall and network security tools to provide comprehensive protection.
The general availability means the service is now production-ready with full SLA support across Azure regions.
Pricing details were not specified in the announcement, so customers should check Azure pricing documentation for DNS security policy costs.

41:28 Ryan – “It is something, being able to automatically take the results of a feed, I will do any day just because these things are updated by many more parties and faster than I can ever react to, and you know, our own threat intelligence. So that’s pretty great. I like it.”

42:46 Public Preview: Standard V2 NAT Gateway and StandardV2 Public IPs

Azure introduces StandardV2 NAT Gateway in public preview, adding zone-redundancy for high availability in regions with availability zones.
This upgrade addresses a key limitation of the original NAT Gateway by ensuring outbound connectivity survives zone failures, which matters for enterprises running mission-critical workloads that require consistent internet egress.
The StandardV2 SKU includes matching StandardV2 Public IPs that work together with the new NAT Gateway tier. Organizations using the original Standard SKU will need to evaluate migration paths since zone-redundancy represents a fundamental architectural change requiring new resource types rather than an in-place upgrade.
This release targets customers who previously had to architect complex workarounds for zone-resilient outbound connectivity, particularly those running multi-zone deployments of containerized applications or database clusters.
- The preview allows testing of failover scenarios before production deployment.
The announcement lacks specific pricing details for the StandardV2 tier, though NAT Gateway typically charges based on hourly resource fees plus data processing costs.
Customers should monitor Azure pricing pages as the preview progresses toward general availability for cost comparisons against the Standard SKU.

43:48 Justin – “The fact that this is not an upgrade that I can just check, and I have to redeploy a whole new thing, annoys the crap out of me.”

46:51 Generally Available: Custom error pages on Azure App Service

Custom error pages on Azure App Service have moved to general availability, allowing developers to replace default HTTP error pages with branded or customized alternatives.
This addresses a common requirement for production applications where maintaining a consistent user experience during errors is important for brand identity and user trust.
The feature integrates directly into App Service configuration without requiring additional Azure services or third-party tools.
Developers can specify custom HTML pages for different HTTP error codes like 404 or 500, which App Service will serve automatically when those errors occur.
This capability is particularly relevant for customer-facing web applications, e-commerce sites, and SaaS platforms where error handling needs to align with corporate branding guidelines.
The feature works across all App Service tiers that support custom domains and SSL certificates.
No additional cost is associated with custom error pages beyond standard App Service hosting fees, which start at approximately $13 per month for the Basic tier. Implementation requires uploading error page files to the app’s file system and updating configuration settings through Azure Portal or deployment templates.
The general availability status means the feature is now production-ready with full support coverage, moving beyond the preview phase where it was available for testing.
- Documentation is available at the Azure App Service custom error pages guide.

48:17 Matt – “It’s crazy that this wasn’t already there. The workarounds you had to do to make your own error page was messy at best.”

49:01 Generally Available: Streamline IT governance, security, and cost management experiences with Microsoft Foundry

Microsoft Foundry reaches general availability as an enterprise AI governance platform that consolidates security, compliance, and cost management controls for IT administrators deploying AI solutions.
The platform addresses the growing need for centralized oversight as organizations scale their AI initiatives across Azure infrastructure.
The service integrates with existing Azure management tools to provide unified visibility and control over AI workloads, allowing IT teams to enforce policies and monitor resource usage from a single interface.
- This reduces the operational overhead of managing disparate AI projects while maintaining enterprise security standards.
Foundry targets large enterprises and regulated industries that require strict governance frameworks for AI deployment, particularly organizations balancing innovation speed with compliance requirements.
The platform helps bridge the gap between data science teams pushing for rapid AI adoption and IT departments responsible for risk management.
The general availability announcement indicates Microsoft is positioning Azure as the enterprise-ready AI cloud, competing directly with AWS and Google Cloud for organizations prioritizing governance alongside AI capabilities.
Specific pricing details were not disclosed in the announcement, suggesting costs likely vary based on usage and existing Azure commitments.

50:22 Justin – “It’s like a combination of SageMaker and Vertex married Databricks and then had a baby – plus a report interface.”

52:44 Generally Available: Model Router in Microsoft Foundry

Microsoft Foundry’s Model Router is now generally available as an AI orchestration layer that automatically selects the optimal language model for each prompt based on factors like complexity, cost, and performance requirements.
This eliminates the need for developers to manually choose between different AI models for each use case.
The service supports an expanded range of models, including the GPT-4 family, GPT-5 family, GPT-oss, and DeepSeek models, giving organizations flexibility to balance performance needs against cost considerations.
The router can dynamically switch between models within a single application based on prompt characteristics.
This addresses a practical challenge for enterprises deploying multiple AI models where different tasks require different model capabilities. For example, simple queries could route to smaller, less expensive models while complex reasoning tasks automatically use more capable models.
The orchestration layer integrates with Microsoft Foundry’s broader AI infrastructure, allowing customers to manage multiple model deployments through a single interface rather than building custom routing logic. This reduces operational complexity for teams managing diverse AI workloads across their organization.
No specific pricing details are provided in the announcement, though costs will likely vary based on the underlying models selected by the router and usage patterns. Organizations should evaluate potential cost savings from routing simpler queries to less expensive models versus always using premium models.

54:50 Generally Available: Scheduled Actions

Azure’s Scheduled Actions feature is now generally available, providing automated VM lifecycle management at scale with built-in handling of subscription throttling and transient error retries.
This eliminates the need for custom scripting or third-party tools to start, stop, or deallocate VMs on a recurring schedule.
The feature addresses common cost optimization scenarios where organizations need to automatically shut down development and test environments during off-hours or scale down non-production workloads on weekends.
This can reduce compute costs by 40-70% for environments that don’t require 24/7 availability.
Scheduled Actions integrates directly with Azure Resource Manager and works across VM scale sets, making it suitable for both individual VMs and large-scale deployments. The automatic retry logic and throttling management means operations complete reliably even when managing hundreds or thousands of VMs simultaneously.
The service is available in all Azure public cloud regions where VMs are supported, with no additional cost beyond standard VM compute charges.
Organizations pay only for the time VMs are running, so automated shutdown schedules directly translate to reduced monthly bills.

55:31 Justin – “Thank you for copying every other cloud that’s had this forever…”

After Show

51:46 OpenAI and NORAD team up to bring new magic to “NORAD Tracks Santa.”

OpenAI partnered with NORAD to add AI-powered holiday tools to the annual Santa tracking tradition, creating three ChatGPT-based features that turn kids’ photos into elf portraits, generate custom toy coloring pages, and build personalized Christmas stories. This represents a consumer-friendly application of generative AI that demonstrates how large language models can be packaged for mainstream family use during the holidays.
The collaboration shows OpenAI pursuing brand-building partnerships with trusted institutions like NORAD to normalize AI tools in everyday contexts. By embedding ChatGPT features into a 68-year-old military tradition that reaches millions of families, OpenAI gains exposure to non-technical users who might otherwise be hesitant about AI adoption.
From a technical perspective, these tools showcase practical implementations of image generation and text-to-image capabilities that parents can use without understanding the underlying models. The focus on simple, single-purpose GPTs rather than complex interfaces suggests OpenAI is testing how to make their technology more accessible to casual users.
The partnership raises interesting questions about AI companies seeking legitimacy through associations with government organizations and cultural traditions. While the tools are harmless holiday fun, they demonstrate how AI providers are moving beyond enterprise sales to embed their technology into cultural moments and family activities.
This is essentially a marketing play disguised as holiday cheer, but it does illustrate how cloud-based AI services are becoming infrastructure for consumer experiences rather than just backend business tools. The real story is about distribution strategy and making AI feel safe and familiar to mainstream audiences.
The Cloud Pod has one message: keep Skynet out of Christmas!

Closing

332: 2025 Re:Invent Predictions Draft – May The Odds Be Ever In Your Favor

Fri, 28 Nov 2025 17:00:41 +0000

Welcome to episode 332 of The Cloud Pod – where the forecast is always cloudy! It’s Thanksgiving week, which can only mean one thing: AWS Re:Invent predictions! In this special episode, Justin, Jonathan, Ryan, and Matt engage in the annual tradition of drafting their best guesses for what AWS will announce at the biggest cloud conference of the year. Justin is the reigning champion (probably because he actually reads the show notes), but with a reverse snake draft order determined by dice roll, anything could happen. Will Werner announce his retirement? Is Cognito finally getting a much-needed overhaul? And just how many times will “AI” be uttered on stage? Grab your turkey and let’s get predicting!

Titles we almost went with this week:

Roll For Initiative: The Re:Invent Prediction Draft
Justin’s Winning Streak: A Study in Actually Doing Your Homework
Serverless GPUs and Broken Dreams: Our Re:Invent Wishlist
Shooting in the Dark: AWS Predictions Edition
We’re Never Good at This, But Here We Go Again
Vegas Odds: What Happens at Re:Invent, Gets Predicted Wrong

AWS Re:Invent Predictions 2025

The annual prediction draft is here! Draft order was determined by dice roll: Jonathan first, followed by Ryan, Justin, and Matt in last position. As always, it’s a reverse order format, with points awarded for each correct prediction announced during the Tuesday, Wednesday, and Thursday keynotes.

Jonathan’s Predictions

Serverless GPU Support – An extension to Lambda or a different service that provides on-demand serverless GPU/inference capability. Likely with requirements for pre-warmed provisioned instances.
Agentic Platform for Continuous AI Agents – A service that allows agents to run continuously with goals or instructions, performing actions periodically or on-demand in the real world. Think: running agents on a schedule that can check conditions and take automated actions.
Werner Vogels Retirement Announcement – Werner will announce that this is his last Re:Invent keynote and that he is retiring.

Ryan’s Predictions

New Trainium 3 Chips, Inferentia, and Graviton Chips – New generation of AWS custom silicon across training, inference, and general compute.
Expanded Model Availability in Bedrock – AWS will significantly expand the number of models available in Bedrock, potentially via partnerships or integrations with additional providers.
Major Refresh to AWS Organizations – UI-based or functionality refresh providing better visibility into SCPs, OU mappings, and stack sets across organizations.

Justin’s Predictions

New Nova Model with Multi-modal Support – Launch of Nova Premier or Nova Sonic with multi-modal capabilities, bringing Amazon’s foundational model to the next level.
OpenAI Partnership Announcement – AWS and OpenAI will announce a strategic partnership, potentially bringing OpenAI models to Bedrock (likely announced on stage).
Advanced Agentic AI Capabilities for Security Hub – Enhanced features for Security Hub adding Agentic AI to help automate SOC team operations.

Matt’s Predictions

Model Router for Bedrock – A service to route LLM queries to different AI models, simplifying the process of testing and selecting models for different use cases.
Well-Architected Framework Expansion – New lenses or significant updates to the Well-Architected Framework beyond the existing Generative AI and Sustainability lenses.
End User Authentication That Doesn’t Suck – A new or significantly revamped end-user authentication service (essentially Cognito 2.0) that actually works well for client portals.

Tiebreaker: How Many Times Will “AI” or “Artificial Intelligence” Be Said On Stage?

If we end in a tie (or nobody gets any predictions correct, which is historically possible), we go to the tiebreaker!

Host Guess Matt 200 Justin 160 Ryan 99 Jonathan 1

Honorable Mentions

Ideas that didn’t make the cut but might just surprise us:

Jonathan:

Mathematical proof/verification that text was generated by Amazon’s LLMs (watermarking for AI output)
Marketplace for AI work – publish and monetize AI-based tools with Amazon handling billing
New consumer device to accompany Nova models (smarter Alexa replacement with local inference)

Ryan:

FinOps AI recommender for model usage and cost optimization
Savings plans or committed use discounts for Bedrock use cases

Matt:

Sustainability/green dashboard improvements
AI-specific features for Aurora or DSQL

Justin:

Big S3 vectors announcement and integration to Bedrock
FinOps service for Kubernetes
Amazon Q Developer with autonomous coding agents
New GPU architecture combining training/inference/Graviton capabilities
Amazon Bedrock model marketplace for revenue share on fine-tuned models

Quick Hits From the Episode

00:02 – Is it really Re:Invent already? The existential crisis begins.
01:44 – Jonathan reveals why Justin always wins: “Because you read the notes.”
02:54 – Matt hasn’t been to a Re:Invent session since Image Builder launched… eight years ago.
05:03 – Jonathan comes in hot with serverless GPU support prediction.
06:57 – The inference vs. training cost debate – where’s the real ROI?
09:30 – Matt’s picks get systematically destroyed by earlier drafters.
14:09 – The OpenAI partnership prediction causes draft chaos.
16:24 – Jonathan drops the Werner retirement bombshell.
19:12 – Justin’s Security Hub prediction: “Please automate the SOC teams.”
19:46 – Everyone hates Cognito. Matt’s prediction resonates with the universe.
21:47 – Tiebreaker time: Jonathan goes with 1 out of pure spite.
24:08 – Honorable mentions include mathematical AI verification and a marketplace for AI work.

Re:Invent Tips (From People Who Aren’t Going)

Since none of us are attending this year, here’s what we remember from the good old days:

Chalk Talks remain highly respected and valuable for deep technical content
Labs and hands-on sessions are worth your time more than keynotes you can watch online
Networking on the expo floor and in hallways is where the real value happens
Don’t try to see everything – focus on what matters to your work
Stay hydrated – Vegas is dry and conferences are exhausting

Closing

And that is the week in the cloud! We’re taking Thanksgiving week off, so there won’t be an episode during Re:Invent. We’ll record late that week and have a dedicated Re:Invent recap episode the following week. If you’re heading to Las Vegas, have a great time and let us know how it goes!

Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

331: Claude Gets a $30 Billion Azure Wardrobe and Two New Best Friends

Thu, 27 Nov 2025 19:06:35 +0000

Welcome to episode 331 of The Cloud Pod, where the forecast is always cloudy! Jonathan, Ryan, Matt, and Justin (for a little bit, anyway) are in the studio today to bring you all the latest in cloud and AI news. This week, we’re looking at our Ignite predictions (that side gig as internet psychics isn’t looking too good) undersea cables (our fave!), plus datacenters and more. Plus Claude and Azure make a 30 billion dollar deal! Take a break from turkey and avoiding politics, and let’s take a trip into the clouds!

Titles we almost went with this week

GPT-5.1 Gets a Shell Tool Because Apparently We Haven’t Learned Anything From Sci-Fi Movies
The Great Ingress Egress: NGINX Controller Waves Goodbye After Years of Volunteer Burnout
Queue the Applause: Lambda SQS Mapping Gets a Serious Speed Boost
SELECT * FROM future WHERE SQL meets AI without the prompt drama
MFA or GTFO: Microsoft’s 99.6% Phishing-Resistant Authentication Achievement
JWT Another Thing ALB Can Do: OAuth Validation Moves to the Load Balancer
Google’s Emerging Threats Center: Because Manually Checking 12 Months of Logs Sounds Terrible
EventBridge Gets a Drag-and-Drop Makeover: No More Schema Drama
Permission Denied: How Granting Access Took Down the Internet

Follow Up

00:51 Ignite Predictions – The Results

Matt (Who is in charge of sound effects, so be aware)

ACM Competitor – True SSL competitive product
AI announcement in Security AI Agent (Copilot for Sentinel) – sort of (½)
Azure DevOps Announcement

Justin

New Cobalt and Mai Gen 2 or similar – Check
Price Reduction on OpenAI & Significant Prompt Caching
Microsoft Foundational LLM to compete with OpenAI –

Jonathan

The general availability of new, smaller, and more power-efficient Azure Local hardware form factors
Declarative AI on Fabric: This represents a move towards a declarative model, where users state the desired outcome, and the AI agent system determines the steps needed to achieve it within the Fabric ecosystem.
Advanced Cost Management: Granular dashboards to track the token and compute consumption per agent or per transaction, enabling businesses to forecast costs and set budgets for their agent workforce.

How many times will they say Copilot:

The word “Copilot” is mentioned 46 to 71 times in the video.

Jonathan 45

Justin: 35

Matt: 40

General News

05:13 Cloudflare outage on November 18, 2025

Cloudflare experienced its worst outage since 2019 on November 18, 2025, lasting approximately three hours and affecting core traffic routing across its entire network.
The incident was triggered by a database permissions change that caused a Bot Management feature file to double in size, exceeding hardcoded limits in their proxy software and causing system panics that resulted in 5xx errors for customers.
The root cause reveals a cascading failure pattern, where a ClickHouse database query began returning duplicate column metadata after permission changes.
This resulted in a significant increase in the feature file, from approximately 60 features to over 200, which exceeded the preallocated memory limit of 200 features in their Rust-based FL2 proxy code.
The team initially suspected a DDoS attack due to fluctuating symptoms caused by the bad configuration file being generated every five minutes as the database cluster was gradually updated.
The outage impacted multiple Cloudflare services, including their CDN, Workers KV, Access, and even their own dashboard login system through Turnstile dependencies.
Customers on the older FL proxy engine did not see errors but received incorrect bot scores of zero, potentially causing false positives for those using bot blocking rules.
Cloudflare’s remediation plan includes treating internal configuration files with the same validation rigor as user input, implementing more global kill switches for features, and preventing error reporting systems from consuming excessive resources during incidents.
The company acknowledged this as unacceptable for their position in the Internet ecosystem and committed to architectural improvements to prevent similar failures.

06:41 Justin – “Definitely a bad outage, but I appreciate that they owned it, and owned it hard… especially considering they were front page news.”

AI Is Going Great, or How ML Makes Money

07:27 Introducing GPT-5.1 for developers | OpenAI

OpenAI has released GPT-5.1 in their API platform with adaptive reasoning that dynamically adjusts thinking time based on task complexity, resulting in 2-3x faster performance on simple tasks while maintaining frontier intelligence.
The model includes a new “no reasoning” mode (reasoning_effort set to ‘none’) that delivers 20% better low-latency tool calling performance compared to GPT-5 minimal reasoning, making it suitable for latency-sensitive applications while supporting web search and improved parallel tool calling.
GPT-5.1 introduces extended prompt caching with 24-hour retention (up from minutes), maintaining the existing 90% cost reduction for cached tokens with no additional storage charges.
Early adopters report the model uses approximately half the tokens of competitors at similar quality levels, with companies like Balyasny Asset Management seeing agents run 50% faster while exceeding GPT-5 accuracy.
The release includes two new developer tools in the Responses API: apply_patch for structured code editing using diffs without JSON escaping, and a shell tool that allows the model to propose and execute command-line operations in a controlled plan-execute loop. GPT-5.1 achieves 76.3% on SWE-bench Verified and shows 7% improvement on diff editing benchmarks according to early testing partners like Cline and Augment Code.
OpenAI is also releasing specialized gpt-5.1-codex and gpt-5.1-codex-mini models optimized specifically for long-running agentic coding tasks, while maintaining the same pricing and rate limits as GPT-5.
- If you didn’t catch it in the podcast, Justin HATES this. Hates. It. All the hate.
The company has committed to not deprecating GPT-5 in the API and will provide advanced notice if deprecation plans change.
Pricing and rate limits are the same at GPT-5.

9:31 Ryan – “I didn’t really like GPT-5, so I don’t have high expectations, but as these things enhance, I’ve found using different models for different use cases has some advantages, so maybe I’ll find the case for this one.”

11:31 Piloting group chats in ChatGPT | OpenAI

OpenAI is piloting group chat functionality in ChatGPT, starting with users in Japan, New Zealand, South Korea, and Taiwan across all subscription tiers (Free, Go, Plus, and Pro).
The feature allows up to 20 people to collaborate in a shared conversation with ChatGPT, with responses powered by GPT-5.1 Auto that selects the optimal model based on the prompt and the user’s subscription level.
ChatGPT has been trained with new social behaviors for group contexts, including deciding when to respond or stay quiet based on conversation flow, reacting with emojis, and referencing profile photos for personalized image generation.
Users can mention “ChatGPT” explicitly to trigger a response, and custom instructions can be set per group chat to control tone and personality.
Privacy controls separate group chats from personal conversations, with personal ChatGPT memory not shared or used in group contexts.
Users must accept invitations to join, can see all participants, and can leave at any time, with group creators having special removal privileges.
The feature includes safeguards for users under 18, automatically reducing sensitive content exposure for all group members when a minor is present.
- Parents can disable group chats entirely through parental controls, providing additional oversight for younger users.
Rate limits apply only to ChatGPT responses (not user-to-user messages) and count against the subscription tier of the person ChatGPT is responding to.
The feature supports search, image and file uploads, image generation, and dictation, making it functional for both personal planning and workplace collaboration scenarios.

12:41 Jonathan – “I’d rather actually have group chats enabled if kids are going to use it because at least you have witnesses to the conversation at that point.”

16:38 Gemini 3: Introducing the latest Gemini AI model from Google

Google launches Gemini 3 Pro in preview across its product suite, including the Gemini app, AI Studio, Vertex AI, and a new AI Mode in Search with generative UI capabilities.
The model achieves a 1501 Elo score on LMArena leaderboard and demonstrates 91.9% on GPQA Diamond, with a 1 million token context window for processing multimodal inputs including text, images, video, audio and code.
Gemini 3 Deep Think mode offers enhanced reasoning performance, scoring 41.0% on Humanity’s Last Exam and 45.1% on ARC-AGI-2 with code execution.
Google is providing early access to safety testers before rolling out to Google AI Ultra subscribers in the coming weeks, following comprehensive safety evaluations per their Frontier Safety Framework.
Google introduces Antigravity, a new agentic development platform that integrates Gemini 3 Pro with Gemini 2.5 Computer Use for browser control and Gemini 2.5 Image for editing.
The platform enables autonomous agent workflows with direct access to editor, terminal, and browser, scoring 54.2% on Terminal-Bench 2.0 and 76.2% on SWE-bench Verified for coding agent capabilities.
The model shows improved long-horizon planning by topping Vending-Bench 2 leaderboard and delivers enhanced agentic capabilities through Gemini Agent for Google AI Ultra subscribers.
Gemini 3 demonstrates 72.1% on SimpleQA Verified for factual accuracy and 1487 Elo on WebDev Arena for web development tasks, with availability in third-party platforms including Cursor, GitHub, JetBrains, and Replit.

18:24 Ryan – “I look forward to trying this. My initial attempts with Gemini 2.5 did not go well, but I found a sort of sweet spot in using it for planning and documentation. It’s still much better at coding than any other model that I’ve used. So cool, I look forward to using this.”

19:14 Microsoft, NVIDIA, and Anthropic announce strategic partnerships – The Official Microsoft Blog

Continuing the messy breakups…
Anthropic commits to $30 billion in Azure compute capacity, and up to one gigawatt of additional capacity, making this one of the largest cloud infrastructure commitments in AI history.
This positions Azure as Anthropic’s primary scaling platform for Claude models.
NVIDIA and Anthropic are establishing their first deep technology partnership focused on co-design and engineering optimization.
Anthropic will optimize Claude models for NVIDIA Grace Blackwell and Vera Rubin systems, while NVIDIA will tune future architectures specifically for Anthropic workloads to improve performance, efficiency, and total cost of ownership.
Claude models, including Sonnet 4.5, Opus 4.1, and Haiku 4.5, are now available through Microsoft Foundry on Azure, making Claude the only frontier model accessible across all three major cloud platforms (AWS, Azure, GCP).
Azure enterprise customers gain expanded model choice beyond OpenAI offerings.
Microsoft commits to maintaining Claude integration across its entire Copilot family, including GitHub Copilot, Microsoft 365 Copilot, and Copilot Studio.
This ensures developers and enterprise users can leverage Claude capabilities within existing Microsoft productivity and development workflows.
NVIDIA and Microsoft are investing up to $10 billion and $5 billion, respectively, in Anthropic as part of the partnership. So yes, that’s a lot of money going back and forth.
The combined $15 billion investment represents substantial backing for Anthropic’s continued development and positions all three companies to benefit from Claude’s growth trajectory.

21:57 Jonathan – “I’m wondering what Anthropic’s plan is – what they’re working on in the background – because they have just taken a huge amount of capacity from AWS and their new data center in Northern Indiana, and now another 30 billion in Azure Compute? I guess they’re still building models every day… that’s a lot of money flying around.”

Cloud Tools

23:17 Ingress NGINX Retirement: What You Need to Know | Kubernetes Contributors

Ingress NGINX, one of the most popular Kubernetes ingress controllers that has powered billions of requests worldwide, is being retired in March 2026 due to unsustainable maintenance burden and mounting technical debt.
The project has struggled for years with only one or two volunteer maintainers working after hours, and despite its widespread use in hosted platforms and enterprise clusters, efforts to find additional support have failed.
The retirement stems from security concerns around features that were once considered flexible but are now viewed as vulnerabilities, particularly the snippets annotations that allowed arbitrary NGINX configuration.
The Kubernetes Security Response Committee and SIG Network exhausted all options to make the project sustainable before making this difficult decision to prioritize user safety over continuing an undermaintained critical infrastructure component.
Users should immediately begin migrating to Gateway API, the modern replacement for Ingress that addresses many of the architectural issues that plagued Ingress NGINX. Existing deployments will continue to function and installation artefacts will remain available, but after March 2026, there will be zero security patches, bug fixes, or updates of any kind.
Alternative ingress controllers are plentiful and listed in Kubernetes documentation, including cloud-provider-specific options and vendor-supported solutions.
Users can check if they are affected by running a simple kubectl command to look for pods with the ingress-nginx selector across all namespaces.
This retirement highlights a critical open source sustainability problem where massively popular infrastructure projects can fail despite widespread adoption when companies benefit from the software but do not contribute maintainer resources back to the community.

24:39 Justin – “I’m actually surprised NGINX didn’t want to pick this up; it seems like an obvious move for F5 to pick up and maintain the Ingress NGINX controller. But what do I know?”

25:46 Replicate is joining Cloudflare

Cloudflare acquires Replicate, bringing its 50,000-plus model catalog and fine-tuning capabilities to Workers AI.
This consolidates model discovery, deployment, and inference into a single platform backed by Cloudflare’s global network.
The acquisition addresses the operational complexity of running AI models by combining Replicate’s Cog containerization tool with Cloudflare’s serverless infrastructure.
Developers can now deploy custom models and fine-tune without managing GPU hardware or dependencies.
Existing Replicate APIs will continue functioning without interruption while gaining Cloudflare’s network performance.
Workers AI users get access to proprietary models like GPT-5 and Claude Sonnet through Replicate’s unified API alongside open-source options.
The integration extends beyond inference to include AI Gateway for observability and cost analytics, plus native connections to Cloudflare’s data stack, including R2 storage and Vectorize database.
This creates an end-to-end platform for building AI applications with state management and real-time capabilities.
Replicate’s community features for sharing models, publishing fine-tunes, and experimentation will remain central to the platform.
The acquisition positions Cloudflare to compete more directly with hyperscaler AI offerings by combining model variety with edge deployment.

27:09 Ryan – “Cloudflare has been doing kind of amazing things at the edge, which is kind of neat. We’ve had serverless and functions for a while, and definitely options out there that provide much better performance. It’s kind of neat. They’re well-positioned to do that.”

28:02 KubeCon NA 2025 Recap: The Dawn of the AI Native Era | Blog

KubeCon 2025 marked the industry shift from cloud native to AI native, with CNCF launching the Kubernetes AI Conformance Program to standardize how AI and ML workloads run across clouds and hardware accelerators like GPUs and TPUs.
The live demo showed Dynamic Resource Allocation making accelerators first-class citizens in Kubernetes, signaling that AI infrastructure standardization is now a community priority.
Harness showcased Agentic AI capabilities that transform traditional CI/CD pipelines into intelligent, adaptive systems that learn and optimize delivery automatically.
- Their booth demonstrated 17 integrated products spanning CI, CD, IDP, IaCM, security, testing, and FinOps, with particular emphasis on AI-powered pipeline creation and visual workflow design that caught significant attendee interest.
Security emerged as a critical theme with demonstrations of zero-CVE malware attacks that bypass traditional vulnerability scanners by compromising the build chain itself.
- The solution path involves supply chain attestation using SLSA, policy-as-code enforcement, and artifact signing with Sigstore, which Harness demonstrated as native capabilities in their platform.
Apple introduced Apple Containerization, a framework running Linux containers directly on macOS using lightweight microVMs that boot minimal Linux kernels in under a second.
- This combines VM-level security with container speed, creating safer local development environments that could reshape how developers work on Mac hardware.
The conference emphasized that AI native infrastructure requires intelligent scheduling, deeper observability, and verified agent identity using SPIFFE/SPIRE, with multiple sessions showing practical implementations at scale from companies like Yahoo, managing 8,000 nodes, and Spotify handling a million infrastructure resources.

29:51 Justin – “Everyone has moved on from Kubernetes as the hotness; now it’s all AI, so what are people working on in the AI space?”

AWS

30:27 AWS Lambda enhances event processing with provisioned mode for SQS event-source mapping

AWS Lambda now offers provisioned mode for SQS event source mapping, providing 3x faster scaling and 16x higher concurrency (up to 20,000 concurrent executions) compared to the standard polling mode.
This addresses customer demands for better control over event processing during traffic spikes, particularly for financial services and gaming companies requiring sub-second latency.
The new provisioned mode uses dedicated event pollers that customers can configure with minimum and maximum values, where each poller handles up to 1 MB/sec throughput, 10 concurrent invokes, or 10 SQS API calls per second.
Setting a minimum number of pollers maintains baseline capacity for immediate response to traffic surges, while the maximum prevents downstream system overload.
Pricing is based on Event Poller Units (EPUs) charged for the number of pollers provisioned and their duration, with a minimum of 2 event pollers required per event source mapping.
Each EPU supports up to 1 MB per second throughput capacity, though AWS has not published specific per-EPU pricing on the announcement.
The feature is available now in all commercial AWS Regions and can be configured through the AWS Console, CLI, or SDKs.
Monitoring is handled through CloudWatch metrics, specifically the ProvisionedPollers metric that tracks active event pollers in one-minute windows.
This capability enables applications to handle up to 2 GBps of aggregate traffic while automatically scaling down to the configured minimum during low-traffic periods for cost optimization.
The enhanced scaling detects growing backlogs within seconds and adjusts poller count dynamically between configured limits.

31:36 Ryan – “Where was this 5 years ago when we were maintaining a logging platform? This would have been very nice!”

33:30 Amazon EventBridge introduces enhanced visual rule builder

EventBridge launches a new visual rule builder that integrates the Schema Registry with a drag-and-drop canvas, allowing developers to discover and subscribe to events from over 200 AWS services and custom applications without referencing individual service documentation.
The schema-aware interface helps reduce syntax errors when creating event filter patterns and rules.
The enhanced builder includes a comprehensive event catalog with readily available sample payloads and schemas, eliminating the need to hunt through documentation for event structures.
- This addresses a common pain point: developers previously had to manually locate and understand event formats across different AWS services.
Available now in all regions where Schema Registry is launched at no additional cost beyond standard EventBridge usage charges.
- The feature is accessible through the EventBridge console and aims to reduce development time for event-driven architectures.
The visual builder particularly benefits teams building complex event-driven applications that need to filter and route events from multiple sources.
By providing schema validation upfront, it helps catch configuration errors before deployment rather than during runtime.

34:46 Matt – “I definitely – back in the day – had lots of fun with EventBridge, and trying to make sure I got the schemas right for every frame when you’re trying to trigger one thing from another. So not having to deal with that mess is exponentially better. You know, at this point, though, I feel like I would just tell AI to tell me what the scheme was and solve the problem that way.”

35:43 Application loadbalancer support client credential flow with JWT verification

ALB now handles JWT token verification natively at the load balancer layer, eliminating the need for custom authentication code in backend applications. This offloads OAuth 2.0 token validation, including signature verification, expiration checks, and claims validation, directly to the load balancer, reducing complexity in microservices architectures.
The feature supports Client Credentials Flow and other OAuth 2.0 flows, making it particularly useful for machine-to-machine and service-to-service authentication scenarios. Organizations can now centralize token validation at the edge rather than implementing it repeatedly across multiple backend services.
This capability is available immediately in all AWS regions where ALB operates, with no additional ALB feature charges beyond standard load balancer pricing. Customers pay only for the existing ALB hourly rates and Load Balancer Capacity Units (LCUs) consumed.
The implementation reads JWTs from request headers and validates against configured JSON Web Key Sets (JWKS) endpoints, supporting integration with identity providers like Auth0, Okta, and AWS Cognito.
Failed validation results in configurable HTTP error responses before requests reach backend targets.
This addresses a common pain point in API gateway and microservices deployments, where each service previously needed its own token validation logic.
The centralized approach reduces code duplication and potential security inconsistencies across service boundaries.

38:40 Jonathan – “Maybe this is kind of a sign that Cognito is not gaining the popularity they wanted. Because effectively, you could re-spin this announcement as Auth0 and Okta are now first-class citizens when it comes to authentication through API Gateway and ALB.”

GCP

39:10 How Protective ReRoute improves network resilience | Google Cloud Blog

Google Cloud’s Protective ReRoute (PRR) shifts network failure recovery from centralized routers to distributed endpoints, allowing hosts to detect packet loss and immediately reroute traffic to alternate paths.
This host-based approach has reduced inter-datacenter outages from slow network convergence by up to 84 percent since deployment five years ago, with recovery times measured in single-digit multiples of round-trip time rather than seconds or minutes.
PRR works by having hosts continuously monitor path health using TCP retransmission timeouts, then modifying IPv6 flow-label headers to signal the network to use alternate paths when failures occur. Google contributed this IPv6 flow-label modification mechanism to the Linux kernel version 4.20 and later, making it available as open source technology for the broader community.
The feature is particularly critical for AI and ML training workloads, where even brief network interruptions can cause expensive job failures and restarts costing millions in compute time.
Large-scale distributed training across multiple GPUs and TPUs requires the ultra-reliable data distribution that PRR provides to prevent communication pattern disruptions.
Google Cloud customers can use PRR in two modes: hypervisor mode, which automatically protects cross-datacenter traffic without guest OS changes, or guest mode for the fastest recovery, requiring Linux kernel 4.20 plus, TCP applications, and IPv6 traffic, or gVNIC driver for IPv4.
Documentation is available at cloud.google.com/compute/docs/networking for enabling guest-mode PRR on critical workloads.
The architecture treats the network as a highly parallel system where reliability increases exponentially with available paths rather than degrading serially through forwarding stages.
This approach capitalizes on Google’s network path diversity to protect real-time applications, frequent short-lived connections, and data integrity scenarios where packet loss causes corruption beyond just throughput reduction.

40:57 Ryan – “I was trying to think how I would even implement something like this in guest mode because it breaks my head. It seems pretty cool, and I’m sure that from an underlying technology at the infrastructure level, from the Google network, it sounds pretty neat. But it’s also the coordination of that failover seems very complex. And I would worry.”

41:54 Introducing the Emerging Threats Center in Google Security Operations | Google Cloud Blog

Google Security Operations launches the Emerging Threats Center, a Gemini-powered detection engineering system that automatically generates security rules when new threat campaigns emerge from Google Threat Intelligence, Mandiant, and VirusTotal.
The system addresses a key pain point where 59% of security leaders report difficulty deriving actionable intelligence from threat data, typically requiring days or weeks of manual work to assess organizational exposure.
The platform provides two critical capabilities for security teams during major threat events: it automatically searches the previous 12 months of security telemetry for campaign-related indicators of compromise and detection rule matches, while also confirming active protection through campaign-specific detections.
- This eliminates the manual cross-referencing process that traditionally occurs when zero-day vulnerabilities emerge.
Under the hood, the system uses an agentic workflow where Gemini ingests threat intelligence from Mandiant incident response and Google’s global visibility, generates synthetic event data mimicking adversary tactics, tests existing detection rules for coverage gaps, and automatically drafts new rules when gaps are found. Human security analysts maintain final approval before deployment, transforming detection engineering from a best-effort manual process into a systematic automated workflow.
The Emerging Threats Center is available today for licensed Google Security Operations customers, though specific pricing details were not disclosed in the announcement.
Organizations with high-volume security operations like Fiserv are already using the behavioral detection capabilities to move beyond single indicators toward systematic adversary behavior detection.

44:40 Jonathan – “I see this as very much a CrowdStrike-type AI solution for Google Cloud, in a way. Looking at the data, you’re identifying emerging threats, which is what CrowdStrike’s sales point really is, and then implementing controls to help quench that.”

47:56 Introducing Dhivaru and two new connectivity hubs | Google Cloud Blog

Google is investing in Dhivaru, a new Trans-Indian Ocean subsea cable connecting the Maldives, Christmas Island, and Oman, extending the Australia Connect initiative to improve regional connectivity.
The cable system aims to support growing AI service demand like Gemini 2.5 Flash and Vertex AI by providing resilient infrastructure across the Indian Ocean region.
The announcement includes two new connectivity hubs in the Maldives and Christmas Island that will provide three core capabilities: cable switching for automatic traffic rerouting during faults, content caching to reduce latency by storing popular content locally, and colocation services offering rack space to carriers and local companies.
- These hubs are positioned to serve Africa, the Middle East, South Asia, and Oceania with improved reliability.
Google emphasizes the energy efficiency of subsea cables compared to traditional data centers, noting that connectivity hubs require significantly less power since they focus on networking and localized storage rather than compute-intensive AI and cloud workloads.
The company is exploring ways to use power demand from these hubs to accelerate local investment in sustainable energy generation in smaller locations.
The connectivity hubs will provide strategic benefits by minimizing the distance data travels before switching paths, which improves resilience and reduces downtime for services across the region.
- This infrastructure investment aims to strengthen local economies while supporting Google’s objective of serving content from locations closer to users and customers.
The project represents Google’s continued infrastructure expansion to meet long-term demand driven by AI adoption rates that are outpacing predictions, with partnerships including Ooredoo Maldives and Dhiraagu supporting the Maldives hub deployment.

49:38 Matthew – “I had to look up one connectivity hub, which is literally just a small little data center that just kind of handles basic networking and storage – and nothing fancy, which is interesting that they’re putting the two connectivity hubs. They’re dropping these hubs where all their cables terminate. So they are able to cache stuff at each location, which is always interesting.”

Azure

51:46 Infinite scale: The architecture behind the Azure AI superfactory – The Official Microsoft Blog

Microsoft announces its second Fairwater datacenter in Atlanta, connecting it to the Wisconsin site and existing Azure infrastructure to create what they call a planet-scale AI superfactory.
The facility uses a flat network architecture to integrate hundreds of thousands of NVIDIA GB200 and GB300 GPUs into a unified supercomputer for training frontier AI models.
The datacenter achieves 140kW per rack power density through closed-loop liquid cooling that uses water equivalent to 20 homes annually and is designed to last 6-plus years without replacement.
The two-story building design minimizes cable lengths between GPUs to reduce latency, while the site secures 4×9 availability power at 3×9 cost by relying on resilient grid power instead of traditional backup systems.
Each rack houses up to 72 NVIDIA Blackwell GPUs connected via NVLink with 1.8TB GPU-to-GPU bandwidth and 14TB pooled memory per GPU.
- The facility uses a two-tier Ethernet-based backend network with 800Gbps GPU-to-GPU connectivity running on SONiC to avoid vendor lock-in and reduce costs compared to proprietary solutions.
Microsoft deployed a dedicated AI WAN backbone with over 120,000 new fiber miles across the US last year to connect Fairwater sites and other Azure datacenters.
- This allows workloads to span multiple geographic locations and enables dynamic allocation between training, fine-tuning, reinforcement learning, and synthetic data generation based on specific requirements.
The architecture addresses the challenge that large training jobs now exceed single-facility power and space constraints by creating fungibility across sites.
Customers can segment traffic across scale-up networks within sites and scale-out networks between sites, maximizing GPU utilization across the combined system rather than being limited to a single datacenter.

55:25 Private Preview: Azure HorizonDB

Azure HorizonDB for PostgreSQL enters private preview as Microsoft’s performance-focused database offering, featuring autoscaling storage up to 128 TB and compute scaling to 3,072 vCores.
The service claims up to 3 times faster performance compared to open-source PostgreSQL, positioning it as a competitor to AWS Aurora and Google Cloud AlloyDB in the managed PostgreSQL space.
The 128 TB storage ceiling represents a substantial increase over Azure’s existing PostgreSQL offerings, addressing enterprise workloads that previously required sharding or migration to other platforms.
This storage capacity combined with the high vCore count targets large-scale OLTP and analytical workloads that need both horizontal and vertical scaling options.
Microsoft appears to be building HorizonDB as a separate service line rather than an upgrade to existing Azure Database for PostgreSQL Flexible Server, suggesting different architecture and pricing models.
- Organizations currently using Azure Database for PostgreSQL will need to evaluate migration paths and cost implications when the service reaches general availability.
The private preview status means limited customer access and no published pricing information yet.
Enterprises interested in testing HorizonDB should expect typical private preview constraints, including potential feature changes, regional limitations, and SLA restrictions before general availability.

57:35 Jonathan – “So it sounds like they’ve pretty much built what Amazon did with the Aurora, separating the storage from the compute to let them scale independently.”

59:10 Public Preview: Microsoft Defender for Cloud + GitHub Advanced Security

Microsoft Defender for Cloud now integrates natively with GitHub Advanced Security in public preview, creating a unified security workflow that spans from source code repositories through production cloud environments.
This integration allows security teams and developers to work within a single platform rather than switching between separate tools for code scanning and cloud protection.
The solution addresses the full application lifecycle security challenge by connecting GitHub’s code-level vulnerability detection with Defender for Cloud’s runtime protection capabilities.
Organizations using both GitHub and Azure can now correlate security findings from development through deployment, reducing the gap between DevOps and SecOps teams.
This preview targets cloud-native application teams who need consistent security policies across their CI/CD pipeline and production workloads. The integration is particularly relevant for organizations already invested in the Microsoft and GitHub ecosystem, as it leverages existing tooling rather than requiring additional third-party solutions.
The announcement provides limited details on pricing structure, though organizations should expect costs to align with existing Defender for Cloud and GitHub Advanced Security licensing models.
Specific regional availability and rollout timeline details were not included in the brief announcement.

1:00:35 Matthew – “It seems like it has a lot of potential, but without the pricing and Windows for Defender as a CPM, I feel like – for me – it lacks some features, when I’ve tried to use it. They’re going in the right direction; I don’t think they’re there at the end product yet.”

1:03:05 Public Preview: Smart Tier account level tiering (Azure Blob Storage and ADLS

Azure introduces Smart Tier for Blob Storage and ADLS Gen2, which automatically moves data between hot, cool, and archive tiers based on access patterns without manual intervention.
This eliminates the need for lifecycle management policies and reduces the operational overhead of managing storage costs across large data estates.
The feature works at the account level rather than requiring per-container or per-blob configuration, making it simpler to deploy across entire storage accounts. Organizations with unpredictable access patterns or mixed workloads will benefit most, as the system continuously optimizes placement without predefined rules.
Smart Tier monitors blob access patterns and automatically transitions objects to lower-cost tiers when appropriate, then moves them back to hot storage when access frequency increases.
This differs from traditional lifecycle policies that rely on age-based rules and cannot respond dynamically to actual usage.
The public preview allows customers to test the automated tiering without committing to production workloads, though specific pricing details for the Smart Tier feature itself were not disclosed in the announcement. Standard Azure Blob Storage tier pricing applies, with the hot tier being the most expensive and the archive tier offering the lowest storage costs but higher retrieval fees.
This capability targets customers managing large volumes of data with variable access patterns, particularly those in analytics, backup, and archival scenarios where manual tier management becomes impractical at scale.
The integration with ADLS Gen2 makes it relevant for big data and analytics workloads running on Azure.

1:05:18 Jonathan – “So they’ve always had the tiering, but now they’re providing an easy button for you based on access patterns.”

1:13:04 From idea to deployment: The complete lifecycle of AI on display at Ignite

2025 – The Official Microsoft Blog

Microsoft Ignite 2025 introduces three intelligence layers for AI development: Work IQ connects Microsoft 365 data and user patterns, Fabric IQ unifies analytical and operational data under a shared business model, and Foundry IQ provides a managed knowledge system routing across multiple data sources.
These layers work together to give AI agents business context rather than requiring custom integrations for each data source.
Microsoft Agent Factory offers a single metered plan for building and deploying agents across Microsoft 365 Copilot and Copilot Studio without upfront licensing requirements.
The program includes access to AI Forward Deployed Engineers and role-based training, targeting organizations that want to build custom agents but lack internal AI expertise or want to avoid complex provisioning processes.
Microsoft Agent 365 provides centralized observability, management, and security for AI agents regardless of whether they were built with Microsoft platforms, open-source frameworks, or third-party tools. With IDC projecting 1.3 billion AI agents by 2028, this addresses the governance gap where unmanaged agents become shadow IT, integrating Defender, Entra, Purview, and Microsoft 365 admin center for agent lifecycle management.
Work IQ now exposes APIs for developers to build custom agents that leverage the intelligence layer’s understanding of user workflows, relationships, and content patterns. This allows organizations to extend Microsoft 365 Copilot capabilities into their own applications while maintaining the native integration advantages rather than relying on third-party connectors.
The announcements position Microsoft as providing end-to-end AI infrastructure from the datacenter to the application layer, with particular emphasis on making agent development accessible to frontline workers rather than limiting it to specialized AI teams. No specific pricing details were provided for the new services beyond the mention of metered plans for Agent Factory.

Closing

330: AWS Proves the Internet Really Is a Series of Tubes Under the Ocean

Fri, 21 Nov 2025 04:15:10 +0000

Welcome to episode 329 of The Cloud Pod, where the forecast is always cloudy (and if you’re in California, rainy too!) Justin and Matt have taken a break from Ark building activities to bring you this week’s episode, packed with all the latest in cloud and AI news, including undersea cables (our favorite!) FinOps, Ignite predictions, and so much more! Grab your umbrellas and let’s get started!

Titles we almost went with this week

Fastnet and Furious: AWS Lays 320 Terabits of Cable Across the Atlantic
No More kubectl apply –pray: AWS Backup Takes the Stress Out of EKS Recovery
AWS Gets Swift with Lambda: No Taylor Version Required
Breaking Up Is Hard to Do: Microsoft Splits Teams from Office
FinOps and Behold: Google Automates Your Cloud Budget Nightmares
AMD Turin Around GCP’s Price-Performance with N4D VMs
Azure Gets Territorial: Your Data Stays Put Whether It Likes It or Not
AWS Finally Answers “Is It Available in My Region?” Before You Build It
Getting to the Bare Metal of Things: Google’s Axion Goes Commando
Azure Ultra Disk Gets Ultra Serious About Latency
Container Size Matters: Azure Expands ACI to 240 GB Memory
Google Containerises Chaos: Agent Sandbox Keeps Your AI from Going Rogue
AWS Prints Money While Amazon Prints Pink Slips: Q3 Earnings Beat

Follow Up

02:08 Microsoft sidesteps hefty EU fine with Teams unbundling deal

Microsoft avoids a potentially substantial EU antitrust fine by agreeing to unbundle Teams from the Office 365 and Microsoft 365 suites for a period of seven years.
The settlement follows a 2023 complaint from Salesforce-owned Slack alleging anticompetitive bundling practices that harmed rival collaboration tools.
The commitments require Microsoft to offer Office and Microsoft 365 suites without Teams at reduced prices, with a 50 percent larger price difference between bundled and unbundled versions.
Customers with long-term licenses can switch to Teams-free suites, addressing concerns about forced adoption of the collaboration platform.
Microsoft must provide interoperability between competing collaboration tools and its products, plus enable data portability from Teams to rival services.
These technical requirements aim to level the playing field for competitors like Slack and Zoom in the European enterprise collaboration market.
The settlement applies specifically to the European Union market and stems from Microsoft’s dominant position in productivity software.
Organizations using Microsoft 365 in the EU will now have a genuine choice in selecting collaboration tools without being locked into Teams through bundling.
This decision sets a precedent for how cloud software vendors can package integrated services, particularly when holding dominant market positions.
The seven-year commitment period and mandatory interoperability requirements could influence how Microsoft and competitors structure product offerings globally.

General News

08:30 It’s Earnings Time! (Warning: turn down your volume)

Amazon’s stock soars on earnings, revenue beat, spending guidance

Yes, we know there’s a little delay in our reporting here, but it’s still important! (To Justin, anyway.)
AWS grew revenue 20% year-over-year to $33 billion in Q3, generating $11.4 billion in operating income, which represents two-thirds of Amazon’s total operating profit.
While this growth trails Google Cloud’s 34% and Azure’s 40%, AWS maintains its position as the leading cloud infrastructure provider.
Amazon increased its 2025 capital expenditure forecast to $125 billion, up from $118 billion, with CFO Brian Olsavsky indicating further increases expected in 2026.
This spending exceeds Google, Meta, and Microsoft’s capex guidance and signals Amazon’s commitment to AI infrastructure despite concerns about missing out on high-profile AI cloud deals.
Amazon’s Q4 revenue guidance of $206-213 billion (midpoint $209.5 billion) exceeded analyst expectations of $208 billion, driven by strong performance in both AWS and the digital advertising business, which grew 24% to $17.7 billion.
The company’s overall revenue reached $180.17 billion, beating estimates of $177.8 billion.
The company announced 14,000 corporate layoffs this week, which CEO Andy Jassy attributed to organizational culture and reducing bureaucratic layers rather than financial pressures or AI automation.
Amazon’s total workforce stands at 1.58 million employees, representing a 2% year-over-year increase despite the cuts.

06:14 Justin – “There’s a lot of investors starting to question some of the dollars being spent on (AI). It’s feeling very .com boom-y. Let’s not do that again.”

06:46 Alphabet stock jumps 4% after strong earnings results, boost in AI spend

Alphabet increased AI infrastructure spending guidance to $91-93 billion for the year, up from $85 billion previously, driven by strong Google Cloud demand.
CEO Sundar Pichai reported a $155 billion backlog for Google Cloud at quarter’s end, with CFO signaling significant capex increases expected in 2026.
Google Cloud contributed to Alphabet’s first-ever $100 billion revenue quarter, with total Q3 revenue reaching $102.35 billion and beating analyst expectations by $2.5 billion.
The company’s earnings of $3.10 per share significantly exceeded the $2.33 analyst consensus.
Google Search revenue grew 15% year-over-year to $56.56 billion, indicating that AI integration in search is proving to be an opportunity rather than a threat to the core business.
Analysts noted this addresses previous concerns about AI disrupting Google’s search dominance.
Wall Street analysts raised price targets substantially following the results, with Goldman Sachs increasing from $288 to $330 and JPMorgan raising from $300 to $340.
Deutsche Bank characterized the earnings as having virtually no negative aspects across any business segment.

08:03 Matt – “The 15 % of revenue for Google search year over year feels like a massive growth, but I still don’t really understand how they track that. It’s not like there’s 15 % more people using Google than before, but that’s the piece I don’t really understand still.”

08:27 Microsoft (MSFT) Q1 2026 earnings report

Microsoft Azure revenue grew 40% year-over-year in Q1 fiscal 2026, beating analyst expectations of 38.2% growth and driving the Intelligent Cloud segment to $30.9 billion in total revenue.
The company’s AI infrastructure investments continue to pay off as Azure cloud services reached over $75 billion in annual revenue for fiscal 2025.
Microsoft took a $3.1 billion accounting hit to net income this quarter related to its OpenAI investment, equivalent to 41 cents per 41-cent-per-share impact on earnings.
Despite this, the company still beat earnings expectations at $3.72 per share versus the expected $3.67, with overall revenue reaching $77.67 billion.
Capital expenditure spending came in at $34.9 billion for the quarter, and CFO Amy Hood indicated that capex growth will accelerate throughout fiscal 2026 rather than slow down as previously suggested.
This aggressive infrastructure spending caused the stock to drop 4% in after-hours trading despite the strong revenue performance.
Microsoft now holds a 27% stake in OpenAI’s for-profit entity worth approximately $135 billion, following the company’s restructuring announcement.
This formalized partnership structure clarifies the relationship between the two companies as Azure continues to serve as the primary infrastructure platform for OpenAI’s services.
The quarter’s results were overshadowed by a significant Azure and Microsoft 365 outage that occurred on the same day as earnings, affecting various websites and gaming services for several hours. Microsoft expects full recovery by evening, but the timing highlights ongoing reliability concerns as the company scales its cloud infrastructure.

09:27 Azure Front Door RCA

What happened: Azure Front Door and CDN experienced an 8+ hour outage (Oct 29-30, 2025), causing connection timeouts and DNS failures across numerous Azure and Microsoft services, including Azure Portal, Microsoft 365, Entra ID, and many others.
Root cause: A valid customer configuration change exposed a latent bug when processed across different control plane versions, creating incompatible metadata that crashed data plane services.
The crash occurred asynchronously (~5 minutes delayed), allowing it to pass through safety checks undetected.
Why it spread globally: The defective configuration propagated to all edge sites within 4 minutes (15:39 UTC) and was mistakenly saved as the “Last Known Good” snapshot before crashes began appearing at 15:41 UTC, making rollback impossible.
Recovery approach: Rather than reverting to the corrupted LKG, Microsoft manually removed problematic configurations and performed a careful phased redeployment across all edge sites, completing full mitigation by 00:05 UTC (~8.5 hours total).
Prevention measures: Microsoft has completed synchronous config processing, added pre-canary validation stages, reduced recovery time from 4.5 hours to 1 hour, and is working on traffic isolation and further improvements through mid-2026.
Are you interested in the video version of this information? You can find that here.

14:23 PREDICTIONS FOR IGNITE

Matt

ACM Competitor – True SSL competitive product
AI announcement in Security AI Agent (Copilot for Sentinel)
Azure DevOps Announcement

Justin

New Cobalt and Mai Gen 2 or similar
Price Reduction on OpenAI & Significant Prompt Caching
Microsoft Foundational LLM to compete with OpenAI

Jonathan (who isn’t here)

The general availability of new, smaller, and more power-efficient Azure Local hardware form factors
Declarative AI on Fabric: This represents a move towards a declarative model, where users state the desired outcome, and the AI agent system determines the steps needed to achieve it within the Fabric ecosystem.
Advanced Cost Management: Granular dashboards to track the token and compute consumption per agent or per transaction, enabling businesses to forecast costs and set budgets for their agent workforce.

How many times will they say Copilot:

Jonathan
Justin: 35
Matt: 40

Honorable Claude:

Claude for Azure AI
Autonomous Agent Platform

23:00 Matt – “

Cloud Tools

26:47 Apptio expands its FinOps tools for cloud cost control – SiliconANGLE

IBM-owned Apptio launches Cloudability Governance with Terraform integration to provide real-time cost estimation and policy compliance at deployment time.
Platform engineers can now see cost impacts before deploying infrastructure through version control systems like GitHub, addressing the problem where 55% of business leaders lack adequate visibility into technology spending ROI.
Kubecost 3.0 adds GPU-specific monitoring capabilities through Nvidia’s Data Center GPU Manager exporter, providing utilization and memory metrics critical for AI workloads.
The container-agnostic platform works across on-premises and cloud Kubernetes environments, with bidirectional integration into Cloudability’s FinOps suite for unified cost visibility.
The platform addresses common tagging blind spots by automatically identifying resource initiators and applying ownership tags when teams forget. It also supports synthetic tags that map to business units, processing trillions of rows of cost data monthly to detect over-provisioning and committed instance discount opportunities.
AI workload acceleration has increased the velocity of cloud spending rather than creating new blind spots, with GPU costs potentially reaching thousands of dollars per hour.
Real-time visibility becomes essential when infrastructure costs can scale this rapidly, making proactive cost governance more important than reactive monitoring.
The Terraform integration positions Apptio to intercept infrastructure deployments before they happen, shifting FinOps from reactive cost analysis to proactive cost prevention.
This represents a meaningful evolution in cloud cost management by embedding financial controls directly into the infrastructure provisioning workflow.

33:03 Matt – “I’ve set these up in my pipelines before… It’s always nice to see, and it’s good if you’re launching net new, but for general PR, it’s just more noise. It kind of needed these tools.”

AWS

28:44 AWS rolls out Fastnet subsea cable connecting the U.S. and Ireland

AWS announces Fastnet, a dedicated transatlantic subsea cable connecting Maryland to County Cork, Ireland, with 320+ terabits per second capacity when operational in 2028.
The system uses unique landing points away from traditional cable corridors to provide route diversity and network resilience for AWS customers running cloud and AI workloads.
The cable features advanced optical switching branching unit technology that allows future topology changes and can redirect data to new landing points as network demands evolve. This architecture specifically targets growing AI traffic loads and integrates directly with AWS services like CloudFront and Global Accelerator for rapid data rerouting.
AWS’s centralized traffic monitoring system provides complete visibility across the global network and implements millions of daily optimizations to route customer traffic along the most performant paths.
This differs from public internet routing, where individual devices make decisions with limited network visibility, helping avoid congestion before it impacts applications.
The infrastructure investment includes Community Benefit Funds for both Maryland’s Eastern Shore and County Cork to support local initiatives, including STEM education, workforce development, and sustainability programs.
AWS worked with local organizations and residents from project inception to align the deployment with community priorities.
With this addition, AWS’s global fiber network now spans over 9 million kilometers of terrestrial and subsea cabling across 38 regions and 120 availability zones. The automated network management tools resolve 96 percent of network events without human intervention through services like Elastic Load Balancing and CloudWatch.

29:24 Matt – “The speed of this is ridiculous. 320 plus terabytes per second – that is a lot of data to go at once!”

30:20 Introducing AWS Capabilities by Region for easier Regional planning and f aster global deployments | AWS News Blog

AWS launched Capabilities by Region, a new planning tool that lets you compare service availability, API operations, CloudFormation resources, and EC2 instance types across multiple AWS Regions simultaneously.
The tool addresses a common customer pain point by providing visibility into which AWS features are available in different Regions and includes forward-looking roadmap information showing planned launch quarters.
The tool helps solve practical deployment challenges like ensuring compliance with data residency requirements, planning disaster recovery architectures, and avoiding costly rework from discovering Regional limitations mid-project. You can filter results to show only common features available across all selected Regions, making it easier to design portable architectures.
Beyond the web interface, AWS made the Regional capability data accessible through the AWS Knowledge MCP Server, enabling automation of Region expansion planning and integration into CI/CD pipelines.
The MCP server is publicly accessible at no cost without requiring an AWS account, though it is subject to rate limits.
The tool provides detailed visibility into infrastructure components, including specific EC2 instance types like Graviton-based and GPU-enabled variants, helping you verify whether specialized compute resources are available in target Regions before committing to an architecture.
This level of granularity extends to CloudFormation resource types and individual API operations for services like DynamoDB and API Gateway.

30:36 Justin – “Thank you. I’ve wanted this for a long time. You put it in a really weird UI choice, but I do appreciate that it’s there.”

32:10 Secure EKS clusters with the new support for Amazon EKS in AWS Backup | AWS News Blog

AWS Backup now supports Amazon EKS clusters, providing centralized backup and restore capabilities for both Kubernetes configurations and persistent data stored in EBS, EFS, and S3. This eliminates the need for custom scripts or third-party tools that previously required complex maintenance across multiple clusters.
The service includes policy-based automation for protecting single or multiple EKS clusters with immutable backups to meet compliance requirements. During restore operations, AWS Backup can now provision a new EKS cluster automatically based on previous configuration settings, removing the requirement to pre-provision target infrastructure.
Restore operations are non-destructive, meaning they apply only the delta between backup and source rather than overwriting existing data or Kubernetes versions. Customers can restore full clusters, individual namespaces to existing clusters, or specific persistent storage resources if partial backup failures occur.
The feature is available in all AWS commercial regions except China and in AWS GovCloud US, where both AWS Backup and Amazon EKS are supported.
Pricing follows standard AWS Backup rates based on backup storage consumed and data transfer, with costs varying by region and storage tier.
Salesforce highlighted the business impact, noting that losing a Kubernetes control plane due to software bugs or accidental deletion can be catastrophic without proper backup capabilities. This native integration addresses a critical resiliency gap for organizations running production EKS workloads at scale.

33:07 Matt – “It’s the namespace level that they can deploy or backup and restore to that, to me, is great. I could see this being a SaaS company that runs their application in Kubernetes, and they have a namespace per customer, and having that ability to have that single customer backed up and be able to restore that is fantastic. So while it sounds like a minor release, if you’re in the Kubernetes ecosystem, it will just make your life better.”

33:53 Jupyter Deploy: Create a JupyterLab application with real-time collaboration in the cloud in minutes | AWS Open Source Blog

Jupyter Deploy is an open source CLI tool from AWS that lets small teams and startups deploy a fully configured JupyterLab environment to the cloud in minutes, solving the problem of expensive enterprise deployment frameworks.
The tool automatically sets up EC2 instances with HTTPS encryption, GitHub OAuth authentication, real-time collaboration features, and a custom domain without requiring manual console configuration.
The CLI uses infrastructure-as-code templates with Terraform to provision AWS resources, making it simple to upgrade instance types for GPU workloads, add storage volumes, or manage team access through a single command. Users can easily scale from a basic t3.medium instance to GPU-accelerated instances when they need more compute power for deep learning tasks.
Real-time collaboration is a key differentiator, allowing multiple team members to work simultaneously in the same JupyterLab environment after authenticating through GitHub, eliminating the security and access limitations of running Jupyter locally on laptops. The tool includes cost management features like the ability to stop instances when not in use while preserving state and file systems.
The project is vendor-neutral and extensible, with AWS planning to add Kubernetes templates for Amazon EKS and welcoming community contributions for other cloud providers, OAuth providers, and deployment patterns.
Templates are distributed as Python libraries that the CLI automatically discovers, making it easy for the community to create and share new deployment configurations.

34:51 Justin – “A lot of people, especially in their AI workloads, they don’t want to use SageMaker for that necessarily; they want their own deployment of a cluster. And so there was just some undifferentiated heavy lifting that was happening, and so I think this helps address some of that.”

GCP

35:09 Agentic AI on Kubernetes and GKE | Google Cloud Blog

Agent Sandbox is a new Kubernetes primitive designed specifically for running AI agents that need to execute code or use computer interfaces, providing kernel-level isolation through gVisor and Kata Containers.
This addresses the security challenge of AI agents making autonomous decisions about tool usage, where traditional application security models fall short.
On GKE, Agent Sandbox delivers sub-second latency for isolated agent workloads through pre-warmed sandbox pools, representing up to 90% improvement over cold starts.
The managed implementation leverages GKE Sandbox and container-optimized compute for horizontal scaling of thousands of ephemeral sandbox environments.
Pod Snapshots is a GKE-exclusive feature in limited preview that enables checkpoint and restore of running pods, reducing startup times from minutes to seconds for both CPU and GPU workloads.
This allows teams to snapshot idle sandboxes and suspend them to save compute costs while maintaining the ability to quickly restore them to a specific state.
The project includes a Python SDK designed for AI engineers to manage sandbox lifecycles without requiring deep infrastructure expertise, while still providing Kubernetes administrators with operational control. Agent Sandbox is available as an open source CNCF project and can be deployed on GKE today, with documentation at agent-sandbox.sigs.k8s.io.
Primary use cases include agentic AI systems that need to execute generated code safely, reinforcement learning environments requiring rapid provisioning of isolated compute, and computer use scenarios where agents interact with terminals or browsers. The isolation model prevents potential data exfiltration or damage to production systems from non-deterministic agent behavior.

36:49 Matt – “Anything that can make these environments, especially if they are ephemeral, scale up and down better so you’re not burning time and capacity on your GPUs – that are not cheap – is definitely useful. So it’d be a nice little money saver along the way.”

37:09 Ironwood TPUs and new Axion-based VMs for your AI workloads | Google Cloud Blog

Google announces Ironwood, its seventh-generation TPU, delivering 10X peak performance improvement over TPU v5p and 4X better performance per chip than TPU v6e for both training and inference workloads.
The system scales up to 9,216 chips in a superpod with 9.6 Tb/s interconnect speeds and 1.77 petabytes of shared HBM, featuring Optical Circuit Switching for automatic failover. Anthropic plans to access up to 1 million TPUs and reports that the performance gains will help scale Claude efficiently.
New Axion-based N4A instances enter preview, offering up to 2X better price-performance than comparable x86 VMs for general-purpose workloads like microservices, databases, and data preparation.
C4A metal, Google’s first Arm-based bare metal instance, will launch in preview soon for specialized workloads requiring dedicated physical servers. Early customers report 30% performance improvements for video transcoding at Vimeo and 60% better price-performance for data processing at ZoomInfo.
Google positions Ironwood and Axion as complementary solutions for the age of inference, where agentic workflows require coordination between ML acceleration and general-purpose compute.
The AI Hypercomputer platform integrates both with enhanced software, including GKE Cluster Director for TPU fleet management, MaxText improvements for training optimization, and vLLM support for switching between GPUs and TPUs. According to IDC, AI Hypercomputer customers achieved 353% three-year ROI and 28% lower IT costs on average.
The announcement emphasizes system-level co-design across hardware, networking, and software, building on Google’s custom silicon history, including TPUs that enabled the Transformer architecture eight years ago. Ironwood uses advanced liquid cooling deployed at a gigawatt scale with 99.999% fleet-wide uptime since 2020, while the Jupiter data center network connects multiple superpods into clusters of hundreds of thousands of TPUs.
Customers can sign up for Ironwood, N4A, and C4A metal preview access through Google Cloud forms.

38:57 Automate financial governance policies using Workload Manager | Google Cloud Blog

Google has enhanced Workload Manager to automate FinOps cost governance policies across GCP organizations, allowing teams to codify financial rules using Open Policy Agent Rego and run continuous compliance scans.
The tool includes predefined rules for common cost management scenarios like enforcing resource labels, lifecycle policies on Cloud Storage buckets, and data retention settings, with results exportable to BigQuery for analysis and visualization in Looker Studio.
The pricing update is significant, with Google reducing Workload Manager costs by up to 95 percent for certain scenarios and introducing a small free tier for testing.
- This makes large-scale automated policy scanning more economical compared to manual auditing processes that can take weeks or months while costs accumulate.
The automation addresses configuration drift where systems deviate from established cost policies, enabling teams to define rules once and scan entire organizations, specific folders, or individual projects on schedules ranging from hourly to monthly. Integration with notification channels, including email, Slack, and PagerDuty, ensures policy violations reach the appropriate teams for remediation.
Organizations can use custom rules from the GitHub repository or leverage hundreds of Google-authored best practice rules covering FinOps, security, reliability, and operations.
The BigQuery export capability provides historical compliance tracking and supports showback reporting for cost allocation across teams and business units.

40:06 Matt – “Having that very quick, rapid response to know that something changed and you need to go look at it before you get a 10 million dollar bill is critical.”

Azure

41:50 Generally Available: Azure MCP Server

Azure MCP Server provides a standardized way for AI agents and developers to interact with Azure services through the Model Context Protocol.
This creates a consistent interface layer across services like AKS, Azure Container Apps, App Service, Cosmos DB, SQL Database, and AI Foundry, reducing the need to learn individual service APIs.
The MCP implementation allows developers to build AI agents that can programmatically manage and query Azure resources using natural language or structured commands.
- This bridges the gap between conversational AI interfaces and cloud infrastructure management, enabling scenarios like automated resource provisioning or intelligent troubleshooting assistants.
The server architecture provides secure, authenticated access to Azure services while maintaining standard Azure RBAC controls.
- This means AI agents operate within existing security boundaries and permissions frameworks rather than requiring separate authentication mechanisms.
Primary use cases include DevOps automation, intelligent cloud management tools, and AI-powered development assistants that need direct Azure service integration. Organizations building copilots or agent-based workflows can now connect to Azure infrastructure without custom API integration work for each service.
The feature is generally available across Azure regions where the underlying services operate. Pricing follows standard Azure service consumption models for the resources accessed through MCP, with no additional charge for the MCP Server interface itself.

42:50 Matt – “So I like the idea of this, and I like it for troubleshooting and stuff like this, but the idea of using it to provision resources terrifies me. Maybe in development environments, ‘Hey, I’m setting up a three-tier web application, spin me up what I need.’ But if you’re doing this for a company, I really worry about speaking in natural language, and consistently getting the same result to spin up resources.”

45:50 A new era and new features in Azure Ultra Disk

Azure Ultra Disk receives substantial performance and cost optimization updates focused on mission-critical workloads.
- The service now delivers an 80% reduction in P99.9 and outlier latency, plus a 30% improvement in average latency, making it suitable for transaction logs and I/O-intensive applications that previously required local SSDs or Write Accelerator.
New flexible provisioning model enables significant cost savings with workloads on small disks, saving up to 50% and large disks up to 25%.
Customers can now independently adjust capacity, IOPS, and throughput with more granular control, allowing a financial database example to reduce Ultra Disk spending by 22% while maintaining required performance levels.
Instant Access Snapshot feature enters public preview for Ultra Disk and Premium SSD v2, eliminating traditional wait times for snapshot readiness. New disks created from these snapshots hydrate up to 10x faster with minimal read latency impact during hydration, enabling rapid recovery and replication for business continuity scenarios.
Ultra Disk now supports Azure Boost VMs, including Ebdsv5 series (GA with up to 400,000 IOPS and 10GB/s) and Memory Optimized Mbv3 VM Standard_M416bs_v3 (GA with up to 550,000 IOPS and 10GB/s).
Additional Azure Boost VM announcements are planned for 2025 Ignite with further performance improvements for remote block storage.
Recent feature additions include live resize capability, encryption at host support, Azure Site Recovery and VM Backup integration, and shared disk capability for SCSI Persistent Reservations.
Third-party backup and disaster recovery services now support Ultra Disk for customers with existing tooling preferences.

47:38 Matt – “There wasn’t any encryption at the host level? Clearly I make bad life choices being in Azure, but not THAT bad of choices.”

48:21 Announcing General Availability of Larger Container Sizes on Azure Container Instances | Microsoft Community Hub

Azure Container Instances now supports container sizes up to 31 vCPUs and 240 GB of memory for standard containers, expanding from the previous 4 vCPUs and 16 GB limits.
This applies across standard containers, confidential containers, virtual network-enabled containers, and AKS virtual nodes, though confidential containers max out at 180 GB memory.
The larger sizes target data-intensive workloads like real-time fraud detection, predictive maintenance, collaborative analytics in healthcare, and high-performance computing tasks such as climate modeling and genomic research. Organizations can now run fewer, larger containers instead of managing multiple smaller instances, simplifying scaling operations.
Customers must request quota approval through Azure Support before deploying containers exceeding 4 vCPUs and 16 GB, then can deploy via Azure Portal, CLI, PowerShell, ARM templates, or Bicep. The serverless nature maintains ACI’s pay-per-use pricing model, though specific costs for larger SKUs are not detailed in the announcement.
This positions ACI as a more viable alternative to managed Kubernetes for workloads that need substantial compute resources but don’t require full orchestration complexity. The enhancement particularly benefits scenarios where confidential computing is required, as those containers can now scale to 31 vCPUs with 180 GB memory while maintaining security boundaries.

49:40 Generally Available: Geo/Object Priority Replication for Azure Blob

Geo Priority Replication is now generally available for Azure Blob Storage, providing accelerated data replication between primary and secondary regions for GRS and GZRS storage accounts with an SLA-backed guarantee. This addresses a longstanding customer request for predictable replication timing in geo-redundant storage scenarios.
The feature specifically targets customers with compliance requirements or business continuity needs that demand faster recovery point objectives (RPO) for their geo-replicated data. Organizations in regulated industries like finance and healthcare can now better meet data availability requirements with measurable replication performance.
This enhancement works within the existing GRS and GZRS storage account types, meaning customers can enable it on current deployments without migrating to new account types. The SLA backing represents a shift from best-effort replication to guaranteed performance metrics for secondary region data synchronization.
The announcement appears truncated with incomplete SLA details, but the core value proposition centers on reducing the uncertainty around when data becomes available in secondary regions during normal operations. This matters for disaster recovery planning, where organizations need to calculate realistic RPO values rather than relying on variable replication times.
Pricing details were not included in the announcement, though this feature likely carries additional costs beyond standard GRS or GZRS storage rates, given the performance guarantees involved. Customers should review Azure pricing documentation for specific cost implications before enabling geo priority replication.

Closing

329: Azure Front Door: Please Use the Side Entrance

Wed, 12 Nov 2025 07:35:45 +0000

Welcome to episode 329 of The Cloud Pod, where the forecast is always cloudy! Justin, Jonathan, and special guest Elise are in the studio to bring you all the latest in AI and cloud news, including – you guessed it – more outages, and more OpenAI team-ups. We’ve also got GPUs, K8 news, and Cursor updates. Let’s get started!

Titles we almost went with this week

Azure Front Door: Please Use the Side Entrance – el -jb
Azure and NVIDIA: A Match Made in GPU Heaven – mk
Azure Goes Down Under the Weight of Its Own Configuration – el
GitHub Turns Your Copilot Subscription Into an All-You-Can-Eat Agent Buffet – mk, el
Microsoft Goes Full Blackwell: No Regrets, Just GPUs
Jules Verne Would Be Proud: Google’s CLI Goes 20,000 Bugs Under the Codebase
RAG to Riches: AWS Makes Retrieval Augmented Generation Turnkey
Kubectl Gets a Gemini Twin: Google Teaches AI to Speak Kubernetes
I’m Not a Robot: Azure WAF Finally Learns to Ask the Important Questions
OpenAI Puts 38 Billion Eggs in Amazon’s Basket: Multi-Cloud Gets Complicated
The Root Cause They’ll Never Root Out: Why Attrition Stays Off the RCA
Google’s New Extension Lets You Deploy Kubernetes by Just Asking Nicely
Cursor 2.0: Now With More Agents Than a Hollywood Talent Agency

Follow Up

04:46 Massive Azure outage is over, but problems linger – here’s what happened | ZDNET

Azure experienced a global outage on October 29, affecting all regions simultaneously, unlike the recent AWS outage that was limited to a single region.
The incident lasted approximately eight hours from noon to 8 PM ET, impacting major services including Microsoft 365, Teams, Xbox Live, and critical infrastructure for Alaska Airlines, Vodafone UK, and Heathrow Airport, among others.
The root cause was an inadvertent tenant configuration change in Azure Front Door that bypassed safety validations due to a software defect. Microsoft’s protection mechanisms failed to catch the erroneous deployment, allowing invalid configurations to propagate across the global fleet and cause HTTP timeouts, server errors, and elevated packet loss at network edges.
Recovery required rolling back to the last known good configuration and gradually rebalancing traffic across nodes to prevent overload conditions.
Some customers experienced lingering issues even after the official recovery time, with Microsoft temporarily blocking configuration changes to Azure Front Door while completing the restoration process.
The incident highlights concentration risk in cloud infrastructure, as this marks the second major cloud provider outage in October 2025.
Despite Azure revenue growing 40 percent in the latest quarterly report, Microsoft’s stock declined in after-hours trading as the company acknowledged capacity constraints in meeting AI and cloud demands.
Affected Azure services included App Service, Azure SQL Database, Microsoft Entra ID, Container Registry, Azure Databricks, and approximately 15 other core platform services. Microsoft has implemented additional validation and rollback controls to prevent similar configuration deployment failures, though the full post-incident report remains pending.

07:06 Matt – “The fact that you’re plus one week and still can’t actually make changes or even do simple things like purge a cache makes me think this is a lot bigger on the backend than they let on at the beginning.”

AI Is Going Great – Or How ML Makes Money

08:30 AWS and OpenAI announce multi-year strategic partnership | OpenAI

AWS and OpenAI formalized a 38 billion dollar multi-year partnership providing OpenAI immediate access to hundreds of thousands of NVIDIA GPUs (GB200s and GB300s) clustered via Amazon EC2 UltraServers, with capacity deployment targeted by the end of 2026.
The infrastructure supports both ChatGPT inference serving and next-generation model training with the ability to scale to tens of millions of CPUs for agentic workloads.
The partnership builds on existing integration where OpenAI’s open weight foundation models became available on Amazon Bedrock earlier this year, making OpenAI one of the most popular model providers on the platform. Thousands of customers, including Thomson Reuters, Peloton, and Verana Health, are already using these models for agentic workflows, coding, and scientific analysis.
AWS positions this as validation of their large-scale AI infrastructure capabilities, noting they have experience running clusters exceeding 500,000 chips with the security, reliability, and scale required for frontier model development.
The low-latency network architecture of EC2 UltraServers enables optimal performance for interconnected GPU systems.
This represents a significant shift in OpenAI’s infrastructure strategy, moving substantial compute workloads to AWS while maintaining its existing Microsoft Azure relationship.
The seven-year commitment timeline with continued growth provisions indicates long-term capacity planning for increasingly compute-intensive AI model development.

09:53 Elise – “It sort of feels like OpenAI has a strategic partnership with everyone right now, so I’m sure this will help them, just like everything else that they have done will help them. We’re banking a lot on OpenAI being very successful.”

17:11 Google removes Gemma models from AI Studio after GOP senators complaint – Ars Technica

Google removed its open Gemma AI models from AI Studio following a complaint from Senator Marsha Blackburn, who reported the model hallucinated false sexual misconduct allegations against her when prompted with leading questions.
The model allegedly fabricated detailed false claims and generated fake news article links, demonstrating the persistent hallucination problem across generative AI systems.
The removal only affects non-developer access through AI Studio’s user interface, where model behavior tweaking tools could increase hallucination likelihood.
Developers can still access Gemma through the API and download models for local development, suggesting Google is limiting casual experimentation rather than pulling the model entirely.
This incident highlights the ongoing challenge of AI hallucinations in production systems, which no AI firm has successfully eliminated despite mitigation efforts.
Google’s response indicates a shift toward restricting open model access when inflammatory outputs could result from user prompting, potentially setting a precedent for how cloud providers handle politically sensitive AI failures.
The timing follows congressional hearings where Google defended its hallucination mitigation practices, with the company’s representative acknowledging these issues are widespread across the industry.
This creates a tension between open model availability and liability concerns when models generate defamatory content, particularly affecting cloud-based AI platforms.

23:00 Matt – “That’s everything on the internet, though. When Wikipedia first came out and you started using it, we were told you can’t reference Wikipedia, because who knows what was put on there…you can’t blindly trust.”

Cloud Tools

26:53 Introducing Agent HQ: Any agent, any way you work – The GitHub Blog

GitHub launches Agent HQ as a unified platform to orchestrate multiple AI coding agents from Anthropic, OpenAI, Google, Cognition, and xAI directly within GitHub and VS Code, all included in paid Copilot subscriptions.
This eliminates the fragmented experience of juggling different AI tools across separate interfaces and subscriptions.
Mission Control provides a single command center across GitHub, VS Code, mobile, and CLI to assign work to different agents in parallel, track their progress, and manage agent identities and permissions just like human team members.
The system maintains familiar Git primitives like pull requests and issues while adding granular controls over when CI runs on agent-generated code.
VS Code gets Plan Mode for building step-by-step task approaches with clarifying questions before code generation, plus AGENTS.md files for creating custom agents with specific rules like preferred logging frameworks or testing patterns.
It’s the only editor supporting the full Model Context Protocol specification with one-click access to the GitHub MCP Registry for integrating tools like Stripe, Figma, and Sentry.
GitHub Code Quality in public preview now provides org-wide visibility into code maintainability and reliability, with Copilot automatically reviewing its own generated code before developers see it to catch technical debt early.
Enterprise admins get a new control plane for governing AI access, setting security policies, and viewing Copilot usage metrics across the organization.
The platform keeps developers on GitHub’s existing compute infrastructure, whether using GitHub Actions or self-hosted runners, avoiding vendor lock-in while OpenAI Codex becomes available this week in VS Code Insiders for Copilot Pro+ users as the first partner agent.

27:20 Jonathan- “I’m like the different interfaces; they all bring something a little different.”

31:55 Cursor introduces its coding model alongside multi-agent interface – Ars :Technica

Cursor launches version 2.0 of its IDE with Composer, its first competitive in-house coding model built using reinforcement learning and mixture-of-experts architecture.
The company claims Composer is 4x faster than similarly intelligent models while maintaining competitive intelligence levels with frontier models from OpenAI, Google, and Anthropic.
The new multi-agent interface in Cursor 2.0 allows developers to run multiple AI agents in parallel for coding tasks, expanding beyond the single-agent workflow that has been standard in AI-assisted development environments.
- This represents a shift toward more complex, distributed AI assistance within the IDE.
Cursor’s internal benchmarking shows Composer prioritizes speed over raw intelligence, outperforming competitors significantly in tokens per second while slightly underperforming the best frontier models in intelligence metrics.
- This positions it as a practical option for developers who need faster code generation and iteration cycles.
The IDE maintains its Visual Studio Code foundation while deepening LLM integration for what Cursor calls vibe coding, where AI assistance is more directly embedded in the development workflow.
Previously, Cursor relied entirely on third-party models, making this its first attempt at vertical integration in the AI coding assistant space.

33:03 Elise- “Cursor had an agent built, and I thought it was ok, but it was wrong a lot. The 2.0 agent seems fabulous, comparatively, and a lot faster.”

AWS

43:25 The Model Context Protocol (MCP) Proxy for AWS is now generally available

AWS has released the Model Context Protocol (MCP) Proxy for AWS, a client-side proxy that enables MCP clients to connect to remote AWS-hosted MCP servers using AWS SigV4 authentication.
The proxy works with popular AI development tools like Amazon Q Developer CLI, Cursor, and Kiro, allowing developers to integrate AWS service interactions directly into their agentic AI workflows.
The proxy enables developers to access AWS resources like S3 buckets and RDS tables through MCP servers while maintaining AWS security standards through SigV4 authentication.
It includes built-in safety controls such as read-only mode to prevent accidental changes, configurable retry logic for reliability, and logging capabilities for troubleshooting issues.
The MCP Proxy bridges the gap between local AI development tools and AWS-hosted MCP servers, particularly those built on Amazon Bedrock AgentCore Gateway or Runtime.
This allows AI agents and developers to extend their workflows to include AWS service interactions without manually handling authentication and protocol communications.
Installation options are flexible, supporting deployment from source, Python package managers, or containers, making it straightforward to integrate with existing MCP-supported development environments.
The proxy is open-source and available now through the AWS GitHub repository at https://github.com/aws/mcp-proxy-for-aws with no additional cost beyond standard AWS service usage.

44:10 Matt – “This is a nice little tool to help with production…and easier stepping stone than having to build all this stuff yourself.”

47:07 Amazon ECS now supports built-in Linear and Canary deployments

Amazon ECS now includes native linear and canary deployment strategies alongside existing blue/green deployments, eliminating the need for external tools like AWS CodeDeploy for gradual traffic shifting.
Linear deployments shift traffic in equal percentage increments with configurable step sizes and bake times, while canary deployments route a small percentage to the new version before completing the shift.
The feature integrates with CloudWatch alarms for automatic rollback detection and supports deployment lifecycle hooks for custom validation steps.
Both strategies include a post-deployment bake time that keeps the old revision running after full traffic shift, enabling quick rollback without downtime if issues emerge.
Available now in all commercial AWS regions where ECS operates, the deployment strategies work with Application Load Balancer and ECS Service Connect configurations.
Customers can implement these strategies through Console, SDK, CLI, CloudFormation, CDK, and Terraform for both new and existing ECS services without additional cost beyond standard ECS pricing.
This brings ECS deployment capabilities closer to parity with Kubernetes native deployment options and reduces dependency on CodeDeploy for teams running containerized workloads.
The built-in approach simplifies deployment pipelines for organizations that previously needed separate deployment orchestration tools.

48:45 Jonathan – “I always wonder why they haven’t built these things previously, and I guess it was possible through CodeDeploy, but if it was possible through CodeDeploy, then why add it to ECS now? I feel like we kind of get this weird sprawl.”

50:35 Amazon Route 53 Resolver now supports AWS PrivateLink

Route 53 Resolver now supports AWS PrivateLink, allowing customers to manage DNS resolution features entirely over Amazon’s private network without traversing the public internet.
This includes all Resolver capabilities like endpoints, DNS Firewall, Query Logging, and Outposts integration.
The integration addresses security and compliance requirements for organizations that need to keep all AWS API calls within private networks. Operations like creating, deleting, and editing Resolver configurations can now be performed through VPC endpoints instead of public endpoints.
Available immediately in all regions where Route 53 Resolver operates, including AWS GovCloud (US) regions.
No additional feature announcements for pricing were mentioned, so standard Route 53 Resolver pricing applies, plus PrivateLink endpoint costs (typically $0.01 per hour per AZ plus data processing charges).
Primary use case targets enterprises with strict network isolation policies, particularly in regulated industries like finance and healthcare, where DNS management traffic must remain on private networks.
This complements existing hybrid DNS architectures using Resolver endpoints for on-premises connectivity.

51:04 Jonathan – “Good for anyone who wanted this!”

54:05 Mountpoint for Amazon S3 and Mountpoint for Amazon S3 CSI driver add monitoring capability

Mountpoint for Amazon S3 now emits near real-time metrics using the OpenTelemetry Protocol, allowing customers to monitor operations through CloudWatch, Prometheus, and Grafana instead of parsing log files manually.
This addresses a significant operational gap for teams running data-intensive workloads that mount S3 buckets as file systems on EC2 instances or Kubernetes clusters.
The new monitoring capability provides granular metrics, including request counts, latency, and error types at the EC2 instance level, enabling proactive troubleshooting of issues like permission errors or performance bottlenecks. Customers can now set up alerts and dashboards using standard observability tools rather than building custom log parsing solutions.
Integration works through CloudWatch agent or OpenTelemetry collector, making it compatible with existing monitoring infrastructure that many organizations already have deployed. The feature is available immediately for both the standalone Mountpoint client and the Mountpoint for Amazon S3 CSI driver used in Kubernetes environments.
This update is particularly relevant for machine learning workloads, data analytics pipelines, and containerized applications that treat S3 as a file system and need visibility into storage layer performance. Setup instructions are available in the Mountpoint GitHub repository with configuration examples for common observability platforms.

GCP

58:31 New Log Analytics query builder simplifies writing SQL code | Google Cloud Blog

Google Cloud has released the Log Analytics query builder to general availability, providing a UI-based interface that generates SQL queries automatically for users who need to analyze logs without deep SQL expertise.
The tool addresses the common challenge of extracting insights from nested JSON payloads in log data, which typically requires complex SQL functions like JSON_VALUE and JSON_EXTRACT that many DevOps engineers and SREs find time-consuming to write.
The query builder includes intelligent schema discovery that automatically detects and suggests JSON fields and values from your datasets, along with a real-time SQL preview so users can see the generated code and switch to manual editing when needed.
- Key capabilities include search across all fields, automatic aggregations and grouping, and one-click visualization to dashboards, making it practical for incident troubleshooting and root cause analysis workflows.
Google plans to expand the feature with cross-project log scopes, trace data integration for joining logs and traces, query saving and history, and natural language to SQL conversion using Gemini AI.
The query builder works with existing Log Analytics pricing, which is based on the amount of data scanned during queries, similar to BigQuery’s on-demand pricing model.
The tool integrates directly with Google Cloud’s observability stack, allowing users to query logs alongside BigQuery datasets and other telemetry types in a single interface.
This consolidation reduces context switching for teams managing complex distributed systems across multiple GCP services and projects.

1:00:01 Jonathan- “I think this is where everything is going. Why spend half an hour crafting a perfect SQL query…when you can have it figure it all out for you.”

1:01:12 GKE and Gemini CLI work better together | Google Cloud Blog

Google has open-sourced a GKE extension for Gemini CLI that integrates Kubernetes Engine operations directly into the command-line AI agent. The extension works as both a Gemini CLI extension and a Model Context Protocol server compatible with any MCP client, allowing developers to manage GKE clusters using natural language commands instead of verbose kubectl syntax.
The integration provides three main capabilities: GKE-specific context resources for more natural prompting, pre-built slash command prompts for complex workflows, and direct access to GKE tools, including Cloud Observability integration. Installation requires a single command for Gemini CLI users, with separate instructions available for other MCP clients.
The primary use case targets ML engineers deploying inference models on GKE who need help selecting appropriate models and accelerators based on business requirements like latency targets.
Gemini CLI can automatically discover compatible models, recommend accelerators, and generate deployable Kubernetes manifests through conversational interaction rather than manual configuration.
This builds on Gemini CLI’s extension architecture that bundles MCP servers, context files, and custom commands into packages that teach the AI agent how to use specific tools.
The GKE extension represents Google’s effort to make Kubernetes operations more accessible through AI assistance, particularly for teams managing AI workload deployments.
The announcement includes no pricing details as both Gemini CLI and the GKE extension are open source projects, though standard GKE cluster costs and any Gemini API usage charges would still apply during operation.

1:02:10 Matt – “Anything to make Kubernetes easier to manage, I’m on board for it.”

1:05:06 Master multi-tasking with the Jules extension for Gemini CLI | Google Cloud Blog

Google has launched the Jules extension for Gemini CLI, which acts as an autonomous coding assistant that handles background tasks like bug fixes, security patches, and dependency updates while developers focus on primary work.
Jules operates asynchronously using the /jules command, working in isolated environments to address multiple issues in parallel and creating branches for review.
The extension integrates with other Gemini CLI extensions to create automated workflows, including the Security extension for vulnerability analysis and remediation, and the Observability extension for crash investigation and automated unit test generation.
- This modular approach allows developers to chain together different capabilities for comprehensive task automation.
Jules addresses common developer productivity drains by handling routine maintenance tasks that typically interrupt deep work sessions. The tool can process multiple GitHub issues simultaneously, each in its own environment, and prepares fixes for human review rather than automatically committing changes.
The extension is available now as an open source project on GitHub at github.com/gemini-cli-extensions/jules, with no pricing information provided, as it appears to be a free developer tool.
Google is building an ecosystem of Gemini CLI extensions that can be combined with Jules for various development workflows.

1:06:16 Jonathan – “Google obviously listens to their customers because it was only half an hour ago when I said something like this would be pretty useful.”

1:11:36 Announcing GA of Cost Anomaly Detection | Google Cloud Blog

Google’s Cost Anomaly Detection has reached general availability with AI-powered alerts now enabled by default for all GCP customers across all projects, including new ones.
The service automatically monitors spending patterns and sends alerts to Billing Administrators when unusual cost spikes are detected, with no configuration required.
The GA release introduces AI-generated anomaly thresholds that adapt to each customer’s historical spending patterns, reducing alert noise by flagging only significant, unexpected deviations.
Customers can override these intelligent baselines with custom values if needed, and the system now supports both absolute-dollar thresholds and percentage-based deviation filters to accommodate projects of different sizes and sensitivities.
The improved algorithm solves the cold start problem that previously required six months of spending history, now providing immediate anomaly protection for brand new accounts and projects from day one.
- This addresses a key limitation from the public preview phase and ensures comprehensive cost monitoring regardless of account age.
Cost Anomaly Detection remains free as part of GCP’s cost management toolkit and integrates with Cloud Budgets to create a layered approach for preventing, detecting, and containing runaway cloud spending.
The anomaly dashboard provides root cause analysis to help teams quickly understand and address cost spikes when they occur.
Interested in pricing details? Check out the billing console here.

1:14:01 Elise – “I just wonder, there’s so many third-party companies that specialize in this kind of thing. So I wonder if they realized that they could just do a little bit better.”

Azure

1:16:37 Building the future together: Microsoft and NVIDIA announce AI advancements at GTC DC | Microsoft Azure Blog

Microsoft and NVIDIA are expanding their AI partnership with several infrastructure and model updates.
Azure Local now supports NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, enabling organizations to run AI workloads at the edge with cloud-like management through Azure Arc, targeting healthcare, retail, manufacturing, and government sectors requiring data residency and low-latency processing.
Azure AI Foundry adds NVIDIA Nemotron models for agentic AI and enterprise reasoning, plus NVIDIA Cosmos models for physical AI applications like robotics and autonomous vehicles.
Microsoft also introduced TRELLIS for 3D asset generation, all deployable as NVIDIA NIM microservices with enterprise-grade security and scalability.
Microsoft deployed the first production-scale cluster of NVIDIA GB300 NVL72 systems with over 4,600 Blackwell Ultra GPUs in the new NDv6 GB300 VM series.
Each rack delivers 130 TB/s of NVLink bandwidth and up to 136 kW of compute power, designed for training and deploying frontier models with integrated liquid cooling and Azure Boost for accelerated I/O.
Also, NVIDIA Run:ai is now available on Azure Marketplace, providing GPU orchestration and workload management across Azure NC and ND series instances. The platform integrates with AKS, Azure Machine Learning, and Azure AI Foundry to help enterprises dynamically allocate GPU resources, reduce costs, and improve utilization across teams.
Azure Kubernetes Service now supports NVIDIA Dynamo framework on ND GB200-v6 VMs, demonstrating 1.2 million tokens per second with the gpt-oss 120b model.
Microsoft reports up to 15x throughput improvement over Hopper generation for reasoning models, with deployment guides available for production implementations.

1:21:53 Jonathan – “That’s a really good salesy number to quote, though, 1.2 million tokens a second – that’s great, but that’s not an individual user. One individual user will not get 1.2 million tokens a second out of any model. That is, at full capacity with as many users running inference as possible on that cluster. The total generation output might be 1.2 million tokens a second, which is still phenomenal, but as far as the actual user experience, you know, if you were a business that wanted really fast inference, you’re not going to get 1.2 million tokens a second.”

1:23:26 Public Preview: Azure Functions zero-downtime deployments with rolling Updates in Flex Consumption

Azure Functions in the Flex Consumption plan now supports rolling updates for zero-downtime deployments through a simple configuration change.
This eliminates the need for forceful instance restarts during code or configuration updates, allowing the platform to gracefully transition workloads across instances.
Rolling updates work by gradually replacing old instances with new ones while maintaining active request handling, similar to deployment strategies used in container orchestration platforms.
This brings enterprise-grade deployment capabilities to serverless functions without requiring additional infrastructure management.
The capability is currently in public preview for the Flex Consumption plan specifically, which is Azure’s newer consumption-based pricing model that offers more flexibility than the traditional Consumption plan.
Pricing follows the standard Flex Consumption model based on execution time and memory usage, with no additional cost for the rolling update feature itself.

1:24:42 Matt – “It’s a nice quality of life feature that they’re adding to everything. It’s in preview, though, so don’t deploy production workloads leveraging this.”

1:25:06 The Azure PAYG API Shift: What’s Actually Changing (and Why It Matters)

Microsoft is deprecating the legacy Consumption API for Azure Pay-As-You-Go cost data retrieval and replacing it with two modern approaches: the Cost Details API for Enterprise and Microsoft Customer Agreement subscriptions, and the Exports API for PAYG and Visual Studio subscriptions.
This shifts from a pull model, where teams constantly query APIs, to a subscribe model where Azure delivers cost data directly to Azure Storage Accounts as CSV files.
The change addresses significant scalability and consistency issues with the old API that struggled with throttling, inconsistent schemas across different subscription types, and handling large enterprise-scale datasets.
The new APIs support FOCUS-compliant schemas, include reservations and savings plans data in single exports, and integrate better with Power BI and Azure Data Factory for FinOps automation.
FinOps teams need to audit existing scripts that call the Microsoft.Commerce/UsageAggregates endpoint and migrate to storage-based data ingestion instead of direct API calls.
While the legacy endpoint remains live but unsupported, Microsoft strongly recommends immediate migration, though the deprecation timeline may extend based on customer adoption rates.
The practical impact for cloud teams is more reliable cost data pipelines with fewer failed jobs, predictable scheduled exports eliminating API throttling issues, and consistent field mappings across all subscription types.
Teams should review Microsoft’s field mapping reference documentation, as column names have changed between the old and new APIs.
PAYG customers currently must use the Exports API with storage-based retrieval, though Microsoft plans to eventually extend Cost Details API support to PAYG subscriptions.
The transition requires updating data flow architecture but provides an opportunity to standardize FinOps processes across different Azure billing models.

1:27:12 Matt – “A year or two ago, we did an analysis at my day job, and we were trying to figure out the savings plan’s amount if we buy X amount, how much do we need to buy everything along those lines. And we definitely ran into like throttling issues, and it was just bombing out on us at a few points, and a lot of weird loops we had to do because the format just didn’t make sense with moderate stuff. It’s a great way. I would suggest you move not because they’re trying to get rid of it, but because it will make your life better.”

1:28:05 Generally Available: Azure WAF CAPTCHA Challenge for Azure Front Door

Azure WAF now includes CAPTCHA challenge capabilities for Front Door deployments, allowing organizations to distinguish between legitimate users and automated bot traffic.
This addresses common threats like credential stuffing, web scraping, and DDoS attacks that traditional WAF rules may miss.
The CAPTCHA feature integrates directly into Azure Front Door‘s WAF policy engine, enabling administrators to trigger challenges based on custom rules, rate limits, or anomaly detection patterns.
Organizations can configure CAPTCHA thresholds and exemptions without requiring changes to backend application code.
This capability targets e-commerce sites, financial services, and any web application experiencing bot-driven abuse or account takeover attempts.
The CAPTCHA challenge adds a human verification layer that complements existing WAF protections like OWASP rule sets and custom security policies.
Pricing follows the standard Azure Front Door WAF model with per-policy charges plus request-based fees, though specific CAPTCHA-related costs were not detailed in the announcement.
Organizations already using Front Door Premium can enable this feature through policy configuration updates.
The general availability means this protection is now production-ready across all Azure regions where Front Door operates, removing the need for third-party CAPTCHA services or custom bot mitigation solutions for many Azure customers.
We just wonder what we’re going to replace re: Captcha with when AI can click the button like a human can.

1:31:04 Public Preview: Instant Access Snapshots for Azure Premium SSD v2 and Ultra Disk Storage

Azure now offers Instant Access Snapshots in public preview for Premium SSD v2 and Ultra Disks, eliminating the traditional wait time for snapshot restoration. Previously, customers had to wait for snapshots to fully hydrate before using restored disks, but this feature allows immediate disk restoration with high performance right after snapshot creation.
This capability addresses a critical operational need for enterprises running high-performance workloads on Azure’s fastest storage tiers.
Premium SSD v2 and Ultra Disks are typically used for mission-critical databases, SAP HANA, and other latency-sensitive applications where downtime during recovery operations directly impacts business operations.
The feature reduces recovery time objectives for disaster recovery and backup scenarios, particularly valuable for customers who need rapid failover capabilities. Organizations can now create point-in-time copies and immediately spin up test environments or recover from failures without the performance penalty of background hydration processes.
This positions Azure’s premium storage offerings more competitively against AWS’s EBS snapshots with fast snapshot restore and Google Cloud’s instant snapshots.
The preview status means customers should test thoroughly before production use, and Microsoft has not yet announced general availability timing or any pricing changes specific to this snapshot capability.

Closing

328: Shhh… It’s a Secret Region!

Wed, 05 Nov 2025 23:48:49 +0000

Welcome to episode 328 of The Cloud Pod, where the forecast is always cloudy! Justin, Ryan, and Matt are on board today to bring you all the latest news in cloud and AI, including secret regions (this one has the aliens), ongoing discussions between Microsoft and OpenAI, and updates to Nova, SQL, and OneLake -and even the latest installment of Cloud Journeys. Let’s get started!

Titles we almost went with this week

CloudWatch’s New Feature: Because Nobody Likes Writing Incident Reports at 3 AM
DNS: Did Not Survive – The Great US-EAST-1 Outage of 2025
404 DevOps Not Found: The AWS Automation Adventure mk
When Your DevOps Team Gets Replaced by AI and Then Everything Crashes
Database Migrations Get the ChatGPT Treatment: Just Vibe Your Schema Changes
AWS DevOps Team Gets the AI Treatment: 40% Fewer Humans, 100% More Questions
Breaking Up is Hard to Compute: Microsoft and OpenAI Redefine Their Relationship
AWS Goes Full Scope: Now Tracking Your Cloud’s Carbon from Cradle to Gate
Platform Engineering: When Your Golden Path Leads to a Dead End
DynamoDB’s DNS Disaster: How a Race Condition Raced Through AWS
AI Takes Over AWS DevOps Jobs, Servers Take Unscheduled Vacation
PostgreSQL Scaling Gets a 30-Second Makeover While AWS Takes a Coffee Break
The Domino Effect: When DynamoDB Drops, Everything Drops
RAG to Riches: Amazon Nova Learns to Cite Its Sources
AWS Finally Tells You When Your EC2 Instance Can’t Keep Up With Your Storage Ambitions
AWS Nova Gets Grounded: No More Hallucinating About Reality
One API to Rule Them All: OneLake’s Storage Compatibility Play
OpenAI gets to pay Alimony
Database schema deployments are totally a vibe
AWS will tell you how not green you are today, now in 3 scopes

General News

02:00 DDoS in September | Fastly

Fastly‘s September DDoS report reveals a notable 15.5 million requests per second attack that lasted over an hour, demonstrating how modern application-layer attacks can sustain extreme throughput with real HTTP requests rather than simple pings or amplification techniques.
Attack volume in September dropped to 61% of August levels, with data suggesting a correlation between school schedules and attack frequency: lower volumes coincide with school breaks, while higher volumes occur when schools are in session.
Media & Entertainment companies faced the highest median attack sizes, followed by Education and High Technology sectors, with 71% of September’s peak attack day attributed to a single enterprise media company.
The sustained 15 million RPS attack originated from a single cloud-provider ASN, using sophisticated daemons that mimicked browser behavior, making detection more challenging than typical DDoS patterns.
Organizations should evaluate whether their incident response runbooks can handle hour-long attacks at 15+ million RPS, as these sustained high-throughput attacks require automated mitigation rather than manual intervention.
Listen, we’re not inviting a DDoS attack, but also…we’ll just turn off the website, so there’s that.

AI Is Going Great – Or How ML Makes Money

04:41 Google AI Studio updates: More control, less friction

Google AI Studio introduces “vibe coding” – a new AI-powered development experience that generates working multi-modal apps from natural language prompts without requiring API key management or manual service integration.
The platform now automatically connects appropriate models and APIs based on app descriptions, supporting capabilities like Veo for video generation, Nano Banana for image editing, and Google Search for source verification.
New Annotation Mode enables visual app modifications by highlighting UI elements and describing changes in plain language rather than editing code directly
The updated App Gallery provides visual examples of Gemini-powered applications with instant preview, starter code access, and remix capabilities for rapid prototyping
Users can add personal API keys to continue development when free-tier quotas are exhausted, with automatic switching back to the free tier upon renewal.
Are you a visual learner? You can check out their YouTube tutorial playlist here.

05:39 Justin – “So, there are still API keys – they made it sound like there wasn’t, but there is. You just don’t have to manage them until you’ve consumed your free tier.”

09:35 OpenAI takes aim at Microsoft 365 Copilot • The Register

OpenAI launched “company knowledge” for ChatGPT Business, Enterprise, and Edu plans, enabling direct integration with corporate data sources, including Slack, SharePoint, Google Drive, Teams, and Outlook; notably excluding OneDrive, which could impact Microsoft-heavy organizations.
The feature requires manual activation for each conversation and lacks capabilities like web search, image generation, or graph creation when enabled, unlike Microsoft 365 Copilot‘s deeper integration across Office applications.
ChatGPT Business pricing at $25/user/month undercuts Microsoft 365 Copilot’s $30/month fee, potentially offering a more cost-effective enterprise AI assistant option with stronger brand recognition. (5 bucks is 5 bucks, right?)
Security implementation includes individual authentication per connector, encryption of all data, no training on corporate data, and an Enterprise Compliance API for conversation log review and regulatory reporting.
Data residency and processing locations vary by connector, with no clear documentation from OpenAI, requiring organizations to verify compliance requirements before deployment.
We kind of think we’ve heard of this before…

11:05 Ryan – “And it’s a huge problem. It’s been a huge problem that people have been trying to solve for a long time.”

14:23 The next chapter of the Microsoft–OpenAI partnership – The Official Microsoft Blog

Welp, the divorce has reached a (sort of) amicable alimony agreement.
Microsoft and OpenAI have restructured their partnership with Microsoft, now holding approximately 27% stake in OpenAI’s new public benefit corporation, which is now valued at $135 billion, while maintaining exclusive Azure API access and IP rights until AGI is achieved.
The agreement introduces an independent expert panel to verify AGI declarations and extends Microsoft’s IP rights for models and products through 2032, including post-AGI models with safety guardrails, though research IP expires by 2030 or AGI verification.
OpenAI gains significant operational flexibility, including the ability to develop non-API products with third parties on any cloud provider, release open weight models meeting capability criteria, and serve US government national security customers on any cloud infrastructure.
Microsoft can now independently pursue AGI development alone or with partners, and if using OpenAI’s IP pre-AGI, must adhere to compute thresholds significantly larger than current leading model training systems.
OpenAI has committed to purchasing $250 billion in Azure services while Microsoft loses its right of first refusal as OpenAI’s compute provider, signaling a shift toward more independent operations for both companies.

Con’t The next chapter of the Microsoft–OpenAI partnership | OpenAI

Microsoft’s investment in OpenAI is now valued at approximately $135 billion, representing roughly 27% ownership on a diluted basis, while OpenAI transitions to a public benefit corporation structure.
The partnership introduces an independent expert panel to verify when OpenAI achieves AGI, with Microsoft’s IP rights for models and products extended through 2032, including post-AGI models with safety guardrails.
OpenAI gains significant flexibility, including the ability to develop non-API products with third parties on any cloud provider, release open weight models meeting capability criteria, and provide API access to US government national security customers on any cloud.
Microsoft can now independently pursue AGI development alone or with partners, while OpenAI has committed to purchasing an additional $250 billion in Azure services, but Microsoft no longer has the right of first refusal as a compute provider.
The revenue-sharing agreement continues until AGI verification, but payments will be distributed over a longer timeframe, while Microsoft retains exclusive rights to OpenAI’s frontier models and Azure API exclusivity until AGI is achieved.

15:59 Justin – “Once AGI is achieved is an interesting choice… I wonder how Microsoft believes that’s gonna happen very soon, and OpenAI doesn’t, that’s why they’re willing to agree on that term; it’s interesting. Again, it has to be independently verified by a partner, so OpenAI can’t just come out and say, ‘we’ve created AGI,’ then, into a legal dispute – it has to be agreed upon by others. So that’s all very interesting.”

17:45 Build more accurate AI applications with Amazon Nova Web Grounding | AWS News Blog

AWS announces general availability of Web Grounding for Amazon Nova Premier, a built-in RAG tool that automatically retrieves and cites current web information during inference.
The feature eliminates the need to build custom RAG pipelines while reducing hallucinations through automatic source attribution and verification.
Web Grounding operates as a system tool within the Bedrock Converse API, allowing Nova models to intelligently determine when to query external sources based on prompt context.
Developers simply add nova_grounding to the toolConfig parameter, and the model handles retrieval, integration, and citation of public web sources automatically.
The feature is currently available only in US East N. Virginia for Nova Premier, with Ohio and Oregon regions coming soon, and support for other Nova models planned.
Additional costs apply beyond standard model inference pricing, detailed on the Amazon Bedrock pricing page.
Primary use cases include knowledge-based chat assistants requiring current information, content generation tools needing fact-checking, research applications synthesizing multiple sources, and customer support where accuracy and verifiable citations are essential.
The reasoning traces in responses allow developers to follow the model’s decision-making process.
The implementation provides a turnkey alternative to custom RAG architectures, particularly valuable for developers who want to focus on application logic rather than managing complex information retrieval systems while maintaining transparency through automatic source attribution.

18:36 Justin – “This is the first time I’ve heard anything about Nova in months, so, good to know?”

Cloud Tools

19:34 Introducing-ai-powered-database-migration-authoring

Harness introduces AI-powered database migration authoring that lets developers describe schema changes in plain English, like “create a table named animals with columns for genus_species,” and automatically generates production-ready SQL migrations with rollback scripts and Git integration.
The tool addresses the “AI Velocity Paradox” where 63% of organizations ship code faster with AI, but 72% have suffered production incidents from AI-generated code – by extending AI automation to database changes, which remain a manual bottleneck in most CI/CD pipelines.
Built on Harness’s Software Delivery Knowledge Graph and MCP Server, it analyzes current schemas, generates backward-compatible migrations, validates for compliance, and integrates with existing policy-as-code governance – making it more than just a generic SQL generator.
Database DevOps is one of Harness’s fastest-growing modules, with customers like Athenahealth reporting they saved months of engineering effort compared to Liquibase Pro or homegrown solutions while getting better governance and visibility.
This positions databases as first-class citizens in CI/CD pipelines rather than the traditional midnight deployment bottleneck, allowing DBAs to maintain oversight through automated approvals while developers can finally move database changes at DevOps speed.

20:44 Ryan – “Given how hard this is for humans to do, I look forward to AI doing this better.”

AWS

21:38 Amazon Allegedly Replaced 40% of AWS DevOps With AI Days Before Crash

An unverified report claims Amazon replaced 40% of AWS DevOps staff with AI systems capable of automatically fixing IAM permissions, rebuilding VPC configurations, and rolling back failed Lambda deployments, just days before their widely reported on crash.
AWS has not confirmed this, and skepticism remains high, however.
The timing coincides with a recent AWS outage that impacted major services, including Snapchat, McDonald’s app, Roblox, and Fortnite, raising questions about automation’s role in system reliability and incident response.
AWS officially laid off hundreds of employees in July 2025 (and more just recently), but the alleged 40% DevOps reduction would represent a significant shift toward AI-driven infrastructure management if true.
The incident highlights growing concerns about cloud service concentration risk, as both this AWS outage and the 2023 CrowdStrike incident demonstrate how single points of failure can impact thousands of businesses globally.
For AWS customers, this raises practical questions about the balance between automation efficiency and human oversight in critical infrastructure operations, particularly for disaster recovery and complex troubleshooting scenarios.

22:19 Justin – “In general, Amazon has been doing a lot of layoffs. There’s been a lot of brain drain. I don’t know that they’ve automated 40% of the DevOps staff with AI systems…so this one seems a little rumor-y and speculative, but I did find it fun that people were trying to blame AI for Amazon’s woes last week.”

24:41 Summary of the Amazon DynamoDB Service Disruption in Northern Virginia (US-EAST-1) Region

DynamoDB experienced a 2.5-hour outage in US-EAST-1 due to a race condition in its DNS management system that resulted in empty DNS records, affecting all services dependent on DynamoDB, including EC2, Lambda, and Redshift.
The cascading failure pattern showed how tightly coupled AWS services are – EC2 instance launches failed for 14 hours because DynamoDB’s outage prevented lease renewals between EC2’s DropletWorkflow Manager and physical servers.
Network Load Balancers experienced connection errors from 5:30 AM to 2:09 PM due to health check failures caused by EC2’s network state propagation delays, demonstrating how infrastructure dependencies can create extended recovery times.
AWS has disabled the automated DNS management system globally and will implement velocity controls and improved throttling mechanisms before re-enabling, highlighting the challenge of balancing automation with resilience.
The incident reveals architectural vulnerabilities in multi-service dependencies – services like Redshift in all regions failed IAM authentication due to hardcoded dependencies on US-EAST-1, suggesting the need for better regional isolation.

26:31 Matt – “It’s a good write-up to show that look, even these large cloud providers that have these massive systems and have redundancy upon redundancy upon redundancy – it’s all software under the hood. Software will eventually have a bug in it. And this just happens to be a really bad bug that took down half the internet.”

28:30 Amazon CloudWatch introduces interactive incident reporting

CloudWatch now automatically generates post-incident analysis reports by correlating telemetry data, investigation inputs, and actions taken during an investigation, reducing report creation time from hours to minutes.
Reports include executive summaries, event timelines, impact assessments, and actionable recommendations, helping teams identify patterns and implement preventive measures for better operational resilience.
The feature integrates directly with CloudWatch investigations, capturing operational telemetry and service configurations automatically without manual data collection or correlation.
Currently available in 12 AWS regions, including US East, Europe, and Asia Pacific, with no specific pricing mentioned – likely included in existing CloudWatch investigation costs.
This addresses a common pain point where teams spend significant time manually creating incident reports instead of focusing on root cause analysis and prevention strategies.

31:00 Customer Carbon Footprint Tool Expands: Additional emissions categories including Scope 3 are now available | AWS News Blog

AWS Customer Carbon Footprint Tool now includes Scope 3 emissions data covering fuel/energy-related activities, IT hardware lifecycle emissions, and building/equipment impacts, giving customers a complete view of their carbon footprint beyond just direct operational emissions.
The tool provides both location-based and market-based emission calculations with 38 months of historical data recalculated using the new methodology, accessible through the AWS Billing console with CSV export and integration options for QuickSight visualization.
Scope 3 emissions are amortized over asset lifecycles (6 years for IT hardware, 50 years for buildings) to fairly distribute embodied carbon across operational lifetime, with all calculations independently verified following GHG Protocol standards.
Early access customers like Salesforce, SAP, and Pinterest report that the granular regional data and Scope 3 visibility help them move beyond industry averages to make targeted carbon reduction decisions based on actual infrastructure emissions.
The tool remains free to use within the AWS Billing and Cost Management console, providing emissions data in metric tons of CO2 equivalent (MTCO2e) to help organizations track progress toward sustainability goals and compliance reporting requirements.

32:45 Matt – “This is a difficult problem to solve. Once you have scope three, it’s all your indirect costs. So, I think if I remember correctly, scope one is your actual server, scope two is power, and then scope three is all the things that have to get included to generate your power and your servers, which includes shipping, et cetera. So getting all that, it’s not an easy task to do. Even when I look at the numbers, I don’t know what these mean half the time when I have to look at them. I’m like, we’re going down. That seems positive.”

33:59 AWS Secret-West Region is now available

AWS launches Secret-West, its second region capable of handling Secret-level U.S. classified workloads, expanding beyond the existing Secret-East region to provide geographic redundancy for intelligence and defense agencies operating in the western United States.
The region meets stringent Intelligence Community Directive (ICD) 503 and DoD Security Requirements Guide Impact Level 6 requirements, enabling government agencies to process and analyze classified data with multiple Availability Zones for high availability and disaster recovery.
This expansion allows agencies to deploy latency-sensitive classified workloads closer to western U.S. operations while maintaining multi-region resiliency, addressing a critical gap in classified cloud infrastructure outside the eastern United States.
AWS continues to operate in a specialized market segment with limited competition, as few cloud providers can meet the security clearance and infrastructure requirements necessary for Secret-level classification hosting.
Pricing information is not publicly available due to the classified nature of the service; interested government agencies must contact AWS directly through their secure channels to discuss access and costs.

Agent Coulson – “Welcome to level 7.”

38:24 AWS Transfer Family now supports changing identity provider type on a server

AWS Transfer Family now allows changing identity provider types (service managed, Active Directory, or custom IdP) on existing SFTP, FTPS, and FTP servers without service interruption, eliminating the need to recreate servers during authentication migrations.
This feature enables zero-downtime authentication migrations for organizations transitioning between identity providers or consolidating authentication systems, particularly useful for companies undergoing mergers or updating compliance requirements.
The capability is available across all AWS regions where Transfer Family operates, with no additional pricing beyond standard Transfer Family costs, which start at $0.30 per protocol per hour.
Organizations can now adapt their file transfer authentication methods dynamically as business needs evolve, such as switching from basic service-managed users to enterprise Active Directory integration without disrupting ongoing file transfers.
Implementation details and migration procedures are documented in the Transfer Family User Guide here.

39:26 Ryan – “Any kind of configuration change that requires you to destroy and recreate isn’t fun. I do believe that we should architect for such things and be able to redirect things with DNS traffic (which never goes wrong), never causes anyone any problems. But, it is terrible when that happens, because even when it works, you’re sort of nervously doing it the entire time.”

40:24 New Amazon CloudWatch metrics to monitor EC2 instances exceeding I/O performance

AWS introduces Instance EBS IOPS Exceeded Check and Instance EBS Throughput Exceeded Check metrics that return binary values (0 or 1) to indicate when EC2 instances exceed their EBS-optimized performance limits, helping identify bottlenecks without manual calculation.
These metrics enable automated responses through CloudWatch alarms, such as triggering instance resizing or type changes when I/O limits are exceeded, reducing manual intervention for performance optimization.
Available at no additional cost with 1-minute granularity for all Nitro-based EC2 instances with attached EBS volumes across all commercial AWS regions, including GovCloud and China.
Addresses a common blind spot where applications experience degraded performance due to exceeding instance-level I/O limits rather than volume-level limits, which many users overlook when troubleshooting. (Yes, we’re all guilty of this.)
Particularly useful for database workloads and high-throughput applications where understanding whether the bottleneck is at the instance or volume level is critical for right-sizing decisions.

41:20 Matt – “This would have solved a lot of headaches when GP3 came out…”

GCP

43:53 A practical guide to Google Cloud’s Parameter Manager | Google Cloud Blog

Google Cloud Parameter Manager provides centralized configuration management that separates application settings from code, supporting JSON, YAML, and unformatted data with built-in format validation for JSON and YAML types
The service integrates with Secret Manager through a __REF__ syntax that allows parameters to securely reference secrets like API keys and passwords, with regional compliance enforcement ensuring secrets can only be referenced by parameters in the same region
Parameter Manager uses versioning for configuration snapshots, enabling safe rollbacks and preventing unintended breaking changes to deployed applications while supporting use cases like A/B testing, feature flags, and regional configurations
Both Parameter Manager and Secret Manager offer monthly free tiers, though specific pricing details aren’t provided in the announcement; the service requires granting IAM permissions for parameters to access referenced secrets
Key benefits include eliminating hard-coded configurations, supporting multi-region deployments with region-specific settings, and enabling dynamic configuration updates without code changes for applications across various industries

44:22 Justin – “ I’m a very heavy user of parameter store on AWS. I love it, and you should all use it for any of your dynamic configuration, especially if you’re moving containers between environments. This is the bee’s knees in my opinion.”

49:39 Cross-Site Interconnect, now GA, simplifies L2 connectivity | Google Cloud Blog

Cross-Site Interconnect is now GA, providing managed Layer 2 connectivity between data centers using Google’s global network infrastructure, eliminating the need for complex multi-vendor setups and reducing capital expenditures for WAN connectivity.
The service offers consumption-based pricing with no setup fees or long-term commitments, allowing customers to scale bandwidth dynamically and pay only for what they use, though specific pricing details weren’t provided in the announcement.
Built on Google’s 3.2 million kilometers of fiber and 34 subsea cables (and you know how much we love a good undersea cable).
Cross-Site Interconnect provides a 99.95% SLA that includes protection against cable cuts and maintenance windows, with automatic failover and proactive monitoring across 100s of Cloud Interconnect PoPs.
Financial services and telecommunications providers are early adopters, with Citadel reporting stable performance during their pilot program, highlighting use cases for low-latency trading, disaster recovery, and dynamic bandwidth augmentation for AI/ML workloads.
As a transparent Layer 2 service, it enables MACsec encryption between remote routers with customer-controlled keys, while providing programmable APIs for infrastructure-as-code workflows and real-time monitoring of latency, packet loss, and bandwidth utilization.

50:57 Ryan – “I mean, I like this just because of the heavy use of infrastructure as code availability. Some of these deep-down network services across the clouds don’t really provide that; it’s all just sort of click ops or a support case. So this is kind of neat. And I do like that you can dynamically configure this and stand it up / turn it down pretty quickly.”

53:12 Introducing Bigtable tiered storage | Google Cloud Blog

Bigtable introduces tiered storage that automatically moves data older than a configurable threshold from SSD to infrequent access storage, reducing storage costs by up to 85% while maintaining API compatibility and data accessibility through the same interface.
The infrequent access tier provides 540% more storage capacity per node compared to SSD-only nodes, enabling customers to retain historical data for compliance and analytics without manual archiving or separate systems.
Time-series workloads from manufacturing, automotive, and IoT benefit most – sensor data, EV battery telemetry, and factory equipment logs can keep recent data on SSD for real-time operations while moving older data to cheaper storage automatically based on age policies.
Integration with Bigtable SQL allows querying across both tiers, and logical views enable controlled access to historical data for reporting without full table permissions, simplifying data governance for large datasets.
Currently in preview with pricing at approximately $0.026/GB/month for infrequent access storage compared to $0.17/GB/month for SSD storage, representing significant savings for organizations storing hundreds of terabytes of historical operational data.

54:31 Ryan – “To illustrate that I’m still a cloud guy at heart, whenever I’m in an application and I’m loading data and I go back – like I want to see a year’s data – and it takes that extra 30 seconds to load, I actually get happy, because I know what they’re doing on the backend.”

56:05 Now Shipping A4X Max, Vertex AI Training and more | Google Cloud Blog

Google launches A4X Max instances powered by NVIDIA GB300 NVL72 with 72 Blackwell Ultra GPUs and 36 Grace CPUs, delivering 2x network bandwidth compared to A4X and 4x better LLM training performance versus A3 H100-based VMs. The system features 1.4 exaflops per NVL72 system and can scale to clusters twice as large as A4X deployments.
GKE now supports DRANET (Dynamic Resource Allocation Kubernetes Network Driver) in production, starting with A4X Max, providing topology-aware scheduling of GPUs and RDMA network cards to boost bus bandwidth for distributed AI workloads.
This improves cost efficiency through better VM utilization by optimizing connectivity between RDMA devices and GPUs.
GKE Inference Gateway integrates with NVIDIA NeMo Guardrails to add safety controls for production AI deployments, preventing models from engaging in undesirable topics or responding to malicious prompts.
The integration combines model-aware routing and autoscaling with enterprise-grade security features.
Vertex AI Model Garden will support NVIDIA Nemotron models as NIM microservices, starting with Llama Nemotron Super v1.5, allowing developers to deploy open-weight models with granular control over machine types, regions, and VPC security policies.
Vertex AI Training now includes curated recipes built on NVIDIA NeMo Framework and NeMo-RL with managed Slurm environments and automated resiliency features for large-scale model development.
A4X Max is available in preview through Google Cloud sales representatives and leverages Cluster Director for lifecycle management, topology-aware placement, and integration with Managed Lustre storage.
Pricing details were not disclosed in the announcement.

57:41 Justin – “That’s a lot of cool hardware stuff that I do not understand.”

Azure

58:38 NVIDIA GB300 NVL72: Next-generation AI infrastructure at scale | Microsoft Azure Blog

Microsoft deployed the first production cluster with over 4,600 NVIDIA GB300 NVL72 systems featuring Blackwell Ultra GPUs, enabling AI model training in weeks instead of months and supporting models with hundreds of trillions of parameters
The ND GB300 v6 VMs deliver 1,440 petaflops of FP4 performance per rack with 72 GPUs, 37TB of fast memory, and 130TB/second NVLink bandwidth, specifically optimized for reasoning models, agentic AI, and multimodal generative AI workloads
Azure implemented 800 Gbps NVIDIA Quantum-X800 InfiniBand networking with full fat-tree architecture and SHARP acceleration, doubling effective bandwidth by performing computations in-switch for improved large-scale training efficiency
The infrastructure uses standalone heat exchanger units and new power distribution models to handle high-density GPU clusters, with Microsoft planning to scale to hundreds of thousands of Blackwell Ultra GPUs across global datacenters
OpenAI and Microsoft are already using these clusters for frontier model development, with the platform becoming the standard for organizations requiring supercomputing-scale AI infrastructure (pricing is not specified in the announcement).

59:55 Ryan – “Companies looking for scale – companies with a boatload of money.”

1:00:23 Generally Available: Near-zero downtime scaling for HA-enabled Azure Database for PostgreSQL servers

Azure Database for PostgreSQL servers with high availability can now scale in under 30 seconds compared to the previous 2-10 minute window, reducing downtime by over 90% for database scaling operations.
This feature targets production workloads that require continuous availability during infrastructure changes, particularly benefiting e-commerce platforms, financial services, and SaaS applications that cannot afford extended maintenance windows.
The near-zero downtime scaling works specifically with HA-enabled PostgreSQL instances, leveraging Azure’s high availability architecture to perform seamless compute and storage scaling without disrupting active connections.
While pricing remains unchanged from standard PostgreSQL rates, the reduced downtime translates to lower operational costs by minimizing revenue loss during scaling events and reducing the need for complex maintenance scheduling.
This enhancement positions Azure PostgreSQL competitively against AWS RDS and Google Cloud SQL, which still require longer downtime windows for similar scaling operations on their managed PostgreSQL offerings.

1:01:16 Matt – “They’ve had this for forever on Azure SQL, which is their Microsoft SQL platform, so it doesn’t surprise me. It surprised me more that this was already a two-to-10-minute window to scale. Seems crazy for a production HA service.”

1:02:10 OneLake APIs: Bring your apps and build new ones with familiar Blob and ADLS APIs | Microsoft Fabric Blog | Microsoft Fabric

OneLake now supports Azure Blob Storage and ADLS APIs, allowing existing applications to connect to Microsoft Fabric’s unified data lake without code changes – just swap endpoints to onelake.dfs.fabric.microsoft.com or onelake.blob.fabric.microsoft.com. What could go wrong?
This API compatibility eliminates migration barriers for organizations with existing Azure Storage investments, enabling immediate use of tools like Azure Storage Explorer with OneLake while preserving existing scripts and workflows
The feature targets enterprises looking to consolidate data lakes without rewriting applications, particularly those using C# SDKs or requiring DFS operations for hierarchical data management
Microsoft provides an end-to-end guide demonstrating open mirroring to replicate on-premises data to OneLake Delta tables, positioning this as a bridge between traditional storage and Fabric’s analytics ecosystem
No specific pricing mentioned for OneLake API access – costs likely follow standard Fabric capacity pricing model based on compute and storage consumption

Cloud Journey

1:03:47 8 platform engineering anti-patterns | InfoWorld

Platform engineering initiatives are failing at an alarming rate because teams treat the visual portal as the entire platform rather than building solid backend APIs and orchestration first. The 2024 DORA Report found that dedicated platform engineering teams actually decreased throughput by 8% and change stability by 14%, showing that implementation mistakes have serious downstream consequences.
The biggest mistake organizations make is copying approaches from large companies like Spotify without considering ROI for their scale. Mid-size companies invest the same effort as enterprises with thousands of developers but see minimal returns, making reference architectures often impractical for solving real infrastructure abstraction challenges.
Successful platform adoption requires shared ownership where developers can contribute plugins and customizations rather than top-down mandates. Spotify achieves 100% employee adoption of their internal Backstage by allowing engineers to build their own plugins like Soundcheck, proving that developer autonomy drives platform usage.
Organizations must survey specific user subsets because Java developers, QA testers, and SREs have completely different requirements from an internal developer platform. Tracking surface metrics like onboarded users misses the point when platforms should measurably improve time to market, reduce costs, and increase innovation rather than just showing DORA metrics.
Simply rebranding operations teams as platform engineering without a cultural shift and product mindset creates more toil than it reduces. Platforms need to be treated as products requiring continuous improvement, user research, internal marketing, and incremental development, starting with basic CI/CD touchpoints rather than attempting to solve every problem on day one.

Closing

327: AWS Finally Admits Kubernetes is Hard, Makes Robots Do It Instead

Thu, 30 Oct 2025 00:26:49 +0000

Welcome to episode 327 of The Cloud Pod, where the forecast is always cloudy! Justin, Matt, and Ryan are here to bring you all the latest news (and a few rants) in the worlds of Cloud and AI. I’m sure all our readers are aware of the AWS outage last week, as it was in all the news everywhere. But we’ve also got some new AI models (including Sora in case you’re low on really crappy videos the youths might like), plus EKS, Kubernetes, Vertex AI, and more. Let’s get started!

Titles we almost went with this week

Oracle and Azure Walk Into a Cloud Bar: Nobody Gets ETL’d
When DNS Goes Down, So Does Your Monday: AWS Takes Half the Internet on a Coffee Break
404 Cloud Not Found: AWS Proves Even the Internet’s Phone Book Can Get Lost
DNS: Definitely Not Staffed – How AWS Lost Its Way When It Lost Its People
When Larry Met Satya: A Cloud Love Story
Azure Finally Answers ‘Dude, Where’s My Data?’ with Storage Discovery
Breaking: Microsoft Discovers AI Training Uses More Power Than a Small Country
404 Engineers Not Found – AWS Learns the Hard Way That People Are Its Most Critical Infrastructure
Azure Storage Discovery: Finding Your Data Needles in the Cloud Haystack
EKS Auto Mode: Because Even Your Clusters Deserve Cruise Control
Azure Gets Reel: Microsoft Adds Video Generation to AI Foundry
The Great Token Heist: Vertex AI Steals 90% Off Your Gemini Bills
Cache Me If You Can: Vertex AI’s Token-Saving Feature
IaC Just Got a Manager – And It’s Not Your Boss
From Musk to Microsoft: Grok 4 Makes the Great Cloud Migration
No Harness.. You are not going to make IACM happen
Microsoft Drafts a Solution to Container Creation Chaos
PowerShell to the People: Azure Simplifies the Great Gateway Migration
IP There Yet? Azure’s Scripts Keep Your Address While You Upgrade

Follow Up

00:53 Glacier Deprecation Email

Standalone Amazon Glacier service (vault-based with separate APIs) will stop accepting new customers as of December 15, 2025.
S3 Glacier storage classes (Instant Retrieval, Flexible Retrieval, Deep Archive) are completely unaffected and continue normally
Existing Glacier customers can keep using it forever – no forced migration required.
AWS is essentially consolidating around S3 as the unified storage platform, rather than maintaining two separate archival services.
The standalone service will enter maintenance mode, meaning there will be no new features, but the service will remain operational.
Migration to S3 Glacier is optional but recommended for better integration, lower costs, and more features. (Justin assures us it is actually slightly cheaper, so there’s that.)

General News

02:24 F5 discloses major security breach linked to nation-state hackers – GeekWire

F5 disclosed that nation-state hackers maintained persistent access to their internal systems over the summer of 2024, stealing portions of BIG-IP source code and vulnerability details before containment in August.
The breach compromised product development and engineering systems, but did not affect customer CRM data, financial systems, or F5’s software supply chain, according to independent security audits.
F5 has released security patches for BIG-IP, F5OS, and BIG-IP Next products and is providing threat-hunting guides to help customers monitor for suspicious activity.
This represents the first publicly disclosed breach of F5’s internal systems, notable given that F5 handles traffic for 80% of Fortune Global 500 companies through its load-balancing and security services.
The incident highlights supply chain security concerns, as attackers targeted source code and vulnerability information, rather than customer data, potentially seeking ways to exploit F5 products deployed across enterprise networks.

03:12 Justin – “A little concerning on this one, mostly because F5 is EVERYWHERE.”

AI is Going Great – Or How ML Makes Money

04:55 Claude Code gets a web version—but it’s the new sandboxing that really matters – Ars Technica

Anthropic launched web and mobile interfaces for Claude Code, their CLI-based AI coding assistant, with the web version supporting direct access to GitHub repositories and the ability to process general instructions, such as “add real-time inventory tracking to the dashboard.”
The web interface introduces multi-session support, allowing developers to run and switch between multiple coding sessions simultaneously through a left-side panel, plus the ability to provide mid-task corrections without canceling and restarting
A new sandboxing runtime has been implemented to improve security and reduce friction, moving away from the previous approach where Claude Code required permission for most changes and steps during execution
The mobile version is currently limited to iOS and is in an earlier development stage compared to the web interface, indicating a phased rollout approach
This positions Claude Code as a more accessible alternative to traditional CLI-only AI coding tools, potentially expanding its reach to developers who prefer web-based interfaces over command-line environments

05:51 Ryan – “I haven’t had a chance to play with the web version, but I am interested in it just because I found the terminal interface limiting, but I also feel like a lot of the value is in that local sort of execution and not in the sandbox. A lot of the tasks I do are internal and require access to either company resources or private networks, or the kind of thing where you’re not going to get that from a publicly hosted sandbox environment.”

08:36 Open Source: Containerization Assist MCP Server

Containerization Assist automates the tedious process of creating Dockerfiles and Kubernetes manifests, eliminating manual errors that plague developers during the containerization process
Built on AKS Draft’s proven foundation, this open-source tool goes beyond basic AI coding assistants by providing a complete containerization platform rather than just code suggestions.
The tool addresses a critical pain point where developers waste hours writing boilerplate container configurations and debugging deployment issues caused by manual mistakes. (Listener beware, Justin mini rant here.)
As an open-source MCP (Model Context Protocol) server, it integrates seamlessly with existing development workflows while leveraging Microsoft’s containerization expertise from Azure Kubernetes Service. (Expertise is a stretch.)
This launch signals Microsoft’s commitment to simplifying Kubernetes adoption by removing the steep learning curve associated with container orchestration and manifest creation – or you could just use a pass.

09:47 Matt – “The piece I did like about this is that it integrated in as an optional feature, kind of the trivia and the security thing. So it’s not just setting it up, but they integrated the next steps of security code scanning. It’s not Microsoft saying, you know, hey, it’s standard … they are building security in, hopefully.”

Cloud Tools

33:09 IaC is Great, But Have You Met IaCM?

IaCM (Infrastructure as Code Management) extends traditional IaC by adding lifecycle management capabilities, including state management, policy enforcement, and drift detection to handle the complexity of infrastructure at scale.
Key features include centralized state file management with version control, module and provider registries for reusable components, and automated policy enforcement to ensure compliance without slowing down teams.
The platform integrates directly into CI/CD workflows with visual PR insights showing cost estimates and infrastructure changes before deployment, solving the problem of unexpected costs and configuration conflicts.
IaCM addresses critical pain points like configuration drift, secret exposure in state files, and resource conflicts when multiple teams work on the same infrastructure simultaneously.
Harness IaCM specifically supports OpenTofu and Terraform with features like Variable Sets, Workspace Templates, and Default Pipelines to standardize infrastructure delivery across organizations.

13:04 Justin – “So let me boil this down for you. We created our own Terraform Enterprise or Terraform Cloud, but we can’t use that name because it’s copyrighted. So we’re going to try to create a new thing and pretend we invented this – and then try to sell it to you as our new Terraform or OpenTofu replacement for your management tier.”

HugOps Corner – Previously Known as AWS

41:08 AWS outage hits major apps and services, resurfacing old questions about

cloud redundancy – GeekWire

AWS US-EAST-1 experienced a major outage starting after midnight Pacific on Monday, caused by DNS resolution issues with DynamoDB that prevented proper address lookup for database services, impacting thousands of applications, including Facebook, Snapchat, Coinbase, ChatGPT, and Amazon’s own services.
The outage highlighted ongoing redundancy concerns as many organizations failed to implement proper failover to other regions or cloud providers, despite similar incidents in US-EAST-1 in 2017, 2021, and 2023, raising questions about single-region dependency for critical infrastructure.
AWS identified the root cause as an internal subsystem responsible for monitoring network load balancer health, with core DNS issues resolved by 3:35 AM Pacific, though Lambda backlog processing and EC2 instance launch errors persisted through the morning recovery period.
Real-world impacts included LaGuardia Airport check-in kiosk failures, causing passenger lines, widespread disruption to financial services (Venmo, Robinhood), gaming platforms (Roblox, Fortnite), and productivity tools (Slack, Canva), demonstrating the cascading effects of cloud provider outages.
The incident underscores the importance of multi-region deployment strategies and proper disaster recovery planning for AWS customers, particularly those using US-EAST-1 as their primary region due to its status as AWS’s oldest and largest data center location.
We have a couple of observations: this one took a LONG time to resolve, including hours before the DNS was restored. Maybe they’re out of practice? Maybe it’s a people problem? Hopefully, this isn’t the new norm as some of the talent have been let go/moved on.

17:53 Ryan – “If it’s a DNS resolution issue that’s causing a global outage, that’s not exactly straightforward. It’s not just a bug, you know, or a function returning the wrong value, or that you’re looking at global propagation, you’re looking at clients in different places, resolving different things, at the base parts of the internet for functionality. And so it does take a pretty experienced engineer to sort of have that in their heads conceptually in to order to troubleshoot. I wonder if that’s really the cause, where they’re not able to recover as fast. But I also feel like cloud computing has come a long way, and the impact was very widely felt because a lot more people are using AWS as their hosting provider than I think have been in the past. A little bit of everything, I think.”

AWS outage was not due to a cyberattack — but shows potential for ‘far worse’ damage – GeekWire

AWS’s US-EAST-1 region experienced an outage due to an internal monitoring subsystem failure affecting network load balancers, impacting major services including Facebook, Coinbase, and LaGuardia Airport check-in systems.
The issue was related to DNS resolution problems with DynamoDB, not a cyberattack.
The incident highlights ongoing single-region dependency issues, as US-EAST-1 remains AWS’s largest region and has caused similar widespread disruptions in 2017, 2021, and 2023. Many organizations still lack proper multi-region failover despite repeated outages from this location.
Industry experts warn that the outage demonstrates vulnerability to potential targeted attacks on cloud infrastructure monoculture. The concentration of services on single providers creates systemic risk similar to agricultural monoculture, where one failure can cascade widely.
The failure occurred at the control-plane level, suggesting AWS should implement more aggressive isolation of critical networking components. This may accelerate enterprise adoption of multi-cloud and multi-region architectures as baseline resilience requirements.
AWS resolved the issue within hours but the incident reinforces that even major cloud providers remain vulnerable to cascading failures when core monitoring and health check systems malfunction, affecting downstream services across their infrastructure.

Today is when Amazon’s brain drain finally caught up with AWS • The Register

AWS experienced a major outage on October 20, 2025 in US-EAST-1 region caused by DNS resolution failures for DynamoDB endpoints, taking 75 minutes just to identify the root cause and impacting banking, gaming, social media, and government services across much of the internet.
The incident highlights concerns about AWS’s talent retention, with 27,000+ Amazon layoffs between 2022-2025 and internal documents showing 69-81% regretted attrition, suggesting loss of senior engineers who understood complex failure modes and had institutional knowledge of AWS systems.
DynamoDB’s role as a foundational service meant the DNS failure created cascading impacts across multiple AWS services, demonstrating the risk of centralized dependencies in cloud architectures and the importance of regional redundancy for critical workloads.
AWS’s status page showed “all is well” for the first 75 minutes of the outage, continuing a pattern of slow incident communication that AWS has acknowledged as needing improvement in multiple previous post-mortems from 2011, 2012, and 2015.
The article suggests this may be a tipping point where the loss of experienced staff who built these systems is beginning to impact AWS’s legendary operational excellence, with predictions that similar incidents may become more frequent as institutional knowledge continues to leave.

-And that’s an end to Hugops. Moving on to the rest of AWS-

23:58 Monitor, analyze, and manage capacity usage from a single interface with \Amazon EC2 Capacity Manager | AWS News Blog

EC2 Capacity Manager provides a single dashboard to monitor and manage EC2 capacity across all accounts and regions, eliminating the need to collect data from multiple AWS services like Cost and Usage Reports, CloudWatch, and EC2 APIs.
Available at no additional cost in all commercial AWS regions.
The service aggregates capacity data with hourly refresh rates for On-Demand Instances, Spot Instances, and Capacity Reservations, displaying utilization metrics by vCPUs, instance counts, or estimated costs based on published On-Demand rates.
Key features include automated identification of underutilized Capacity Reservations with specific utilization percentages by instance type and AZ, plus direct modification capabilities for ODCRs within the same account.
Data exports to S3 extend analytics beyond the 90-day console retention period, enabling long-term capacity trend analysis and integration with existing BI tools or custom reporting systems.
Organizations can enable cross-account visibility through AWS Organizations integration, helping identify optimization opportunities like redistributing reservations between development accounts showing 30% utilization and production accounts exceeding 95%.

25:45 Ryan – “This is kind of nice to have it built in and just have it be plug and play – especially when it’s at no cost.”

26:21 New Amazon EKS Auto Mode features for enhanced security, network control, and performance | Containers

EKS Auto Mode now supports EC2 On-Demand Capacity Reservations and Capacity Blocks for ML, allowing customers to target pre-purchased capacity for AI/ML workloads requiring guaranteed access to specialized instances like P5s. This addresses the challenge of GPU availability for training jobs without over-provisioning.
New networking capabilities include separate pod subnets for isolating infrastructure and application traffic, explicit public IP control for enterprise security compliance, and forward proxy support with custom certificate bundles. These features enable integration with existing enterprise network architectures without complex CNI customizations.
Complete AWS KMS encryption now covers both ephemeral storage and root volumes using customer-managed keys, addressing security audit findings that previously flagged unencrypted storage.
This eliminates the need for custom AMIs or manual certificate distribution.
Performance improvements include multi-threaded node filtering and intelligent capacity management that can automatically relax instance diversity constraints during capacity shortages.
These optimizations particularly benefit time-sensitive applications and AI/ML workloads requiring rapid scaling.
EKS Auto Mode is available for new clusters or can be enabled on existing EKS clusters running Kubernetes 1.29+, with migration guides available for teams moving from Managed node groups, Karpenter, or Fargate.
Pricing follows standard EKS pricing at $0.10 per cluster per hour plus EC2 instance costs.

27:33 Ryan – “This just highlights how terrible it was before.”

29:33 Amazon EC2 now supports Optimize CPUs for license-included instances

EC2 now lets customers reduce vCPU counts and disable hyperthreading on Windows Server and SQL Server license-included instances, enabling up to 50% savings on vCPU-based licensing costs while maintaining full memory and IOPS performance.
This feature targets database workloads that need high memory and IOPS but fewer vCPUs – for example, an r7i.8xlarge instance can be reduced from 32 to 16 vCPUs while keeping its 256 GiB memory and 40,000 IOPS.
The CPU optimization extends EC2’s existing Optimize CPUs feature to license-included instances, addressing a common pain point where customers overpay for Microsoft licensing due to fixed vCPU counts.
Available now in all commercial AWS regions and GovCloud regions, with no additional charges beyond the adjusted licensing costs based on the modified vCPU count.
This positions AWS competitively against Azure for SQL Server workloads by offering more granular control over licensing costs, particularly important as organizations migrate legacy database workloads to the cloud.
Interested in CPU options? Check those out here.

30:20 Justin – “This is a little weird to me, because I thought this already existed.”

31:46 AWS Systems Manager Patch Manager launches security updates notification for Windows

AWS Systems Manager Patch Manager now includes an “AvailableSecurityUpdate” state that identifies Windows security patches available but not yet approved by patch baseline rules, helping prevent accidental exposure from delayed patch approvals.
The feature addresses a specific operational risk where administrators using ApprovalDelay with extended timeframes could unknowingly leave systems vulnerable, with instances marked as Non-Compliant by default when security updates are pending.
Available across all AWS Systems Manager regions with no additional charges beyond standard pricing, the feature integrates directly into existing patch baseline configurations through the console at https://console.aws.amazon.com/systems-manager/patch-manager.
Organizations can customize compliance reporting behavior to maintain existing workflows while gaining visibility into security patch availability across their Windows fleet, particularly useful for enterprises with complex patch approval processes.
The update provides a practical solution for balancing security requirements with operational stability, allowing teams to maintain patch deployment schedules while staying informed about critical security updates awaiting approval.

30:20 Ryan – “It sounds like just a quality of life improvement, but it’s something that should be so basic, but isn’t there, right? Which is like Windows patch management is cobbled together and not really managed well, and so you could have a patch available, but the only way to find out that it was available previously to this was to actually go ahead and patch it and then see if it did something. And so now, at least you have a signal on that; you can apply your patches in a way that’s not going to take down your entire service if a patch goes wrong. So this is very nice. I think for people using the Systems Manager patch management, they’re going to be very happy with this.”

35:26 Introducing CLI Agent Orchestrator: Transforming Developer CLI Tools into a Multi-Agent Powerhouse | AWS Open Source Blog

AWS introduces CLI Agent Orchestrator (CAO), an open source framework that enables multiple AI-powered CLI tools like Amazon Q CLI and Claude Code to work together as specialized agents under a supervisor agent, addressing limitations of single-agent approaches for complex enterprise development projects.
CAO uses hierarchical orchestration with tmux session isolation and Model Context Protocol servers to coordinate specialized agents – for example, orchestrating Architecture, Security, Performance, and Test agents simultaneously during mainframe modernization projects.
The framework supports three orchestration patterns (Handoff for synchronous transfers, Assign for parallel execution, Send Message for direct communication) plus scheduled runs using cron-like automation, with all processing occurring locally for security and privacy.
Currently supports Amazon Q Developer CLI and Claude Code with planned expansion to OpenAI Codex CLI, Gemini CLI, Qwen CLI, and Aiden – no pricing mentioned as it’s open source, available at github.com/awslabs/cli-agent-orchestrator.
Key use cases include multi-service architecture development, enterprise migrations requiring parallel implementation, comprehensive research workflows, and multi-stage quality assurance processes that benefit from coordinated specialist agents.
We definitely appreciate another tool in the Agent Orchestration world.

37:46 Amazon ECS now publishes AWS CloudTrail data events for insight into API activities

Amazon ECS now publishes CloudTrail data events for ECS Agent API activities, enabling detailed monitoring of container instance operations, including polling (ecs: Poll), telemetry sessions (ecs: StartTelemetrySession), and managed instance logging (ecs: PutSystemLogEvents).
Security and operations teams gain comprehensive audit trails to detect unusual access patterns, troubleshoot agent communication issues, and understand how container instance roles are utilized for compliance requirements.
The feature uses the new data event resource type AWS::ECS::ContainerInstance and is available for ECS on EC2 in all AWS regions, with ECS Managed Instances supported in select regions.
Standard CloudTrail data event charges apply – typically $0.10 per 100,000 events recorded, making this a cost-effective solution for organizations needing detailed container instance monitoring.
This addresses a previous visibility gap in ECS operations, as teams can now track agent-level activities that were previously opaque, improving debugging capabilities and security posture for containerized workloads.

39:33 Ryan – “This is definitely something I would use sparingly because the UCS API is agent API chatting. So this seems like it would be very expensive, very fast.”

GCP

41:22 G4 VMs powered by NVIDIA RTX 6000 Blackwell GPUs are GA | Google Cloud Blog

Google Cloud launches G4 VMs with NVIDIA RTX 6000 Blackwell GPUs, offering up to 9x throughput improvement over G2 instances and supporting workloads from AI inference to digital twin simulations with configurations of 1, 2, 4, or 8 GPUs.
The G4 VMs feature enhanced PCIe-based peer-to-peer data paths that deliver up to 168% throughput gains and 41% lower latency for multi-GPU workloads, addressing the bottleneck issues common in serving large generative AI models that exceed single GPU memory limits.
Each GPU provides 96GB of GDDR7 memory (up to 768GB total), native FP4 precision support, and Multi-Instance GPU capability that allows partitioning into 4 isolated instances, enabling efficient serving of models from under 30B to over 100B parameters.
NVIDIA Omniverse and Isaac Sim are now available on Google Cloud Marketplace as turnkey solutions for G4 VMs, enabling immediate deployment of industrial digital twin and robotics simulation applications with full integration across GKE, Vertex AI, Dataproc, and Cloud Run.
G4 VMs are available immediately with broader regional availability than previous GPU offerings, though specific pricing details were not provided in the announcement – customers should contact Google Cloud sales for cost information. (AKA $$$$.)

43:03 Dataproc 2.3 on Google Compute Engine | Google Cloud Blog

Dataproc 2.3 introduces a lightweight, FedRamp High-compliant image that contains only essential Spark and Hadoop components, reducing CVE exposure and meeting strict security requirements for organizations handling sensitive data.
Optional components like Flink, Hive WebHCat, and Ranger are now deployed on-demand during cluster creation rather than pre-packaged, keeping clusters lean by default while maintaining full functionality when needed.
Custom images allow pre-installation of required components to reduce cluster provisioning time while maintaining the security benefits of the lightweight base image.
The image supports multiple operating systems, including Debian 12, Ubuntu 22, and Rocky 9, with deployment as simple as specifying version 2.3 when creating clusters via gcloud CLI.
Google employs automated CVE scanning and patching combined with manual intervention for complex vulnerabilities to maintain compliance standards and security posture.

44:14 Ryan – “But on the contrary, like FedRAMP has such tight SLAs for vulnerability management that you don’t have to carry this risk or request an exception because of Google not patching Flink as fast as you would like them to. At least this puts the control at the end user, where they can say, well, I’m not going to use it.”

44:45 BigQuery Studio gets improved console interface | Google Cloud Blog

BigQuery Studio’s new interface introduces an expanded Explorer view that allows users to filter resources by project and type, with a dedicated search function that spans across all BigQuery resources within an organization – addressing the common pain point of navigating through large-scale data projects.
The Reference panel provides context-aware information about tables and schemas directly within the code editor, eliminating the need to switch between tabs or run exploratory queries just to check column names or data types – particularly useful for data analysts writing complex SQL queries.
Google has streamlined the workspace by moving job history to a dedicated tab accessible from the Explorer pane and removing the bottom panel clutter, while also allowing users to control tab behavior with double-click functionality to prevent unwanted tab replacements.
The update includes code generation capabilities where clicking on table elements in the Reference panel automatically inserts query snippets or field names into the editor, reducing manual typing errors and speeding up query development workflows.
This interface refresh targets data analysts, data engineers, and data scientists who need efficient navigation across multiple BigQuery projects and datasets – no pricing changes mentioned as this appears to be a UI update to the existing BigQuery Studio service.

46:00 Ryan – “Although I’m a little nervous about having all the BigQuery resources across an organization available on a single console, just because it sounds like a permissions nightmare.”

47:10 Manage your prompts using Vertex SDK | Google Cloud Blog

Google launches GA of Prompt Management in Vertex AI SDK, enabling developers to create, version, and manage prompts programmatically through Python code rather than tracking them in spreadsheets or text files.
The feature provides seamless integration between Vertex AI Studio’s visual interface for prompt design and the SDK for programmatic management, with prompts stored as centralized resources within Google Cloud projects for team collaboration.
Enterprise security features include Customer-Managed Encryption Keys (CMEK) and VPC Service Controls (VPCSC) support, addressing compliance requirements for organizations handling sensitive data in their AI applications.
Key use cases include teams building production generative AI applications that need version control, consistent prompt deployment across environments, and the ability to programmatically update prompts without manual code changes.
Pricing follows standard Vertex AI model usage rates with no additional charges for prompt management itself; documentation available at cloud.google.com/vertex-ai/generative-ai/docs/model-reference/prompt-classes.

47:43 Justin – “If your prompt has sensitive data in it, I have questions already.”

49:05 Gemini Code Assist in GitHub for Enterprises | Google Cloud Blog

Google launches Gemini Code Assist for GitHub Enterprise, bringing AI-powered code reviews to enterprise customers using GitHub Enterprise Cloud and on-premises GitHub Enterprise Server.
This addresses the bottleneck where 60.2% of organizations take over a day for code changes to reach production due to manual review processes.
The service provides organization-level controls, including centralized custom style guides and org-wide configuration settings, allowing platform teams to enforce coding standards automatically across all repositories.
Individual teams can still customize repo-level settings while maintaining organizational baselines.
Built under Google Cloud Terms of Service, the enterprise version ensures code prompts and model responses are stateless and not stored, with Google committing not to use customer data for model training without permission. This addresses enterprise security and compliance requirements for AI-assisted development.
Currently in public preview with access through the Google Cloud Console, the service includes a higher pull request quota than the individual developer tier. Google is developing additional features, including agentic loop capabilities for automated issue resolution and bug fixing.
This release complements the recently launched Code Review Gemini CLI Extension for terminal-based AI assistance and represents part of Google’s broader strategy to provide AI assistance across the entire software development lifecycle.
Pricing details are not specified in the announcement.

51:08 Ryan – “It’s just sort of the ability to sort of do organizational-wide things is super powerful for these tools, and I’m just sort of surprised that GitHub allows that. It seems like they would have to develop API hooks and externalize that.”

53:19 Vertex AI context caching | Google Cloud Blog

Vertex AI context caching reduces costs by 90% for repeated content in Gemini models by storing precomputed tokens – implicit caching happens automatically, while explicit caching gives developers control over what content to cache for predictable savings
The feature supports caching from 2,048 tokens up to Gemini 2.5 Pro’s 1 million token context window across all modalities (text, PDF, image, audio, video) with both global and regional endpoint support
Key use cases include document processing for financial analysis, customer support chatbots with detailed system instructions, codebase Q&A for development teams, and enterprise knowledge base queries
Implicit caching is enabled by default with no code changes required and clears within 24 hours, while explicit caching charges standard input token rates for initial caching, then a 90% discount on reuse, plus hourly storage fees based on TTL.
Integration with Provisioned Throughput ensures production workloads benefit from caching, and explicit caches support Customer Managed Encryption Keys (CMEK) for additional security compliance

54:18 Ryan – “This is awesome. If you have a workload where you’re gonna have very similar queries or prompts and have it return similar data, this is definitely nicer than having to regenerate that every time. They’ve been moving more and more towards this. And I like to see it sort of more at a platform level now, whereas you could sort of implement this – in a weird way – directly in a model, like in a notebook or something. This is more of a ‘turn it on and it works’.”

55:30 Cloud Armor named Strong Performer in Forrester WAVE, new features launched

Cloud Armor introduces hierarchical security policies (GA) that enable WAF and DDoS protection at the organization, folder, and project levels, allowing centralized security management across large GCP deployments with consistent policy enforcement.
Enhanced WAF inspection capability (preview) expands request body inspection from 8KB to 64KB for all preconfigured rules, improving detection of malicious content hidden in larger payloads while maintaining performance.
JA4 network fingerprinting support (GA) provides advanced SSL/TLS client identification beyond JA3, offering deeper behavioral insights for threat hunting and distinguishing legitimate traffic from malicious actors.
Organization-scoped address groups (GA) enable IP range list management across multiple security policies and products like Cloud Next Generation Firewall, reducing configuration complexity and duplicate rules.
Cloud Armor now protects Media CDN with Network Threat Intelligence and ASN blocking capabilities (GA), defending media assets at the network edge against known malicious IPs and traffic patterns.

56:59 Ryan – “These are some pretty advanced features for a cloud platform provided WAF. It’s pretty cool.”

Azure

58:44 Generally Available: Observed capacity metric in Azure Firewall

Azure Firewall’s new observed capacity metric provides real-time visibility into capacity unit utilization, helping administrators track actual scaling behavior versus provisioned capacity for better resource optimization and cost management.
This observability enhancement addresses a common blind spot where teams over-provision firewall capacity due to uncertainty about actual usage patterns, potentially reducing unnecessary Azure spending on unused capacity units.
The metric integrates with Azure Monitor and existing alerting systems, enabling proactive capacity planning and automated scaling decisions based on historical utilization trends rather than guesswork.
Target customers include enterprises with variable traffic patterns and managed service providers who need granular visibility into firewall performance across multiple client deployments to optimize resource allocation.
While pricing remains unchanged for Azure Firewall itself (starting at $1.25/hour plus $0.016/GB processed), the metric helps justify right-sizing decisions that could significantly impact monthly costs for organizations running multiple firewall instances.

Generally Available: Prescaling in Azure Firewall

Azure Firewall prescaling allows administrators to reserve capacity units in advance for predictable traffic spikes like holiday shopping seasons or product launches, eliminating the lag time typically associated with auto-scaling firewall resources.
This feature addresses a common pain point where Azure Firewall’s auto-scaling couldn’t respond quickly enough to sudden traffic surges, potentially causing performance degradation during critical business events.
Prescaling integrates with Azure’s existing capacity planning tools and can be configured through Azure Portal, PowerShell, or ARM templates, making it accessible for both manual and automated deployment scenarios.
Target customers include e-commerce platforms, streaming services, and any organization with predictable traffic patterns that require guaranteed firewall throughput during peak periods.
While specific pricing wasn’t detailed in the announcement, prescaling will likely follow Azure Firewall’s existing pricing model where customers pay for provisioned capacity units, with costs varying by region and SKU tier.
When you combine these two announcements, they’re pretty good!

1:01:35 Public Preview: Environmental sustainability features in Azure API Management

Azure API Management introduces carbon-aware capabilities that allow organizations to route API traffic and adjust policy behavior based on carbon intensity data, helping reduce the environmental impact of API infrastructure operations.
The feature enables developers to implement sustainability-focused policies such as throttling non-critical API calls during high carbon intensity periods or routing traffic to regions with cleaner energy grids.
This aligns with Microsoft’s broader carbon negative commitment by 2030 and provides enterprises with tools to measure and reduce the carbon footprint of their digital services at the API layer.
Target customers include organizations with ESG commitments and sustainability reporting requirements who need granular control over their cloud infrastructure’s environmental impact.
Pricing details are not yet available for the preview, but the feature integrates with existing API Management tiers and will likely follow consumption-based pricing models when generally available.

1:02:44 Matt – “So APIMs are one, stupidly expensive. If you have to be on the premier tier, it’s like $2,700 a month. And then if you want HA, you have to have two of them. So whatever they’re doing to the hood is stupidly expensive. If you ever had to deal with the SharePoint, they definitely use them because I’ve hit the same error codes as we provide to customers. On the second side, when you do scale them, you can scale them to be multi-region APIMs in the paired region concept, so in theory, what you can do based on this is route a cheaper or more environmentally efficient one, you could route to your paired region and then have the traffic coming that way.”

1:06:09 Unlock insights about your data using Azure Storage Discovery

Azure Storage Discovery is now generally available as a fully managed service that provides enterprise-wide visibility into data estates across Azure Blob Storage and Data Lake Storage, helping organizations optimize costs, ensure security compliance, and improve operational efficiency across multiple subscriptions and regions.
The service integrates Microsoft Copilot in Azure to enable natural language queries for storage insights, allowing non-technical users to ask questions like “Show me storage accounts with default access tier as Hot above 1TiB with least transactions” and receive actionable visualizations without coding skills. Because a non-technical person is asking this question. In the ever-wise words of Marcia Brady, “Sure, Jan.”
Key capabilities include 18-month data retention for trend analysis, insights across capacity, activity, security configurations, and errors, with deployment taking less than 24 hours to generate initial insights from 15 days of historical data.
Pricing includes a free tier with basic capacity and configuration insights retained for 15 days, while the standard plan adds advanced activity, error, and security insights with 18-month retention – specific pricing varies by region at azure.microsoft.com/pricing/details/azure-storage-discovery.
Target use cases include identifying cost optimization opportunities through access tier analysis, ensuring security best practices by highlighting accounts still using shared access keys, and managing data redundancy requirements across global storage estates.

1:08:35 Ryan – “Well, I’ll tell you when I was looking for this report, I had a lot of natural language – and I was shouting it at my computer.”

1:09:52 Sora 2 in Azure AI Foundry: Create videos with responsible AI | Microsoft Azure Blog

Azure AI Foundry now offers OpenAI’s Sora 2 video generation model in public preview, enabling developers to create videos from text, images, and existing video inputs with synchronized audio in multiple languages.
The platform provides a unified environment combining Sora 2 with other generative models like GPT-image-1 and Black Forest Lab’s Flux 1.1, all backed by Azure’s enterprise security and content filtering for both inputs and outputs.
Key capabilities include realistic physics simulation, detailed camera control, and creative features for marketers, retailers, educators, and creative directors to rapidly prototype and produce video content within existing business workflows.
Sora 2 is currently available via API through Standard Global deployment in Azure AI Foundry, with pricing details available on the Azure AI Foundry Models page.
Microsoft positions this as part of their responsible AI approach, embedding safety controls and compliance frameworks to help organizations innovate while maintaining governance over generated content.
We’re not big fans of this one.

1:10:12 Grok 4 is now available in Microsoft Azure AI Foundry | Microsoft Azure Blog

Microsoft brings xAI’s Grok 4 model to Azure AI Foundry, featuring a 128K-token context window, native tool use, and integrated web search capabilities. The model emphasizes first-principles reasoning with a “think mode” that breaks down complex problems step-by-step, particularly excelling at math, science, and logic puzzles.
Grok 4’s extended context window allows processing of entire code repositories, lengthy research papers, or hundreds of pages of documents in a single query. This eliminates the need to manually chunk large inputs and enables comprehensive analysis across massive datasets without losing context.
Azure AI Content Safety is enabled by default for Grok 4, addressing enterprise concerns about responsible AI deployment. Microsoft and xAI conducted extensive safety testing and compliance checks over the past month to ensure business-ready protection layers.
Pricing starts at $2 per million input tokens and $10 per million output tokens for Grok 4, with faster variants available at lower costs.
The family includes Grok 4 Fast Reasoning for analytical tasks, Fast Non-Reasoning for lightweight operations, and Grok Code Fast 1 specifically for programming workflows.
The model’s real-time data integration allows it to retrieve and incorporate external information beyond its training data, functioning as an autonomous research assistant. This capability is particularly valuable for tasks requiring current information like market analysis or regulatory updates.

1:11:04 Generally Available: Enhanced cloning and Public IP retention scripts for Azure Application Gateway migration

Azure releases PowerShell scripts to help customers migrate from Application Gateway V1 to V2 before the April 2026 retirement deadline, addressing a critical infrastructure transition need.
The enhanced cloning script preserves configurations during migration while the Public IP retention script ensures customers can maintain their existing IP addresses, minimizing disruption to production workloads.
This migration tooling targets enterprises running legacy Application Gateway Standard or WAF SKUs who need to upgrade to Standard_V2 or WAF_V2 for continued support and access to newer features.
The scripts automate what would otherwise be a complex manual migration process, reducing the risk of configuration errors and downtime during the transition.
Customers should begin planning migrations now as the 2026 deadline approaches, with these scripts providing a standardized path forward for maintaining application delivery infrastructure.
You know would be even easier than PowerShell? How about just doing it for them? Too easy?
(Listener alert: This time it’s a Matt rant.)

Oracle

1:14:59 Oracle Expands AI Agent Studio for Fusion Applications with New Marketplace, LLMs, and Vast Partner Network

Oracle AI Agent Studio expands with new marketplace LLMs and partner integrations for Fusion Applications, allowing customers to build AI agents using models from Anthropic, Cohere, Meta, and others alongside Oracle’s own models.
The platform enables the creation of AI agents that can automate tasks across Oracle Fusion Cloud Applications, including ERP, HCM, and CX, with pre-built templates and low-code development tools for business users.
Oracle is partnering with major consulting firms like Accenture, Deloitte, and Infosys to help customers implement AI agents, though this likely means significant professional services costs for most deployments.
The AI agents can handle tasks like expense report processing, supplier onboarding, and customer service inquiries, with Oracle claiming reduced manual work by up to 50% in some use cases.
Pricing details remain unclear, but the service requires Oracle Fusion Applications subscriptions and likely additional fees for LLM usage and agent deployment based on Oracle’s typical pricing model.

1:15:45 Ryan – “They’re partnering with these giant firms that will come in with armies of engineers who will build you a thing – and hopefully document it before running away.”

Closing

326: Oracle Discovers the Dark Side (And Finally Has Cookies)

Thu, 23 Oct 2025 05:07:07 +0000

Welcome to episode 326 of The Cloud Pod, where the forecast is always cloudy! Justin and Ryan are your guides to all things cloud and AI this week! We’ve got news from SonicWall (and it’s not great), a host of goodbyes to say over at AWS, Oracle (finally) joins the dark side, and even Slurm – and you don’t even need to ride on a creepy river to experience it. Let’s get started!

Titles we almost went with this week

SonicWall’s Cloud Backup Service: From 5% to Oh No, That’s Everyone
AWS Spring Cleaning: 19 Services Get the Boot
The Great AWS Service Purge of 2025
Maintenance Mode: Where Good Services Go to Die
GitHub Gets Assimilated: Resistance to Azure Migration is Futile
Salesforce to Ransomware Gang: You Can’t Always Get What You Want
Kansas City Gets the Need for Speed with 100G Direct Connect. Peter, what are you up too
Gemini Takes the Wheel: Google’s AI Learns to Click and Type
Oracle Discovers the Dark Side (Finally Has Cookies)
Azure Goes Full Blackwell: 4,600 Reasons to Upgrade Your GPU Game
DataStax to the Future: AWS Hires Database CEO for Security Role
The Clone Wars: EBS Strikes Back with Instant Volume Copies
Slurm Dunk: AWS Brings HPC Scheduling to Kubernetes
The Great Cluster Convergence: When Slurm Met EKS
Codex sent me a DM that I’ll ignore too on Slack

General News

01:24 SonicWall: Firewall configs stolen for all cloud backup customers

SonicWall confirmed that all customers using their cloud backup service had firewall configuration files exposed in a breach, expanding from their initial estimate of 5% to 100% of cloud backup users. That’s a big difference…
The exposed backup files contain AES-256-encrypted credentials and configuration data, which could include MFA seeds for TOTP authentication, potentially explaining recent Akira ransomware attacks that bypassed MFA.
SonicWall requires affected customers to reset all credentials, including local user passwords, TOTP codes, VPN shared secrets, API keys, and authentication tokens across their entire infrastructure.
This incident highlights a fundamental security risk of cloud-based configuration backups where sensitive credentials are stored centrally, making them attractive targets for attackers.
The breach demonstrates why WebAuthn/passkeys offer superior security architecture since they don’t rely on shared secrets that can be stolen from backups or servers.
Interested in checking out their detailed remediation guidance? Find that here.

02:36 Justin – “You know, providing your own encryption keys is also good; not allowing your SaaS vendor to have the encryption key is a positive thing to do. There’s all kinds of ways to protect your data in the cloud when you’re leveraging a SaaS service.”

04:43 Take this rob and shove it! Salesforce issues stern retort to ransomware extort

Salesforce is refusing to pay ransomware demands from criminals claiming to have stolen nearly 1 billion customer records, stating they will not engage, negotiate with, or pay any extortion demand.
This firm stance sets a precedent for how major cloud providers handle ransomware attacks.
The stolen data appears to be from previous breaches rather than new intrusions, specifically from when ShinyHunters compromised Salesloft’s Drift application earlier this year.
The attackers used stolen OAuth tokens to access multiple companies’ Salesforce instances.
The incident highlights the security risks of third-party integrations in cloud environments, as the breach originated through a compromised integration app rather than Salesforce’s core platform.
This demonstrates how supply chain vulnerabilities can expose customer data across multiple organizations.
Scattered LAPSUS$ Hunters set an October 10 deadline for payment and offered $10 in Bitcoin to anyone willing to harass executives of affected companies. This unusual tactic shows evolving extortion methods beyond traditional ransomware encryption.
Salesforce maintains there’s no indication their platform has been compromised, and no known vulnerabilities in their technology were exploited. The company is working with external experts and authorities while supporting affected customers through the incident.

06:31 Ryan – “I do also really like Salesforce’s response, just because I feel like the ransomware has gotten a little out of hand, and I think a lot of companies are quiet quietly sort of paying these ransoms, which has only made the attacks just skyrocket. So making a big public show of saying we’re not going to pay for this is, is a good idea.”

AI is Going Great – Or How ML Makes Money

07:06 Introducing AgentKit

OpenAI’s AgentKit provides a framework for building and managing AI agents with simplified deployment and customization options, addressing the growing need for autonomous AI systems in cloud environments.
The tool integrates with existing OpenAI technologies and supports multiple programming languages, enabling developers to create agents that can interact with various cloud services and APIs without extensive infrastructure setup.
AgentKit’s architecture allows for efficient agent lifecycle management, including deployment, monitoring, and behavior customization, which could reduce operational overhead for businesses running AI workloads at scale.
Key use cases include automated customer service agents, data processing pipelines, and intelligent workflow automation that can adapt to changing conditions in cloud-native applications.
This development matters for cloud practitioners as it potentially lowers the barrier to entry for implementing sophisticated AI agents while providing the scalability and reliability expected in enterprise cloud deployments

09:03 Codex Now Generally Available

OpenAI’s Codex is now generally available, offering GPT-3-based AI that’s fine-tuned specifically for code generation and understanding across multiple programming languages. This represents a significant advancement in AI-assisted development tools becoming mainstream.
Several new features, A new Slack integration: Delegate tasks or ask questions to Codex directly from a team channel or thread, just like a coworker
Codex SDK to embed the same agent that powers Codex CLI to your own workflows, tools, and apps for state-of-the-art performance on GPT-5-Codex without more tuning
New Admin tools with environment controls, monitoring, and analytics dashboards. ChatGPT workspace admins now have more control

09:48 Ryan – “I don’t know why, but something about having it available in Slack to boss it around sort of rubs me the wrong way. I feel like it’s the poor new college grad joining the team – it’s just delegated all the crap jobs.”

10:14 Introducing the Gemini 2.5 Computer Use model

Google released Gemini 2.5 Computer Use model via Gemini API, enabling AI agents to interact with graphical user interfaces through clicking, typing, and scrolling actions – available in Google AI Studio and Vertex AI for developers to build automation agents.
The model operates in a loop using screenshots and action history to navigate web pages and applications, outperforming competitors on web and mobile control benchmarks while maintaining the lowest latency among tested solutions.
Built-in safety features include per-step safety service validation and system instructions to prevent high-risk actions like bypassing CAPTCHA or compromising security, with developers able to require user confirmation for sensitive operations.
Early adopters, including Google teams, use it for UI testing and workflow automation, with the model already powering Project Mariner, Firebase Testing Agent, and AI Mode in Search – demonstrating practical enterprise applications.
This represents a shift from API-only interactions to visual UI control, enabling automation of tasks that previously required human interaction like form filling, dropdown navigation, and operating behind login screens.

11:48 Ryan – “I think this is the type of thing that really is going to get AI to be as big as the Agentic model in general; having it be able to understand click and UIs and operate on people’s behalf. It’s going to open up just a ton of use cases for it.”

AWS

12:35 AWS Service Availability Change Announcement

AWS is moving 19 services to maintenance mode starting November 7, 2025, including Amazon Glacier, AWS CodeCatalyst, and Amazon Fraud Detector – existing customers can continue using these services but new customers will be blocked from adoption.
Several migration-focused services are being deprecated, including AWS Migration Hub, AWS Application Discovery Service, and AWS Mainframe Modernization Service, signaling AWS may be consolidating or rethinking its migration tooling strategy.
The deprecation of Amazon S3 Object Lambda and Amazon Cloud Directory suggests AWS is streamlining overlapping functionality – customers will need to evaluate alternatives like Lambda@Edge or AWS Directory Service for similar capabilities.
AWS Snowball Edge Compute Optimized and Storage Optimized entering maintenance indicates AWS is likely pushing customers toward newer edge computing solutions like AWS Outposts or Local Zones for hybrid deployments.
The sunset of specialized services like AWS HealthOmics Variant Store and AWS IoT SiteWise Monitor shows AWS pruning niche offerings that may have had limited adoption or overlapping functionality with other services.

13:53 Ryan – “It’s interesting, because I was a heavy user of CodeGuru and CodeCatalyst for a while, so the announcement I got as a customer was a lot less friendly than maintenance mode. It was like, your stuff’s going to end. So I don’t know if it’s true across all these services, but I know with at least those two. I did not get one for Glacier – because I also have a ton of stuff in Glacier, because I’m cheap.”

17:01 AWS Direct Connect announces 100G expansion in Kansas City, MO

AWS Direct Connect now offers 100 Gbps dedicated connections with MACsec encryption at the Netrality KC1 data center in Kansas City, expanding high-bandwidth private connectivity options in the central US region.
The Kansas City location provides direct network access to all public AWS Regions (except China), AWS GovCloud Regions, and AWS Local Zones, making it a strategic connectivity hub for enterprises in the Midwest.
With 100G connections and MACsec encryption, organizations can achieve lower latency and enhanced security for workloads requiring high throughput, such as data analytics, media processing, or hybrid cloud architectures.
This expansion brings AWS Direct Connect to over 146 locations worldwide, reinforcing AWS’s commitment to providing enterprises with reliable alternatives to internet-based connectivity for mission-critical applications.
For businesses evaluating Direct Connect, the 100G option typically suits large-scale data transfers and enterprises with substantial bandwidth requirements, while the 10G option remains available for more moderate connectivity needs.

18:07 AWS IAM Identity Center now supports customer-managed KMS keys for encryption at rest | AWS News Blog

AWS IAM Identity Center now supports customer-managed KMS keys for encrypting identity data at rest, giving organizations in regulated industries full control over encryption key lifecycle, including creation, rotation, and deletion. This addresses compliance requirements for customers who previously could only use AWS-owned keys.
The feature requires symmetric KMS keys in the same AWS account and region as the Identity Center instance, with multi-region keys recommended for future flexibility. Implementation involves creating the key, configuring detailed permissions for Identity Center services and administrators, and updating IAM policies for cross-account access.
Not all AWS managed applications currently support Identity Center with customer-managed keys – administrators must verify compatibility before enabling to avoid service disruptions. The documentation provides specific policy templates for common use cases, including delegated administrators and application administrators.
Standard AWS KMS pricing applies for key storage and API usage while Identity Center remains free. The feature is available in all AWS commercial regions, GovCloud, and China regions.
Key considerations include the critical nature of proper permission configuration – incorrect setup can disrupt Identity Center operations and access to AWS accounts. Organizations should implement encryption context conditions to restrict key usage to specific Identity Center instances for enhanced security.

18:52 Justin – “Encrypt setup can disrupt Identity Center operations, like revoking your encryption key, might be bad for your access to your cloud. So be careful with this one.”

19:28 New general-purpose Amazon EC2 M8a instances are now available | AWS News Blog

AWS launches M8a instances powered by 5th Gen AMD EPYC Turin processors, delivering up to 30% better performance and 19% better price-performance than M7a instances for general-purpose workloads.
The new instances feature 45% more memory bandwidth and 50% improvements in networking (75 Gbps) and EBS bandwidth (60 Gbps), making them suitable for financial applications, gaming, databases, and SAP-certified enterprise workloads.
M8a introduces instance bandwidth configuration (IBC), allowing customers to flexibly allocate resources between networking and EBS bandwidth by up to 25%, optimizing for specific workload requirements.
Each vCPU maps to a physical CPU core without SMT, resulting in up to 60% faster GroovyJVM performance and 39% faster Cassandra performance compared to M7a instances.
Available in 12 sizes from small to metal-48xl (192 vCPU, 768GiB RAM) across three regions initially, with standard pricing options including On-Demand, Savings Plans, and Spot instances.

20:01 Ryan – “That’s a big one! I still don’t have a use case for it.”

20:09 Announcing Amazon Quick Suite: your agentic teammate for answering questions and taking action | AWS News Blog

Amazon Quick Suite combines AI-powered research, business intelligence, and automation into a single workspace, eliminating the need to switch between multiple applications for data gathering and analysis.
The service includes Quick Research for comprehensive analysis across enterprise and external sources, Quick Sight for natural language BI queries, and Quick Flows/Automate for process automation.
Quick Index serves as the foundational knowledge layer, creating a unified searchable repository across databases, documents, and applications that powers AI responses throughout the suite. This addresses the common enterprise challenge of fragmented data sources by consolidating everything from S3, Snowflake, Google Drive, and SharePoint into one intelligent knowledge base.
The automation capabilities are split between Quick Flows for business users (natural language workflow creation) and Quick Automate for technical teams (complex multi-department processes with approval routing and system integrations).
Both tools generate workflows from simple descriptions, but Quick Automate handles enterprise-scale processes like customer onboarding with advanced orchestration and monitoring.
Existing Amazon QuickSight customers will be automatically upgraded to Quick Suite with all current BI capabilities preserved under the “Quick Sight” branding, maintaining the same data connectivity, security controls, and user permissions. Pricing follows a per-user subscription model with consumption-based charges for Quick Index and optional features.
The service introduces “Spaces” for contextual data organization and custom chat agents that can be configured for specific departments or use cases, enabling teams to create tailored AI assistants connected to relevant datasets and workflows. This allows organizations to scale from personal productivity tools to enterprise-wide deployment while maintaining access controls.

22:13 Justin – “This is a confusing product. It’s doing a lot of things, probably kind of poorly.”

23:13 AWS Strengthens AI Security by Hiring Ex-DataStax CEO As New VP – Business Insider

AWS hired Chet Kapoor, former DataStax CEO, as VP of Security Services and Observability, reporting directly to CEO Matt Garman, to strengthen security offerings as AWS expands its AI business.
Kapoor brings experience from DataStax, where he led Astra DB development and integrated real-time AI capabilities, positioning him to address the security challenges of increasingly complex cloud deployments.
The role consolidates leadership of security services, governance, and operations portfolios under one executive, with teams from Gee Rittenhouse, Nandini Ramani, Georgia Sitaras, and Brad Marshall now reporting to Kapoor.
This hire follows recent AWS leadership changes, including the departures of VP of AI Matt Wood and VP of generative AI Vasi Philomin, signaling AWS’s focus on strengthening AI security expertise.
Kapoor will work alongside AWS CISO Amy Herzog to develop security and observability services that address what Garman describes as changing requirements driven by AI adoption.

26:03 Justin – “Also, DataStax was bought by IBM – and everyone knows that anything bought by IBM will be killed mercilessly.”

26:50 Amazon Bedrock AgentCore is now generally available

Amazon Bedrock AgentCore provides a managed platform for building and deploying AI agents that can execute for up to 8 hours with complete session isolation, supporting any framework like CrewAI, LangGraph, or LlamaIndex, and any model inside or outside Amazon Bedrock.
The service includes five core components: Runtime for execution, Memory for state management, Gateway for tool integration via Model Context Protocol, Identity for OAuth and IAM authorization, and Observability with CloudWatch dashboards and OTEL compatibility for monitoring agents in production.
AgentCore enables agents to communicate with each other through Agent-to-Agent protocol support and securely act on behalf of users with identity-aware authorization, making it suitable for enterprise automation scenarios that require extended execution times and complex tool interactions.
The platform eliminates infrastructure management while providing enterprise features like VPC support, AWS PrivateLink, and CloudFormation templates, with consumption-based pricing and no upfront costs across nine AWS regions.
Integration with existing observability tools like Datadog, Dynatrace, and LangSmith allows teams to monitor agent performance using their current toolchain, while the self-managed memory strategy gives developers control over how agents store and process information.

28:17 Ryan – “This really to me, seems like a full app, you know, like this is a core component instead of doing development; you’re just taking AI agents, putting them together, and giving them tasks. Then, the eight-hour runtime is crazy. It feels like it’s getting warmer in here just reading that.”

28:49 AWS’ Custom Chip Now Powers Most of Its Key AI Cloud Service — The Information

AWS has transitioned the majority of its AI inference workloads to its custom Inferentia chips, marking a significant shift away from Nvidia GPUs for production AI services.
The move demonstrates AWS’s commitment to vertical integration and cost optimization in the AI infrastructure space.
Inferentia chips now handle most inference tasks for services like Amazon Bedrock, SageMaker, and internal AI features across AWS products.
This custom silicon strategy allows AWS to reduce dependency on expensive third-party GPUs while potentially offering customers lower-cost AI inference options.
The shift to Inferentia represents a broader industry trend where cloud providers develop custom chips to differentiate their services and control costs. AWS can now optimize the entire stack from silicon to software for specific AI workloads, similar to Apple’s approach with its M-series chips.
For AWS customers, this transition could mean more predictable pricing and better performance-per-dollar for inference workloads. The custom chips are specifically designed for inference rather than training, making them more efficient for production AI applications.
This development positions AWS to compete more effectively with other cloud providers on AI pricing while maintaining control over its technology roadmap.
Customers running inference-heavy workloads may see cost benefits as AWS passes along savings from reduced reliance on Nvidia hardware

29:39 Ryan – “Explains all the Oracle and Azure Nvidia announcements.”

30:16 Introducing Amazon EBS Volume Clones: Create instant copies of your EBS volumes | AWS News Blog

Amazon EBS Volume Clones enables instant point-in-time copies of encrypted EBS volumes within the same Availability Zone through a single API call, eliminating the previous multi-step process of creating snapshots in S3 and then new volumes.
Cloned volumes are available within seconds with single-digit millisecond latency, though performance during initialization is limited to the lowest of: 3,000 IOPS/125 MiB/s baseline, source volume performance, or target volume performance.
This feature targets development and testing workflows where teams need quick access to production data copies, but it complements rather than replaces EBS snapshots, which remain the recommended backup solution with 11 nines durability in S3.
Pricing includes a one-time fee per GiB of source volume data at initiation, plus standard EBS charges for the new volume, making cost governance important since cloned volumes persist independently until manually deleted.
The feature currently requires encrypted volumes and operates only within the same Availability Zone, supporting all EBS volume types across AWS commercial regions and select Local Zones.

32:06 Running Slurm on Amazon EKS with Slinky | Containers

AWS introduces Slinky, an open source project that lets you run Slurm workload manager inside Amazon EKS, enabling organizations to manage both traditional HPC batch jobs and modern Kubernetes workloads on the same infrastructure without maintaining separate clusters.
The solution deploys Slurm components as Kubernetes pods with slurmctld on general-purpose nodes and slurmd on GPU/accelerated nodes, supporting features like auto-scaling worker pods based on job queues and integration with Karpenter for dynamic EC2 provisioning.
Key benefit is resource optimization – AI inference workloads can scale during business hours while training jobs scale overnight using the same compute pool, with teams able to use familiar Slurm commands (sbatch, srun) alongside Kubernetes APIs.
Slinky provides an alternative to AWS ParallelCluster (self-managed), AWS PCS (managed Slurm), and SageMaker HyperPod (ML-optimized) for organizations already standardized on EKS who need deterministic scheduling for long-running jobs.
The architecture supports custom container images, allowing teams to package specific ML dependencies (CUDA, PyTorch versions) directly into worker pods, eliminating manual environment management while maintaining reproducibility across environments.

GCP

33:09 Introducing Gemini Enterprise | Google Cloud Blog

Google launches Gemini Enterprise as a unified AI platform that combines Gemini models, no-code agent building, pre-built agents, data connectors for Google Workspace and Microsoft 365, and centralized governance through a single chat interface.
This positions Google as offering a complete AI stack, rather than just models or toolkits like competitors.
The platform includes notable integrations with Microsoft 365 and SharePoint environments while offering enhanced features when paired with Google Workspace, including new multimodal agents for video creation (Google Vids with 2.5M monthly users) and real-time speech translation in Google Meet. This cross-platform approach differentiates it from more siloed offerings.
Google introduces next-generation conversational agents with a low-code visual builder supporting 40+ languages, powered by the latest Gemini models for natural voice interactions and deep enterprise integration.
Early adopters like Commerzbank report 70% inquiry resolution rates, and Mercari projects 500% ROI through 20% workload reduction.
The announcement includes new developer tools like Gemini CLI (1M+ developers in 3 months) with extensions from Atlassian, GitLab, MongoDB, and others, plus industry protocols for agent interoperability (A2A), payments (AP2), and model context (MCP).
This creates a foundation infrastructure for an agent economy where developers can monetize specialized agents.
Google’s partner ecosystem includes 100,000+ partners with expanded integrations for Box, Salesforce, ServiceNow, and deployment support from Accenture, Deloitte, and others.
The company also launches Google Skills training platform and GEAR program to train 1 million developers, addressing the critical skills gap in enterprise AI adoption.

35:01 Justin – “I think both Azure and Amazon have similar problems; they are rushing so fast to make products, that they’re creating the same products over and over again, just with slightly different limitations or use cases.”

36:05 Introducing LLM-Evalkit | Google Cloud Blog

Google releases LLM-Evalkit, an open-source framework that centralizes prompt engineering workflows on Vertex AI, replacing the current fragmented approach of managing prompts across multiple documents and consoles.
The tool shifts prompt development from subjective testing to data-driven iteration by requiring teams to define specific problems, create test datasets, and establish concrete metrics for measuring LLM performance.
LLM-Evalkit features a no-code interface designed to democratize prompt engineering, allowing non-technical team members like product managers and UX writers to contribute to the development process.
The framework integrates directly with Vertex AI SDKs and provides versioning, benchmarking, and performance tracking capabilities in a single application, addressing the lack of standardized evaluation processes in current workflows.
Available now on GitHub as an open-source project, with additional evaluation features accessible through the Google Cloud console, though specific pricing details are not mentioned in the announcement.

37:09 Ryan – “Reading through this announcement, it’s solving a problem I had – but I didn’t know I had.”

38:17 Announcing enhancements to Google Cloud NetApp Volumes | Google Cloud Blog

Google Cloud NetApp Volumes now supports iSCSI block storage alongside file storage, enabling enterprises to migrate SAN workloads to GCP without architectural changes.
The service delivers up to 5 GiB/s throughput and 160K IOPS per volume with independent scaling of capacity, throughput, and IOPS.
NetApp FlexCache provides local read caches of remote volumes for distributed teams and hybrid cloud deployments.
This allows organizations to access shared datasets with local-like performance across regions, supporting compute bursting scenarios that require low-latency data access.
The service now integrates with Gemini Enterprise as a data store for RAG applications, allowing organizations to ground AI models on their secure enterprise data without complex ETL processes.
Data remains governed within NetApp Volumes while being accessible for search and inference workflows.
Auto-tiering automatically moves cold data to lower-cost storage at $0.03/GiB for the Flex service level, with configurable thresholds from 2-183 days. Large-capacity volumes now scale from 15TiB to 3PiB with over 21GiB/s throughput per volume for HPC and AI workloads.
NetApp SnapMirror enables replication between on-premises NetApp systems and Google Cloud with zero RPO and near-zero RTO.
This positions GCP competitively against AWS FSx for NetApp ONTAP and Azure NetApp Files for enterprise storage migrations.

40:30 Justin – “I have a specific workload that needs storage, that’s shared across boxes, and iSCSI is a great option for that, in addition to other methods you could use that I’m currently using, which have some sharp edges. So I’m definitely going to do some price calculation models. This might be good, because Google has multi-writer files, like EBS-type solutions, but does not have the performance that I need quite yet.”

Azure

41:08 GitHub Will Prioritize Migrating to Azure Over Feature Development – The New Stack

GitHub is migrating its entire infrastructure from its Virginia data center to Azure within 24 months, with teams being asked to delay feature development to focus on this migration due to capacity constraints from AI and Copilot workloads.
The migration represents a significant shift from GitHub’s previous autonomy since Microsoft’s 2018 acquisition, with GitHub losing independence after CEO Thomas Dohmke’s departure and being folded deeper into Microsoft’s organizational structure.
Technical challenges include migrating GitHub’s MySQL clusters that run on bare metal servers to Azure, which some employees worry could lead to more outages during the transition period, given recent service disruptions.
This positions Azure to capture one of the world’s largest developer platforms as a flagship customer, demonstrating Azure’s ability to handle massive scale workloads while potentially raising concerns among open source developers about tighter Microsoft integration.
The move highlights how AI workloads are straining traditional infrastructure, with GitHub citing “existential” needs to scale for AI and Copilot demands, showing how generative AI is forcing major architectural decisions across the industry.

43:17 Ryan – “I just hope the service stays up; it’s so disruptive to my day job when GitHub has issues.”

43:33 Microsoft 365 services fall over in North America • The Register

Microsoft 365 experienced a North American outage on October 9, lasting just over an hour, caused by misconfigured network infrastructure that affected all services, including Teams, highlighting the fragility of centralized cloud services when configuration errors occur.
This incident followed another Azure outage where Kubernetes crashes took down Azure Front Door instances, suggesting potential systemic issues with Microsoft’s infrastructure management and configuration processes that enterprise customers should factor into their reliability planning.
Users reported that switching to backup circuits restored services, and some attributed issues to AT&T’s network, demonstrating the importance of multi-path connectivity and diverse network providers for mission-critical cloud services.
Microsoft’s response involved rerouting traffic to healthy infrastructure and analyzing configuration policies to prevent future incidents, though the lack of detailed root cause information raises questions about transparency and whether customers have sufficient visibility into infrastructure dependencies.
The back-to-back outages underscore why organizations need robust disaster recovery plans beyond single cloud providers, as even brief disruptions to productivity tools like Teams can significantly impact business operations across entire regions.

44:17 Introducing Microsoft Agent Framework | Microsoft Azure Blog

Microsoft Agent Framework converges AutoGen research project with Semantic Kernel into a unified open-source SDK for orchestrating multi-agent AI systems, addressing the fragmentation challenge as 80% of enterprises now use agent-based AI according to PwC.
The framework enables developers to build locally and then deploy to Azure AI Foundry with built-in observability, durability, and compliance, while supporting integration with any API via OpenAPI and cross-runtime collaboration through Agent2Agent protocol.
Azure AI Foundry now provides unified observability across multiple agent frameworks, including LangChain, LangGraph, and OpenAI Agents SDK, through OpenTelemetry contributions, positioning it as a comprehensive platform compared to AWS Bedrock or GCP Vertex AI’s more limited agent support.
Voice Live API reaches general availability, offering a unified real-time speech-to-speech interface that integrates STT, generative AI, TTS, and avatar capabilities in a single low-latency pipeline for building voice-enabled agents.
New responsible AI capabilities in public preview include task adherence, prompt shields with spotlighting, and PII detection, addressing McKinsey’s finding that the lack of governance tools is the top barrier to AI adoption.

44:48 Justin – “We continue to be in a world of confusion around Agentic and out of control of Agentic things.”

45:54 NVIDIA GB300 NVL72: Next-generation AI infrastructure at scale | Microsoft Azure Blog

Microsoft deployed the first production cluster with over 4,600 NVIDIA GB300 NVL72 systems featuring Blackwell Ultra GPUs, enabling AI model training in weeks instead of months and supporting models with hundreds of trillions of parameters.
This positions Azure as the first cloud provider to deliver Blackwell Ultra at scale for production workloads.
Each ND GB300 v6 VM rack contains 72 GPUs with 130TB/second of NVLink bandwidth and 37TB of fast memory, delivering up to 1,440 PFLOPS of FP4 performance.
The system uses 800 Gbps NVIDIA Quantum-X800 InfiniBand for cross-rack connectivity, doubling the bandwidth of previous GB200 systems.
The infrastructure targets frontier AI workloads, including reasoning models, agentic AI systems, and multimodal generative AI, with OpenAI already using these clusters for training and deploying their largest models.
This gives Azure a competitive edge over AWS and GCP in supporting next-generation AI workloads.
Azure implemented custom cooling systems using standalone heat exchangers and new power distribution models to handle the high energy density requirements of these dense GPU clusters.
The co-engineered software stack optimizes storage, orchestration, and scheduling for supercomputing scale.
While pricing wasn’t disclosed, the scale and specialized nature of these VMs suggest they’ll target enterprise customers and AI research organizations requiring cutting-edge performance for training trillion-parameter models. Azure plans to deploy hundreds of thousands of Blackwell Ultra GPUs globally.

47:24 Ryan – “Pricing isn’t disclosed because it’s the GDP of a small country.”

48:05 Generally Available: CLI command for migration from Availability Sets and basic load balancer on AKS

Thanks for the timely heads up on this one…
Azure introduces a single CLI command to migrate AKS clusters from deprecated Availability Sets to Virtual Machine Scale Sets before the September 2025 deadline, simplifying what would otherwise be a complex manual migration process.
The automated migration upgrades clusters from basic load balancers to standard load balancers, providing improved reliability, zone redundancy, and support for up to 1000 nodes compared to the basic tier’s 100-node limit.
This positions Azure competitively with AWS EKS and GCP GKE, which already use more modern infrastructure patterns by default, though Azure’s migration tool reduces the operational burden for existing customers.
Organizations running production AKS workloads on Availability Sets should prioritize testing this migration in non-production environments first, as the process involves recreating node pools, which could impact running applications.
While the migration itself has no direct cost, customers will see increased charges from standard load balancers (approximately $0.025/hour plus data processing fees) compared to free basic load balancers.

49:01 Ryan – “This is why you drag your feet on getting off of everything.”

Oracle

49:12 Announcing Dark Mode For The OCI Console

Oracle finally joins the dark mode club with OCI Console, following years behind AWS (2017), Azure (2019), and GCP (2020) – a basic UI feature that took surprisingly long for a major cloud provider to implement.
The feature allows users to toggle between light and dark themes in the console settings, with Oracle claiming it reduces eye strain and improves battery life on devices – standard benefits that every other cloud provider has been touting for years.
Dark mode persists across browser sessions and devices when logged into the same OCI account, though Oracle hasn’t specified if this preference syncs across different OCI regions or tenancies.
While this is a welcome quality-of-life improvement for developers working late hours, it highlights Oracle’s ongoing challenge of playing catch-up on basic console features that competitors have long considered table stakes.
The rollout appears to be gradual with no specific timeline mentioned, and Oracle provides no details about API or CLI theme preferences, suggesting this is purely a web console enhancement.

Closing

325: Db2 or Not Db2: That Is the Backup Question

Thu, 16 Oct 2025 20:10:12 +0000

Welcome to episode 325 of The Cloud Pod, where the forecast is always cloudy! Justin is on vacation this week, so it’s up to Ryan and Matthew to bring you all the latest news in cloud and AI, and they definitely deliver! This week we have an AWS invoice undo button, Sora 2, and quite a bit of news DigitalOcean – plus so much more. Let’s get started!

Titles we almost went with this week

AWS Shoots for the Cloud with NBA Partnership
Nothing But Net: AWS Scores Big with Basketball AI Deal
From Courtside to Cloud-side: AWS Dunks on Sports Analytics
PostgreSQL Gets a Gemini Twin for Natural Language Queries
Fuzzy Logic: When Your Database Finally Speaks Your Language
CLI and Let AI: Google’s Natural Language Database Assistant
Satya’s Org Chart Shuffle: Now with More AI Synergy
Microsoft Reorgs Again: This Time It’s Personal (and Commercial)
Ctrl+Alt+Delete: Microsoft Reboots Its Sales Machine
Sora 2: The Sequel Nobody Asked For But Everyone Will Use
OpenAI Puts the “You” in YouTube (AI Edition)
Sam Altman Stars in His Own AI-Generated Reality Show
Grok and Roll: Microsoft’s New AI Model Rocks Azure
To Grok or Not to Grok: That is the Question
Grok Around the Clock: Azure’s 24/7 Reasoning Machine
Spark Joy: Google Lights Up ML Inference for Data Pipelines
DigitalOcean’s Storage Trinity: Hot, Cold, and Backed Up
NFS: Not For Suckers (Network File Storage)
The Goldilocks Storage Strategy: Not Too Hot, Not Too Cold, Just Right
NAT Gonna Cost You: DigitalOcean’s Gateway to Savings
BYOIP: Bring Your Own IP (But Leave Your Billing Worries Behind)
The Great Invoice Escape: No More Support Tickets Required Ctrl+Z for Your AWS Bills: The Undo Button Finance Teams Needed
Image Builder Finally Learns When to Stop Trying
Pipeline Dreams: Now With Built-in Reality Checks
EC2 Image Builder Gets a Failure Intervention Feature
MCP: Model Context Protocol or Marvel Cinematic Protocol?

AI is Going Great – Or How ML Makes Money

00:45 OpenAI’s Sora 2 lets users insert themselves into AI videos with sound – Ars Technica

OpenAI’s Sora 2 introduces synchronized audio generation alongside video synthesis, matching Google’s Veo 3 and Alibaba’s Wan 2.5 capabilities.
This positions OpenAI competitively in the multimodal AI space with what they call their “GPT-3.5 moment for video.”
The new iOS social app feature allows users to insert themselves into AI-generated videos through “cameos,” suggesting potential applications for personalized content creation and social media integration at scale.
Sora 2 demonstrates improved physical accuracy and consistency across multiple shots, addressing previous limitations where objects would teleport or deform unrealistically.
The model can now simulate complex movements like gymnastics routines while maintaining proper physics.
The addition of “sophisticated background soundscapes, speech, and sound effects” expands potential enterprise use cases for automated video production, training materials, and marketing content generation without separate audio post-processing.
This development signals increasing competition in the video synthesis market, with major cloud providers likely to integrate similar capabilities into their AI services portfolios to meet growing demand for automated content creation tools.

02:04 Matt – “So, before, when you could sort of trust social media videos, now you can’t anymore.”

03:25 Jules introduces new tools and API for developers

Google’s Jules AI coding agent now offers command-line access through Jules Tools and an API for direct integration into developer workflows, moving beyond its original chat interface to enable programmatic task automation.
The Jules API allows developers to trigger coding tasks from external systems like Slack bug reports or CI/CD pipelines, enabling automated code generation, bug fixes, and test writing as part of existing development processes.
Recent updates include file-specific context selection, persistent memory for user preferences, and structured environment variable management, addressing reliability issues that previously limited production use.
This positions Jules as a workflow automation tool rather than just a coding assistant, competing with GitHub Copilot and Amazon CodeWhisperer by focusing on asynchronous task execution rather than inline code completion.
The shift to API-based access enables enterprises to integrate AI coding assistance into their existing toolchains without requiring developers to switch contexts or adopt new interfaces.

04:41 Matt – “We’re just adding to the tools; then we need to figure out which one is gong to be actually useful for you.”

05:17 OpenAI Doubles Down on Chip Diversity With AMD, Nvidia Deals –Business Insider

OpenAI signed a multi-year deal with AMD for chips requiring up to 6 gigawatts of power, plus an option to acquire tens of billions in AMD stock, diversifying beyond its heavy reliance on Nvidia GPUs accessed through Microsoft Azure.
The AMD partnership joins recent deals including 10 gigawatts of Nvidia GPUs with $100 billion investment, a Broadcom partnership for custom AI chips in 2025, and a $300 billion Oracle compute deal, signaling OpenAI’s strategy to secure diverse hardware supply chains.
This diversification could benefit the broader AI ecosystem by increasing competition in the AI chip market, potentially lowering prices and reducing supply chain vulnerabilities from geopolitical risks or natural disasters.
AMD expects tens of billions in revenue from the deal, marking a significant validation of their AI technology in a market where Nvidia holds dominant market share, while OpenAI gains negotiating leverage and supply redundancy.
These massive infrastructure investments serve as demand signals for continued AI growth, though they concentrate risk on OpenAI’s success – if OpenAI fails to grow as projected, it could impact multiple chip manufacturers and the broader AI infrastructure buildout.

06:51 Ryan – “I’m stuck on this article sort of gigawatts of power as a unit of measurement for GPU. Like, that’s hilarious to me. we’re just, there’s not this many, not this many GPUs, but like this much in power of GPUs.”

AWS

07:55 AWS to Become the Official Cloud and Cloud AI Partner of the NBA, WNBA, NBA G League, Basketball Africa League and NBA Take-Two Media

AWS becomes the official cloud and AI partner for NBA, WNBA, and affiliated leagues, launching “NBA Inside the Game powered by AWS” – a new basketball intelligence platform that processes billions of data points using Amazon Bedrock and SageMaker to deliver real-time analytics and insights during live games.
The platform introduces AI-powered advanced statistics that analyze 29 data points per player using machine learning to generate previously unmeasurable performance metrics, with initial stats rolling out during the 2025-26 season accessible via NBA App, NBA.com, and Prime Video broadcasts.
Play Finder” technology uses AI to analyze player movements across thousands of games, enabling instant search and retrieval of similar plays for broadcasters and eventually allowing teams direct access to ML models for coaching and front office workflows.
The NBA App, NBA.com, and NBA League Pass will run entirely on AWS infrastructure, supporting global fan engagement with personalized, in-language content delivery while complementing Amazon’s 11-year media rights agreement for 66 regular-season games on Prime Video.
This partnership demonstrates AWS’s expanding role in sports analytics beyond traditional cloud infrastructure, showcasing how AI services like Bedrock and SageMaker can transform real-time data processing for consumer-facing applications at massive scale.

10:51 Ryan – “I do like the AI analytics for sports, like AWS is already in the NFL and F! Racings and it’s sort of a neat add-on when they integrate it.”

12:45 AWS Introduces self-service invoice correction feature

AWS launches self-service invoice correction feature allowing customers to instantly update purchase order numbers, business legal names, and addresses on their invoices through the Billing and Cost Management console without contacting support.
This addresses a common pain point for enterprise customers who need accurate invoices for accounting compliance and reduces manual support ticket volume for AWS teams.
The guided workflow in the console lets customers update both their account settings and select existing invoices, providing immediate corrected versions for download.
Available in all AWS regions except GovCloud and China regions, making it accessible to most commercial AWS customers globally.
Particularly valuable for organizations with strict procurement processes or those who’ve undergone mergers, acquisitions, or address changes that require invoice updates for proper expense tracking.

17:53 EC2 Image Builder now provides enhanced capabilities for managingimage pipelines

EC2 Image Builder now automatically disables pipelines after consecutive failures, preventing unnecessary resource creation and reducing costs from repeatedly failed builds – a practical solution for teams dealing with flaky build processes.
The new custom log group configuration allows teams to set specific retention periods and encryption settings for pipeline logs, addressing compliance requirements and giving better control over log management costs.
This update targets a common pain point where failed image builds could run indefinitely, consuming resources and generating costs without producing usable outputs – particularly valuable for organizations running frequent automated builds.
The features are available at no additional cost across all AWS commercial regions including China and GovCloud, making them immediately accessible for existing Image Builder users through Console, CLI, API, CloudFormation, or CDK.
These enhancements position Image Builder as a more mature CI/CD tool for AMI creation, competing more effectively with third-party solutions by addressing operational concerns around cost control and logging flexibility.

16:22 Matt – “I just like this because it automatically disables the pipeline, and I feel like this is more for all those old things that you forgot about that are running that just keep triggering daily that break at one point – or you hope break, so you actually don’t keep spending the money on them. That’s a pretty nice feature, in my opinion, there where it just stops it from running forever.”

18:26 Open Source Model Context Protocol (MCP) Server now available for AmazonBedrock AgentCore

AWS releases an open-source Model Context Protocol (MCP) server for Amazon Bedrock AgentCore, providing a standardized interface for developers to build, analyze, and deploy AI agents directly in their development environments with one-click installation support for IDEs like Kiro, Claude Code, Cursor, and Amazon Q Developer CLI.
The MCP server enables natural language-driven agent development, allowing developers to iteratively build agents and transform agent logic to work with the AgentCore SDK before deploying to development accounts, streamlining the path from prototype to production.
This integration addresses the complexity of AI agent development by providing a unified protocol that works across multiple development tools, reducing the friction between local development and AWS deployment while maintaining security and scale capabilities.
Available globally via GitHub, the MCP server represents AWS’s commitment to open-source tooling for generative AI development, complementing the broader AgentCore platform which handles secure deployment and operation of AI agents at scale.
For businesses looking to implement AI agents, this reduces development time and technical barriers while maintaining enterprise-grade security and scalability, with pricing following the standard Amazon Bedrock AgentCore model.

20:50 Ryan- “This is one of those things where I’m a team of one right now doing a whole bunch of snowflake development of internal services, and so I’m like, what’s this for? I don’t understand the problem. But I can imagine that this is something that’s really useful more when you’re spreading out against teams so that you can get unification on some of these things, because if you have a team of people all developing separate agents that are, in theory, somehow going to work together…so I think this is maybe a step in that direction.”

22:02 Amazon ECS now supports one-click event capture and event history querying in the AWS Management Console

Amazon ECS adds one-click event capture in the console that automatically creates EventBridge rules and CloudWatch log groups, eliminating manual setup for monitoring task state changes and service events.
The new event history tab provides pre-built query templates for common troubleshooting scenarios like stopped tasks and container exit codes, keeping data beyond the default retention limits without requiring CloudWatch Logs Insights knowledge.
This addresses a long-standing pain point where ECS task events would disappear after tasks stopped, making post-mortem debugging difficult – now operators can query historical events directly from the ECS console with filters for time range, task ID, and deployment ID.
The feature is available in all AWS Commercial and GovCloud regions at standard CloudWatch Logs pricing, making it accessible for teams that need better visibility into container lifecycle events without additional tooling.
For DevOps teams managing production ECS workloads, this simplifies incident response by consolidating event data in one place rather than jumping between multiple AWS consoles to piece together what happened during an outage.

23:14 Jonathan – “It’s a great click ops feature.”

24:04 AWS Knowledge MCP Server now generally available

AWS launches a free MCP (Model Context Protocol) server that provides AI agents and LLM applications direct access to AWS documentation, blog posts, What’s New announcements, and Well-Architected best practices in a format optimized for language models.
The server includes regional availability data for AWS APIs and CloudFormation resources, helping AI agents provide more accurate responses about service availability and reduce hallucinations when answering AWS-related questions.
No AWS account required and available at no cost with rate limits, making it accessible for developers building AI assistants or chatbots that need authoritative AWS information without manual context management.
Compatible with any MCP client or agentic framework supporting the protocol, allowing developers to integrate trusted AWS knowledge into their AI applications through a simple endpoint configuration.
This addresses a common challenge where AI models provide outdated or incorrect AWS information by ensuring responses are anchored in official, up-to-date AWS documentation and best practices.

25:46 Jonathan – “It’s the rate limiting; it’s putting realistic in controls in place, whereas before they would just scrap everything.”

28:48 Automatic quota management is now generally available for AWS Service Quotas

AWS Service Quotas now automatically monitors quota usage and sends proactive notifications through email, SMS, or Slack before customers hit their limits, preventing application interruptions from quota exhaustion.
The feature integrates with AWS Health and CloudTrail events, enabling customers to build automated workflows that respond to quota threshold alerts and potentially request increases programmatically.
This addresses a common operational pain point where teams discover quota limits only after hitting them, causing service disruptions or failed deployments during critical scaling events. (Really though, is there any other way?)
The service is available at no additional cost across all commercial AWS regions, making it accessible for organizations of any size to improve their quota management practices.
For DevOps teams managing multi-account environments, this provides centralized visibility into quota consumption patterns across services, helping predict future needs and plan capacity more effectively.

32:06 Amazon RDS for Db2 launches support for native database backups

RDS for Db2 now supports native database-level backups, allowing customers to selectively back up individual databases within a multi-database instance rather than requiring full instance snapshots. This enables more granular control for migrations and reduces storage costs.
The feature addresses a common enterprise need for moving specific databases between environments – customers can now easily migrate individual databases to another RDS instance or back to on-premises Db2 installations using standard backup commands.
Development teams benefit from the ability to quickly create database copies for testing environments without duplicating entire instances, while compliance teams can maintain separate backup copies of specific databases to meet regulatory requirements.
Cost optimization becomes more achievable as customers only pay for storage of the specific databases they need to back up rather than full instance snapshots, particularly valuable for instances hosting multiple databases where only some require frequent backups.
The feature is available in all regions where RDS for Db2 is offered, with pricing following standard RDS storage rates detailed at aws.amazon.com/rds/db2/pricing.

GCP

34:19 Gemini CLI for PostgreSQL in action | Google Cloud Blog

Google introduces Gemini CLI extension for PostgreSQL that enables natural language database management, allowing developers to implement features like fuzzy search through conversational commands instead of manual SQL configuration and extension management.
The tool automatically identifies appropriate PostgreSQL extensions (like pg_trgm for fuzzy search), checks installation status, handles setup, and generates optimized queries with proper indexing recommendations – reducing typical multi-step database tasks to simple English requests.
Key capabilities include full lifecycle database control from instance creation to user management, automatic code generation based on table schemas, and intelligent schema exploration – positioning it as a database assistant rather than just a command line tool.
This addresses a common developer pain point of context switching between code editors, database clients, and cloud consoles, potentially accelerating feature development for applications requiring advanced PostgreSQL capabilities like search functionality.
Available through GitHub at github.com/gemini-cli-extensions/postgres, this represents Google’s broader push to integrate Gemini AI across their cloud services, though pricing details and performance benchmarks compared to traditional database management approaches aren’t specified.

35:35 Matt – “I really like the potentially increasing people, because they don’t have context switch. It’s like it’s a feature.”

39:01 Google announces new $4 billion investment in Arkansas

Google is investing $4 billion in Arkansas through 2027 to build its first data center in the state at West Memphis, expanding GCP’s regional presence and capacity for cloud and AI workloads in the central US.
The investment includes a 600 MW solar project partnership with Entergy and programs to reduce peak power usage, addressing the growing energy demands of AI infrastructure while improving grid stability.
Google is providing free access to Google AI courses and Career Certificates to all Arkansas residents, starting with University of Arkansas and Arkansas State University students, to build local cloud and AI talent.
The $25 million Energy Impact Fund for Crittenden County residents demonstrates Google’s approach to community investment alongside data center development, potentially setting a model for future expansions.
This positions GCP to better serve customers in the central US with lower latency and regional data residency options, competing with AWS and Azure’s existing presence in neighboring states.

40:25 Ryan – “So per some live research, Walmart is using both Azure and Google as their own private data center infrastructure.”

Azure

43:36 Accelerating our commercial growth

Microsoft is restructuring its commercial organization under Judson Althoff as CEO of commercial business, consolidating sales, marketing, operations, and engineering teams to accelerate AI transformation services for enterprise customers.
The reorganization creates a unified commercial leadership team with shared accountability for product strategy, go-to-market readiness, and sales execution, potentially streamlining how Azure AI services are delivered to customers.
Operations teams now report directly to commercial leadership rather than corporate, which should tighten feedback loops between customer needs and Azure service delivery.
This structural change allows Satya Nadella and engineering leaders to focus on datacenter buildout, systems architecture, and AI innovation while commercial teams handle customer-facing execution.
The move signals Microsoft’s push to position itself as the primary partner for enterprise AI transformation, likely intensifying competition with AWS and Google Cloud for AI workload dominance.

45:47 Matt – “Yeah, I think it’s just the AI. Even our account team changed their name a bunch; they al have AI in their name now.”

46:31 Grok 4 is now available in Microsoft Azure AI Foundry | Microsoft Azure Blog

Microsoft brings xAI’s Grok 4 model to Azure AI Foundry with a 128K token context window, native tool use, and integrated web search capabilities, positioning it as a competitor to GPT-4 and Claude for enterprise reasoning tasks.
The model features “think mode” for first-principles reasoning that breaks down complex problems step-by-step, making it particularly suited for research analysis, tutoring, and troubleshooting scenarios where logical consistency matters.
Pricing starts at $2 per million input tokens and $10 per million output tokens for Grok 4, with faster variants available at lower costs – Grok 4 Fast Reasoning at $0.60/$2.40 and Fast Non-Reasoning at $0.30/$1.20 per million tokens.
Azure AI Content Safety is enabled by default for all Grok models, addressing enterprise concerns about responsible AI deployment while Microsoft continues safety testing and compliance checks.
The extended context window allows processing entire code repositories or hundreds of pages of documents in a single request, reducing the need to manually chunk large inputs for analysis tasks.

48:18 Ryan – “I like competition generally, and so it’s good to see another competitor model developer, but it is it like they’re adding features that are one model behind Anthopic and OpenAI.”

49:06 Microsoft to allow consumer Copilot in corporate environs • The Register

Question one: What?
Microsoft now allows employees to use personal Copilot subscriptions (Personal, Family, or Premium) with work Microsoft 365 accounts, effectively endorsing shadow IT practices while maintaining that enterprise data protections remain intact through Entra identity controls.
IT administrators can disable this feature (which they are rushing to do right now) through cloud policy controls and audit personal Copilot interactions, though the default enablement removes their initial authority over AI tool adoption within their organizations.
This move positions Microsoft to boost Copilot adoption statistics by any means necessary counting personal usage in enterprise environments, while competing AI vendors may view this as Microsoft leveraging its Office dominance to crowd out alternatives.
Government tenants (GCC/DoD) are excluded from this capability, and employees should note that their personal Copilot prompts and responses will be captured and auditable by their employers.
The feature represents Microsoft’s shift from preventing shadow IT to managing it, potentially creating compliance challenges for organizations with strict data governance requirements while offering a controlled alternative to completely unmanaged AI tools.

50:44 Ryan – “I think this is nutso.”

53:00 Fabric Mirroring for Azure SQL Managed Instance (Generally Available) | Microsoft Fabric Blog | Microsoft Fabric

Azure SQL Managed Instance Mirroring enables near real-time data replication to Microsoft Fabric’s OneLake without ETL processes, supporting both data changes and schema modifications like column additions/drops unlike traditional CDC approaches.
The feature provides free compute and storage based on Fabric capacity size (F64 capacity includes 64TB free mirroring storage), with OneLake storage charges only applying after exceeding the free limit.
Mirrored data becomes immediately available across all Fabric services including Power BI Direct Lake mode, Data Warehouse, Notebooks, and Copilots, allowing cross-database queries between mirrored databases, warehouses, and lakehouses.
Microsoft positions this as a zero-code, zero-ETL solution competing with AWS Database Activity Streams and GCP Datastream, targeting enterprises seeking simplified operational data access and reduced total cost of ownership.
The service extends beyond Managed Instance to include Azure SQL Database and SQL Server 2016-2025, creating a unified mirroring approach across Microsoft’s entire SQL portfolio into their analytics platform.
Interested in pricing? Find that here.

54:55 Ryan – “Because Microsoft SQL server is so memory intensive for performance, being able to do large queries across, you know, datasets has always been difficult with that…So I can see why this is very handy if you’re Microsoft SQL on Azure. And then the fact that they’re giving you so much for free is the incentive there. They know what they’re doing.”

56:35 Generally Available: Azure Firewall Updates – IP Group limit increased to 600 per Firewall Policy

Azure Firewall Policy now supports 600 IP Groups per policy, tripling the previous limit of 200, allowing organizations to consolidate more network security rules into fewer, more manageable groups.
This enhancement directly addresses enterprise scalability needs by reducing rule complexity – instead of maintaining thousands of individual IP addresses across multiple policies, administrators can organize them into logical groups like “branch offices” or “partner networks.”
The increased limit brings Azure Firewall closer to parity with AWS Network Firewall and GCP Cloud Armor, which have historically offered more flexible rule management options for large-scale deployments.
Primary beneficiaries include large enterprises and managed service providers who manage complex multi-tenant environments, as they can now implement more granular security policies without hitting artificial limits.
While the feature itself is free, customers should note that Azure Firewall pricing starts at $1.25 per deployment hour plus data processing charges, making efficient rule management critical for cost optimization.

57:50 Matt – “Azure Firewall isn’t cheap, but it’s also your but it’s also your IDS and IPS, so if you’re comparing it to Apollo Alto or any of these other massive ones, the Premiere version is not cheap, but it does give you a lot of those security things.”

Other Clouds

58:54 Announcing cost-efficient storage with Network file storage, cold storage, and usage-based backups | DigitalOcean

DigitalOcean is launching Network File Storage (NFS) on October 20th, a managed file system service starting at 50 GiB increments that supports NFSv3/v4 and allows multiple GPU/CPU droplets to mount the same share for AI/ML workloads.
This addresses the need for shared high-performance storage without the typical 1TB+ minimums of competitors.
Spaces cold storage enters public preview at $0.007/GiB per month with one free retrieval monthly, targeting petabyte-scale datasets that need instant access but are rarely used. The pricing model avoids unpredictable retrieval fees common with other providers by including one monthly retrieval in the base price.
Usage-based backups now support 4, 6, or 12-hour backup intervals with retention from 3 days to 6 months, priced from $0.01-0.04/GiB-month based on frequency. This consumption-based model helps businesses meet strict RPO requirements without paying for unused capacity.
All three services target AI/ML workloads and data-intensive applications, with NFS optimized for training datasets, cold storage for archived models, and frequent backups for GPU droplet protection.
The combination provides a complete storage strategy for organizations dealing with growing data footprints.
The services are initially available in limited regions (NFS in ATL1 and NYC) with preview access requiring support tickets or form submissions, indicating a measured rollout approach typical of infrastructure services.

1:01:24 Matt – “At lot of these companies don’t need the scale, the flexibility and everything else that AWS, GCP, and Azure provide…this is probably all they need.”

1:02:36Build Smarter Agents with Image Generation, Auto-Indexing, VPC Security, and new AI Tools on DigitalOcean Gradient AI Platform | DigitalOcean

DigitalOcean’s Gradient AI Platform now supports image generation through OpenAI’s gpt-image-1 model, marking their first non-text modality and enabling developers to create images programmatically via the same API endpoint used for text completions.
Auto-indexing for Knowledge Bases automatically detects, fetches, and re-indexes new or updated documents from connected sources into OpenSearch databases, reducing manual maintenance for keeping AI agents’ knowledge current.
New VPC integration allows AI agents and indexing jobs to run on private networks within DigitalOcean’s managed infrastructure, addressing enterprise security requirements without exposing services to the public internet.
Two new developer tools are coming: the Agent Development Kit (ADK) provides a code-first framework for building and deploying AI agent workflows, while
Genie offers VS Code integration for designing multi-agent systems using natural language.
These updates position DigitalOcean to compete more directly with major cloud providers in the AI platform space by offering multimodal capabilities, enterprise security features, and developer-friendly tooling for building production AI applications.

1:04:14 Matt – “Theyre really learning about their audience, and they’re going to build specific to what their customer needs… and they’ve determined that their customers need these image generation AI features. They’re not always the fastest, but they always get there.”

1:05:11 Announcing per-sec billing, new Droplet plans, BYOIP, and NAT gateway preview to reduce scaling costs | DigitalOcean

DigitalOcean is switching from hourly to per-second billing for Droplets starting January 1, 2026, with a 60-second minimum charge, which seems like the standard now.
This change could reduce costs by up to 80% for short-lived workloads like CI/CD pipelines that previously paid for full hours when only using minutes.
New intermediate Droplet sizes bridge the gap between shared and dedicated CPU plans, allowing in-place upgrades without IP changes or data migration. The new plans include 5x SSD variants for CPU Optimized and 6.5x SSD variants for General Purpose, addressing the previous large cost jump between tiers.
Bring Your Own IP (BYOIP) is now generally available with a 7-day setup time compared to 1-4 weeks at hyperscalers. This allows businesses to maintain their IP reputation and avoid breaking client allow-lists when migrating to DigitalOcean.
VPC NAT Gateway enters public preview at $40/month including 100GB bandwidth, supporting up to 500,000 simultaneous connections.
This managed service provides centralized egress with static IPs for private resources without the complexity of self-managed NAT instances.
These updates target cost optimization and migration friction points, particularly benefiting ephemeral workloads, auto-scaling applications, and businesses needing to maintain IP continuity during cloud migrations.

1:09:31 Introducing Snowflake Managed MCP Servers for Secure, Governed Data Agents

Snowflake is introducing Managed MCP (Model Context Protocol) Servers that enable secure data agents to access enterprise data while maintaining governance and compliance controls. This addresses the challenge of giving AI agents access to sensitive data without compromising security.
The MCP protocol, originally developed by Anthropic, allows AI assistants to interact with external data sources through a standardized interface.
Snowflake’s implementation adds enterprise-grade security layers including authentication, authorization, and audit logging.
Data agents can now query Snowflake databases, run SQL commands, and retrieve results without requiring direct database credentials or exposing sensitive connection strings. All interactions are governed by Snowflake’s existing role-based access controls and data governance policies.
This integration enables organizations to build AI applications that can answer questions about their business data while ensuring compliance with data residency, privacy regulations, and internal security policies. The managed service handles infrastructure complexity and scaling automatically.
Developers can connect popular AI frameworks and tools to Snowflake data through the MCP interface, reducing the complexity of building secure data pipelines for AI applications. This positions Snowflake as a bridge between enterprise data warehouses and the emerging AI agent ecosystem.