DeepMind demos an AI-enabled pointer and Gemini help in Chrome
DeepMind shared interaction principles and demos for an AI-enabled mouse pointer, and says Gemini in Chrome can answer questions about the exact part of a webpage you point to.
AI tools guide
This guide collects real AI product launches, APIs, developer tools, and workflow updates from The AI Tea. Each brief explains what changed, who it affects, and where the original source lives.
Terms to know
DeepMind shared interaction principles and demos for an AI-enabled mouse pointer, and says Gemini in Chrome can answer questions about the exact part of a webpage you point to.
NVIDIA describes a “verified agent skills” catalog with scanning, signing, and machine-readable skill cards to help teams trust and audit reusable agent capabilities.
Stability AI released Stable Audio 3.0, including open-weight Small and Medium checkpoints trained on licensed data, plus a Large model offered via its API for higher-volume use.
Cohere released Command A+, an Apache 2.0 open-source MoE model positioned for agentic workflows, multimodal inputs, and long context while targeting self-hosted enterprise deployment.
Google says the Gemini app is adding Daily Brief and Gemini Spark, plus new models like Gemini 3.5 Flash and Gemini Omni, to make it more proactive.
OpenAI says Codex is now available in preview in the ChatGPT mobile app, letting you check in on long-running work and approve or redirect it from your phone.
Hugging Face and NVIDIA shared a workflow for fine-tuning Cosmos Predict with LoRA and DoRA methods for robot-video generation tasks.
OpenAI introduced newer API voice models for realtime conversation, translation, and transcription workflows that developers can build into apps.
Hugging Face and AWS published a practical overview of infrastructure pieces teams use to train, deploy, and serve foundation models.
AWS says its MCP Server is generally available, letting AI agents call AWS APIs and read current documentation under IAM guardrails with CloudTrail and CloudWatch visibility.
OpenAI says three Realtime API audio models—GPT‑Realtime‑2, GPT‑Realtime‑Translate, and GPT‑Realtime‑Whisper—support voice agents that reason, translate, and transcribe in real time.
Google says Gemini API File Search now supports images plus text, metadata filtering, and page citations to ground RAG responses.
OpenAI says GPT‑5.5 Instant, ChatGPT’s default model, is more accurate, cuts hallucinated claims in internal tests, and adds visibility into what context was used for personalization.
OpenAI describes a relay-plus-transceiver WebRTC design that keeps voice sessions stable while avoiding huge public UDP port ranges in Kubernetes.