Amazon Outlines Plans to Improve AI Efficiency and Affordability.

Watching sessions from last week’s AWS: re-Invent conference highlighted Amazon’s emphasis on integrating AI into applications rather than treating it as a standalone technology. This shift means developers are urged to prioritize cost and efficiency when leveraging AI.

“Generative AI inference is going to be a core building block for every single application,” stated Matt Garman, CEO of Amazon Web Services. “I believe generative AI has the potential to transform every industry, company, workflow, and user experience.”

To support this vision, Garman introduced a range of updates, including advancements in storage, databases, computing chips, and various AI tools. These upgrades primarily aim to reduce both costs and complexity for users.

Generative AI tools were a central focus at the event. Matt Garman, CEO of Amazon Web Services, highlighted Bedrock, the company’s AI platform, saying, “Every application is going to use inference in some way to enhance or build or really change an application.” One standout feature is model distillation, which allows users to train smaller, subject-specific models using prompts and outputs from larger models. These distilled models are more efficient, running 500% faster and at 75% lower costs.

Garman also discussed enhanced guardrails and security features, including a preview of an automated reasoning system designed to ensure systems operate as intended, reducing the risk of hallucinations. Additional upgrades to Bedrock include improved retrieval-augmented generation (RAG) tools, featuring better methods for ingesting and evaluating knowledge bases.

Agents are another hot topic this year, as seen at Microsoft Ignite. Garman introduced a preview of new agent services, focusing on multi-agent collaboration and orchestration. He emphasized that we are still in “the earliest days of generative AI.”

“The most success that we’ve seen from companies everywhere in the world is in cost avoidance and productivity,” said Amazon CEO Andy Jassy. “But you also are starting to see completely reimagined and reinvented customer experiences.”

Jassy highlighted Amazon’s internal applications, such as customer service chatbots that recognize users and their orders. These chatbots have improved customer satisfaction by 500 basis points, sped up processing times by 25%, and cut costs by 25%. He also mentioned “Sparrow,” a robotic system that transfers items into customer-specific totes, and “Rufus,” a feature enabling customers to ask questions on any product detail page. Altogether, Amazon has over 1,000 generative AI applications deployed or in development.

While praising Anthropic’s Claude models, Jassy emphasized that “there will never be one tool to rule the world,” stressing the importance of choice in model selection. To that end, he introduced Amazon’s new “frontier” models, branded under the Nova family.

Jassy also discussed changes to SageMaker, originally a tool for training AI models, which has evolved into a unified platform for data, analytics, and AI. The new SageMaker Unified Studio consolidates previously separate studios, query editors, and visual tools. Additionally, SageMaker Lakehouse offers an integrated view of data across multiple lakes, warehouses, and third-party sources for analytics and AI/machine learning. The original SageMaker is now rebranded as SageMaker AI.

Compute, Storage, and Databases
Matt Garman announced the general availability of instances using the new Trainium 2 chips, including an EC2 instance featuring Trainium 2 UltraServers. These servers combine four nodes into a 64-chip configuration with 83 petaflops of performance, designed for both AI training and inference. Garman noted the instances provide 30–40% better price performance compared to current GPU-based options, though real-world results may vary.

Looking ahead, Trainium 3 is slated for release next year as AWS’s first 3nm chip, offering double the computing power of Trainium 2 and 40% greater efficiency. For traditional computing needs, Garman highlighted the ongoing success of Amazon’s Graviton CPU chips, introduced in 2018. Graviton now handles as much compute power as all AWS delivered—x86 and Arm combined—in 2019, delivering 40% better price performance than x86 server chips while consuming 60% less energy. According to Garman, 90% of AWS’s top 1,000 customers use Graviton.

Garman also introduced the Graviton 4 chip, which is 30% faster per core and offers triple the CPU cores and memory. It promises 40% faster performance for database applications and up to 45% gains for large Java applications.

In storage and databases, Garman unveiled new S3 tables for Apache Iceberg and an S3 Metadata service. He also spotlighted Aurora DSQL, which he described as the fastest distributed SQL database. Aurora DSQL is a multi-region, serverless service compatible with PostgreSQL, built for always-available applications. It leverages a new transaction engine and builds on a time-sync service Amazon introduced last year. Garman claimed it is up to four times faster than Google’s Spanner database.