
The tool works by first evaluating prompts against user-defined datasets and metrics, then rewriting them to optimize them for up to five inference models. It then benchmarks the optimized versions against the originals across the models to help developers identify the best-performing configurations for specific workloads, AWS said.
Currently, it is generally available across multiple AWS regions, including US East, US West, Mumbai, Seoul, Singapore, Sydney, Tokyo, Canada (Central), Frankfurt, Ireland, London, Zurich, and São Paulo.
The company said that enterprise customers will be billed for its use based on the Bedrock model inference tokens consumed during the optimization process, using the same per-token pricing rates applied to standard Bedrock inference workloads.
Will help with economics of scaling AI in production
The tool’s focus on automated prompt refinement, analysts say, will help enterprises tackle operational challenges, especially the economics around scaling generative AI workloads in production.

