Google TPU 8t and TPU 8i: Purpose-built for the Agentic Era
YouTube transcript, YouTube translate
A quick preview of the first subtitles so you know what the video covers.
models have evolved to keep up with the changing demands and the infrastructure needs to evolve as well. Recognizing that the infrastructure requirements for pre-training, post-training and realtime serving have radically diverged, I'm thrilled to announce that Google will be launching two eighth generation TPUs, TPU8 and TPU 8i. The new frontier models will reach trillions of parameters. To build these frontier models of tomorrow, we need infrastructure that can handle mega scale pre-training without being stalled by data bottlenecks. TPU8 is optimized to reduce training time for trillion parameter frontier models, delivering a staggering 121 exoflops of native FP4 computes and two pabytes of shared HPM within a single 9,600 chip super pod. a 3x increase in peak performance per super pod over the previous generation. As we enter the agentic era, the industry is hitting what we call the latency wall where traditional architectures struggle with the real-time demands of auto reggressive decoding and complex chain of thought reasoning models. That is why to announce a second eighth gen TPU that directly addresses this issue. TPU8 is our specialized post-training and inference engine. We are optimizing its core features to be the best reinforcement learning and serving infrastructure for the next generation of reasoning models. A defining breakthrough for TPU 8i lies in its ICI networking architecture which pioneers the boardfly topology for TPUs. By shortening the network diameter needed for all DA communication, the very heartbeat of reasoning models, Boardfly achieves up to a 50% improvement in latency for communication intensive workloads. Customers often ask if scale must come at the expense of speed. The reality of our eighth generation TPUs is that we have transitioned away from a singular one-sizefits-all approach. While both architectures remain highly capable encompassing pre-training, reinforcement learning, fine-tuning, and serving, we have purposefully optimized each system to unlock maximum efficiency and value for the most critical stages of AI development. These nextG TPUs will power Google's own AI infrastructure needs to build the best models like Gemini. And these will also be available for our Google Cloud customers by the end of the year. We can't wait to see what the world will build with the power of TPU8i and TPU8.