How CPUs will address the energy challenges of generative AI

By Jeff Wittich

The vast majority of company leaders (98%) recognize the strategic importance of AI, with nearly 65% planning increased investments. Global AI spending is expected to reach $300 billion by 2026. Also by 2026, AI’s electricity usage could increase tenfold, according to the International Energy Agency. Clearly, AI presents businesses with a dual challenge: maximizing AI’s capabilities while minimizing its environmental impact.

In the United States alone, power consumption by data centers is expected to double by 2030, reaching 35GW (gigawatts), primarily due to the growing demand for AI technologies. This increase is largely driven by the deployment of AI-ready racks, which consume an excessive 40kW to 60kW (kilowatts) each due to their GPU-intensive processes.

There are three main strategies available to address these looming energy challenges effectively:

CPUs vs. GPUs for AI inference workloads

Contrary to common belief, sustainable AI practices show that CPUs, not just high-powered GPUs, are suitable for most AI tasks. For example, 85% of AI compute is used for inference and does not require a GPU.

For AI inference tasks, CPUs offer a balanced blend of performance, energy efficiency, and cost-effectiveness. They adeptly handle diverse, less-intensive inference tasks, making them particularly energy-efficient. Additionally, their ability to process parallel tasks and adapt to fluctuating demands ensures optimal energy usage, which is crucial for maintaining efficiency. This stands in stark contrast to the more power-hungry GPUs, which excel in AI training due to their high-performance capabilities but often remain underutilized between intensive tasks.

Moreover, the lower energy and financial spend associated with CPUs make them a preferable option for organizations striving for sustainable and cost-effective operations. Further enhancing this advantage, software optimization libraries tailored for CPU architectures significantly reduce energy demands. These libraries optimize AI inference tasks to run more efficiently, aligning computational processes with the CPU’s operational characteristics to minimize unnecessary power usage.

Similarly, enterprise developers can utilize cutting-edge software tools that enhance AI performance on CPUs. These tools integrate seamlessly with common AI frameworks such as TensorFlow and ONNX, automatically tuning AI models for optimal CPU performance. This not only streamlines the deployment process but also eliminates the need for manual adjustments across different hardware platforms, simplifying the development workflow and further reducing energy consumption.

Lastly, model optimization complements these software tools by refining AI models to eliminate unnecessary parameters, creating more compact and efficient models. This pruning process not only maintains accuracy but also reduces computational complexity, lowering the energy required for processing.

Choosing the right compute for AI workloads

For enterprises to fully leverage the benefits of AI while maintaining energy efficiency, it is critical to strategically match CPU capabilities with specific AI priorities. This involves several steps:

Data centers currently account for about 4% of global energy consumption, a figure that the growth of AI threatens to increase significantly. Many data centers already have deployed large numbers of GPUs, which consume tremendous power and suffer from thermal constraints.

For example, GPUs like Nvidia’s H100, with 80 billion transistors, push power consumption to extremes, with some configurations exceeding 40kW. As a result, data centers must employ immersion cooling, a process which submerges the hardware in thermally conductive liquid. While effective at heat removal and allowing for higher power densities, this cooling method consumes additional power, compelling data centers to allocate 10% to 20% of their energy solely for this task.

Conversely, energy-efficient CPUs offer a promising solution to future-proof against the surging electricity needs driven by the rapid expansion of complex AI applications. Companies like Scaleway and Oracle are leading this trend by implementing CPU-based AI inferencing methods that dramatically reduce reliance on traditional GPUs. This shift not only promotes more sustainable practices but also showcases the ability of CPUs to efficiently handle demanding AI tasks.

To illustrate, Oracle has successfully run generative AI models with up to seven billion parameters, such as the Llama 2 model, directly on CPUs. This approach has demonstrated significant energy efficiency and computational power benefits, setting a benchmark for effectively managing modern AI workloads without excessive energy consumption.

Matching CPUs with performance and energy needs

Given the superior energy efficiency of CPUs in handling AI tasks, we should consider how best to integrate these technologies into existing data centers. The integration of new CPU technologies demands careful consideration of several key factors to ensure both performance and energy efficiency are optimized:

By focusing on these key considerations, we can effectively balance performance and energy efficiency in our data centers, ensuring a cost-effective and future-proofed infrastructure prepared to meet the computational demands of future AI applications.

Advancing CPU technology for AI

Industry AI alliances, such as the AI Platform Alliance, play a crucial role in advancing CPU technology for artificial intelligence applications, focusing on enhancing energy efficiency and performance through collaborative efforts. These alliances bring together a diverse range of partners from various sectors of the technology stack—including CPUs, accelerators, servers, and software—to develop interoperable solutions that address specific AI challenges. This work spans from edge computing to large data centers, ensuring that AI deployments are both sustainable and efficient.

These collaborations are particularly effective in creating solutions optimized for different AI tasks, such as computer vision, video processing, and generative AI. By pooling expertise and technologies from multiple companies, these alliances aim to forge best-in-breed solutions that deliver optimal performance and remarkable energy efficiency.

Cooperative efforts such as the AI Platform Alliance fuel the development of new CPU technologies and system designs that are specifically engineered to handle the demands of AI workloads efficiently. These innovations lead to significant energy savings and boost the overall performance of AI applications, highlighting the substantial benefits of industry-wide collaboration in driving technological advancements.

Jeff Wittich is chief product officer at Ampere Computing.

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.

© Info World