On May 17th, the Digital Cloud Power Conference 2024•AI Empowerment Enterprise Innovation Forum concluded successfully under the theme "Evolving by Crossing Business Boundaries." At this forum, Shenzhou Kuntai addressed the pain points of users regarding LLM training and high computing power demands by introducing products and solutions for a green landing in multi-cloud heterogeneous environments. These include the Heterogeneous Intelligent Scheduling Operation Platform (HISO), the Heterogeneous Intelligent Computing Acceleration Platform (HICA), and the integrated delivery of Shenzhou Kuntai's fully liquid-cooled enclosure products. These offerings help enterprise customers match better computing power combinations within the entire heterogeneous intelligent computing resource pool, effectively improve the resource utilization efficiency of GPU server clusters, and solve the energy consumption issues between nodes and their interconnectivity. This aids enterprises in building an intelligent computing infrastructure base that is superior in performance, lower in cost, higher in energy efficiency, and lower in energy consumption.
Zhou Chuan, Vice President of the Digital China Information Innovation Group and General Manager of the R&D Center
In the era of heterogeneous computing,
how can enterprises implement intelligent computing to reduce costs and increase efficiency?
In the new era of heterogeneous computing, multi-cloud heterogeneous computing infrastructure has become a necessity. A unique "intelligent computing era" is calling for a new intelligent computing architecture. Meanwhile, as LLM and generative AI are being deployed, a vast number of model training and inference tasks not only bring massive underlying computing power demands, but also pose significant challenges to resource utilization. According to data from OpenAI, the MFU (Model Flops Utility) for training GPT-4 ranges between 32% to 36%. Currently, the industry average MFU rate is only about 30% to 40%. Improving the utilization rate of intelligent computing resources could save enterprises a considerable amount of cost.
Energy consumption is another major challenge. Computing power itself will become a primary source of energy consumption and carbon dioxide emissions, not to mention that the energy consumption of GPUs is more than twice that of CPUs. Research by MIT suggests that future human needs for artificial intelligence applications will require an additional 10% of energy demand. To put it vividly, "refining" LLM will consume more electricity than steelmaking. For a company, adding one rack for intelligent computing operations per year is equivalent to increasing electricity consumption by 150,000 kWh, enough to supply electricity for 100 households for a year and approximately emitting 1.5 tons of carbon dioxide. Both energy consumption and carbon emissions are enormous.
Shenzhou Kuntai improves the utilization rate of computing resources with a dual approach
Shenzhou Kuntai has launched the Heterogeneous Intelligent Scheduling Operation Platform HISO and the Heterogeneous Intelligent Computing Acceleration Platform HICA, effectively solving the complex heterogeneous compatibility issues faced by intelligent computing clusters both within and between clusters, significantly enhancing the utilization rate of computing resources.
The Shenzhou Kuntai HISO, based on cloud-native technology and integrating GPU hard segmentation and virtual segmentation technologies, can achieve GPU resource virtualization or pooling, completing cross-cluster computing power scheduling. According to user business needs, this platform can match optimal computing power combinations throughout the entire heterogeneous intelligent computing resource pool, improving the resource utilization rate of GPU server clusters. HISO boasts key capabilities such as mixing domestic and international GPU resources in a network, hybrid scheduling, and precise isolation of computing power, managing and scheduling GPU resources of multiple clusters "as if managing a single GPU host." By using GPU container pass-through and IaaS offloading, the platform accelerates model loading time, achieving a threefold increase in loading speed compared to traditional methods. It also collects full-stack, end-to-end metrics from the intelligent computing center in real time, identifying and locating software and hardware faults, achieving observability of computing power.
The Shenzhou Kuntai Heterogeneous Intelligent Computing Acceleration Platform HICA focuses on solving the computing power scheduling optimization issues within clusters. By masking the underlying computational power ecological differences within clusters and breaking through key computational efficiency bottlenecks, it effectively enhances the utilization rate and availability of computing power. Through proprietary service layers, intermediate adaptation layers, and scheduling orchestration algorithms, HICA employs data parallelism, model parallelism, etc., to effectively break down parallel computing tasks and match corresponding software stacks and computing resources for them. When GPU resources change, the platform can dynamically schedule computing subtasks in real time and adjust model topologies and architectures to fully aggregate various computing resources.
The Shenzhou Kuntai Heterogeneous Intelligent Computing Acceleration Platform HICA features a multi-core capability within a single cloud, supporting mainstream AI chips from both domestic and international manufacturers. It enables mixed training and inference across intelligent computing clusters composed of different brands and models of chips, with an expected reduction of 20% in idle computing power.
Moreover, based on the characteristics of collective communication flows between different models and operators, the platform can adaptively select the most suitable parameters for higher communication efficiency. During the operation of different models, taking into account the varying requirements for memory-to-computation ratios, HICA can select the most appropriate memory-to-computation resources to load models at various scales from macro to micro, accelerating throughput and reducing latency. This results in an MFU improvement of 10~20% and an MBU enhancement of 5%.
Silicon photonics + liquid cooling make the electric meter "jump" less,
integrated delivery saves customers time and effort.
As the demand for computing power surges with the implementation of generative AI and bandwidth speeds skyrocket, the pain point of high energy consumption in intelligent computing center nodes becomes increasingly acute. Taking a 10,000-card intelligent computing center as an example, using 200G interfaces requires about 80,000 optical modules, with interconnect energy consumption accounting for 5% of the total. The issue of high energy consumption between nodes has become increasingly prominent.
To address the energy consumption issue between nodes, Shenzhou Kuntai adopts silicon photonic technology, reducing modulator voltage through single-source multi-modulator and employing a series of technologies such as distributed feedback lasers, effectively lowering interconnect energy consumption by 25%.
Meanwhile, targeting the energy consumption of nodes, Shenzhou Kuntai introduces liquid-cooled servers, enhancing system cooling efficiency through integrated cold plates and intelligent flow regulation. By utilizing a negative pressure pipeline system and near-end leak detection technology, coupled with service management systems, the cooling system's reliability is enhanced, effectively reducing node energy consumption by 30%.
To help customers avoid difficulties such as numerous interfaces, complex connectors, challenging on-site deployment, and long implementation cycles, Shenzhou Kuntai officially released the "KunTai Pod2000 Full Liquid-Cooled Enclosure" solution at the opening ceremony of the Digital Cloud Power Conference 2024. With an integrated delivery approach, it not only reduces the complexity of deployment and maintenance, but also achieves 100% liquid cooling and a cost-effective liquid cooling solution for data centers, pushing PUE towards 1.15. Moreover, with a maximum single enclosure power of over 60KW+, it achieves an energy efficiency ratio 1.5 times the industry average, offering customers powerful computing power while effectively controlling energy consumption costs.
Shenzhou Kuntai's enclosure products can be equipped with Kunpeng and Ascend mainboards. The integrated delivery method is highly praised, and silicon photonics + liquid cooling further enhance the enclosure products, creating an intelligent computing center that is more powerful, consumes less energy, and operates more efficiently for customers.
With the advent of ChatGPT at the end of 2022, AI is becoming a core engine driving innovation. The current IT infrastructure has reached a new phase of development characterized by a mutually reinforcing cycle of model advancement and computing power, spiraling upwards. Faced with new development opportunities, Shenzhou Kuntai has proposed a strategy to pursue a new intelligent computing architecture, aiming to enhance the overall performance of intelligent computing center systems. This strategy establishes a diversified intelligent computing architecture that is high-throughput, highly parallel, efficient, and low in energy consumption. The new intelligent computing architecture enables rapid deployment and low investment to break through computing power bottlenecks, constructing intelligent computing centers with superior performance, lower costs, and higher energy efficiency. In the future, every intelligent computing center and computer will adopt this new intelligent computing architecture, thereby achieving universal access to computing power.