New consortium to redefine Ethernet for AI and HPC

The network protocol faces challenges as AI and high-performance computing (HPC) workloads evolve rapidly

The Ultra Ethernet Consortium (UEC) has been set up to foster industry-wide cooperation to build an Ethernet-based stack architecture for high-performance networking.

The UEC intends to capitalise on Ethernet’s ubiquity and flexibility in handling varied workloads to build the Ultra Ethernet stack, while being scalable and cost-effective.

The founding members of UEC include AMD, Arista, Broadcom, Cisco, Eviden (an Atos Business), HPE, Intel, Meta and Microsoft.

Fit for purpose

“This isn’t about overhauling Ethernet,” said Dr. J Metz, Chair of the Ultra Ethernet Consortium. “It’s about tuning Ethernet to improve efficiency for workloads with specific performance requirements. We’re looking at every layer – from the physical all the way through the software layers – to find the best way to improve efficiency and performance at scale.”

Karl Freund, Founder and Principal Analyst at Cambrian-AI Research, noted,There has been an ongoing discussion, dare I say battle, over the best networking to use for infrastructure supporting the training and inference of large language models for generative AI. Some companies have been shifting to Ethernet-based networking, preferring its ease of installation and use. The UEC initiative will be a welcome addition to the AI community.”

Four areas of focus

The consortium will work on minimising changes to the communication stack while maintaining and promoting Ethernet’s interoperability. The technical goals for the consortium are to develop specifications, APIs, and source code to define:

• protocols, electrical and optical signalling characteristics, application program interfaces and/or data structures for Ethernet communications

• link-level and end-to-end network transport protocols to extend or replace existing link and transport protocols

• link-level and end-to-end congestion, telemetry and signaling mechanisms; each of the foregoing suitable for AI, machine learning and HPC environments

• software, storage, management and security constructs to facilitate a variety of workloads and operating environments.

UEC will take a systematic approach of modular, compatible, interoperable layers with tight integration to provide holistic improvement for demanding workloads via four working groups. They are: Physical Layer, Link Layer, Transport Layer and Software Layer.

UEC is a Joint Development Foundation project hosted by The Linux Foundation. UEC will begin accepting applications for new members in Q4 2023. More information can be found at ultraethernet.org