No-Nvidias networking club convenes in search of open GPU interconnect

Ultra Accelerator Link consortium promises 200 gigabits per second per lane spec will debut in Q1 2025

30 Oct 2024, 06:27 by Simon Sharwood · The Register

The Ultra Accelerator Link Consortium – an alliance of enterprise tech vendors that pointedly excludes Nvidia because it wants a shared standard for accelerator-to-accelerator links – has opened its doors and promised to deliver a spec in the first quarter of 2025.

The Consortium announced its existence in May, and promised to "define and establish an open industry standard that will enable AI accelerators to communicate more effectively."

The group's members include AMD, AWS, Broadcom, Cisco, Google, HPE, Intel, Meta and Microsoft – a Who's Who of AI, other than Nvidia.

Why exclude the market leader?

Nvidia's networking business has quietly grown to over a $14 billion annual run rate – a figure only the likes of Cisco and Huawei win from datacenter sales. Plenty of that revenue comes from Nvidia's proprietary InfiniBand and NVLink GPU-to-GPU connection offerings – which aren't easily accessed by rival vendors.

When UALink proclaims it wants "an interconnect based upon open standards [to] enable system OEMs, IT professionals and system integrators to create a pathway for easier integration, greater flexibility and scalability of their AI-connected datacenters" it's therefore both declaring its members' ambition and pointing out that vendors and buyers alike generally appreciate an open alternative to proprietary products.

Vendors like open standards because they enable them to sell stuff. As the enterprise tech vendor community sees Nvidia dominate AI, many players would very much like to sell more stuff, but know it's too late to create their own tech. Buyers also like a contested market because that tends to drive down prices.

Which brings us to Tuesday's announcement that the UAlink Consortium has done enough paperwork to have been incorporated, and is open to other entities joining the group.

The org also teased its forthcoming version 1.0 spec, which it claims "will enable up to 200Gbit/sec per lane scale-up connection for up to 1024 accelerators within an AI pod." That's rather faster than the 112Gbit/sec possible over Nvidia's NVlink and also leaves PCIe 5 eating dust.

The consortium promised the spec will emerge in a form fit for general review in the first quarter of 2025. But even then it will still be just an idea, and whatever emerges in early 2025 will be many months or years away from appearing in hardware.

Which is why Nvidia CEO Jensen Huang dismissed UALink as a threat last May during Taiwan's Computex exhibition, when he said "By the time the first gen of UALink comes out, we will be at NVLink seven or eight."

Which is not to say Nvidia is necessarily hostile to open standards: the acceleration champ this week proudly pointed to its Spectrum X take on Ethernet being used by the 100,000-GPU AI training cluster built by Elon Musk's xAI.

UALink Consortium chair Kurtis Bowman offered a canned quote to the effect that "The release of the UALink 1.0 specification in Q1 2025 represents an important milestone as it will establish an open industry standard enabling AI accelerators and switches to communicate more effectively, expand memory access to meet large AI model requirements and demonstrate the benefits of industry collaboration." ®