Arrow Lake splashdown: Intel pins hopes on replacement for Raptors

New silicon, new architecture, and loads of new motherboards rise to support it, but will power be anchored down?

by · The Register

Back in September 2023, Intel unveiled its newly designed Meteor Lake SoC for the mobile market, which was the first disaggregated chip for mobile using multiple tiled packaging. While in consensus opinion indicates Meteor Lake flopped, it did pave the way for Intel to try new things in the consumer space.

Its next generation of Lunar Lake processors appear to increase efficiency and performance in the low-end to mid-range mobile segment, and while Intel has been working on bolstering its mobile portfolio, they also just announced its next client focused desktop platform.

Enter Arrow Lake, Intel's new desktop platform, which aims build on what works, and no doubt also to move on from the controversy surrounding the instability of its previous 14th Gen Raptor Lake Refresh chips. Intel is integrating its AI-focused neural processing unit (NPU) into its Arrow Lake desktop products – which is the first time it has done so outside of its mobile focused SoCs. Arrow Lake is using a disaggregated and scalable tiled approach to manufacturing and design, with separate tiles for compute, graphics, I/O and the SoC.

When CEO Gelsinger announced Intel's aggressive and ambitious "five nodes in four years" roadmap in 2021, it was at a critical juncture for the tech giant. Since then it has been on a rocky path and more recently, has been posting relatively weak financials.

Touching more on the nodes, a lot of the focus around Intel's latest Arrow Lake processors for desktop was expected to be built on its 20A node, which is the first Intel node to move from the relative nanometer measurement to an angstrom-based measurement – the dawn of the angstrom era. Unfortunately Intel recently cancelled 20A, which in reality was only meant to be a stepping stone to its 18A node. As we know, Intel is expected to deliver its upcoming Panther Lake client chips and their Clearwater Forest for server on 18A.

Intel Arrow Lake slide deck depicting Foveros' tiled design

Although many expected Intel to use all of its fabbing and manufacturing flexibility on the 20A node to manufacture Arrow Lake, the company has actually outsourced all of the tile manufacturing for this generation to TSMC, while Intel will still be packaging the disaggregated architecture using its own Foundry. All of the tiles are packaged onto a base tile using Intel's Foveros 3D stacking packaging, which integrates all of the tiles onto one package, as we've seen previously with their disaggregated Meteor Lake SoC architecture.

Let's dive into Arrow Lake, first by focusing on Intel's new compute tile for the device. This particular tile is manufactured using TSMC's N3B node, with Intel introducing two new cores to its desktop portfolio, which the company concurrently uses for its Lunar Lake mobile SoCs: Lion Cove and Skymont.

The Lion Cove performance (P) cores primarily focus on delivering IPC and single-threaded performance gains. Intel has claimed previously that Lion Cove brings up to a 9 percent improvement in performance over the previous Raptor Lake P-cores. As always, take in-house performance figures with a tiny tiny grain of salt.

Intel Arrow Lake Slide Deck showing out of engine improvements - Click to enlarge

Architecturally, the biggest enhancements in P-core (Lion Cove) are in the execution engine. Intel has expanded its OoO window by increasing both the allocation/rename and retirement stages within the pipeline. This wider execution pipeline should give the core creedence to issue more instructions per clock cycle, which should theoretically present itself through a strong increase in throughput during compute-heavy tasks. Intel greatly expanded the number of execution ports, which should provide a substantial increase in the types of parallel instructions being handled by Lion Cove. This means benefits should come to the ILP are most pronounced in workloads such as rendering, AI inferencing, and physics simulations.

Intel Arrow Lake slide showing P core improvements

Branch prediction, as you know, is one of the key elements used in modern processors to maintain high throughput. The redesigned Lion Cove branch predictor is designed to cut down on mispredictions. These are quite expensive in terms of pipeline stalls, but by making improvements in both prediction accuracy and the recovery latency from mis-predicted branches, it should lead to significant enhancements in efficiency, especially those involving dynamic workloads like AI-driven tasks or gaming. Intel has also increased the size of the reorder buffer, which tracks the status of every in-flight instruction as it works its way through the pipeline. This should in theory bolster the overall out-of-order executions from when the processor is waiting on a branch or memory access that has not been resolved yet.

Intel Arrow Lake slide showing cache hierarchy

In terms of cache hierarchy of the new performance cores, Intel has completely redesigned the structure compared to previous generations. Lion Cove features a multi-level data cache consisting of a 48KB L0D cache with 4-cycle load-to-use latency, a 192KB L1D cache with 9-cycle latency, and an expanded L2 cache. One area where it has improved the L2 cache per core in Lion Cove is by increasing it to 3 MB. A larger cache means fewer memory accesses need to spill over onto the slower L3 cache or even the main memory (DRAM), and this is something that is quite welcomed under mostly heavy loads such as video editing, large-scale simulations, and high-framerate gaming.

Intel further extends the shared L3 cache to 36 MB shared; this means the core can share much more data between threads. Clearly, for multi-core workloads, inter-core communications have to be both low-latency and high-bandwidth, which are both important attributes to cache hierarchy improvements that will directly translate to smooth multi-threaded performance, especially in several workloads that involve multiple cores accessing the same data sets.

Intel Arrow Lake slide showing Skymont E-core performance

The Skymont E-cores mark a notable upgrade over Gracemont, at least on paper, which we saw present in 12th, 13th, and 14th Gen Core processors. The Skymont cores will be much more efficient than their previous counterparts, with Intel promising a gain of up to 32 percent in integer performance and as high as 72 percent in floating-point performance over Gracemont at ISO -frequency. Again, take these performance claims with a pinch of salt, but in theory, its plausible. The E-cores reduce power consumption in multi-threaded tasks, which is their job after all: efficiency cores free up the higher-performing P-cores by dealing with background processes, all forms of parallel computation tasks, and light-to-moderate work. Intel also removes hyperthreading, so each core is parallel in count to each thread, e.g. one thread per core. This is designed to reduce the overall power envelope and give more power and thermal headroom overall.

One of the major architectural enhancements in Skymont is that it has increased vector throughput with more lanes for the SIMD units, which should allow for more executions per data cycle in each E-core. This is essential in workloads such as multimedia, AI, and scientific computing etc., Secondly, L2 cache per cluster has increased to 4 MB from 2 MB, which should remove some memory access bottlenecks.

The larger cache, in conjunction with deeper instruction queues and wider instruction dispatch, should technically make the Skymont cores better at parallel workloads and improve the execution efficiency across multi-threaded applications. On paper, these improvements over make Skymont E-cores an improvement over Gracemont (the E-cores in Raptor Lake), and should handle content rendering, AI inference, and background system management-all without consuming unreasonable amounts of power. That's the nature of an efficiency core after all.

The GPU tile on Arrow Lake is fabricated on the TSMC N5P Process, which is an improved version of the 5nm node already known to bring better performance and efficiency compared to the standard N5 process. Based on an improved version of its Xe architecture, the GPU tile includes four Xe-cores, each with ray tracing units and enhanced vector engines. It uses TSMC's N5P manufacturing process for better power efficiency at higher clock speeds, thus allowing it to give the GPU up to 2x the graphics performance compared to the previous 14th Generation Processors.

Intel Arrow Lake Slide Deck showing the Xe Graphics tile composition - Click to enlarge

With 64 vector engines with 16 per Xe-core and full support for AI workloads through DP4a instructions, the GPU tile looks architecturally solid on paper. Besides this is the addition of XeSS, Xe Super Sampling which should allow for even greater improvements in graphical output via AI image upscaled actions, especially compared to previous generations of Intel's chips. In short, the balance between power efficiency and performance does make the N5P-based GPU tile stand out from other desktop chips, but don't expect to play AAA titles at maximum settings; integrated graphics is still far away from that yet, no matter what the marketing wants you to believe.

Designed on the TSMC N6 process node, the SoC and I/O tile is responsible for system connectivity, memory, and data flow control in general within Intel's Arrow Lake architecture. There is high-bandwidth DDR5-6400 memory, ensuring that system resources become available with minimum latency. Simultaneously, it also controls up to 24 PCIe 5.0 lanes for fast data transfer to and from GPUs or NVMe SSDs. Also, it provides the latest kinds of connectivity like Thunderbolt 4, Wi-Fi 7, and Bluetooth 5.4 to provide the latest in communication standards for Arrow Lake platforms. Built on TSMC's N6 process node, it strikes a balance between performance density and energy efficiency since the manufacturing process keeps power consumption very low while offering high throughput.

While not quite as vital to the performance of its compute or GPU siblings, this SoC-I/O tile really serves a critical function in enabling all of that efficient data flow across the platform from CPU to GPU to memory, all the way down to peripheral devices. It's also worth noting that Intel has increased the memory speeds with Arrow Lake up to DDR5-6400 at JEDEC settings, which is up from DDR5-5600 as saw on their previous 14th Gen Raptor Lake Refresh chips. This means each Arrow Lake CPU regardless of the silicon lottery is guaranteed to handle the uplift in memory speeds, but above this through the implementation of X.M.P memory profiles, with some kits of DDR5 hitting up to 8000 MT/s technically voids Intel's warranty.

In terms of motherboard support, Intel is introducing a new socket, namely the LGA 1851 socket, which means a whole load of new motherboards are coming out to support the launch of Arrow Lake. The chipset for Arrow Lake, or at least the premium chipset launching alongside the chips is named Z890 and brings a load of possible I/O options and configurations. The Z890 chipset features up to 24 x PCIe 4.0 lanes, 10 x USB 3.2, with up to 14 x USB 2.0 and 8 x SATA ports. How these features are enabled and implemented are primarily down to motherboard vendors, with many premium models offering both USB4, Thunderbolt 4 and the latest in consumer networking such as the new Wi-Fi 7 CNVis, and even some with 10G Ethernet for users who require it.

 Cores/Threads (P+E/T)Cache (L3/L2)P-Core Max TurboP-Core BaseE-Core Max TurboE-Core BaseBase TDPMax Turbo TDPPrice (MSRP)
Core Ultra 9 285K8+16/2436/40 MB5.6 GHz3.7 GHz4.6 GHz3.2 GHz125 W250 W$589
Core Ultra 7 265K8+12/2030/36 MB5.4 GHz3.9 GHz4.6 GHz3.3 GHz125 W250 W$394
Core Ultra 7 265KF8+12/2030/36 MB5.4 GHz3.9 GHz4.6 GHz3.3 GHz125 W250 W$379
Core Ultra 5 245K6+8/1424/26 MB5.2 GHz4.2 GHz4.6 GHz3.6 GHz125 W159 W$309
Core Ultra 5 245KF6+8/1424/26 MB5.2 GHz4.2 GHz4.6 GHz3.6 GHz125 W159 W$294

As it stands, Intel is launching five new Core Ultra 200-series processors with prices starting at $294 for the Intel Core Ultra 5 245KF (6P+8E/14T), with the flagship Core Ultra 9 285K (8P+16E/24T) coming in at $589. The top chip, the Intel Core Ultra 9 285K, has 8 P-cores and 16 E-cores, and can boost up to 5.6 GHz on the P-Cores and it also comes with 36 MB of L3 cache, with a base TDP of 125 W; it has a turbo TDP of up to 250 W.

While TDP is something of an oxymoron in the world of client desktop chips, we typically see motherboard vendors through their interpretation of multi-core enhancement (MCE) skirt these limits anyway to allow it to stay ahead of the competition. Since multi-core enhancement tends to often push CPUs beyond their official limits, for users, gains may come at the cost of thermal efficiency and potentially stability. This can pose instability issues, especially as motherboard vendors on their preset profiles typically putting too much CPU VCore voltage through to accommodate the many (chips) and not what each piece of silicon is capable of.

The Intel Core Ultra 200-series Arrow Lake processors are officially expected to go on sale on October 24. Also, during its Arrow Lake press briefing, Intel did mention that its VPro SKUs based on Arrow Lake for commercial systems and SMEs will definitely be coming out, but would not be drawn on when this might be.

In another twist, Intel also announced that its Core Ultra H and HX series for premium gaming laptops are expected to arrive in Q1 2025; this is provided all things go to plan for Intel as things haven't been so great for the chip giant lately. Intel could really do with a trouble free launch for Arrow Lake as it continues the process of laying off some 16,000 workers following a disastrous second quarter where it reported $1.6 billion in losses. ®