AMD Ryzen AI 300 Series Launched

Notice: Page may contain affiliate links for which we may earn a small commission through services like Amazon Affiliates or Skimlinks.

AdrianBc

Member
Mar 29, 2021
32
17
8
"It feels strange, but in this generation, I am more excited to use the Ryzen AI 300 series than the Ryzen 9000 series based on the demos I have seen."

I do not agree with this.

While it is true that the Ryzen AI 300 brings significant improvements, especially in the GPU for everybody and in the NPU for those interested in machine learning inference, the Ryzen 9000 series brings the greatest advance in CPU throughput since 2019, when the Ryzen 9 3950X Zen 2 CPU has been introduced.

The Ryzen AI 300 series has the same vector throughput as Zen 3 and Zen 4, only in the desktop and server Zen 5 the vector throughput is doubled.

The evolution of the floating-point throughput in desktop CPUs (i.e. non-server non-workstation CPUs, which are much cheaper per core) has been the following. After a time of rapid throughput increase in the sequence Core 2 => Nehalem => Sandy Bridge => Haswell, Intel had remained without competition and the desktop CPUs have stagnated at 32 FP64 FMA units per socket.

When Zen has begun to apply pressure on Intel, they have increased the number of FP64 FMA units to 48 in Coffee Lake in 2017 and to 64 in Coffee Lake Refresh in 2018 (Zen 1 and 1+ had lower floating-point throughput per core and per socket).

One year later, in 2019, AMD launched Zen 2 with up to 128 FP64 FMA units per socket and Intel has not been able to match AMD since then.

Now, in 2024, Ryzen 9 9950X brings another doubling of the floating-point throughput, with 256 FP64 FMA units per socket.

Despite the hype for machine learning, there still are much more important engineering applications that depend on traditional floating-point computations. Even if the most efficient way for those would be to use FP64 GPUs, the prices for those have grown enormously and they are no longer acceptable for any small business or for any individual.

I still have some old AMD GPUs from almost a decade ago, which were cheaper and faster at FP64 computations than any later CPUs or GPUs.

Until this year there was nothing that could replace them. Now, Ryzen 9 9950X has about the same speed as the old GPUs (close to two FP64 Tflop/s), but it has a lower price and a lower power consumption and it is also much easier to write programs for it, so it is a good upgrade.

No other current CPU or GPU will have a similar FP64 performance per dollar with Ryzen 9 9950X, at least not at reasonable total price. While after its launch AMD Instinct MI300X was offered at $16500, which means a much lower performance per dollar than 9950X, I have seen recently an offer from Dell at a reduced price of $9000. At that price, the performance per dollar would be similar to 9950X. Nevertheless, even if that offer had been valid for those who do not use Dell servers, $9000 is too much for any small business or individual. Such a price is acceptable only for an organization that can ensure that such a GPU will be kept busy almost 24/7 with doing work that would support some profitable activity, allowing to recover the paid price.

So for most people interested in FP64 computations, from now on 9950X will be the only good choice and it will bring a great jump in performance over the computers of similar cost or a much lower cost in comparison with the computers of similar performance, like 3950X did 5 years ago.

In comparison with 9950X with 256 FP64 FMA units per socket, a Ryzen AI 9 HX 370 has only 96 FP64 FMA units per socket, the same with a Ryzen 9 5900X. It is a 50% increase vs. Phoenix/Hawk Point (due to 12 cores vs. 8), but it is not a doubling like in the desktop Zen 5.
 
Last edited:

Patriot

Moderator
Apr 18, 2011
1,485
820
113
In comparison with 9950X with 256 FP64 FMA units per socket, a Ryzen AI 9 HX 370 has only 96 FP64 FMA units per socket, the same with a Ryzen 9 5900X. It is a 50% increase vs. Phoenix/Hawk Point (due to 12 cores vs. 8), but it is not a doubling like in the desktop Zen 5.
I am not following your math... AMD switched from dual pumped AVX256, as in it could run 2 AVX256 or 1 AVX512 spanned across 2 units, to a full wide AVX512 unit. This is a Zen5 architecture shift and not something unique to the desktop parts.
Comparing 12 core strix vs 16 core desktop part is therefore also only going to be a 50% increase.

I had seen nothing to suggest that the FMA units per core are half on the monolithic APUs opposed to the chiplet Desktop/server parts but if you saw something I missed please link it up.
 
Last edited:

AdrianBc

Member
Mar 29, 2021
32
17
8
I am not following your math... AMD switched from dual pumped AVX256, as in it could run 2 AVX256 or 1 AVX512 spanned across 2 units, to a full wide AVX512 unit. This is a Zen5 architecture shift and not something unique to the desktop parts.
Comparing 12 core strix vs 16 core desktop part is therefore also only going to be a 50% increase.

I had seen nothing to suggest that the FMA units per core are half on the monolithic APUs opposed to the chiplet Desktop/server parts but if you saw something I missed please link it up.

AMD has not said any word about this officially, but there have been published extensive benchmarks used to test the microarchitecture of a Ryzen 365 Strix Point sample, which have shown that while the integer instructions have all the expected improvements of Zen 5, the SIMD throughput was the same as for Zen 4. Moreover, this difference between laptop Zen 5 and desktop/server Zen 5 was apparently well known among the early testers of Zen 5, even if this could not be published due to the NDA.


You can use Google translate in Chromium/Chrome to read this text. The test results do not need any translation.


It makes sense for laptop CPUs to provide only the same throughput as the Intel cores or the big Arm cores, i.e. one 512-bit FP64 FMA per clock cycle or two 256-bit FP64 FMA per clock cycle.

Otherwise, the AMD mobile cores would have required a greater area and power consumption, which would have made very difficult to implement simultaneously a 3/2 increase in CPU cores, a 4/3 increase in GPU cores and a 3 times increase in NPU throughput, all in a similar TSMC CMOS process with Phoenix/Hawk Point.

For a laptop CPU, which will be used more seldom to run applications that can benefit from AVX-512 or AVX, it is more useful to have 12 cores with half SIMD throughput than the perhaps 8 cores with full SIMD throughput that could have been put in the same chip area.

While I am one of the few who would have preferred to have fewer, but full-throughput, Zen 5 cores, because that is more energy-efficient when running AVX-512 code, which is important for myself, I agree that AMD's decision to prefer more, but weaker, cores is a better choice for most laptop users. For instance, when compiling a software project on a laptop, having 12 cores instead of 8 would increase by almost 50% the compilation speed, while the SIMD throughput would have very little influence.

It is expected that in early 2025 AMD will also launch a series of desktop Zen 5 CPUs repackaged as laptop CPUs, for big laptops with discrete GPUs, exactly like they have done in the previous Zen generations. Those laptops will have Zen 5 cores with full SIMD throughput, but they will be heavy and expensive, being sold as gaming laptops or desktop-replacement laptops.
 
Last edited: