Who wouldn’t want to be able to play their favorite PC games on the go? It’s true that there is the Nintendo Switch for HD gaming on a handheld. But you have to keep in mind that not all games are on Nintendo’s console and there are PC games that we’d like to be able to take anywhere, be it a hotel room, our apartment or during a long car trip where we’re not driving.
That’s why we’ve decided to take a look at the Steam Deck hardware, specifically with its usefulness in mind. PC gaming anywhere and everywhere. So we’ve also taken it into account and pointed out the parts where we think that in terms of portability both Valve and AMD could have done better.
Van Gogh, next gen console technology at scale
If we look at the two main premises of Steam Deck’s main APU, then we’ll find that it has a CPU with Zen 2 architecture and a GPU with RDNA 2 architecture, two points in common with all next gen consoles, but on a smaller scale.
Starting with the CPU, Steam Deck uses a CCX with 4 Zen 2 cores, instead of using 2 of them as in the consoles, so we are facing a design of 4 cores and 8 threads of execution. It is at least surprising that a device that works under a battery opts for multithreading, but at the same time it must be taken into account that the energy cost of placing 8 cores is much higher. Its clock speed? It is between 2.4 GHz and 3.5 GHz. Again we have another surprising element if we take into consideration that we are talking about a system that is designed not to be connected to the power outlet.
As for the GPU we have a RDNA 2 with 8 Compute Units, which moves in speed between 1 GHz and 1.6 GHz, Valve has not talked about the use of AMD’s SmartShift technology, implemented in the RX 5000 and RX 6000 Gaming and in the PlayStation 5 APU. But we have to take into account the problem of thermal throttling that leads to cut the clock speed of the CPU, GPU or both at the same time and more in low power devices.
The way e.g. Nintendo does this with Nintendo Switch is by leaving at 1 GHz CPU speed and allowing variations of GPU speed. Only at times when there is a screen transition, in the form of fade to black, or playing a video that can’t be skipped. That the Nintendo Switch CPU is set to maximum speed to perform the task of copying data from the console’s NAND flash memory to RAM. where the next level data is located. But in general, it’s rare in the design of a portable system that’s meant to give hours of gaming on the go to end up pulling Boost speeds on CPU and GPU.
The fInfinity Cache is crucial
We don’t know if Steam Deck has Infinity Cache or not, but we think not because it is not confirmed for the rest of AMD’s APUs at the moment and it seems to be a feature unique to dedicated GPUs and not integrated GPUs. Especially given the fact that current GPUs behave in a tile raster fashion. We are not talking about tile rendering. Tile rastering is found on all NVIDIA GPUs since Maxwell under the name Tiled Caching and on all AMD GPUs since Vega under the name DSBR.
The idea is that the part of the 3D pipeline that goes from rasterizing the triangle to drawing into the image buffer is done like in tiled rendering, but with a difference. The tiles are stored in the L2 cache. Which is not RAM and therefore does not function as such. This means that any data that falls out of the L2 cache lines will end up in RAM directly. Since the L2 cache is directly related to the bandwidth of the memory controller, which is much smaller with an integrated GPU and a 128-bit LPDDR5 bus, the chances of data falling into RAM are much higher.
In RDNA 2 for PC and only in the Radeon RX 6000, AMD added the Infinity Cache, this is an additional cache level that acts as a Victim Cache, picking up the discarded data from the L2 cache and adopting it inside. The importance of this is that accessing the data that is in the Infinity Cache increases energy efficiency by requiring less pJ/bit to access. It’s true that LPDDR5 has a lower power consumption than GDDR6, since it’s around 4 pJ/bit on average, but the addition of the Infinity Cache in the APU would have made the Steam Deck more efficient.
Ray Tracing is not necessary, VRS is.
Because the integrated GPU is RDNA 2, it contains support for Ray Tracing, given the inclusion of Ray Intersection Units. But Ray Tracing is not only the calculation of the intersection, but also the traversal of the BVH tree. Which is done in RDNA 2 through computation and believe us that the power in that aspect in the case of the Steam Deck is not enough.
Another issue is the Variable Rate Shading, which groups the pixels to which to apply the Pixel or Fragment Shader that have in common both color value and shader program, to process them as one and then copy the data. This gives a performance depending on the game between 10% and 30% more than not using it, apart from cutting the damn accesses to the VRAM, deadly as we said before in terms of energy consumption.
Most Steam Deck users will be using their existing Steam library, where 99% of games that don’t require Ray Tracing won’t have to worry about ray tracing capabilities.Steam Deck GPU yos.
RAM, Speed, Size and Access in Steam Deck
In all AMD APUs/SoCs whose CPU is from one of the Zen architecture generations, there is one common element: the way it accesses. In all cases their unified memory controller, UMC, communicates with the RAM with a 256-bit bus at memclk speed, which in DDR and LPDDR memory is half its transfer rate, in this case 2750 MHz, which is half the 5500 MT/s. The total bandwidth? 88 GB/s, which is more than triple the 25.6 GB/s of the Nintendo Switch’s SoC.
The UMC has therefore been upgraded from the one used in the Ryzen 4000 and Ryzen 5000 PC APUs, as it has gone from supporting 4 LPDDR4 channels to exceeding the same amount of LPDDR5 channels. It should be noted that with each new generation of any type of memory is achieved by lowering the voltage to achieve a clock speed, this allows to raise the clock speed and have a faster RAM and therefore with a higher bandwidth The problem comes with energy efficiency, which is measured in pJ / bit, and it can be said that the evolution in that aspect is going backwards.
Since watts are Joules per second, we can easily extrapolate the bandwidth of the RAM in what it consumes every second. The answer? In the Switch Deck we have a much higher figure than what the Nintendo Switch RAM consumes, so right off the bat we already have the first problem in terms of design, the power consumption of the memory is much higher than its direct competition, being one of the problems facing the energy consumption of the same and therefore the battery life.
Arithmetic and Texturing Intensity in Steam Deck
In computing there is the concept of Bytes per FLOP or Bytes per floating point operation. We use it to measure the arithmetic intensity of different algorithms when running them on the GPU. Another issue to measure is what we have named as texturing intensity, where we use it to check if there is a bottleneck in texture capturing compared to GPUs with RDNA 2 architecture on PC.
For this we have decided to take the AMD RX 6700 XT, in order to make a comparison within the same graphics architecture. The AMD RX 6700 XT, to compare it with another RDNA 2, has a width of 412.8 GB/s and a power of 13.21 TFLOPS, which gives us about 0.032 Bytes per FLOP. The Steam Deck’s maximum GPU power is 1.6 TFLOPS with a bandwidth of 88 GB/s as we deduced earlier. The figure we get? 0.055 Bytes per FLOP, so the memory is not a bottleneck compared to the desktop AMD RX 6000 in the face of compute shaders and all other graphics shaders except Pixel or Fragment Shaders.
<img src=”//www.w3.org/2000/svg’ viewBox=’0 0 3 2’%3E%3C/svg%3E” data-src=”https://hardzone.es/app/uploads-hardzone.co.uk/2020/05/Call-of-Duty-Modern-Warfare.jpg” alt=”Call-of-Duty-Modern-Warfare” width=”1000″ height=”500″ />
The other issue to measure with the arithmetic intensity in the Steam Deck has to do with the texturing units that operate in conjunction with the Pixel Shaders. On the RX 6700 XT we have a texturing rate of 413 GTexels/s. In the case of the Steam Deck we have 8 Compute Units, which make 32 texture units each and maximum clock speed of 1.6 GHz, which transforms into 51.2 Gtexels/s. rate. Applying the same rule of Bytes per FLOP we can measure the performance for texturing.
And what do we get? On the RX 6700 XT it is 0.99, not to say a 1:1 correlation, obviously the bandwidth will not be used for texturing. It’s just a way to measure the memory intensity for this task. What about the Steam Deck case? Again the memory intensity is better, being 1.71. So again the balance between integrated GPU and bandwidth is one of the strong points. Which ensures that the memory is not a bottleneck for the console’s graphics performance.
The dark side, Steam Deck storage
The base version of the console comes with 64 GB eMMC, a figure that seems ridiculous to us and where it will be impossible to install anything at all, making it essential to purchase an M.2 2230 module to install inside the console. Maneuver that could void the warranty since Valve does not recommend tinkering with the console for it. And this is where our first slap on the wrist to Valve comes in. Since at the time of writing this article, we can find M.2 2230 modules for a much lower price than the difference between the different models. So for us it would have been much better to give easy access through a cover to the M.2 interface.
But what happens if we run out of space and don’t have the M.2 SSD installed? Don’t worry, Valve has put in a microSD card slot. At first glance this looks great, but the performance of a microSD card when it comes to transmitting data is very, very low and you’re going to see how loading games takes forever. That’s why we recommend that you go for an M.2 2230 PCIe and install it or go for the two more advanced modules.
It is not only for the storage, we are talking about multiplying the bandwidth several tens of times and this means that the device becomes directly another in terms of performance. However, without forgetting that the Steam Deck is intended to be a laptop, we are surprised by the choice of this type of storage. An NVMe SSD is the best in terms of memory access speed that you can put in a system, that’s true.
Is it the best in a laptop system? Not really, as it would have made more sense to use eUFS 3.x type memory, since there’s no point in using a PCIe NVMe if you’re going to be using eUFS 3.x type memory.he games that will be able to run the machine within the PC catalog without performance problems of any kind are those that do not take good advantage of the advantages. Between now and the release of games that take advantage of NVMe SSD on PC then the Steam Deck will be outdated in power to run them at a decent speed.
And finally the screen
For many, the inclusion of an 800p display with a 16:10 aspect ratio will seem like a step backwards, but it’s not when you consider the associated costs in terms of bandwidth and computational needs of higher resolutions.
One negative point is that the screen isn’t OLED, as OLED screens are more power efficient and pixel burn-in shouldn’t be an issue in a gaming system. Considering that Nintendo has adopted it in the new model of Switch and Kyoto takes years to adopt a technology and when they do this at a bargain price. The inclusion of an OLED screen by Valve for its Steam Deck would have been a better option and no, we are not saying that the screen is bad, but we are talking under the concept of energy consumption which is important for a portable system.
In any case it has enough resolution to give good image quality and performance, as Valve ensures that all games in its catalog work with a minimum frame rate of 30, which is not ideal for all games either.