Add eGPU docs

This commit is contained in:
Quad 2021-07-12 17:48:07 +02:00
parent 48c82ba137
commit 2be94f081e
1 changed files with 133 additions and 0 deletions

133
docs/eGPU.md Normal file
View File

@ -0,0 +1,133 @@
# eGPU
Tested with a Razer Core X Mercury and an AMD RX 580 4GB and RX 6700 XT 12GB.
eGPUs are mostly plug and play. Plug them in and use DRI_PRIME=1 to offload graphics to the eGPU, maybe a reboot is needed.
However, to optimize performance, especially over a Thunderbolt link there's two things you'll want to fix:
1. Run your X server directly on the eGPU to minimize the amount of data passed back and forth
2. Ensure your AMD eGPU is running at its peak speed (All AMD eGPU readers should check this part)
## X configuration
To make the X server run on the eGPU, you will need to put a little snippet in /etc/X11/xorg.conf.d telling it which GPU to use.
First you will need the PCI-E address of your eGPU. You can find this by running `sudo lspci` in a terminal (sudo not always needed).
My GPU shows up as this line:
```
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 22 (rev c5)
```
This means `06:00.0` is its PCI-E address. Be aware that lspci displays in hexadecimal, while you need decimal, so you might need to convert. For example if lspci shows `0a:00.0` that would be converted to `10:00.0`. The [Arch wiki](https://wiki.archlinux.org/title/External_GPU) is a great resource for further details on setting up eGPUs.
You will now need to create the file `/etc/X11/xorg.conf.d/11-egpu.conf` and add the following content for an AMD GPU:
```
Section "Device"
Identifier "Device0"
Driver "amdgpu"
BusID "PCI:06:00:0" # Replace with your PCI-E id
EndSection
Section "Module"
Load "modesetting"
EndSection
```
And the following for an Nvidia GPU:
```
Section "Device"
Identifier "Device0"
Driver "nvidia"
BusID "PCI:06:00:0" # Replace with your PCI-E id
Option "AllowExternalGpus" "True"
EndSection
Section "Module"
Load "modesetting"
EndSection
```
When you start X, it will run exclusively on the eGPU for max performance. X will only start with the eGPU present, to use it handheld you will need to remove or comment out this config file. Unfortunately I do not have a script for this currently, but there are plenty of others online. Alternatively do what I do, only use X when connected to the eGPU, and use Wayland otherwise. In the case of both KDE and Gnome, this can be selected at their login screen.
The eGPU can typically be used under Wayland with no extra configuration by adding `DRI_PRIME=1` before the application you're running. But I've found no way to make Wayland (Gnome/KDE) run entirely on the eGPU
## Ensuring your AMD eGPU runs at its peak speed
eGPUs might seem to work well enough out of the box (except for X reconfiguratoion). But some keen eyed users might notice that their GPU could be running at PCI-E 1.1 speeds instead of PCI-E 3.0 speeds.
The easiest way to check is by running `sudo lspci -vv` and finding your eGPU enclosure (or eGPU itself) in the huge list. Then check the `LnkSta` field to see the speed it is currently running at. For example, here is my eGPU enclosure:
```
03:01.0 PCI bridge: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] (rev 02) (prog-if 00 [Normal decode])
[Shortened]
LnkCap: Port #1, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM L1 Enabled; Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (downgraded), Width x4 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
[Shortened]
```
As you can see, `LnkSta` lists a speed of `2.5 GT/s`, followed by a "(downgraded)". This value should say 8 GT/s.
2.5 GT/s = PCI-E 1.x
5 GT/s = PCI-E 2.x
8 GT/s = PCI-E 3.x
This seems to be an [issue](https://gitlab.freedesktop.org/drm/amd/-/issues/1447) with the amdgpu driver and how it detects Thunderbolt links.
Whether or not this matters, depends on the games you play. Some games barely seem to care. Others exhibit weird or extreme performance issues, often console ports which do a lot of communication between CPU and GPU (Because consoles typically have shared memory for their CPU and GPU). AC: Valhalla and Horizon Zero Dawn are examples of games which tend to perform unbelievably bad on eGPUs, and this does not help. In my case, fixing this brought HZD from a 5fps catastrophe to an almost playable 20fps, and AC: Valhalla from a very stuttery and bad 20fps, to a far less stuttery and rather playable 30fps. Meanwhile I saw absolutely zero difference in games like Scarlet Nexus or Valheim.
To fix this, create a file named `/etc/modprobe.d/amd-pcie-fix.conf` with the following content:
```
options amdgpu pcie_gen_cap=0x40000
```
`pcie_gen_cap=0x40000` will forcibly tell the amdgpu driver that it should be capable of running at PCI-E 3.0 speeds. Other values are listed [here](https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/include/amd_pcie.h). You might have to use another one depending on your setup and what it's really capable of. For example, to force PCI-E 2.0 speeds, use `0x20000`.
You can now re-run the lspci command and check it's output:
```
03:01.0 PCI bridge: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] (rev 02) (prog-if 00 [Normal decode])
[Shortened]
LnkCap: Port #1, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM L1 Enabled; Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x4 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
[Shortened]
```
If it reports PCI-E 3.0 speeds. You are now good to go, and you should hopefully see at least a slight performance improvement in games that don't collaborate well with eGPUs.
Since this is a driver issue, it might be specific to certain GPU models, and I am unsure if similar issues can happen to Nvidia GPUs, as I only have AMD GPUs to test with.
Thank you to [raimu](https://miniwa.moe/users/raimu) for helping with finding a more permanent fix for this.
Previously I had to do a bunch of tricks and hacks on every boot, which I will list here for legacy purposes:
1. Make sure the device is off, and the eGPU unplugged.
2. Boot the device
3. On the GRUB menu, tell it to enter UEFI firmware settings, the device will reboot and enter BIOS/UEFI
4. Hit Save changes and Exit without actually changing anything.
5. Now boot normally to GDM
6. Start a Wayland session
7. Plug in eGPU
8. Display will break and go black
9. Hit Ctrl+Alt+Del to sign out "blind" from Gnome
10. When GDM reappears it should only show on the device's screen
11. Ctrl+Alt+F<something> to change to a tty
12. sudo systemctl restart gdm
13. GDM should restart and show on both displays
14. Start an Xorg session with an X config making use of the eGPU
15. The eGPU now runs at PCI-E 3.0 speeds and x4 until the next reboot
It's unclear why this works. Maybe because it makes the amdgpu driver bug out, because Gnome initilizes the GPU, or entering UEFI offloads something to hardware rather than software. Perhaps all of the above.