Screen freezes on Thinkpad E575 (AMD chipset) Topic is solved

Problems with hardware & laptops not being detected or supported during or after install.
Locked
crnz
Posts: 4
Joined: Thu Apr 29, 2021 12:13 am

Screen freezes on Thinkpad E575 (AMD chipset)

Post by crnz »

Screen freezes on Thinkpad E575 (AMD chipset)
[I first posted this in General Help but I think this area is more appropriate}

I just installed Debian 10 + LXDE on a Thinkpad E575 (AMD chipset). Shortly after starting LXDE, the screen freezes. That is, the mouse pointer still moves, but mouseclicks and keyboard have no effect, the only escape is the on/off button. This sieze happens immediately if I launch Synaptic or Mousepad. VLC, SMPlayer and Firefox gradually sieze up shortly after I start them. However if I launch (before it freezes) Gimp, LibreOffice Writer or PCManFM, they work indefinitely. So does a terminal window. And I can ssh in and the ssh session keeps working even with the screen frozen.

LXQt and Xfce show the same siezing behaviour, however the Cinnamon desktop works fine. But I'd much prefer to use LXDE if I can debug it.
I can't see anything obvious in log files but I don't really know which ones to look at or what to look for - can anyone suggest where I should start looking?
drooly
Posts: 791
Joined: Mon Apr 08, 2013 6:45 am

Re: Screen freezes on Thinkpad E575 (AMD chipset)

Post by drooly »

Have you looked at the journal?

Code: Select all

journalctl -b
for everything from this boot.

Launch each application from the command line and see if you see some relevant info.

Reproduce the problem, ssh in and provide the following output:

Code: Select all

top -n3
lspci -k | grep -iEA5 'vga|3d|display'
uname -rmv
This should be a 2017 model; you might benefit from a backported kernel. A random blog post that outlines the rpocess. I don't think you need to install Wireguard.
crnz
Posts: 4
Joined: Thu Apr 29, 2021 12:13 am

Re: Screen freezes on Thinkpad E575 (AMD chipset)

Post by crnz »

Thanks for very prompt response, and apologies for my delay in replying - real life (TM) interfered.

I did previously look at the journal (or at /var/log/syslog) but all I could see was the following:

Code: Select all

May 01 21:14:48 Tasha kernel: APIC: Switch to symmetric I/O mode setup
May 01 21:14:48 Tasha kernel: [Firmware Bug]: AMD-Vi: IOAPIC[4] not in IVRS table
May 01 21:14:48 Tasha kernel: [Firmware Bug]: AMD-Vi: IOAPIC[5] not in IVRS table
May 01 21:14:48 Tasha kernel: [Firmware Bug]: AMD-Vi: No southbridge IOAPIC found
May 01 21:14:48 Tasha kernel: AMD-Vi: Disabling interrupt remapping
May 01 21:14:48 Tasha kernel: mmc0: Unknown controller version (3). You may experience problems.
I don't know if those are relevant.
These were also on-screen messages during boot. After some googling I did try various kernel boot lines
in /etc/default/grub such as ivrs_ioapic[4]=00:14.0 but they seemd to make no difference.
Also, Cinnamon desktop works okay under Debian (and so does Linux Mint Deb Edition with Cinnamon
in a separate partition), despite getting those same boot messages.
(mmc0 appears to be related to a SD card driver, so not relevant?)

The journal file is 2100 lines so I won't post it at the moment.

The first 20 lines from top -bn3 > topfile are:

Code: Select all

top - 21:20:45 up 6 min,  2 users,  load average: 0.27, 2.19, 1.35
Tasks: 172 total,   1 running, 171 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  3.1 sy,  0.0 ni, 96.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :   7441.6 total,   6480.8 free,    546.0 used,    414.8 buff/cache
MiB Swap:  16603.0 total,  16603.0 free,      0.0 used.   6648.4 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    1 root      20   0  104240  10628   8020 S   0.0   0.1   0:02.60 systemd
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
    4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp
    6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0+
    8 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu+
    9 root      20   0       0      0      0 S   0.0   0.0   0:00.03 ksoftirqd+
   10 root      20   0       0      0      0 I   0.0   0.0   0:00.16 rcu_sched
   11 root      20   0       0      0      0 I   0.0   0.0   0:00.00 rcu_bh
   12 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration+
   14 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/0
   15 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/1
   16 root      rt   0       0      0      0 S   0.0   0.0   0:00.37 migration+
   17 root      20   0       0      0      0 S   0.0   0.0   0:00.04 ksoftirqd+
   18 root      20   0       0      0      0 I   0.0   0.0   0:00.25 kworker/1+
   19 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/1+
   20 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kdevtmpfs
From lspci are:

Code: Select all

00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Carrizo (rev cb)
	Subsystem: Lenovo Wani [Radeon R5/R6/R7 Graphics] (Radeon R5 Graphics)
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu
00:01.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Kabini HDMI/DP Audio
	Subsystem: Lenovo Kabini HDMI/DP Audio

From uname -rmv are:
4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64
When I first log in to LXDE, things work for a few seconds, as if some input buffer is filling up, before things sieze. Launching apps from command line on the E575 just cause the window to hang half-drawn sooner or later. When I say it 'hangs', sometimes after a minute a single mouseclick takes effect. As if things are getting through, just incredibly slowly. It seems to affect windows drawn by the window manager; also, when siezed, display of e.g. 'top' in a terminal window stops updating. (Even while, ssh'd in from my T20, 'top' is still working).

Apps that (I think) draw their own windows like LibreOffice or Gimp seem to be immune, so long as no window-manager windows try to open and jam the input. And the mouse pointer still moves, and changes to a cursor where it should (over any text area) even when all else is siezed.

Meanwhile ssh from my T520 (also running Debian + LXDE) worked fine, I could even view a video - rather jerkily - on the T520 screen with VLC running on the E575 (there was no audio and hence some error messages but everything else in VLC worked).
drooly
Posts: 791
Joined: Mon Apr 08, 2013 6:45 am

Re: Screen freezes on Thinkpad E575 (AMD chipset)

Post by drooly »

When you say that Cinnamon works fine - is that on the same operating system install, or a different one (dual boot)?
I assume it's the same, which makes your problem likely to be higher-level than installing a driver or newer kernel.

I specifically asked to "reproduce the kernel, ssh in" and provide some output.
Is this how you did it? 'top' output of an unencumbered system is useless.

I did not ask you to post all your journal output here, but to look at it. It is meant for humans to be read. Since we're groping in the dark atm, I cannot tell you waht to look for. Could be GPU related, could be CPU or I/O related, anything really at this point.

FWIW, here's a quick search, maybe you can find what you need there.
Installing firmware etc. - I guess it comes down to this wiki.

PS: editing /etc/default/grub will do nothing unless you also 'update-grub' afterwards, and always remember to undo what you did when it doesn't work.
crnz
Posts: 4
Joined: Thu Apr 29, 2021 12:13 am

Re: Screen freezes on Thinkpad E575 (AMD chipset)

Post by crnz »

Many thanks. The answer to most of that is 'yes'. :)

On the E575 with the problem, I've currently got Linux Mint Debian Edition 4 (which, I've read, uses Cinnamon) installed in one partition, and Debian 10 in another, dual-boot. I used the Debian 10 + LXDE non-free (with drivers) amd64 ISO to install, I've since added Cinnamon, LXQt and Xfce desktops, I can switch between them at login time. Mint (LMDE4) and Deb 10 + Cinnamon work, LXQt and Xfce sieze in almost exactly the same way as LXDE. Both Mint and Debian show the same on-screen error messages re IOAPIC on boot-up.

The 'top' output I quoted was from 'top' on the E575, ssh'd in from my T520, while the E575 screen was siezed. I directed it to a file on the E575, then copied the file to my T520 with 'scp' and that's what I posted. I think that's what you wanted? I can't see anything odd in 'top' but I'm not experienced with it.

I'll have another careful look through the journal, and post anything that looks as if it could be significant. Don't know if the IOAPIC bug is. Please excuse my inexperience with this. :)

(Yes I did the full Grub procedure (edit default/grub, update-grub etc,) for several different possibly related kernel parameters, none of which appeared to have any effect, and I did undo them afterwards. I also flashed the BIOS with the latest ISO from Lenovo, quite early in the saga.)

I've got everything installed that's mentioned in the AMD/ATI Debian wiki you linked to, except Vulkan. In particular firmware-amd-graphics and xserver-xorg-video-amdgpu.

(I could just give up and use Cinnamon, except that I much prefer LXDE or even Xfce, and it seems to me this bug may potentially affect a swag of machines beyond mine, viz E575,585,595,475,485,495, and solving it may solve the issues for LXQt and Xfce as well).

At a guess, the window-drawing and keyboard-input routines in LXDE/LXQt/Xfce must be using different interrupts from Cinnamon, Gimp and LibreOffice, if that's possible. Are there any log files that specifically keep track of I/O events?

There are a heap of lines in the journal that could be significant (but probably aren't), but here are some of them (I've edited out the timestamps and 'xxxxx kernel:' for bandwidth and readability).

Code: Select all

Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.19.0-16-amd64 root=UUID=46244445-c5c5-4d51-b2b3-d9a5fe2b26d8 ro quiet
Calgary: detecting Calgary via BIOS EBDA area					
Calgary: Unable to locate Rio Grande table in EBDA - bailing!

Code: Select all

APIC: Switch to symmetric I/O mode setup
[Firmware Bug]: AMD-Vi: IOAPIC[4] not in IVRS table
[Firmware Bug]: AMD-Vi: IOAPIC[5] not in IVRS table
[Firmware Bug]: AMD-Vi: No southbridge IOAPIC found
AMD-Vi: Disabling interrupt remapping		

Code: Select all

ACPI: 8 ACPI AML tables successfully acquired and loaded
ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored		
ACPI: EC: EC started
ACPI: EC: interrupt blocked
ACPI: \_SB_.PCI0.LPC0.EC0_: Used as first EC
ACPI: \_SB_.PCI0.LPC0.EC0_: GPE=0x3, EC_CMD/EC_SC=0x66, EC_DATA=0x62
ACPI: \_SB_.PCI0.LPC0.EC0_: Used as boot DSDT EC to handle transactions
ACPI: Interpreter enabled
ACPI: (supports S0 S3 S4 S5)
ACPI: Using IOAPIC for interrupt routing
PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000)
PCI: MMCONFIG at [mem 0xf8000000-0xfbffffff] reserved in ACPI motherboard resources
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug	

Code: Select all

acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-3f] only partially covers this bridge  
acpi PNP0A08:00: ignoring host bridge window [mem 0x000ce000-0x000cffff window] (conflicts with Video ROM [mem0x000c0000-0x000cfbff])	

Code: Select all

ACPI: \_SB_.PCI0.LPC0.EC0_: GPE=0x3, EC_CMD/EC_SC=0x66, EC_DATA=0x62
ACPI: \_SB_.PCI0.LPC0.EC0_: Used as boot DSDT EC to handle transactions and events	
pci 0000:00:01.0: vgaarb: setting as boot VGA device				
pci 0000:00:01.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none	
pci 0000:00:01.0: vgaarb: bridge control possible					
vgaarb: loaded

Code: Select all

pci 0000:00:01.1: Linked as a consumer to 0000:00:01.0
pci 0000:00:01.1: D0 power state depends on 0000:00:01.0
pci 0000:00:10.0: PME# does not work under D0, disabling it			
PCI: CLS 32 bytes, default 64
Unpacking initramfs...
Freeing initrd memory: 60160K
AMD-Vi: IOMMU performance counters supported
pci 0000:00:00.2: can't derive routing for PCI INT A				
pci 0000:00:00.2: PCI INT A: not connected				
iommu: Adding device 0000:00:01.0 to group 0
iommu: Using direct mapping for device 0000:00:01.0				
iommu: Adding device 0000:00:01.1 to group 0
iommu: Adding device 0000:00:02.0 to group 1
iommu: Adding device 0000:00:02.2 to group 1
iommu: Adding device 0000:00:02.3 to group 1
iommu: Adding device 0000:00:02.4 to group 1
iommu: Adding device 0000:00:03.0 to group 2
iommu: Adding device 0000:00:08.0 to group 3
iommu: Adding device 0000:00:09.0 to group 4
iommu: Adding device 0000:00:09.2 to group 4
iommu: Adding device 0000:00:10.0 to group 5
iommu: Adding device 0000:00:11.0 to group 6
iommu: Adding device 0000:00:12.0 to group 7
iommu: Adding device 0000:00:14.0 to group 8
iommu: Adding device 0000:00:14.3 to group 8
iommu: Adding device 0000:00:18.0 to group 9
iommu: Adding device 0000:00:18.1 to group 9
iommu: Adding device 0000:00:18.2 to group 9
iommu: Adding device 0000:00:18.3 to group 9
iommu: Adding device 0000:00:18.4 to group 9
iommu: Adding device 0000:00:18.5 to group 9
iommu: Adding device 0000:01:00.0 to group 1
iommu: Adding device 0000:03:00.0 to group 1
iommu: Adding device 0000:04:00.0 to group 1
AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40						
AMD-Vi: Extended features (0x37ef22294ada):
PPR NX GT IA GA PC GA_vAPIC
AMD-Vi: Lazy IO/TLB flushing enabled
amd_uncore: AMD NB counters detected
LVT offset 0 assigned for vector 0x400
perf: AMD IBS detected (0x000007ff)
perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).		

Code: Select all

Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
Linux agpgart interface v0.103
AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>				

Code: Select all

irq 7: nobody cared (try booting with the "irqpoll" option)				
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.19.0-16-amd64 #1 Debian 4.19.181-1
Hardware name: LENOVO 20H8CTO1WW/20H8CTO1WW, BIOS R0EET51W (1.25 ) 09/23/2020		

Code: Select all

handlers:					
[<00000000ef1334f6>] amd_gpio_irq_handler
Disabling IRQ #7

Code: Select all

ACPI Warning: SystemIO range 0x0000000000000B00-0x0000000000000B08 conflicts with OpRegion 0x0000000000000B00-0x0000000000000B0F (\_SB.PCI0.SMB.SMB0) (20180810/utaddress-213)	
ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver

Code: Select all

mmc0: Unknown controller version (3). You may experience problems.	

Code: Select all

[drm] amdgpu kernel modesetting enabled.

Code: Select all

[drm] initializing kernel modesetting (CARRIZO 0x1002:0x9874 0x17AA:0x5118 0xCB).

Code: Select all

[drm] BIOS signature incorrect 0 0						
resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000c1fff window]							

Code: Select all

amdgpu 0000:00:01.0: firmware: direct-loading firmware amdgpu/carrizo_vce.bin		
[drm] Found VCE firmware Version: 52.4 Binary ID: 3

Code: Select all

psmouse serio1: synaptics: Your touchpad (PNP: LEN2043 PNP0f13) says it can support a different bus. If i2c-hid and hid-rmi are not used, you might want to try setting psmouse.synaptics_intertouch to 1 and report this to linux-input@vger.kernel.org.	

Code: Select all

psmouse serio2: trackpoint: failed to get extended button data, assuming 3 buttons			
psmouse serio2: trackpoint: IBM TrackPoint firmware: 0x0e, buttons: 3/3			
drooly
Posts: 791
Joined: Mon Apr 08, 2013 6:45 am

Re: Screen freezes on Thinkpad E575 (AMD chipset)  Topic is solved

Post by drooly »

crnz
Posts: 4
Joined: Thu Apr 29, 2021 12:13 am

[SOLVED] Re: Screen freezes on Thinkpad E575 (AMD chipset)

Post by crnz »

TL:DR version - setting the parameter TearFree to On (or True) in the screen config appears to fix it.

(Thanks drooly for the ongoing advice, I've been using Debian for years but I'm new to this sort of thing).

LXDE, Xfce, LXQt, IceWM and Openbox all freeze up after a few windows have been drawn - the mouse pointer moves but keyboard input, mouseclicks and window drawing virtually stop. (but can operate indefinitely inside a text window or Gimp, LibreOffice etc).
MATE skews the screen but keyboard/mouse input continue working (if you can figure where to click).
Gnome, Enlightenment and Cinnamon desktops were unaffected by the problem.

The 'TearFree' mod fixes MATE, LXDE and Xfce (I haven't tried the others).

To test it for the current session, immediately on logging in to LXDE (or MATE or Xfce), before the screen misbehaves, open a terminal window. Use xrandr (no parameters) to show your screen name (mine is eDP).
Then issue the following (substitute your screen identifier for eDP, of course):

Code: Select all

  
xrandr --output eDP --set TearFree on
(Source: wiki.debian.org/AtiHowTo)

To make it permanent, set it in the Xorg config files. X gets its config from a confusing number of cources, but /etc/X11/xorg.conf worked for me.
(From https://fedoraproject.org/wiki/How_to_create_xorg.conf ):
Reboot, from the login screen hit Ctrl-Alt-F1 or Ctrl-Alt-Fn-F1 to get a text console.
As root, do:

Code: Select all

 Xorg :1 -configure  
This will create an outline file /root/xorg.conf.new, which you can copy as /etc/X11/xorg.conf
and then edit it as necessary. In this particular case, I found the entry

Code: Select all

Section "Device"
  #Option	"Accel"	# [<bool>]
  ...
  ...
  #Option	"TearFree"	# [<bool>]
  Identifier	"Card0"
  Driver	"amdgpu"
  BusID 	"PCI:0:1:0"
EndSection
and edited the relevant line to read:

Code: Select all

  Option "TearFree" "True"  
Then reboot again.

What TearFree does is this: (https://manpages.debian.org/buster/xser ... .4.en.html)
Option "TearFree" "boolean"
Set the default value of the per-output 'TearFree' property, which controls tearing prevention using the hardware page flipping mechanism. TearFree is on for any CRTC associated with one or more outputs with TearFree on. Two separate scanout buffers need to be allocated for each CRTC with TearFree on. If this option is set, the default value of the property is 'on' or 'off' accordingly. If this option isn't set, the default value of the property is auto, which means that TearFree is on for rotated outputs, outputs with RandR transforms applied and for RandR 1.4 slave outputs, otherwise off.

THERE MAY BE BETTER SOLUTIONS TO THE PROBLEM. I haven't found one yet.
drooly
Posts: 791
Joined: Mon Apr 08, 2013 6:45 am

Re: Screen freezes on Thinkpad E575 (AMD chipset)

Post by drooly »

Great you solved it!

Just one remark: it's usually recommended to put individual config files in /etc/X11/xorg.conf.d.
So what you'd do is to
  • remove /etc/X11/xorg.conf, because Xorg really should be able to figure out its own configuration
  • create a new file /etc/X11/xorg.conf.d/something.conf
  • put only this block in it:

    Code: Select all

    Section "Device"
        Option    "TearFree"  # [<bool>]
        Identifier    "Card0"
        Driver    "amdgpu"
        BusID     "PCI:0:1:0"
    EndSection
Locked