[Eisfair] Testaufruf bei Instabilität des eiskernels 2.18.0
Thomas Bork
tom at eisfair.org
Mo Feb 15 20:20:55 CET 2016
Hi @all,
da das anscheinend im Thread "Absturz, Log-Datei vor dem Absturz" zu
versteckt ist:
Es gibt User, die mit dem Kernel 2.18.0 Probleme haben. Diese Probleme
äussern sich in Meldungen in /var/log/messages wie
[ 0.000000] INFO: rcu_bh detected stall on CPU 1 (t=0 jiffies)
[ 0.000000] Pid: 0, comm: swapper/1 Not tainted
3.2.71-eisfair-1-SMP #1
[ 0.000000] Call Trace:
[ 0.000000] [<c1057f52>] __rcu_pending+0x64/0x28f
[ 0.000000] [<c10585c9>] rcu_check_callbacks+0x87/0x98
[ 0.000000] [<c1031771>] update_process_times+0x2d/0x58
[ 0.000000] [<c1047207>] tick_sched_timer+0x13f/0x166
[ 0.000000] [<c103e194>] __run_hrtimer.isra.27+0x3d/0x91
[ 0.000000] [<c103e801>] hrtimer_interrupt+0xe2/0x1cb
[ 0.000000] [<c1014e67>] smp_apic_timer_interrupt+0x67/0x7a
[ 0.000000] [<c12fa3fa>] apic_timer_interrupt+0x2a/0x30
[ 0.000000] [<c10400d8>] ? __lowest_in_progress+0x34/0x53
[ 0.000000] [<c11ebfa4>] ? acpi_idle_enter_simple+0x102/0x13b
[ 0.000000] [<c1269431>] cpuidle_idle_call+0x5a/0xa5
[ 0.000000] [<c100159a>] cpu_idle+0x3d/0x5c
[ 0.000000] [<c12f0b2e>] start_secondary+0x190/0x195
- siehe Thread "Fehlermeldung in dmesg"
oder
Feb 8 16:33:56 server kernel: INFO: rcu_sched detected stall on CPU 3
(t=150093 jiffies)
Feb 8 16:33:56 server kernel: Pid: 2544, comm: smbd Tainted: P
O 3.2.75-eisfair-1-SMP #1
Feb 8 16:33:56 server kernel: Call Trace:
Feb 8 16:33:56 server kernel: [__rcu_pending+0x64/0x28f]
__rcu_pending+0x64/0x28f
Feb 8 16:33:56 server kernel: [rcu_check_callbacks+0x6d/0x98]
rcu_check_callbacks+0x6d/0x98
Feb 8 16:33:56 server kernel: [update_process_times+0x2d/0x58]
update_process_times+0x2d/0x58
Feb 8 16:33:56 server kernel: [tick_sched_timer+0x13f/0x166]
tick_sched_timer+0x13f/0x166
Feb 8 16:33:56 server kernel: [__run_hrtimer.isra.27+0x3d/0x91]
__run_hrtimer.isra.27+0x3d/0x91
Feb 8 16:33:56 server kernel: [hrtimer_interrupt+0xe2/0x1cb]
hrtimer_interrupt+0xe2/0x1cb
Feb 8 16:33:56 server kernel: [smp_apic_timer_interrupt+0x67/0x7a]
smp_apic_timer_interrupt+0x67/0x7a
Feb 8 16:33:56 server kernel: [apic_timer_interrupt+0x2a/0x30]
apic_timer_interrupt+0x2a/0x30
Feb 8 16:33:56 server kernel: [any_slab_objects+0x15/0x1b] ?
any_slab_objects+0x15/0x1b
Feb 8 16:33:56 server kernel: [_raw_spin_lock+0x10/0x1c] ?
_raw_spin_lock+0x10/0x1c
Feb 8 16:33:56 server kernel: [unix_state_double_lock+0x3d/0x41]
unix_state_double_lock+0x3d/0x41
Feb 8 16:33:56 server kernel: [unix_dgram_connect+0x83/0x153]
unix_dgram_connect+0x83/0x153
Feb 8 16:33:56 server kernel: [sys_connect+0x63/0x88]
sys_connect+0x63/0x88
Feb 8 16:33:56 server kernel: [sys_socketcall+0x76/0x192]
sys_socketcall+0x76/0x192
Feb 8 16:33:57 server kernel: [syscall_after_call+0x0/0x04]
syscall_call+0x7/0x7
Feb 8 16:33:57 server kernel: [mcheck_cpu_init+0x137/0x2d2] ?
mcheck_cpu_init+0x137/0x2d2
siehe Thread "E1 friert ein wg. Speicherleck"
oder
Feb 12 13:27:36 myeis kernel: INFO: rcu_sched detected stall on CPU 1
(t=15000 jiffies)
Feb 12 13:27:36 myeis kernel: Pid: 3652, comm: smbd Tainted: G O
3.2.75-eisfair-1-VIRT #1
Feb 12 13:27:36 myeis kernel: Call Trace:
Feb 12 13:27:36 myeis kernel: [__rcu_pending+0x64/0x28f]
__rcu_pending+0x64/0x28f
Feb 12 13:27:36 myeis kernel: [account_process_tick+0x104/0x15a] ?
account_process_tick+0x104/0x15a
Feb 12 13:27:36 myeis kernel: [rcu_check_callbacks+0x6d/0x98]
rcu_check_callbacks+0x6d/0x98
Feb 12 13:27:36 myeis kernel: [update_process_times+0x2d/0x58]
update_process_times+0x2d/0x58
Feb 12 13:27:36 myeis kernel: [tick_sched_timer+0x0/0x16b] ?
tick_init_highres+0x11/0x11
Feb 12 13:27:36 myeis kernel: [tick_sched_timer+0x144/0x16b]
tick_sched_timer+0x144/0x16b
Feb 12 13:27:36 myeis kernel: [tick_sched_timer+0x0/0x16b] ?
tick_init_highres+0x11/0x11
Feb 12 13:27:36 myeis kernel: [__run_hrtimer.isra.27+0x4d/0x9c]
__run_hrtimer.isra.27+0x4d/0x9c
Feb 12 13:27:36 myeis kernel: [hrtimer_interrupt+0xe2/0x1dd]
hrtimer_interrupt+0xe2/0x1dd
Feb 12 13:27:36 myeis kernel: [smp_apic_timer_interrupt+0x67/0x7a]
smp_apic_timer_interrupt+0x67/0x7a
Feb 12 13:27:36 myeis kernel: [apic_timer_interrupt+0x2a/0x30]
apic_timer_interrupt+0x2a/0x30
Feb 12 13:27:36 myeis kernel: [link_path_walk+0xfb/0x61b] ?
link_path_walk+0xfb/0x61b
Feb 12 13:27:36 myeis kernel: [try_to_merge_with_ksm_page+0x2d0/0x451]
? try_to_merge_with_ksm_page+0x2d0/0x451
Feb 12 13:27:36 myeis kernel: [__ticket_spin_lock+0x16/0x1c] ?
__ticket_spin_lock+0x16/0x1c
Feb 12 13:27:36 myeis kernel: [_raw_spin_lock+0x8/0x0b]
_raw_spin_lock+0x8/0xb
Feb 12 13:27:36 myeis kernel: [unix_state_double_lock+0x3d/0x41]
unix_state_double_lock+0x3d/0x41
Feb 12 13:27:36 myeis kernel: [unix_dgram_connect+0x83/0x153]
unix_dgram_connect+0x83/0x153
Feb 12 13:27:36 myeis kernel: [sys_connect+0x63/0x88] sys_connect+0x63/0x88
Feb 12 13:27:36 myeis kernel: [sys_socketcall+0x76/0x192]
sys_socketcall+0x76/0x192
Feb 12 13:27:36 myeis kernel: [syscall_after_call+0x0/0x04]
syscall_call+0x7/0x7
Feb 12 13:27:36 myeis kernel: [get_cpu_leaves+0x1dd/0x28a] ?
get_cpu_leaves+0x1dd/0x28a
im Thread "Absturz, Log-Datei vor dem Absturz".
Um einzugrenzen, ob das Problem in einem bestimmten Patch begründet
liegt, der in den Longterm-Kernel 3.2.y und damit bei uns eingeflossen
ist, habe ich diesen Patch für einen Testkernel für die User mit obigen
Problemen entfernt.
Ich bitte hiermit alle User mit obigem Problem darum, diesen Kernel zu
installieren, um einzugrenzen, ob der entfernte Patch das Problem löst.
Unter
http://download.eisfair.org/tombork/test/crash/
liegen die entsprechenden Versionen. In ein leeres Verzeichnis kopieren
und mit
/var/install/bin/install-local-package Verzeichnis
installieren.
Diese Kernel-Pakete räumen 3.2.71 nicht ab - kernel-dev räumt aber
/usr/src/linux-3.2.71-eisfair-1 weiterhin ab. Um einzugrenzen, ob das
Problem mit diesem Kernel noch existiert, ist es nicht nötig, kernel-dev
zu installieren. kernel-dev ist nur dabei, damit man sich den kompletten
Patch ansehen kann, der bei eisfair angewendet wird und der sich nun vom
Patch im normalen eiskernel 2.18.0 unterscheidet.
Die Versionsnummern haben sich nicht geändert, 2.18.0 ist also geblieben.
Zur Information eine Antwort von Ben Hutchings, dem Maintainer des
longterm-3.2.y:
#####################################################
On Sun, 2016-02-14 at 11:51 +0100, Thomas Bork wrote:
> Am 10.02.2016 um 12:18 schrieb Karolin Seeger:
>
>> this is a heads-up that we have seen some system crashes after updating
>> to Ubuntu LTS kernel 3.13.0-77 on systems running Samba.
>>
>> It looks like a kernel bug triggered by Samba calls.
>> A bug report has been created [1].
>>
>> Downgrading to kernel 3.13.0-76 solves the problem.
>>
>> [1]
https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1543980
>
> I want to let you know that some of our samba users have a similar
> problem after switching the kernel from 3.2.74 to 3.2.76:
[...]
> This stalls later on seams to leading to a memory leak till the
> oom-killer kills processes and the machines crashes.
>
> Downgrading to kernel 3.2.74 solves the problem.
>
> After reading
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1543980
> https://forge.univention.org/bugzilla/show_bug.cgi?id=40558
> https://patchwork.ozlabs.org/patch/582017/
>
> I think the patch
>
>>
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/net/unix/af_unix.c?id=a3b0f6e8a21ef02f69a15abac440572d8cde8c2a
>
> in 3.2.75 is the problem.
I think it's fixed by this kernel patch:
http://mid.gmane.org/87r3gj11jc.fsf_-_@doppelsaurus.mobileactivedefense.com
Assuming it's applied upstream, it will get into stable updates in due
course. I've also queued this up for inclusion in Debian security
updates.
Ben.
#####################################################
--
der tom
[eisfair-team]
Mehr Informationen über die Mailingliste Eisfair