mdadm raid soft lock-ups ubuntu kernel 4.13.0-36

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

mdadm raid soft lock-ups ubuntu kernel 4.13.0-36

Adam Hamsik
Hi,

we're running Ubuntu 16.04.4, mdadm - v3.3 and Kernel 4.13.0-36.
We have created raid10 using 22 960GB SSDs [1] . The problem we're
experiencing is that /usr/share/mdadm/checkarray
(executed by cron, included in a mdadm pkg) results in (soft?)
deadlock - load on the node spikes up to 500-700 and all I/O operations
are blocked for a period of time. We can see traces liek these [2] in
our kernel log.

e.g. it ends up in static state like

test@os-node1:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid10 dm-23[9] dm-22[8] dm-21[7] dm-20[6] dm-18[4] dm-19[5] dm-17[3]
                    dm-16[21] dm-15[20] dm-14[2] dm-13[19] dm-12[18] dm-11[17]
                    dm-10[16] dm-9[15] dm-8[14] dm-7[13] dm-6[12] dm-5[11] dm-4[10] dm-3[1] dm-2[0]
      10313171968 blocks super 1.2 512K chunks 2 near-copies [22/22] [UUUUUUUUUUUUUUUUUUUUUU]
      [===>.................]  check = 19.0% (1965748032/10313171968) finish=1034728.8min speed=134K/sec
      bitmap: 0/39 pages [0KB], 131072KB chunk
unused devices: <none>

and the only solution is to hard reboot the node. What we found out is that it
doesn't happen on idle raid, we have to generate some significant load
(10 VMs running fio[3] with 500GB HDDs.) to be able to reproduce the issue.

Anyone ever experienced similar issues? Do you have any suggestions how to
better trouble shoot this issue and maybe identify if disks or software layer
is responsible for this behaviour

[1] http://www.samsung.com/us/dell/pdfs/PM1633a_Flyer_2016_v4.pdf
[2] https://gist.github.com/haad/09213bab1bc30a00c7d255c0bc60897b
[3] https://github.com/axboe/fio





Regards
Adam.

Adam Hamsik
00421 904 937 495
[hidden email]
[hidden email]

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: mdadm raid soft lock-ups ubuntu kernel 4.13.0-36

Kleber Sacilotto de Souza
On 06/08/18 06:20, Adam Hamsik wrote:

> Hi,
>
> we're running Ubuntu 16.04.4, mdadm - v3.3 and Kernel 4.13.0-36.
> We have created raid10 using 22 960GB SSDs [1] . The problem we're
> experiencing is that /usr/share/mdadm/checkarray
> (executed by cron, included in a mdadm pkg) results in (soft?)
> deadlock - load on the node spikes up to 500-700 and all I/O operations
> are blocked for a period of time. We can see traces liek these [2] in
> our kernel log.
>
> e.g. it ends up in static state like
>
> test@os-node1:~$ cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md1 : active raid10 dm-23[9] dm-22[8] dm-21[7] dm-20[6] dm-18[4]
> dm-19[5] dm-17[3]
>                     dm-16[21] dm-15[20] dm-14[2] dm-13[19] dm-12[18]
> dm-11[17]
>                     dm-10[16] dm-9[15] dm-8[14] dm-7[13] dm-6[12]
> dm-5[11] dm-4[10] dm-3[1] dm-2[0]
>       10313171968 blocks super 1.2 512K chunks 2 near-copies [22/22]
> [UUUUUUUUUUUUUUUUUUUUUU]
>       [===>.................]  check = 19.0% (1965748032/10313171968)
> finish=1034728.8min speed=134K/sec
>       bitmap: 0/39 pages [0KB], 131072KB chunk
> unused devices: <none>
>
> and the only solution is to hard reboot the node. What we found out is
> that it
> doesn't happen on idle raid, we have to generate some significant load
> (10 VMs running fio[3] with 500GB HDDs.) to be able to reproduce the issue.
>
> Anyone ever experienced similar issues? Do you have any suggestions how to
> better trouble shoot this issue and maybe identify if disks or software
> layer
> is responsible for this behaviour
>
> [1] http://www.samsung.com/us/dell/pdfs/PM1633a_Flyer_2016_v4.pdf
> [2] https://gist.github.com/haad/09213bab1bc30a00c7d255c0bc60897b
> [3] https://github.com/axboe/fio
>
>
>
>
>
> Regards
> Adam.
>
> Adam Hamsik
> 00421 904 937 495
> [hidden email] <mailto:[hidden email]>
> [hidden email] <mailto:[hidden email]>
>
>
Hi Adam,

Thank you for reporting the problem. That seems to be something to be
investigated, however, we generally use this mailing-list for patch
submission and some other communications. Could you please open a bug
report on Launchpad against the linux package [1]? Once that's done
someone from our team will triage the bug and the investigation and
discussion can continue from there.


[1] https://bugs.launchpad.net/ubuntu/+source/linux


Thank you,
Kleber

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team