[SRU B/C] [PATCH 0/4] Line discipline buffer flush/tty_reopen() race fix

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[SRU B/C] [PATCH 0/4] Line discipline buffer flush/tty_reopen() race fix

Guilherme Piccoli
BugLink: https://bugs.launchpad.net/bugs/1791758

** NOTICE: Patch 1 in this SRU request should be applied only to 4.15,
since it is already present in 4.18.


[Impact]

* Line discipline code is racy when we have buffer being flush while the
tty is being initialized or reinitialized. For the first problem, we have
an upstream patch since January 2018:
b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf");
although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent
ones.

* For the race between the buffer flush while tty is being reopened, we
have a patch that addresses this issue recently merged for 5.0-rc1:
83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()").
No Ubuntu kernel currently contains this patch, hence we're hereby
submitting the SRU request. The upstream complete patch series for
this is in [0].

* The approach of both patches are similar - they rely in locking/semaphore
to prevent race conditions. Some additional patches are necessary to prevent
correlated issues, like preventing a potential deadlock due to bad
prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to
c96cf923a98d ("tty: Don't block on IO when ldisc change is pending").
All the necessary fixes are grouped here in this SRU request.

* The symptom of the race condition between the buffer flush and the
tty reopen routine is a kernel crash with the following trace:


BUG: unable to handle kernel paging request at 0000000000002268
IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
[...]
Call Trace:
[<addr>] ? kvm_sched_clock_read+0x1e/0x30
[<addr>] n_tty_receive_buf2+0x14/0x20
[<addr>] flush_to_ldisc+0xd5/0x120
[<addr>] process_one_work+0x156/0x400
[<addr>] worker_thread+0x11a/0x480
[...]


* A kernel crash was collected from an user, analysis is present in
comment #4 in LP #1791758.


[Test Case]

* It is not trivial to trigger this fault, but the usual recipe is to keep
accessing a machine through SSH (or keep killing getty when in IPMI serial
console) and in some way run commands before the terminal is ready in that
machine (like hacking some echo into ttySx or pts in an infinite loop).

* We have reports of users that could reproduce this issue in their
production environment, and with the patches present in this SRU request
the problem was fixed.


[Regression Potential]

* tty subsystem is highly central and patches in that area are always
delicate. For example, the upstream series [0] is a re-spin (V6) due to
a hard to reproduce issue reported in the PA-RISC architecture, which was
found in the V5 iteration [1] but was fixed by the patch c96cf923a98d,
present in this SRU request.

* The patchset [0] is present in tty-next tree since mid-November, so
the overall likelihood of regressions is low.

* These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18)
and didn't show any issues.


[0] https://marc.info/?l=linux-kernel&m=154103190111795
[1] https://marc.info/?l=linux-kernel&m=153737852618183


Dmitry Safonov (4):
  tty: Drop tty->count on tty_reopen() failure
  tty: Hold tty_ldisc_lock() during tty_reopen()
  tty: Don't block on IO when ldisc change is pending
  tty: Simplify tty->count math in tty_reopen()

 drivers/tty/n_hdlc.c    |  4 ++--
 drivers/tty/n_r3964.c   |  2 +-
 drivers/tty/n_tty.c     |  8 ++++----
 drivers/tty/tty_io.c    | 13 ++++++++++---
 drivers/tty/tty_ldisc.c |  7 +++++++
 include/linux/tty.h     |  7 +++++++
 6 files changed, 31 insertions(+), 10 deletions(-)

--
2.19.2


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU B] [PATCH 1/4] tty: Drop tty->count on tty_reopen() failure

Guilherme Piccoli
From: Dmitry Safonov <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1791758

In case of tty_ldisc_reinit() failure, tty->count should be decremented
back, otherwise we will never release_tty().
Tetsuo reported that it fixes noisy warnings on tty release like:
  pts pts4033: tty_release: tty->count(10529) != (#fd's(7) + #kopen's(0))

Fixes: commit 892d1fa7eaae ("tty: Destroy ldisc instance on hangup")

Cc: [hidden email] # v4.6+
Cc: Greg Kroah-Hartman <[hidden email]>
Cc: Jiri Slaby <[hidden email]>
Reviewed-by: Jiri Slaby <[hidden email]>
Tested-by: Jiri Slaby <[hidden email]>
Tested-by: Mark Rutland <[hidden email]>
Tested-by: Tetsuo Handa <[hidden email]>
Signed-off-by: Dmitry Safonov <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>
(cherry-picked from fe32416790093b31364c08395727de17ec96ace1 upstream)
Signed-off-by: Guilherme G. Piccoli <[hidden email]>
---
 drivers/tty/tty_io.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 495686cd0086..67e2ea4e7f1d 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1255,6 +1255,7 @@ static void tty_driver_remove_tty(struct tty_driver *driver, struct tty_struct *
 static int tty_reopen(struct tty_struct *tty)
 {
  struct tty_driver *driver = tty->driver;
+ int retval;
 
  if (driver->type == TTY_DRIVER_TYPE_PTY &&
     driver->subtype == PTY_TYPE_MASTER)
@@ -1268,10 +1269,14 @@ static int tty_reopen(struct tty_struct *tty)
 
  tty->count++;
 
- if (!tty->ldisc)
- return tty_ldisc_reinit(tty, tty->termios.c_line);
+ if (tty->ldisc)
+ return 0;
 
- return 0;
+ retval = tty_ldisc_reinit(tty, tty->termios.c_line);
+ if (retval)
+ tty->count--;
+
+ return retval;
 }
 
 /**
--
2.19.2


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU B/C] [PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen()

Guilherme Piccoli
In reply to this post by Guilherme Piccoli
From: Dmitry Safonov <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1791758

tty_ldisc_reinit() doesn't race with neither tty_ldisc_hangup()
nor set_ldisc() nor tty_ldisc_release() as they use tty lock.
But it races with anyone who expects line discipline to be the same
after hoding read semaphore in tty_ldisc_ref().

We've seen the following crash on v4.9.108 stable:

BUG: unable to handle kernel paging request at 0000000000002260
IP: [..] n_tty_receive_buf_common+0x5f/0x86d
Workqueue: events_unbound flush_to_ldisc
Call Trace:
 [..] n_tty_receive_buf2
 [..] tty_ldisc_receive_buf
 [..] flush_to_ldisc
 [..] process_one_work
 [..] worker_thread
 [..] kthread
 [..] ret_from_fork

tty_ldisc_reinit() should be called with ldisc_sem hold for writing,
which will protect any reader against line discipline changes.

Cc: Jiri Slaby <[hidden email]>
Cc: [hidden email] # b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf")
Reviewed-by: Jiri Slaby <[hidden email]>
Reported-by: [hidden email]
Tested-by: Mark Rutland <[hidden email]>
Tested-by: Tetsuo Handa <[hidden email]>
Signed-off-by: Dmitry Safonov <[hidden email]>
Tested-by: Tycho Andersen <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>
(cherry-picked from 83d817f41070c48bc3eb7ec18e43000a548fca5c upstream)
Signed-off-by: Guilherme G. Piccoli <[hidden email]>
---
 drivers/tty/tty_io.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 67e2ea4e7f1d..83e57a328f2b 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1267,15 +1267,20 @@ static int tty_reopen(struct tty_struct *tty)
  if (test_bit(TTY_EXCLUSIVE, &tty->flags) && !capable(CAP_SYS_ADMIN))
  return -EBUSY;
 
- tty->count++;
+ retval = tty_ldisc_lock(tty, 5 * HZ);
+ if (retval)
+ return retval;
 
+ tty->count++;
  if (tty->ldisc)
- return 0;
+ goto out_unlock;
 
  retval = tty_ldisc_reinit(tty, tty->termios.c_line);
  if (retval)
  tty->count--;
 
+out_unlock:
+ tty_ldisc_unlock(tty);
  return retval;
 }
 
--
2.19.2


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU B/C] [PATCH 3/4] tty: Don't block on IO when ldisc change is pending

Guilherme Piccoli
In reply to this post by Guilherme Piccoli
From: Dmitry Safonov <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1791758

There might be situations where tty_ldisc_lock() has blocked, but there
is already IO on tty and it prevents line discipline changes.
It might theoretically turn into dead-lock.

Basically, provide more priority to pending tty_ldisc_lock() than to
servicing reads/writes over tty.

User-visible issue was reported by Mikulas where on pa-risc with
Debian 5 reboot took either 80 seconds, 3 minutes or 3:25 after proper
locking in tty_reopen().

Cc: Jiri Slaby <[hidden email]>
Reported-by: Mikulas Patocka <[hidden email]>
Signed-off-by: Dmitry Safonov <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>
(cherry-picked from c96cf923a98d1b094df9f0cf97a83e118817e31b upstream)
Signed-off-by: Guilherme G. Piccoli <[hidden email]>
---
 drivers/tty/n_hdlc.c    | 4 ++--
 drivers/tty/n_r3964.c   | 2 +-
 drivers/tty/n_tty.c     | 8 ++++----
 drivers/tty/tty_ldisc.c | 7 +++++++
 include/linux/tty.h     | 7 +++++++
 5 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/tty/n_hdlc.c b/drivers/tty/n_hdlc.c
index eea7b6cb3cc4..33ee831bbd4e 100644
--- a/drivers/tty/n_hdlc.c
+++ b/drivers/tty/n_hdlc.c
@@ -612,7 +612,7 @@ static ssize_t n_hdlc_tty_read(struct tty_struct *tty, struct file *file,
  }
 
  /* no data */
- if (file->f_flags & O_NONBLOCK) {
+ if (tty_io_nonblock(tty, file)) {
  ret = -EAGAIN;
  break;
  }
@@ -679,7 +679,7 @@ static ssize_t n_hdlc_tty_write(struct tty_struct *tty, struct file *file,
  if (tbuf)
  break;
 
- if (file->f_flags & O_NONBLOCK) {
+ if (tty_io_nonblock(tty, file)) {
  error = -EAGAIN;
  break;
  }
diff --git a/drivers/tty/n_r3964.c b/drivers/tty/n_r3964.c
index 30bb0900cd2f..30ba5491f32e 100644
--- a/drivers/tty/n_r3964.c
+++ b/drivers/tty/n_r3964.c
@@ -1078,7 +1078,7 @@ static ssize_t r3964_read(struct tty_struct *tty, struct file *file,
  pMsg = remove_msg(pInfo, pClient);
  if (pMsg == NULL) {
  /* no messages available. */
- if (file->f_flags & O_NONBLOCK) {
+ if (tty_io_nonblock(tty, file)) {
  ret = -EAGAIN;
  goto unlock;
  }
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 63c593a74380..31067cfd677e 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1671,7 +1671,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
 
  down_read(&tty->termios_rwsem);
 
- while (1) {
+ do {
  /*
  * When PARMRK is set, each input char may take up to 3 chars
  * in the read buf; reduce the buffer space avail by 3x
@@ -1713,7 +1713,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp,
  fp += n;
  count -= n;
  rcvd += n;
- }
+ } while (!test_bit(TTY_LDISC_CHANGING, &tty->flags));
 
  tty->receive_room = room;
 
@@ -2188,7 +2188,7 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
  break;
  if (!timeout)
  break;
- if (file->f_flags & O_NONBLOCK) {
+ if (tty_io_nonblock(tty, file)) {
  retval = -EAGAIN;
  break;
  }
@@ -2342,7 +2342,7 @@ static ssize_t n_tty_write(struct tty_struct *tty, struct file *file,
  }
  if (!nr)
  break;
- if (file->f_flags & O_NONBLOCK) {
+ if (tty_io_nonblock(tty, file)) {
  retval = -EAGAIN;
  break;
  }
diff --git a/drivers/tty/tty_ldisc.c b/drivers/tty/tty_ldisc.c
index 0fd18f5fa67d..48fe8d9b6df9 100644
--- a/drivers/tty/tty_ldisc.c
+++ b/drivers/tty/tty_ldisc.c
@@ -340,6 +340,11 @@ int tty_ldisc_lock(struct tty_struct *tty, unsigned long timeout)
 {
  int ret;
 
+ /* Kindly asking blocked readers to release the read side */
+ set_bit(TTY_LDISC_CHANGING, &tty->flags);
+ wake_up_interruptible_all(&tty->read_wait);
+ wake_up_interruptible_all(&tty->write_wait);
+
  ret = __tty_ldisc_lock(tty, timeout);
  if (!ret)
  return -EBUSY;
@@ -350,6 +355,8 @@ int tty_ldisc_lock(struct tty_struct *tty, unsigned long timeout)
 void tty_ldisc_unlock(struct tty_struct *tty)
 {
  clear_bit(TTY_LDISC_HALTED, &tty->flags);
+ /* Can be cleared here - ldisc_unlock will wake up writers firstly */
+ clear_bit(TTY_LDISC_CHANGING, &tty->flags);
  __tty_ldisc_unlock(tty);
 }
 
diff --git a/include/linux/tty.h b/include/linux/tty.h
index 1dd587ba6d88..0cd621d8c7f0 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -365,6 +365,7 @@ struct tty_file_private {
 #define TTY_NO_WRITE_SPLIT 17 /* Preserve write boundaries to driver */
 #define TTY_HUPPED 18 /* Post driver->hangup() */
 #define TTY_HUPPING 19 /* Hangup in progress */
+#define TTY_LDISC_CHANGING 20 /* Change pending - non-block IO */
 #define TTY_LDISC_HALTED 22 /* Line discipline is halted */
 
 /* Values for tty->flow_change */
@@ -382,6 +383,12 @@ static inline void tty_set_flow_change(struct tty_struct *tty, int val)
  smp_mb();
 }
 
+static inline bool tty_io_nonblock(struct tty_struct *tty, struct file *file)
+{
+ return file->f_flags & O_NONBLOCK ||
+ test_bit(TTY_LDISC_CHANGING, &tty->flags);
+}
+
 static inline bool tty_io_error(struct tty_struct *tty)
 {
  return test_bit(TTY_IO_ERROR, &tty->flags);
--
2.19.2


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

[SRU B/C] [PATCH 4/4] tty: Simplify tty->count math in tty_reopen()

Guilherme Piccoli
In reply to this post by Guilherme Piccoli
From: Dmitry Safonov <[hidden email]>

BugLink: https://bugs.launchpad.net/bugs/1791758

As notted by Jiri, tty_ldisc_reinit() shouldn't rely on tty counter.
Simplify math by increasing the counter after reinit success.

Cc: Jiri Slaby <[hidden email]>
Link: lkml.kernel.org/r/<[hidden email]>
Suggested-by: Jiri Slaby <[hidden email]>
Reviewed-by: Jiri Slaby <[hidden email]>
Tested-by: Mark Rutland <[hidden email]>
Signed-off-by: Dmitry Safonov <[hidden email]>
Signed-off-by: Greg Kroah-Hartman <[hidden email]>
(cherry-picked from cf62a1a13749db0d32b5cdd800ea91a4087319de upstream)
Signed-off-by: Guilherme G. Piccoli <[hidden email]>
---
 drivers/tty/tty_io.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 83e57a328f2b..b273aeadbb51 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1271,16 +1271,13 @@ static int tty_reopen(struct tty_struct *tty)
  if (retval)
  return retval;
 
- tty->count++;
- if (tty->ldisc)
- goto out_unlock;
-
- retval = tty_ldisc_reinit(tty, tty->termios.c_line);
- if (retval)
- tty->count--;
-
-out_unlock:
+ if (!tty->ldisc)
+ retval = tty_ldisc_reinit(tty, tty->termios.c_line);
  tty_ldisc_unlock(tty);
+
+ if (retval == 0)
+ tty->count++;
+
  return retval;
 }
 
--
2.19.2


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK/cmnt: [SRU B/C] [PATCH 0/4] Line discipline buffer flush/tty_reopen() race fix

Kleber Souza
In reply to this post by Guilherme Piccoli
On 1/8/19 9:28 PM, Guilherme G. Piccoli wrote:

> BugLink: https://bugs.launchpad.net/bugs/1791758
>
> ** NOTICE: Patch 1 in this SRU request should be applied only to 4.15,
> since it is already present in 4.18.
>
>
> [Impact]
>
> * Line discipline code is racy when we have buffer being flush while the
> tty is being initialized or reinitialized. For the first problem, we have
> an upstream patch since January 2018:
> b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf");
> although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent
> ones.
>
> * For the race between the buffer flush while tty is being reopened, we
> have a patch that addresses this issue recently merged for 5.0-rc1:
> 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()").
> No Ubuntu kernel currently contains this patch, hence we're hereby
> submitting the SRU request. The upstream complete patch series for
> this is in [0].
>
> * The approach of both patches are similar - they rely in locking/semaphore
> to prevent race conditions. Some additional patches are necessary to prevent
> correlated issues, like preventing a potential deadlock due to bad
> prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to
> c96cf923a98d ("tty: Don't block on IO when ldisc change is pending").
> All the necessary fixes are grouped here in this SRU request.
>
> * The symptom of the race condition between the buffer flush and the
> tty reopen routine is a kernel crash with the following trace:
>
>
> BUG: unable to handle kernel paging request at 0000000000002268
> IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
> [...]
> Call Trace:
> [<addr>] ? kvm_sched_clock_read+0x1e/0x30
> [<addr>] n_tty_receive_buf2+0x14/0x20
> [<addr>] flush_to_ldisc+0xd5/0x120
> [<addr>] process_one_work+0x156/0x400
> [<addr>] worker_thread+0x11a/0x480
> [...]
>
>
> * A kernel crash was collected from an user, analysis is present in
> comment #4 in LP #1791758.
>
>
> [Test Case]
>
> * It is not trivial to trigger this fault, but the usual recipe is to keep
> accessing a machine through SSH (or keep killing getty when in IPMI serial
> console) and in some way run commands before the terminal is ready in that
> machine (like hacking some echo into ttySx or pts in an infinite loop).
>
> * We have reports of users that could reproduce this issue in their
> production environment, and with the patches present in this SRU request
> the problem was fixed.
>
>
> [Regression Potential]
>
> * tty subsystem is highly central and patches in that area are always
> delicate. For example, the upstream series [0] is a re-spin (V6) due to
> a hard to reproduce issue reported in the PA-RISC architecture, which was
> found in the V5 iteration [1] but was fixed by the patch c96cf923a98d,
> present in this SRU request.
>
> * The patchset [0] is present in tty-next tree since mid-November, so
> the overall likelihood of regressions is low.
>
> * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18)
> and didn't show any issues.
>
>
> [0] https://marc.info/?l=linux-kernel&m=154103190111795
> [1] https://marc.info/?l=linux-kernel&m=153737852618183
>
>
> Dmitry Safonov (4):
>   tty: Drop tty->count on tty_reopen() failure
>   tty: Hold tty_ldisc_lock() during tty_reopen()
>   tty: Don't block on IO when ldisc change is pending
>   tty: Simplify tty->count math in tty_reopen()
>
>  drivers/tty/n_hdlc.c    |  4 ++--
>  drivers/tty/n_r3964.c   |  2 +-
>  drivers/tty/n_tty.c     |  8 ++++----
>  drivers/tty/tty_io.c    | 13 ++++++++++---
>  drivers/tty/tty_ldisc.c |  7 +++++++
>  include/linux/tty.h     |  7 +++++++
>  6 files changed, 31 insertions(+), 10 deletions(-)
>
Same comments as for the Xenial series, patches provenance needs to be
fixed to:

(cherry picked from commit ...)


Acked-by: Kleber Sacilotto de Souza <[hidden email]>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: ACK/cmnt: [SRU B/C] [PATCH 0/4] Line discipline buffer flush/tty_reopen() race fix

Guilherme Piccoli
On Wed, Jan 9, 2019 at 8:36 AM Kleber Souza <[hidden email]> wrote:
>[...]
> Same comments as for the Xenial series, patches provenance needs to be
> fixed to:
>
> (cherry picked from commit ...)
>
>
> Acked-by: Kleber Sacilotto de Souza <[hidden email]>
>

Thanks again, and sorry for the silly mistake heheh

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

ACK/Cmnt: [SRU B/C] [PATCH 0/4] Line discipline buffer flush/tty_reopen() race fix

Stefan Bader-2
In reply to this post by Guilherme Piccoli
On 08.01.19 21:28, Guilherme G. Piccoli wrote:

> BugLink: https://bugs.launchpad.net/bugs/1791758
>
> ** NOTICE: Patch 1 in this SRU request should be applied only to 4.15,
> since it is already present in 4.18.
>
>
> [Impact]
>
> * Line discipline code is racy when we have buffer being flush while the
> tty is being initialized or reinitialized. For the first problem, we have
> an upstream patch since January 2018:
> b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf");
> although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent
> ones.
>
> * For the race between the buffer flush while tty is being reopened, we
> have a patch that addresses this issue recently merged for 5.0-rc1:
> 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()").
> No Ubuntu kernel currently contains this patch, hence we're hereby
> submitting the SRU request. The upstream complete patch series for
> this is in [0].
>
> * The approach of both patches are similar - they rely in locking/semaphore
> to prevent race conditions. Some additional patches are necessary to prevent
> correlated issues, like preventing a potential deadlock due to bad
> prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to
> c96cf923a98d ("tty: Don't block on IO when ldisc change is pending").
> All the necessary fixes are grouped here in this SRU request.
>
> * The symptom of the race condition between the buffer flush and the
> tty reopen routine is a kernel crash with the following trace:
>
>
> BUG: unable to handle kernel paging request at 0000000000002268
> IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
> [...]
> Call Trace:
> [<addr>] ? kvm_sched_clock_read+0x1e/0x30
> [<addr>] n_tty_receive_buf2+0x14/0x20
> [<addr>] flush_to_ldisc+0xd5/0x120
> [<addr>] process_one_work+0x156/0x400
> [<addr>] worker_thread+0x11a/0x480
> [...]
>
>
> * A kernel crash was collected from an user, analysis is present in
> comment #4 in LP #1791758.
>
>
> [Test Case]
>
> * It is not trivial to trigger this fault, but the usual recipe is to keep
> accessing a machine through SSH (or keep killing getty when in IPMI serial
> console) and in some way run commands before the terminal is ready in that
> machine (like hacking some echo into ttySx or pts in an infinite loop).
>
> * We have reports of users that could reproduce this issue in their
> production environment, and with the patches present in this SRU request
> the problem was fixed.
>
>
> [Regression Potential]
>
> * tty subsystem is highly central and patches in that area are always
> delicate. For example, the upstream series [0] is a re-spin (V6) due to
> a hard to reproduce issue reported in the PA-RISC architecture, which was
> found in the V5 iteration [1] but was fixed by the patch c96cf923a98d,
> present in this SRU request.
>
> * The patchset [0] is present in tty-next tree since mid-November, so
> the overall likelihood of regressions is low.
>
> * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18)
> and didn't show any issues.
>
>
> [0] https://marc.info/?l=linux-kernel&m=154103190111795
> [1] https://marc.info/?l=linux-kernel&m=153737852618183
>
>
> Dmitry Safonov (4):
>   tty: Drop tty->count on tty_reopen() failure
>   tty: Hold tty_ldisc_lock() during tty_reopen()
>   tty: Don't block on IO when ldisc change is pending
>   tty: Simplify tty->count math in tty_reopen()
>
>  drivers/tty/n_hdlc.c    |  4 ++--
>  drivers/tty/n_r3964.c   |  2 +-
>  drivers/tty/n_tty.c     |  8 ++++----
>  drivers/tty/tty_io.c    | 13 ++++++++++---
>  drivers/tty/tty_ldisc.c |  7 +++++++
>  include/linux/tty.h     |  7 +++++++
>  6 files changed, 31 insertions(+), 10 deletions(-)
>
Indeed same cherry-pick comment and mabye a suggestion for future submissions.
At least the way I work from the threaded view in Thunderbird a set like

<cover> X/B/C SRU...
+- X1
+- X2
+ ...
+- X5
+- B
+- B+C1
+- ...
+- B+C4

would help a bit because cross checking with the lp bug then quickly shows all
nominations and patch parts match. Otherwise the other thread might be somewhere
not close to the currently looked at part.
Acked-by: Stefan Bader <[hidden email]>


--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

APPLIED[D]: [SRU B/C] [PATCH 0/4] Line discipline buffer flush/tty_reopen() race fix

Seth Forshee
In reply to this post by Guilherme Piccoli
On Tue, Jan 08, 2019 at 06:28:07PM -0200, Guilherme G. Piccoli wrote:

> BugLink: https://bugs.launchpad.net/bugs/1791758
>
> ** NOTICE: Patch 1 in this SRU request should be applied only to 4.15,
> since it is already present in 4.18.
>
>
> [Impact]
>
> * Line discipline code is racy when we have buffer being flush while the
> tty is being initialized or reinitialized. For the first problem, we have
> an upstream patch since January 2018:
> b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf");
> although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent
> ones.
>
> * For the race between the buffer flush while tty is being reopened, we
> have a patch that addresses this issue recently merged for 5.0-rc1:
> 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()").
> No Ubuntu kernel currently contains this patch, hence we're hereby
> submitting the SRU request. The upstream complete patch series for
> this is in [0].
>
> * The approach of both patches are similar - they rely in locking/semaphore
> to prevent race conditions. Some additional patches are necessary to prevent
> correlated issues, like preventing a potential deadlock due to bad
> prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to
> c96cf923a98d ("tty: Don't block on IO when ldisc change is pending").
> All the necessary fixes are grouped here in this SRU request.
>
> * The symptom of the race condition between the buffer flush and the
> tty reopen routine is a kernel crash with the following trace:
>
>
> BUG: unable to handle kernel paging request at 0000000000002268
> IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
> [...]
> Call Trace:
> [<addr>] ? kvm_sched_clock_read+0x1e/0x30
> [<addr>] n_tty_receive_buf2+0x14/0x20
> [<addr>] flush_to_ldisc+0xd5/0x120
> [<addr>] process_one_work+0x156/0x400
> [<addr>] worker_thread+0x11a/0x480
> [...]
>
>
> * A kernel crash was collected from an user, analysis is present in
> comment #4 in LP #1791758.
>
>
> [Test Case]
>
> * It is not trivial to trigger this fault, but the usual recipe is to keep
> accessing a machine through SSH (or keep killing getty when in IPMI serial
> console) and in some way run commands before the terminal is ready in that
> machine (like hacking some echo into ttySx or pts in an infinite loop).
>
> * We have reports of users that could reproduce this issue in their
> production environment, and with the patches present in this SRU request
> the problem was fixed.
>
>
> [Regression Potential]
>
> * tty subsystem is highly central and patches in that area are always
> delicate. For example, the upstream series [0] is a re-spin (V6) due to
> a hard to reproduce issue reported in the PA-RISC architecture, which was
> found in the V5 iteration [1] but was fixed by the patch c96cf923a98d,
> present in this SRU request.
>
> * The patchset [0] is present in tty-next tree since mid-November, so
> the overall likelihood of regressions is low.
>
> * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18)
> and didn't show any issues.
>
>
> [0] https://marc.info/?l=linux-kernel&m=154103190111795
> [1] https://marc.info/?l=linux-kernel&m=153737852618183

Please remember to include the development kernel when it does not yet
have the patches. Applied to disco/master-next, thanks!

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

APPLIED/cmt: [SRU B/C] [PATCH 0/4] Line discipline buffer flush/tty_reopen() race fix

Khaled Elmously
In reply to this post by Guilherme Piccoli
Patches 1-4 applied to Bionic, patches 2-4 applied to Cosmic, fixed the provenance lines


On 2019-01-08 18:28:07 , Guilherme G. Piccoli wrote:

> BugLink: https://bugs.launchpad.net/bugs/1791758
>
> ** NOTICE: Patch 1 in this SRU request should be applied only to 4.15,
> since it is already present in 4.18.
>
>
> [Impact]
>
> * Line discipline code is racy when we have buffer being flush while the
> tty is being initialized or reinitialized. For the first problem, we have
> an upstream patch since January 2018:
> b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf");
> although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent
> ones.
>
> * For the race between the buffer flush while tty is being reopened, we
> have a patch that addresses this issue recently merged for 5.0-rc1:
> 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()").
> No Ubuntu kernel currently contains this patch, hence we're hereby
> submitting the SRU request. The upstream complete patch series for
> this is in [0].
>
> * The approach of both patches are similar - they rely in locking/semaphore
> to prevent race conditions. Some additional patches are necessary to prevent
> correlated issues, like preventing a potential deadlock due to bad
> prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to
> c96cf923a98d ("tty: Don't block on IO when ldisc change is pending").
> All the necessary fixes are grouped here in this SRU request.
>
> * The symptom of the race condition between the buffer flush and the
> tty reopen routine is a kernel crash with the following trace:
>
>
> BUG: unable to handle kernel paging request at 0000000000002268
> IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
> [...]
> Call Trace:
> [<addr>] ? kvm_sched_clock_read+0x1e/0x30
> [<addr>] n_tty_receive_buf2+0x14/0x20
> [<addr>] flush_to_ldisc+0xd5/0x120
> [<addr>] process_one_work+0x156/0x400
> [<addr>] worker_thread+0x11a/0x480
> [...]
>
>
> * A kernel crash was collected from an user, analysis is present in
> comment #4 in LP #1791758.
>
>
> [Test Case]
>
> * It is not trivial to trigger this fault, but the usual recipe is to keep
> accessing a machine through SSH (or keep killing getty when in IPMI serial
> console) and in some way run commands before the terminal is ready in that
> machine (like hacking some echo into ttySx or pts in an infinite loop).
>
> * We have reports of users that could reproduce this issue in their
> production environment, and with the patches present in this SRU request
> the problem was fixed.
>
>
> [Regression Potential]
>
> * tty subsystem is highly central and patches in that area are always
> delicate. For example, the upstream series [0] is a re-spin (V6) due to
> a hard to reproduce issue reported in the PA-RISC architecture, which was
> found in the V5 iteration [1] but was fixed by the patch c96cf923a98d,
> present in this SRU request.
>
> * The patchset [0] is present in tty-next tree since mid-November, so
> the overall likelihood of regressions is low.
>
> * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18)
> and didn't show any issues.
>
>
> [0] https://marc.info/?l=linux-kernel&m=154103190111795
> [1] https://marc.info/?l=linux-kernel&m=153737852618183
>
>
> Dmitry Safonov (4):
>   tty: Drop tty->count on tty_reopen() failure
>   tty: Hold tty_ldisc_lock() during tty_reopen()
>   tty: Don't block on IO when ldisc change is pending
>   tty: Simplify tty->count math in tty_reopen()
>
>  drivers/tty/n_hdlc.c    |  4 ++--
>  drivers/tty/n_r3964.c   |  2 +-
>  drivers/tty/n_tty.c     |  8 ++++----
>  drivers/tty/tty_io.c    | 13 ++++++++++---
>  drivers/tty/tty_ldisc.c |  7 +++++++
>  include/linux/tty.h     |  7 +++++++
>  6 files changed, 31 insertions(+), 10 deletions(-)
>
> --
> 2.19.2
>
>
> --
> kernel-team mailing list
> [hidden email]
> https://lists.ubuntu.com/mailman/listinfo/kernel-team

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team