[Jaunty] Proposing some ext4 patches

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[Jaunty] Proposing some ext4 patches

Stefan Bader-2
After we had https://bugs.launchpad.net/bugs/389555 I had a look at Ted's
2.6.28-stable repo to figure out, whether there might be some more dangers
lurking. Since his 2.6.28.10 tag there are 21 patches difference between our
and his ext4 tree. Quite a few of them are in a gray zone of being of little
risk but do not seem to be critical enough to qualify for SRU. This would leave
7 (or 8) which we probably should consider...
And (question to the SRU team) if we do, could we use one tracking report?

Stefan

01
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=1a29ba346699cd47c1b3f683340290f705cd9880
Oops when running without journal. Maybe unlikely case but also low risk as it
only adds two checks for no journal.

02
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=626be33b682d1d99bfae4ee4db16ed34f6c8ff65
Fsck errors caused by a change which we imported with the last stable import.

03 (maybe skip (not enough justification))
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=7aaeda34909c571d0b8cc8a8a10e788e4e3350be
Limit the rate a warning is printed to 1.

04
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=7956e0d6d07b7f0448085184dc47ce01143f7746
Oops as a result of a bogus BUG_ON

05 (suggest skip)
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=774c43079ddc04a92030dd31109421518e1fcf14
Modifications to avoid lock contention.

06 (maybe skip (not enough justification))
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=be79df89cf61466bbaa97ceb363719b65dc1efd6
Prevent FTBS with CONFIG_OCFS2_COMPAT_JBD enabled (disabled in ubuntu)

07 (suggest skip)
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=acf4d672af6e588d977a6daaad1057977659fcfa
Fix the accidental inheritance of the TOPDIR flag.

08 (suggest skip)
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=c687585c0afac3269bd819df9072a0212762f0d9
Tightens restriction of flags allowed to set on inodes.

09 (maybe (maybe skip (not enough justification))skip (not enough justification))
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=94921a80dc2146b6be4d645db744dd3de0d02eed
Fix up a bogus/confusing error message in some cases. This only changes an
error code propagated incorrectly to user-space which results in a message
pointing to NFS.

10 (maybe skip (not enough justification))
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=7c2e96ee2e535566cf87527c000b919341de53d6
Documentation only fix.

11 (?)
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=d9ec01eafda7ec7b5fd63b623c86bd95dbd8349a
Fix to not discard preallocations on close. Not sure of the impact here.

12 (suggest skip)
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=2cd9869ec423014e6b4456010a5bee263bc52478
Adding a mount option to disable the automatic allocation of blocks that would
normally have delayed allocation on trusted systems. Better performance but
less safe.

13 (maybe skip (not enough justification))
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=3c5c47493f5300bb08a93158ed43c7ae4064a455
Check for valid modes when reading an inode from disk. Helps to detect corruption.

14 (maybe skip (not enough justification))
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=9a4d680efe0da896450296bf0ee68a406aa3d5e7
Commentary only update.

15
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=3a78d23b7d81cca4779fef150df1fdb7fe5a0547
Fixes wrong memory free

16 (maybe skip (not enough justification))
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=f8e79c5df0d76de38cc46329ce8f9a89a1af222f
Prevent a message (intended to be printed only once) from being printed repeatedly.

17
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=84e7f33aa99698f905a31030717b89ebf3ed0300
Prevents soft lockup by checking the range of i_file_acl.

18 (maybe skip (not enough justification))
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=903865dc2b2bc0bec333cf8ceb55df8703be3807
Ingore the i_file_acl_high unless the filesystem is 64bit (INCOMPAT_64BIT set)

19
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=9bf7b8126ad4a0fc2792532404f667dcf8410917
Prevent possible data corruption on partial writes with preallocation.

20 (maybe skip)
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=2742d4833ba07f06a18ad2df750b9f3a712864a4
Use a large (non-zero) block number for delayed allocation buffers. Those
should never be written but if this is tried it is more obvious where this
comes from.

21
http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=5ac67d1e8d50b0633361ed2b9ff3834a0998d214
Prevent an invalid combination of flags when converting uninitialized extends
to initialized ones.

--

When all other means of communication fail, try words!



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [Jaunty] Proposing some ext4 patches

Ted Ts'o
On Mon, Jun 22, 2009 at 07:39:12PM +0200, Stefan Bader wrote:

> After we had https://bugs.launchpad.net/bugs/389555 I had a look at Ted's
> 2.6.28-stable repo to figure out, whether there might be some more
> dangers lurking. Since his 2.6.28.10 tag there are 21 patches difference
> between our and his ext4 tree. Quite a few of them are in a gray zone of
> being of little risk but do not seem to be critical enough to qualify for
> SRU. This would leave 7 (or 8) which we probably should consider...
> And (question to the SRU team) if we do, could we use one tracking report?
>
> Stefan
>
> 05 (suggest skip)
> http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=774c43079ddc04a92030dd31109421518e1fcf14
> Modifications to avoid lock contention.

Sorry, the patch commit message is a bit misleading.  This fixes a
lock ordering problem (detected by lockdep) that could potentially
lead to a system lockup.  The patch was originally designed to fix a
performance problem, but then we discovered it also fixed a lockdep
warning, and we dropped a reference to a kernel bugzilla entry w/o
updating the commit description:

http://bugzilla.kernel.org/show_bug.cgi?id=12787

> 11 (?)
> http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=d9ec01eafda7ec7b5fd63b623c86bd95dbd8349a
> Fix to not discard preallocations on close. Not sure of the impact here.

We could not discard preallocations on close if there were any delayed
allocation blocks; this lead to preallocations not getting discarded
until much, much later, which could prevent the block allocator from
not being able to allocate blocks efficiently.  

The patch checks so that once an inode has all of its delayed
allocation blocks allocated, and there are no open r/w
filedescriptors, we discard the preallocated blocks so they can be
used by another file.

This helps to promote a better (less fragmented) layout of block
allocations on disk.  It doesn't fix a critical bug, so this patch is
one that you can decide to skip.

> 20 (maybe skip)
> http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=2742d4833ba07f06a18ad2df750b9f3a712864a4
> Use a large (non-zero) block number for delayed allocation buffers. Those
> should never be written but if this is tried it is more obvious where
> this comes from.

.... instead of blowing away the boot block / partition table, which
could lead to all sorts of user complaints.  :-)

At the time we weren't completely convinced that the code was
bug-free, but we haven't had any reports of people triggering an
attempted write with the very large non-zero block number, so it's
probably safe to to skip this.


I am suspicious that hard to debug hang described in Launchpad #330824
(Soft lockups when deleting files from ext4 partitions) may very well
be caused by either (a) a failed backport, or (b) an subtle patch
dependency that triggerred a big due to a skipped patch.  I would
therefore encourage you to use the xfstests suite to test the
resulting ext4 filesystem.  The xfstests can be found here:

        url = git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git

I believe you will need the following packages for it to build and
run: xfsprogs, xfslibs-dev, libaio1, libaio-dev.  I might be missing
one or two, but those seem to be the critical ones.  Hopefully any
others will be obvious.  Cd to the top-level of the xfstests
directory, and run ./configure; make.

To run the XFS tests, you will need to have one (and preferably two)
partitions, called TEST and SCRATCH.  TEST should be a mounted ext4
partition that can be mounted and unmounted.  SCRATCH should be an
empty device that you don't mind getting reformatted (many of the
tests will reformat the SCRATCH partition).  SCRATCH is optional; so
if you don't have an 2nd partition, you can simply omit setting the
SCRATCH_DEV and SCRATCH_MNT environment variables.

Set the environment variables:

TEST_DEV       device file (i.e., /dev/sda1) containing the TEST partition
TEST_DIR   mount point of the TEST partition
SCRATCH_DEV   device file (i.e., /dev/sda2) containing the SCRATCH partition
SCRATCH_MNT   mount point of the SCRATCH partition

(note TEST_DIR vs. SCRATCH_MNT; don't blame me, blame the SGI
engineers.  :-)

Then run "./check -ext4 -g auto" as root in the top-level xfstests
directory.  Everything should pass; if not, then there's probably
something wrong the the Ubuntu backports.  Bug reports with mainline
kernels should be sent [hidden email].  If the mainline
kernel passes, and the Ubuntu backports don't, the sooner it is
detected, the easier it will be to try to find the problem with
bisection searches.

                                                - Ted



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [Jaunty] Proposing some ext4 patches

Steve Langasek-5
In reply to this post by Stefan Bader-2
On Mon, Jun 22, 2009 at 07:39:12PM +0200, Stefan Bader wrote:
> After we had https://bugs.launchpad.net/bugs/389555 I had a look at
> Ted's 2.6.28-stable repo to figure out, whether there might be some
> more dangers lurking. Since his 2.6.28.10 tag there are 21 patches
> difference between our and his ext4 tree. Quite a few of them are in
> a gray zone of being of little risk but do not seem to be critical
> enough to qualify for SRU. This would leave 7 (or 8) which we
> probably should consider...
> And (question to the SRU team) if we do, could we use one tracking report?

I'm fine with using a single SRU bug (i.e.: bug #389555) for all of the
ext4-related backports that are deemed appropriate, provided that there
aren't already multiple user bug reports in Launchpad for different problems
that will be fixed in this update, and that should have verification done
separately.

--
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                    http://www.debian.org/
[hidden email]                                     [hidden email]

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [Jaunty] Proposing some ext4 patches

Stefan Bader-2
In reply to this post by Ted Ts'o
Theodore Tso wrote:

> On Mon, Jun 22, 2009 at 07:39:12PM +0200, Stefan Bader wrote:
>> After we had https://bugs.launchpad.net/bugs/389555 I had a look at Ted's
>> 2.6.28-stable repo to figure out, whether there might be some more
>> dangers lurking. Since his 2.6.28.10 tag there are 21 patches difference
>> between our and his ext4 tree. Quite a few of them are in a gray zone of
>> being of little risk but do not seem to be critical enough to qualify for
>> SRU. This would leave 7 (or 8) which we probably should consider...
>> And (question to the SRU team) if we do, could we use one tracking report?
>>
>> Stefan
>>
>> 05 (suggest skip)
>> http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=774c43079ddc04a92030dd31109421518e1fcf14
>> Modifications to avoid lock contention.
>
> Sorry, the patch commit message is a bit misleading.  This fixes a
> lock ordering problem (detected by lockdep) that could potentially
> lead to a system lockup.  The patch was originally designed to fix a
> performance problem, but then we discovered it also fixed a lockdep
> warning, and we dropped a reference to a kernel bugzilla entry w/o
> updating the commit description:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=12787
>

Ah, thanks for that update.

>> 11 (?)
>> http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=d9ec01eafda7ec7b5fd63b623c86bd95dbd8349a
>> Fix to not discard preallocations on close. Not sure of the impact here.
>
> We could not discard preallocations on close if there were any delayed
> allocation blocks; this lead to preallocations not getting discarded
> until much, much later, which could prevent the block allocator from
> not being able to allocate blocks efficiently.  
>
> The patch checks so that once an inode has all of its delayed
> allocation blocks allocated, and there are no open r/w
> filedescriptors, we discard the preallocated blocks so they can be
> used by another file.
>
> This helps to promote a better (less fragmented) layout of block
> allocations on disk.  It doesn't fix a critical bug, so this patch is
> one that you can decide to skip.
>

Thanks for the clarification. Yes, in that case we rather would skip it.

>> 20 (maybe skip)
>> http://kernel.ubuntu.com/git?p=smb/ubuntu-jaunty.git;a=commitdiff;h=2742d4833ba07f06a18ad2df750b9f3a712864a4
>> Use a large (non-zero) block number for delayed allocation buffers. Those
>> should never be written but if this is tried it is more obvious where
>> this comes from.
>
> .... instead of blowing away the boot block / partition table, which
> could lead to all sorts of user complaints.  :-)
>

Heh, got that only once and failed to blame ext3/4 for it. IOW, it was a bug
somewhere else (the mmc driver).

> At the time we weren't completely convinced that the code was
> bug-free, but we haven't had any reports of people triggering an
> attempted write with the very large non-zero block number, so it's
> probably safe to to skip this.
>
>
> I am suspicious that hard to debug hang described in Launchpad #330824
> (Soft lockups when deleting files from ext4 partitions) may very well
> be caused by either (a) a failed backport, or (b) an subtle patch
> dependency that triggerred a big due to a skipped patch.  I would
> therefore encourage you to use the xfstests suite to test the
> resulting ext4 filesystem.  The xfstests can be found here:
>
> url = git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
>
> I believe you will need the following packages for it to build and
> run: xfsprogs, xfslibs-dev, libaio1, libaio-dev.  I might be missing
> one or two, but those seem to be the critical ones.  Hopefully any
> others will be obvious.  Cd to the top-level of the xfstests
> directory, and run ./configure; make.
>
> To run the XFS tests, you will need to have one (and preferably two)
> partitions, called TEST and SCRATCH.  TEST should be a mounted ext4
> partition that can be mounted and unmounted.  SCRATCH should be an
> empty device that you don't mind getting reformatted (many of the
> tests will reformat the SCRATCH partition).  SCRATCH is optional; so
> if you don't have an 2nd partition, you can simply omit setting the
> SCRATCH_DEV and SCRATCH_MNT environment variables.
>
> Set the environment variables:
>
> TEST_DEV       device file (i.e., /dev/sda1) containing the TEST partition
> TEST_DIR   mount point of the TEST partition
> SCRATCH_DEV   device file (i.e., /dev/sda2) containing the SCRATCH partition
> SCRATCH_MNT   mount point of the SCRATCH partition
>
> (note TEST_DIR vs. SCRATCH_MNT; don't blame me, blame the SGI
> engineers.  :-)
>
> Then run "./check -ext4 -g auto" as root in the top-level xfstests
> directory.  Everything should pass; if not, then there's probably
> something wrong the the Ubuntu backports.  Bug reports with mainline
> kernels should be sent [hidden email].  If the mainline
> kernel passes, and the Ubuntu backports don't, the sooner it is
> detected, the easier it will be to try to find the problem with
> bisection searches.
>
> - Ted
>

Great, yes, I definitely would like to get testing done with the updated ext4
code in Jaunty. So I will go and get kernels and the environment prepared and
the testing done.

Thanks again for the feedback.

Stefan


--

When all other means of communication fail, try words!



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [Jaunty] Proposing some ext4 patches

Stefan Bader-2
In reply to this post by Steve Langasek-5
Steve Langasek wrote:

> On Mon, Jun 22, 2009 at 07:39:12PM +0200, Stefan Bader wrote:
>> After we had https://bugs.launchpad.net/bugs/389555 I had a look at
>> Ted's 2.6.28-stable repo to figure out, whether there might be some
>> more dangers lurking. Since his 2.6.28.10 tag there are 21 patches
>> difference between our and his ext4 tree. Quite a few of them are in
>> a gray zone of being of little risk but do not seem to be critical
>> enough to qualify for SRU. This would leave 7 (or 8) which we
>> probably should consider...
>> And (question to the SRU team) if we do, could we use one tracking report?
>
> I'm fine with using a single SRU bug (i.e.: bug #389555) for all of the
> ext4-related backports that are deemed appropriate, provided that there
> aren't already multiple user bug reports in Launchpad for different problems
> that will be fixed in this update, and that should have verification done
> separately.
>

OK, I will dig whether we already got reports on other issues before placing
all of them into one.

Stefan

--

When all other means of communication fail, try words!



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [Jaunty] Proposing some ext4 patches

Ted Ts'o
In reply to this post by Stefan Bader-2
On Tue, Jun 23, 2009 at 05:32:00PM +0200, Stefan Bader wrote:
>
> Great, yes, I definitely would like to get testing done with the updated
> ext4 code in Jaunty. So I will go and get kernels and the environment
> prepared and the testing done.

The place where I'd strongly encourage testing is the ext4 code in
Karmic.  

I have not tried running the XFS tests on the current Jaunty code, so
it would probably be a good idea to run the xfstests on the current
Jaunty code to get a baseline before trying to test an updated Jaunty
with even more ext4 backported patches added, since there is strong
suspicion that "something might be rotten in the state of Denmark"
with respect to the current ext4 in Jaunty.  So definitely try getting
a baseline first.

Best of luck with the Jaunty update; if there's something I can do to
help or to clarify some patch, please let me know.

Regards,

                                                - Ted

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [Jaunty] Proposing some ext4 patches

Stefan Bader-2
Theodore Tso wrote:

> On Tue, Jun 23, 2009 at 05:32:00PM +0200, Stefan Bader wrote:
>> Great, yes, I definitely would like to get testing done with the updated
>> ext4 code in Jaunty. So I will go and get kernels and the environment
>> prepared and the testing done.
>
> The place where I'd strongly encourage testing is the ext4 code in
> Karmic.  
>
> I have not tried running the XFS tests on the current Jaunty code, so
> it would probably be a good idea to run the xfstests on the current
> Jaunty code to get a baseline before trying to test an updated Jaunty
> with even more ext4 backported patches added, since there is strong
> suspicion that "something might be rotten in the state of Denmark"
> with respect to the current ext4 in Jaunty.  So definitely try getting
> a baseline first.
>

I did not get to the karmic kernel, but have now done the current Jaunty
against a version including the picked set of patches. Both had two tests
failing (213, 214). Strangely those seem to be tests that look like only useful
for xfs. Also in theory they seem to be requiring the fallocate command which
is not included in the tools. So I guess that is a rather a problem with the
test than one with the fs.
So at least it seems to assure, that applying the patches does not make things
worse (aka causes regressions).

Stefan

> Best of luck with the Jaunty update; if there's something I can do to
> help or to clarify some patch, please let me know.
>
> Regards,
>
> - Ted


--

When all other means of communication fail, try words!



--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team
Reply | Threaded
Open this post in threaded view
|

Re: [Jaunty] Proposing some ext4 patches

Ted Ts'o
On Thu, Jul 02, 2009 at 10:18:57PM +0200, Stefan Bader wrote:
>
> I did not get to the karmic kernel, but have now done the current Jaunty  
> against a version including the picked set of patches. Both had two tests
> failing (213, 214). Strangely those seem to be tests that look like only
> useful for xfs. Also in theory they seem to be requiring the fallocate
> command which is not included in the tools. So I guess that is a rather a
> problem with the test than one with the fs.
> So at least it seems to assure, that applying the patches does not make
> things worse (aka causes regressions).

Hmm, I don't get a failure; I just get:

213 [not_run] xfs_io fallocate support is missing
214 [not_run] xfs_io fallocate support is missing

From the git log in question:

    fallocate + read/write tests, ext4 regression tests
   
    New test to test basic mixed fallocate + read & write,
    includes a couple regression tests for bugs that ext4
    hit.  Uses xfs_io to generate fallocate calls, so requires
    git xfsprogs and very recent glibc at this point.
   
    Ext4 folks, this is hopefully a reasonable example of
    how to add a new test.   :)
   
    Signed-off-by: Eric Sandeen <[hidden email]>
    Reviewed-by: Christoph Hellwig <[hidden email]>

Note the "requires git xfsprogs and very recent glibc".  (The
fallocate() support missed latest glibc release, although it's in
glibc's CVS tree, so it requires a backport patch.)  So basically you
need the latest bleeding-edge development support for tests 213 and
214 to run, but it *should* have failed with a clean "not_run" status.

If you got something else, it probably is a bug in the tests, and we
should report it to Eric Sandeen, who coded up the test.

                           - Ted

--
kernel-team mailing list
[hidden email]
https://lists.ubuntu.com/mailman/listinfo/kernel-team