zstd compression for packages

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Julian Andres Klode
On Fri, Mar 16, 2018 at 03:13:55PM -0700, Steve Langasek wrote:
> Hi Julian,
>
> Thanks for posting about this.  I agree that if this is landing in dpkg+apt
> upstream, it's reasonable to try to get it into the 18.04 release so that it
> can be used in later releases without needing a dpkg versioned pre-depends.
>
> If we are to evaluate using zstd as the default compression in 18.10 (or
> later), I think we need to consider the total install experience, and not
> just look at the dpkg unpack time.

We're really only considering cloud cases, as a 10% gain on non-eatmydata
cases on slower connections does not really seem worth it, right?

> For example:
>
> On Mon, Mar 12, 2018 at 11:06:11AM +0100, Julian Andres Klode wrote:
> > firefox
> > --------
> > 8.80user 3.57system 0:37.17elapsed 33%CPU (0avgtext+0avgdata 25260maxresident)k
> > 8inputs+548024outputs (0major+376614minor)pagefaults 0swaps
> >
> > 4.52user 3.30system 0:33.14elapsed 23%CPU (0avgtext+0avgdata 25152maxresident)k
> > 0inputs+544560outputs (0major+386394minor)pagefaults 0swaps
>
> > firefox eatmydata
> > -----------------------
> > 8.79user 2.87system 0:12.43elapsed 93%CPU (0avgtext+0avgdata 25416maxresident)k
> > 0inputs+548016outputs (0major+384193minor)pagefaults 0swaps
> > 4.24user 2.57system 0:08.54elapsed 79%CPU (0avgtext+0avgdata 25280maxresident)k
> > 0inputs+544584outputs (0major+392117minor)pagefaults 0swaps
>
> Since you don't list binary package names for kernel or libreoffice, I'll
> look at firefox, which is the obvious one.  The archive version of this
> package is 42MiB in size in bionic.  If the zstd version is 6% larger, but
> takes 4 seconds less time to unpack, this means the total install time
> (download+unpack) is only improved for the end user if the download speed
> from the apt source is faster than (44108204 bytes * .06 * 8bits/byte / 4.03s
> ~=) 5.25Mbps.
It's not just the firefox package, but the entire apt install firefox in a
fresh debootstrap, so including most dependencies.

Kernel was apt install linux-image-generic initramfs-tools- grub<somethng>-
libreoffice was apt install libreoffice-$foo for all $foo (calc,draw,...)

Therefore the calculations are off, and the improvement at low connection
speeds is likely not worth it.

>
> Have you established that this is a typical effective download speed for
> Ubuntu users?  It's certainly faster than my home connection, though I also
> use a local mirror to speed up installs.  It may be reasonable to expect
> cloud instances to have this much throughput from their mirrors, and so it
> might be the sensible choice solely on that basis; I'm just checking that
> it's been measured.
>
> In other words: if we want to make this the default, we should quantify
> Daniel's remark that he would prefer a 6% faster download over a 10% faster
> unpack.
>
>
> I think we also need to look at the spread of package size increases.  If 6%
> is typical, are there some packages on the high end (of both absolute package
> size and relative size increase) that we should exclude from switching to
> zstd?  We should be transparent about our analysis here.
>
I attached the complete analysis of size differences for main, ordered
by relative increase. There are a few huge relative increases, but only
really for tiny packages.

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

archive repack eval.txt (852K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Dimitri John Ledkov-2
In reply to this post by Steve Langasek-6
On 16 March 2018 at 22:13, Steve Langasek <[hidden email]> wrote:
> In other words: if we want to make this the default, we should quantify
> Daniel's remark that he would prefer a 6% faster download over a 10% faster
> unpack.
>

Well, I think it does not make sense to think about this in absolute
terms. Thinking about user stories is better.

A stable series user will be mostly upgrading packages from -security
and -updates. The download speed and/or size of debs does not matter
much in this case, as these are scheduled to be done in the background
over the course of the day, via unattended upgrades download timer.
Installation speed matters, as that is the window of time when the
system is actually somewhat in a maintenance mode / degraded
performance (apt is locked, there are CPU and disk-io loads).

New instance initialization - e.g. spinning up a cloud instance, with
cloud-init, and installing a bunch of things; deploying juju charm /
conjure-up spell; configuring things with puppet / ansible / etc =>
these are download & install heavy. However, users that do that
heavily, will be in a corporate / bussiness / datacentre environment
and thus it is reasonable to expect them to have either a fat internet
pipe, and/or a local mirror. Meaning download speed & size, are not
critical.

Then there are devel series users, developers who do sbuild builds,
etc. These users are most likely to be on slower home-user connections
and watch things a lot more closely interactively, who indeed care
about the total download+install time. These users, are most likely
very vocal / visible, but are not ultimately the target audience as to
why we develop Ubuntu in the first place. Thus I would be willing to
trade personal developer/devel-series user experience, in favor of the
stable series user. I'm not sure how much it makes sense to
proxy/cache/local-mirror devel series, if it is only a single machine
in use.

--
Regards,

Dimitri.

ps. I fight for the user
pss. /me goes to setup a local mirror proxy cache, with dns spoofing
to make sure all my sbuilds / lxd containers / VM / cloud-images use
local mirror out of the box

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Balint Reczey
Hi All,

On Sat, Mar 17, 2018 at 3:09 PM, Dimitri John Ledkov
<[hidden email]> wrote:

> On 16 March 2018 at 22:13, Steve Langasek <[hidden email]> wrote:
>> In other words: if we want to make this the default, we should quantify
>> Daniel's remark that he would prefer a 6% faster download over a 10% faster
>> unpack.
>>
>
> Well, I think it does not make sense to think about this in absolute
> terms. Thinking about user stories is better.
>
> A stable series user will be mostly upgrading packages from -security
> and -updates. The download speed and/or size of debs does not matter
> much in this case, as these are scheduled to be done in the background
> over the course of the day, via unattended upgrades download timer.
> Installation speed matters, as that is the window of time when the
> system is actually somewhat in a maintenance mode / degraded
> performance (apt is locked, there are CPU and disk-io loads).
>
> New instance initialization - e.g. spinning up a cloud instance, with
> cloud-init, and installing a bunch of things; deploying juju charm /
> conjure-up spell; configuring things with puppet / ansible / etc =>
> these are download & install heavy. However, users that do that
> heavily, will be in a corporate / bussiness / datacentre environment
> and thus it is reasonable to expect them to have either a fat internet
> pipe, and/or a local mirror. Meaning download speed & size, are not
> critical.
>
> Then there are devel series users, developers who do sbuild builds,
> etc. These users are most likely to be on slower home-user connections
> and watch things a lot more closely interactively, who indeed care
> about the total download+install time. These users, are most likely
> very vocal / visible, but are not ultimately the target audience as to
> why we develop Ubuntu in the first place. Thus I would be willing to
> trade personal developer/devel-series user experience, in favor of the
> stable series user. I'm not sure how much it makes sense to
> proxy/cache/local-mirror devel series, if it is only a single machine
> in use.
>
> --
> Regards,
>
> Dimitri.
>
> ps. I fight for the user
> pss. /me goes to setup a local mirror proxy cache, with dns spoofing
> to make sure all my sbuilds / lxd containers / VM / cloud-images use
> local mirror out of the box

I agree with Dimitri's analysis and I'would also like to add one more
thing to consider. During unpacking of packages the system is in a
transient state where programs may not work correctly. Minimizing the
time spent in that transient state is and important additional benefit
of speeding up decompression.

The speedup varies a lot across use cases and IMO the 10% speed
increase is an understatement for many very important use cases.

Cheers,
Balint

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Julian Andres Klode
In reply to this post by Dimitri John Ledkov-2
On Sat, Mar 17, 2018 at 03:09:55PM +0000, Dimitri John Ledkov wrote:

> On 16 March 2018 at 22:13, Steve Langasek <[hidden email]> wrote:
> > In other words: if we want to make this the default, we should quantify
> > Daniel's remark that he would prefer a 6% faster download over a 10% faster
> > unpack.
> >
>
> Well, I think it does not make sense to think about this in absolute
> terms. Thinking about user stories is better.
>
> A stable series user will be mostly upgrading packages from -security
> and -updates. The download speed and/or size of debs does not matter
> much in this case, as these are scheduled to be done in the background
> over the course of the day, via unattended upgrades download timer.
> Installation speed matters, as that is the window of time when the
> system is actually somewhat in a maintenance mode / degraded
> performance (apt is locked, there are CPU and disk-io loads).

I'd like us to have https://wiki.debian.org/Teams/Dpkg/Spec/DeltaDebs
this would mostly solve that problem too.

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Steve Langasek-6
In reply to this post by Dimitri John Ledkov-2
On Sat, Mar 17, 2018 at 03:09:55PM +0000, Dimitri John Ledkov wrote:
> On 16 March 2018 at 22:13, Steve Langasek <[hidden email]> wrote:
> > In other words: if we want to make this the default, we should quantify
> > Daniel's remark that he would prefer a 6% faster download over a 10% faster
> > unpack.

> Well, I think it does not make sense to think about this in absolute
> terms. Thinking about user stories is better.

Sure.

> A stable series user will be mostly upgrading packages from -security
> and -updates. The download speed and/or size of debs does not matter
> much in this case, as these are scheduled to be done in the background
> over the course of the day, via unattended upgrades download timer.
> Installation speed matters, as that is the window of time when the
> system is actually somewhat in a maintenance mode / degraded
> performance (apt is locked, there are CPU and disk-io loads).

Does unattended upgrades download both -security and -updates, or does it
only download -security?  From what I can see in
/usr/bin/unattended-upgrade, the allowed-origins check applies to both the
downloads and the installation.

So by default, increases in the download time of non-security SRUs would be
perceivable by the user (though perhaps not of interest).

> New instance initialization - e.g. spinning up a cloud instance, with
> cloud-init, and installing a bunch of things; deploying juju charm /
> conjure-up spell; configuring things with puppet / ansible / etc =>
> these are download & install heavy. However, users that do that
> heavily, will be in a corporate / bussiness / datacentre environment
> and thus it is reasonable to expect them to have either a fat internet
> pipe, and/or a local mirror. Meaning download speed & size, are not
> critical.

Generally agreed (but the assertion should still be tested, not assumed).

> Then there are devel series users, developers who do sbuild builds,
> etc. These users are most likely to be on slower home-user connections
> and watch things a lot more closely interactively, who indeed care
> about the total download+install time. These users, are most likely
> very vocal / visible, but are not ultimately the target audience as to
> why we develop Ubuntu in the first place. Thus I would be willing to
> trade personal developer/devel-series user experience, in favor of the
> stable series user. I'm not sure how much it makes sense to
> proxy/cache/local-mirror devel series, if it is only a single machine
> in use.
I disagree that we don't develop Ubuntu for developers.  The developer
desktop continues to be an important use case, and while it shouldn't
necessarily dominate every time there is tension between the desktop and
server use cases, it also shouldn't be ignored.

But furthermore, I think there's a separate use case you've not included
here, which is "client user selects a piece of software for installation and
wants to use it immediately".  In that case, the total clock time from
expression of intent, to when the package can be used, does matter.  And
it's not limited to developers of Ubuntu or people tracking the devel
series; this is relevant to the usability of the desktop in stable releases.
It is also, I would argue, the use case that is most important in terms of
its impact on user satisfaction, because it's precisely in the critical path
of a task that has the user's attention; whereas improvements to the other
use cases may improve overall efficiency, but have little or no proximate
benefit to the human user.

--
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                    http://www.debian.org/
[hidden email]                                     [hidden email]

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Dimitri John Ledkov
On 21 March 2018 at 00:25, Steve Langasek <[hidden email]> wrote:

> On Sat, Mar 17, 2018 at 03:09:55PM +0000, Dimitri John Ledkov wrote:
>> On 16 March 2018 at 22:13, Steve Langasek <[hidden email]> wrote:
>> > In other words: if we want to make this the default, we should quantify
>> > Daniel's remark that he would prefer a 6% faster download over a 10% faster
>> > unpack.
>
>> Well, I think it does not make sense to think about this in absolute
>> terms. Thinking about user stories is better.
>
> Sure.
>
>> A stable series user will be mostly upgrading packages from -security
>> and -updates. The download speed and/or size of debs does not matter
>> much in this case, as these are scheduled to be done in the background
>> over the course of the day, via unattended upgrades download timer.
>> Installation speed matters, as that is the window of time when the
>> system is actually somewhat in a maintenance mode / degraded
>> performance (apt is locked, there are CPU and disk-io loads).
>
> Does unattended upgrades download both -security and -updates, or does it
> only download -security?  From what I can see in
> /usr/bin/unattended-upgrade, the allowed-origins check applies to both the
> downloads and the installation.
>
> So by default, increases in the download time of non-security SRUs would be
> perceivable by the user (though perhaps not of interest).
>
>> New instance initialization - e.g. spinning up a cloud instance, with
>> cloud-init, and installing a bunch of things; deploying juju charm /
>> conjure-up spell; configuring things with puppet / ansible / etc =>
>> these are download & install heavy. However, users that do that
>> heavily, will be in a corporate / bussiness / datacentre environment
>> and thus it is reasonable to expect them to have either a fat internet
>> pipe, and/or a local mirror. Meaning download speed & size, are not
>> critical.
>
> Generally agreed (but the assertion should still be tested, not assumed).
>
>> Then there are devel series users, developers who do sbuild builds,
>> etc. These users are most likely to be on slower home-user connections
>> and watch things a lot more closely interactively, who indeed care
>> about the total download+install time. These users, are most likely
>> very vocal / visible, but are not ultimately the target audience as to
>> why we develop Ubuntu in the first place. Thus I would be willing to
>> trade personal developer/devel-series user experience, in favor of the
>> stable series user. I'm not sure how much it makes sense to
>> proxy/cache/local-mirror devel series, if it is only a single machine
>> in use.
>
> I disagree that we don't develop Ubuntu for developers.  The developer

That's not the use case I brought up.

I said users of the devel series, aka ubuntu+1.

The compression vs download trade off, is irrelevant on the ubuntu+1
series, since the churn is so high anyway, that the only way to win,
is to not update every transition / archive push, and only
dist-upgrade weekly. And optimizing for users of ubuntu+1 is very
niche, in comparison to the stable series users.

I make no distinction among the stable series users - be that
"developers" or "not", they are all simply stable series users.

--
Regards,

Dimitri.

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
12