zstd compression for packages

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

zstd compression for packages

Julian Andres Klode
Hey folks,

We had a coding day in Foundations last week and Balint and Julian added support for zstd compression to dpkg [1] and apt [2].

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892664
[2] https://salsa.debian.org/apt-team/apt/merge_requests/8

Zstd is a compression algorithm developed by Facebook that offers far
higher decompression speeds than xz or even gzip (at roughly constant
speed and memory usage across all levels), while offering 19 compression
levels ranging from roughly comparable to gzip in size (but much faster)
to 19, which is roughly comparable to xz -6:

In our configuration, we run zstd at level 19. For bionic main amd64,
this causes a size increase of about 6%, from roughly 5.6 to 5.9 GB.
Installs speed up by about 10%, or, if eatmydata is involved, by up to
40% - user time generally by about 50%.

Our implementations for apt and dpkg support multiple frames as used by
pzstd, so packages can be compressed and decompressed in parallel
eventually.

We are considering requesting a FFe for that - the features are not
invasive, and it allows us to turn it on by default in 18.10.

Thanks,
Balint and Julian

Raw Measurements
===============
All measurements where performed on a cloud instance of bionic, in a basic bionic schroot with overlay, on an ssd.

Kernel install (eatmydata, perf report, time spent in compression)
---------------------------------------------------------------------------------------
Before:  54.79%  liblzma.so.5.2.2
After:  11.04%  libzstd.so.1.3.3

Kernel install (eatmydata)
----------------------------------

12.49user 3.04system 0:12.57elapsed 123%CPU (0avgtext+0avgdata 68720maxresident)k
0inputs+1056712outputs (0major+159306minor)pagefaults 0swaps

5.60user 2.33system 0:07.07elapsed 112%CPU (0avgtext+0avgdata 81388maxresident)k                  
0inputs+1108720outputs (0major+171171minor)pagefaults 0swaps                

firefox
--------
8.80user 3.57system 0:37.17elapsed 33%CPU (0avgtext+0avgdata 25260maxresident)k
8inputs+548024outputs (0major+376614minor)pagefaults 0swaps

4.52user 3.30system 0:33.14elapsed 23%CPU (0avgtext+0avgdata 25152maxresident)k
0inputs+544560outputs (0major+386394minor)pagefaults 0swaps

firefox eatmydata
-----------------------
8.79user 2.87system 0:12.43elapsed 93%CPU (0avgtext+0avgdata 25416maxresident)k
0inputs+548016outputs (0major+384193minor)pagefaults 0swaps
4.24user 2.57system 0:08.54elapsed 79%CPU (0avgtext+0avgdata 25280maxresident)k
0inputs+544584outputs (0major+392117minor)pagefaults 0swaps

libreoffice
-------------
22.51user 7.65system 1:28.34elapsed 34%CPU (0avgtext+0avgdata 64856maxresident)k
0inputs+1376160outputs (0major+1018794minor)pagefaults 0swaps

11.34user 6.66system 1:18.04elapsed 23%CPU (0avgtext+0avgdata 64676maxresident)k
16inputs+1370112outputs (0major+1024989minor)pagefaults 0swaps

libreoffice eatmydata
----------------------------
22.41user 6.82system 0:27.45elapsed 106%CPU (0avgtext+0avgdata 64772maxresident)k
0inputs+1376160outputs (0major+1035581minor)pagefaults 0swaps

10.86user 5.78system 0:17.70elapsed 94%CPU (0avgtext+0avgdata 64800maxresident)k
0inputs+1370112outputs (0major+1043637minor)pagefaults 0swaps

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Julian Andres Klode
On Mon, Mar 12, 2018 at 11:06:11AM +0100, Julian Andres Klode wrote:

> Hey folks,
>
> We had a coding day in Foundations last week and Balint and Julian added support for zstd compression to dpkg [1] and apt [2].
>
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892664
> [2] https://salsa.debian.org/apt-team/apt/merge_requests/8
>
> Zstd is a compression algorithm developed by Facebook that offers far
> higher decompression speeds than xz or even gzip (at roughly constant
> speed and memory usage across all levels), while offering 19 compression
> levels ranging from roughly comparable to gzip in size (but much faster)
> to 19, which is roughly comparable to xz -6:
>
> In our configuration, we run zstd at level 19. For bionic main amd64,
> this causes a size increase of about 6%, from roughly 5.6 to 5.9 GB.
> Installs speed up by about 10%, or, if eatmydata is involved, by up to
> 40% - user time generally by about 50%.
>
> Our implementations for apt and dpkg support multiple frames as used by
> pzstd, so packages can be compressed and decompressed in parallel
> eventually.

More links:

PPA:               https://launchpad.net/~canonical-foundations/+archive/ubuntu/zstd-archive
APT merge request: https://salsa.debian.org/apt-team/apt/merge_requests/8
dpkg patches:      https://bugs.debian.org/892664

I'd also like to talk a bit more about libzstd itself: The package is
currently in universe, but btrfs recently gained support for zstd,
so we already have a copy in the kernel and we need to MIR it anyway
for btrfs-progs.

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Daniel Axtens
Hi,

I looked into compression algorithms a bit in a previous role, and to be honest I'm quite surprised to see zstd proposed for package storage. zstd, according to its own github repo, is "targeting real-time compression scenarios". It's not really designed to be run at its maximum compression level, it's designed to really quickly compress data coming off the wire - things like compressing log files being streamed to a central server, or I guess writing random data to btrfs where speed is absolutely an issue.

Is speed of decompression a big user concern relative to file size? I admit that I am biased - as an Australian and with the crummy internet that my location entails, I'd save much more time if the file was 6% smaller and took 10% longer to decompress than the other way around.

Did you consider Google's Brotli?

Regards,
Daniel

On Mon, Mar 12, 2018 at 9:58 PM, Julian Andres Klode <[hidden email]> wrote:
On Mon, Mar 12, 2018 at 11:06:11AM +0100, Julian Andres Klode wrote:
> Hey folks,
>
> We had a coding day in Foundations last week and Balint and Julian added support for zstd compression to dpkg [1] and apt [2].
>
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892664
> [2] https://salsa.debian.org/apt-team/apt/merge_requests/8
>
> Zstd is a compression algorithm developed by Facebook that offers far
> higher decompression speeds than xz or even gzip (at roughly constant
> speed and memory usage across all levels), while offering 19 compression
> levels ranging from roughly comparable to gzip in size (but much faster)
> to 19, which is roughly comparable to xz -6:
>
> In our configuration, we run zstd at level 19. For bionic main amd64,
> this causes a size increase of about 6%, from roughly 5.6 to 5.9 GB.
> Installs speed up by about 10%, or, if eatmydata is involved, by up to
> 40% - user time generally by about 50%.
>
> Our implementations for apt and dpkg support multiple frames as used by
> pzstd, so packages can be compressed and decompressed in parallel
> eventually.

More links:

PPA:               https://launchpad.net/~canonical-foundations/+archive/ubuntu/zstd-archive
APT merge request: https://salsa.debian.org/apt-team/apt/merge_requests/8
dpkg patches:      https://bugs.debian.org/892664

I'd also like to talk a bit more about libzstd itself: The package is
currently in universe, but btrfs recently gained support for zstd,
so we already have a copy in the kernel and we need to MIR it anyway
for btrfs-progs.

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel


--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Neal Gompa
On Mon, Mar 12, 2018 at 9:11 AM, Daniel Axtens
<[hidden email]> wrote:

> Hi,
>
> I looked into compression algorithms a bit in a previous role, and to be
> honest I'm quite surprised to see zstd proposed for package storage. zstd,
> according to its own github repo, is "targeting real-time compression
> scenarios". It's not really designed to be run at its maximum compression
> level, it's designed to really quickly compress data coming off the wire -
> things like compressing log files being streamed to a central server, or I
> guess writing random data to btrfs where speed is absolutely an issue.
>
> Is speed of decompression a big user concern relative to file size? I admit
> that I am biased - as an Australian and with the crummy internet that my
> location entails, I'd save much more time if the file was 6% smaller and
> took 10% longer to decompress than the other way around.
>
> Did you consider Google's Brotli?
>

I can't speak for Julian's decision for zstd, but I can say that in
the RPM world, we picked zstd because we wanted a better gzip.
Compression and decompression times are rather long with xz, and the
ultra-high-efficiency from xz is not as necessary as it used to be,
with storage becoming much cheaper than it was nearly a decade ago
when most distributions switched to LZMA/XZ payloads.

zstd also provides the necessary properties to make it chunkable and
rsyncable, which is useful for metadata. For package payloads, there
are things we can do to make compression go much faster than it does
now (and it's still quite a bit faster than xz as-is and somewhat
faster than gzip now).

I don't know for sure if Debian packaging allows this, but for RPM, we
switch to xz payloads when the package is sufficiently large in which
the compression/decompression speed isn't really going to be matter
(e.g. game data). So while most packages may not necessarily be using
xz payloads, quite a few would. That said, we've been xz for all
packages for a few years now, and the main drag is the time it takes
to wrap everything up to make a package.

As for Google's Brotli, the average compression ratio isn't as high as
zstd, and is markedly slower. With these factors in mind, the obvious
choice was zstd.

(As an aside, rpm in sid/buster and bionic doesn't have zstd support
enabled... Is there something that can be done to make that happen?)

--
真実はいつも一つ!/ Always, there's only one truth!

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Robie Basak-4
In reply to this post by Julian Andres Klode
On Mon, Mar 12, 2018 at 11:06:11AM +0100, Julian Andres Klode wrote:
> We are considering requesting a FFe for that - the features are not
> invasive, and it allows us to turn it on by default in 18.10.

libzstd has only been stable in the archive since Artful. We had to SRU
fixes to Xenial because it was added to Debian (and outside
experimental) before the format was stable upstream.

Of all the general uses of a new compression algorithm, I'd expect our
distribution archival case to be near the end of a develop/test/rollout
cycle. Are you sure we want to rely on it so completely by switching to
it by default in 18.10?

Robie

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Jeremy Bicha-2
In reply to this post by Julian Andres Klode
On Mon, Mar 12, 2018 at 6:06 AM, Julian Andres Klode
<[hidden email]> wrote:
> We are considering requesting a FFe for that - the features are not
> invasive, and it allows us to turn it on by default in 18.10.

What does Debian's dpkg maintainer think?

Thanks,
Jeremy Bicha

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Julian Andres Klode
In reply to this post by Robie Basak-4
On Mon, Mar 12, 2018 at 01:49:42PM +0000, Robie Basak wrote:

> On Mon, Mar 12, 2018 at 11:06:11AM +0100, Julian Andres Klode wrote:
> > We are considering requesting a FFe for that - the features are not
> > invasive, and it allows us to turn it on by default in 18.10.
>
> libzstd has only been stable in the archive since Artful. We had to SRU
> fixes to Xenial because it was added to Debian (and outside
> experimental) before the format was stable upstream.
>
> Of all the general uses of a new compression algorithm, I'd expect our
> distribution archival case to be near the end of a develop/test/rollout
> cycle. Are you sure we want to rely on it so completely by switching to
> it by default in 18.10?

So the goal is to have it in 20.04, which means we should ship it now, so
we can do upgrades from 18.04 to it. Whether we change the default in
18.10 or not, I don't know, but:

IMO, better 18.10 than later. We should gain experience with it,
and if it turns out to be problematic, we can switch the default back
and do no-change rebuilds for 20.04 :)

That said, if we have problems, I expect people using zstd in filesystems
(btrfs) or backup tools (borg) to be off worse.

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Julian Andres Klode
In reply to this post by Neal Gompa
On Mon, Mar 12, 2018 at 09:30:16AM -0400, Neal Gompa wrote:

> On Mon, Mar 12, 2018 at 9:11 AM, Daniel Axtens
> <[hidden email]> wrote:
> > Hi,
> >
> > I looked into compression algorithms a bit in a previous role, and to be
> > honest I'm quite surprised to see zstd proposed for package storage. zstd,
> > according to its own github repo, is "targeting real-time compression
> > scenarios". It's not really designed to be run at its maximum compression
> > level, it's designed to really quickly compress data coming off the wire -
> > things like compressing log files being streamed to a central server, or I
> > guess writing random data to btrfs where speed is absolutely an issue.
> >
> > Is speed of decompression a big user concern relative to file size? I admit
> > that I am biased - as an Australian and with the crummy internet that my
> > location entails, I'd save much more time if the file was 6% smaller and
> > took 10% longer to decompress than the other way around.
> >
> > Did you consider Google's Brotli?
> >
>
> I can't speak for Julian's decision for zstd, but I can say that in
> the RPM world, we picked zstd because we wanted a better gzip.
> Compression and decompression times are rather long with xz, and the
> ultra-high-efficiency from xz is not as necessary as it used to be,
> with storage becoming much cheaper than it was nearly a decade ago
> when most distributions switched to LZMA/XZ payloads.

I want zstd -19 as an xz replacement due to higher decompression speed,
and it also requires about 1/3 less memory when compressing which should
be nice for _huge_ packages.

> I don't know for sure if Debian packaging allows this, but for RPM, we
> switch to xz payloads when the package is sufficiently large in which
> the compression/decompression speed isn't really going to be matter
> (e.g. game data). So while most packages may not necessarily be using
> xz payloads, quite a few would. That said, we've been xz for all
> packages for a few years now, and the main drag is the time it takes
> to wrap everything up to make a package.

We could. But I don't think it matters much.

>
> As for Google's Brotli, the average compression ratio isn't as high as
> zstd, and is markedly slower. With these factors in mind, the obvious
> choice was zstd.
>
> (As an aside, rpm in sid/buster and bionic doesn't have zstd support
> enabled... Is there something that can be done to make that happen?)

I'd open a wishlist bug in the Debian bug tracker if I were you. If
we were to introduce a delta, we'd have to maintain it...

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Julian Andres Klode
In reply to this post by Jeremy Bicha-2
On Mon, Mar 12, 2018 at 10:02:49AM -0400, Jeremy Bicha wrote:
> On Mon, Mar 12, 2018 at 6:06 AM, Julian Andres Klode
> <[hidden email]> wrote:
> > We are considering requesting a FFe for that - the features are not
> > invasive, and it allows us to turn it on by default in 18.10.
>
> What does Debian's dpkg maintainer think?

We are waiting to hear from him in https://bugs.debian.org/892664 - last
time we chatted on IRC, he was open to investigating zstd.

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Robie Basak-4
In reply to this post by Julian Andres Klode
On Mon, Mar 12, 2018 at 03:05:13PM +0100, Julian Andres Klode wrote:

> On Mon, Mar 12, 2018 at 01:49:42PM +0000, Robie Basak wrote:
> > On Mon, Mar 12, 2018 at 11:06:11AM +0100, Julian Andres Klode wrote:
> > > We are considering requesting a FFe for that - the features are not
> > > invasive, and it allows us to turn it on by default in 18.10.
> >
> > libzstd has only been stable in the archive since Artful. We had to SRU
> > fixes to Xenial because it was added to Debian (and outside
> > experimental) before the format was stable upstream.
> >
> > Of all the general uses of a new compression algorithm, I'd expect our
> > distribution archival case to be near the end of a develop/test/rollout
> > cycle. Are you sure we want to rely on it so completely by switching to
> > it by default in 18.10?
>
> So the goal is to have it in 20.04, which means we should ship it now, so
> we can do upgrades from 18.04 to it. Whether we change the default in
> 18.10 or not, I don't know, but:
Sure. I don't have any objection to making it available now for future
use (apart from the usual post-FF required care etc. which the release
team will decide upon).

I can understand why it may be a goal for 20.04, but I assume that's
subject to it having proven itself by then. So while it makes sense to
start this by default in 18.10 to flush out any issues, that also
pre-supposes that it will have proven itself in the future. A tough call
I think, and not one I have enough information to have an opinion upon.
I mention it to point out that the other side of the trade-off exists.

> IMO, better 18.10 than later. We should gain experience with it,
> and if it turns out to be problematic, we can switch the default back
> and do no-change rebuilds for 20.04 :)
>
> That said, if we have problems, I expect people using zstd in filesystems
> (btrfs) or backup tools (borg) to be off worse.

I think there are certain classes of possible problems for which we will
be worse off than the users in the use cases you point out. The
publication of our archives is somewhat more permanent and we can't, for
example, restore from backup using a different compression to repair our
filesystem. It's providing an *automatic* and seamless upgrade path for
affected Ubuntu users that could prove difficult. In some other cases
where users have individually opted in, a seam isn't necessarily a
problem; but it can be for us.

Robie

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Neal Gompa
In reply to this post by Julian Andres Klode
On Mon, Mar 12, 2018 at 10:09 AM, Julian Andres Klode
<[hidden email]> wrote:

> On Mon, Mar 12, 2018 at 09:30:16AM -0400, Neal Gompa wrote:
>> On Mon, Mar 12, 2018 at 9:11 AM, Daniel Axtens
>> <[hidden email]> wrote:
>> > Hi,
>> >
>> > I looked into compression algorithms a bit in a previous role, and to be
>> > honest I'm quite surprised to see zstd proposed for package storage. zstd,
>> > according to its own github repo, is "targeting real-time compression
>> > scenarios". It's not really designed to be run at its maximum compression
>> > level, it's designed to really quickly compress data coming off the wire -
>> > things like compressing log files being streamed to a central server, or I
>> > guess writing random data to btrfs where speed is absolutely an issue.
>> >
>> > Is speed of decompression a big user concern relative to file size? I admit
>> > that I am biased - as an Australian and with the crummy internet that my
>> > location entails, I'd save much more time if the file was 6% smaller and
>> > took 10% longer to decompress than the other way around.
>> >
>> > Did you consider Google's Brotli?
>> >
>>
>> I can't speak for Julian's decision for zstd, but I can say that in
>> the RPM world, we picked zstd because we wanted a better gzip.
>> Compression and decompression times are rather long with xz, and the
>> ultra-high-efficiency from xz is not as necessary as it used to be,
>> with storage becoming much cheaper than it was nearly a decade ago
>> when most distributions switched to LZMA/XZ payloads.
>
> I want zstd -19 as an xz replacement due to higher decompression speed,
> and it also requires about 1/3 less memory when compressing which should
> be nice for _huge_ packages.
>

On a pure space efficiency basis, zstd -19 is still not as good as xz
-9, but it's pretty darned good.

>> I don't know for sure if Debian packaging allows this, but for RPM, we
>> switch to xz payloads when the package is sufficiently large in which
>> the compression/decompression speed isn't really going to be matter
>> (e.g. game data). So while most packages may not necessarily be using
>> xz payloads, quite a few would. That said, we've been xz for all
>> packages for a few years now, and the main drag is the time it takes
>> to wrap everything up to make a package.
>
> We could. But I don't think it matters much.
>

Maybe not. It was useful a long time ago, now we don't really care
either, as we use xz across the board (for the moment).

>>
>> As for Google's Brotli, the average compression ratio isn't as high as
>> zstd, and is markedly slower. With these factors in mind, the obvious
>> choice was zstd.
>>
>> (As an aside, rpm in sid/buster and bionic doesn't have zstd support
>> enabled... Is there something that can be done to make that happen?)
>
> I'd open a wishlist bug in the Debian bug tracker if I were you. If
> we were to introduce a delta, we'd have to maintain it...
>

Hence asking about sid/buster and bionic. :)

My previous experience with debbugs is that it's a black hole. We'll
see if it's better this time.



--
真実はいつも一つ!/ Always, there's only one truth!

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Colin Watson
In reply to this post by Jeremy Bicha-2
On Mon, Mar 12, 2018 at 10:02:49AM -0400, Jeremy Bicha wrote:
> On Mon, Mar 12, 2018 at 6:06 AM, Julian Andres Klode
> <[hidden email]> wrote:
> > We are considering requesting a FFe for that - the features are not
> > invasive, and it allows us to turn it on by default in 18.10.
>
> What does Debian's dpkg maintainer think?

FWIW, I'd be quite reluctant to add support for this to Launchpad until
it's landed in Debian dpkg/apt; a future incompatibility would be very
painful to deal with.

--
Colin Watson                                       [[hidden email]]

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Julian Andres Klode
On Mon, Mar 12, 2018 at 02:19:18PM +0000, Colin Watson wrote:

> On Mon, Mar 12, 2018 at 10:02:49AM -0400, Jeremy Bicha wrote:
> > On Mon, Mar 12, 2018 at 6:06 AM, Julian Andres Klode
> > <[hidden email]> wrote:
> > > We are considering requesting a FFe for that - the features are not
> > > invasive, and it allows us to turn it on by default in 18.10.
> >
> > What does Debian's dpkg maintainer think?
>
> FWIW, I'd be quite reluctant to add support for this to Launchpad until
> it's landed in Debian dpkg/apt; a future incompatibility would be very
> painful to deal with.

Acknowledged. I don't think we want to go ahead without dpkg upstream
blessing anyway. On the APT side, we don't maintain Ubuntu-only branches,
so if we get a go-ahead it would land in Debian immediately too.

I had a quick look at Launchpad and I think it only needs a backport of
the APT commits to an older branch (or an upgrade to bionic, but that
sounds like more work :D) but I might be wrong.

I think the format is versioned and there might be new versions eventually,
so we might have to take care eventually to only keep generating files
in an old format, but xz has the same problem.

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Balint Reczey
In reply to this post by Daniel Axtens
Hi Daniel,

On Mon, Mar 12, 2018 at 2:11 PM, Daniel Axtens
<[hidden email]> wrote:

> Hi,
>
> I looked into compression algorithms a bit in a previous role, and to be
> honest I'm quite surprised to see zstd proposed for package storage. zstd,
> according to its own github repo, is "targeting real-time compression
> scenarios". It's not really designed to be run at its maximum compression
> level, it's designed to really quickly compress data coming off the wire -
> things like compressing log files being streamed to a central server, or I
> guess writing random data to btrfs where speed is absolutely an issue.
>
> Is speed of decompression a big user concern relative to file size? I admit
> that I am biased - as an Australian and with the crummy internet that my
> location entails, I'd save much more time if the file was 6% smaller and
> took 10% longer to decompress than the other way around.

Yes, decompression speed is a big issue in some cases. Please consider
the case of provisioning cluoud/container instances, where after
booting the image plenty of packages need to be installed and saving
seconds matter a lot.

Zstd format also allows parallel decompression which can make package
installation even quicker in wall-clock time.

Internet connection speed increases by ~50% (according to this [3]
study which matches my experience)  on average per year which is more
than 6% for every two months.

>
> Did you consider Google's Brotli?

We did consider it but it was less promising.

Cheers,
Balint

[3] http://xahlee.info/comp/bandwidth.html

>
> Regards,
> Daniel
>
> On Mon, Mar 12, 2018 at 9:58 PM, Julian Andres Klode
> <[hidden email]> wrote:
>>
>> On Mon, Mar 12, 2018 at 11:06:11AM +0100, Julian Andres Klode wrote:
>> > Hey folks,
>> >
>> > We had a coding day in Foundations last week and Balint and Julian added
>> > support for zstd compression to dpkg [1] and apt [2].
>> >
>> > [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892664
>> > [2] https://salsa.debian.org/apt-team/apt/merge_requests/8
>> >
>> > Zstd is a compression algorithm developed by Facebook that offers far
>> > higher decompression speeds than xz or even gzip (at roughly constant
>> > speed and memory usage across all levels), while offering 19 compression
>> > levels ranging from roughly comparable to gzip in size (but much faster)
>> > to 19, which is roughly comparable to xz -6:
>> >
>> > In our configuration, we run zstd at level 19. For bionic main amd64,
>> > this causes a size increase of about 6%, from roughly 5.6 to 5.9 GB.
>> > Installs speed up by about 10%, or, if eatmydata is involved, by up to
>> > 40% - user time generally by about 50%.
>> >
>> > Our implementations for apt and dpkg support multiple frames as used by
>> > pzstd, so packages can be compressed and decompressed in parallel
>> > eventually.
>>
>> More links:
>>
>> PPA:
>> https://launchpad.net/~canonical-foundations/+archive/ubuntu/zstd-archive
>> APT merge request: https://salsa.debian.org/apt-team/apt/merge_requests/8
>> dpkg patches:      https://bugs.debian.org/892664
>>
>> I'd also like to talk a bit more about libzstd itself: The package is
>> currently in universe, but btrfs recently gained support for zstd,
>> so we already have a copy in the kernel and we need to MIR it anyway
>> for btrfs-progs.
>>
>> --
>> debian developer - deb.li/jak | jak-linux.org - free software dev
>> ubuntu core developer                              i speak de, en
>>
>> --


--
Balint Reczey
Ubuntu & Debian Developer

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Colin Watson
In reply to this post by Julian Andres Klode
On Mon, Mar 12, 2018 at 03:36:11PM +0100, Julian Andres Klode wrote:
> Acknowledged. I don't think we want to go ahead without dpkg upstream
> blessing anyway. On the APT side, we don't maintain Ubuntu-only branches,
> so if we get a go-ahead it would land in Debian immediately too.

Good.

> I had a quick look at Launchpad and I think it only needs a backport of
> the APT commits to an older branch (or an upgrade to bionic, but that
> sounds like more work :D) but I might be wrong.

We'll probably also need a dpkg backport (preferably in xenial-updates)
and some small changes to lib/lp/archiveuploader/.  It's not hugely
difficult but will need a bit of work.

--
Colin Watson                                       [[hidden email]]

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Daniel Axtens
In reply to this post by Balint Reczey


On Tue, Mar 13, 2018 at 1:43 AM, Balint Reczey <[hidden email]> wrote:
Hi Daniel,

On Mon, Mar 12, 2018 at 2:11 PM, Daniel Axtens
<[hidden email]> wrote:
> Hi,
>
> I looked into compression algorithms a bit in a previous role, and to be
> honest I'm quite surprised to see zstd proposed for package storage. zstd,
> according to its own github repo, is "targeting real-time compression
> scenarios". It's not really designed to be run at its maximum compression
> level, it's designed to really quickly compress data coming off the wire -
> things like compressing log files being streamed to a central server, or I
> guess writing random data to btrfs where speed is absolutely an issue.
>
> Is speed of decompression a big user concern relative to file size? I admit
> that I am biased - as an Australian and with the crummy internet that my
> location entails, I'd save much more time if the file was 6% smaller and
> took 10% longer to decompress than the other way around.

Yes, decompression speed is a big issue in some cases. Please consider
the case of provisioning cluoud/container instances, where after
booting the image plenty of packages need to be installed and saving
seconds matter a lot.

Zstd format also allows parallel decompression which can make package
installation even quicker in wall-clock time.

Internet connection speed increases by ~50% (according to this [3]
study which matches my experience)  on average per year which is more
than 6% for every two months.


The future is pretty unevenly distributed, and lots of the planet is stuck on really bad internet still.

AFAICT, [3] is anecdotal, rather than a 'study' - it's based on data from 1 person living in California. This is not really representative. If we look at the connection speed visualisation from the Akamai State of the Internet report [4], it shows that lots and lots of countries - most of the world! - has significantly slower internet than that person. 

(FWIW, anecdotally, I've never had a residential connection get faster (except when I moved), which is mostly because the speed of ADSL is pretty much fixed. Anecdotal reports from users in developing countries, and rural areas of developed countries are not encouraging either: [5].)

Having said that, I'm not unsympathetic to the usecase you outline. I just am saddened to see the trade-offs fall against the interests of people with worse access to the internet. If I can find you ways of saving at least as much time without making the files bigger, would you be open to that?

Regards,
Daniel

>
> Did you consider Google's Brotli?

We did consider it but it was less promising.

Cheers,
Balint

[3] http://xahlee.info/comp/bandwidth.html

>
> Regards,
> Daniel
>
> On Mon, Mar 12, 2018 at 9:58 PM, Julian Andres Klode
> <[hidden email]> wrote:
>>
>> On Mon, Mar 12, 2018 at 11:06:11AM +0100, Julian Andres Klode wrote:
>> > Hey folks,
>> >
>> > We had a coding day in Foundations last week and Balint and Julian added
>> > support for zstd compression to dpkg [1] and apt [2].
>> >
>> > [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892664
>> > [2] https://salsa.debian.org/apt-team/apt/merge_requests/8
>> >
>> > Zstd is a compression algorithm developed by Facebook that offers far
>> > higher decompression speeds than xz or even gzip (at roughly constant
>> > speed and memory usage across all levels), while offering 19 compression
>> > levels ranging from roughly comparable to gzip in size (but much faster)
>> > to 19, which is roughly comparable to xz -6:
>> >
>> > In our configuration, we run zstd at level 19. For bionic main amd64,
>> > this causes a size increase of about 6%, from roughly 5.6 to 5.9 GB.
>> > Installs speed up by about 10%, or, if eatmydata is involved, by up to
>> > 40% - user time generally by about 50%.
>> >
>> > Our implementations for apt and dpkg support multiple frames as used by
>> > pzstd, so packages can be compressed and decompressed in parallel
>> > eventually.
>>
>> More links:
>>
>> PPA:
>> https://launchpad.net/~canonical-foundations/+archive/ubuntu/zstd-archive
>> APT merge request: https://salsa.debian.org/apt-team/apt/merge_requests/8
>> dpkg patches:      https://bugs.debian.org/892664
>>
>> I'd also like to talk a bit more about libzstd itself: The package is
>> currently in universe, but btrfs recently gained support for zstd,
>> so we already have a copy in the kernel and we need to MIR it anyway
>> for btrfs-progs.
>>
>> --
>> debian developer - deb.li/jak | jak-linux.org - free software dev
>> ubuntu core developer                              i speak de, en
>>
>> --


--
Balint Reczey
Ubuntu & Debian Developer


--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Benjamin Tegge
Am Dienstag, den 13.03.2018, 12:07 +1100 schrieb Daniel Axtens:

>
>
> On Tue, Mar 13, 2018 at 1:43 AM, Balint Reczey <balint.reczey@canonic
> al.com> wrote:
> > Hi Daniel,
> >
> > On Mon, Mar 12, 2018 at 2:11 PM, Daniel Axtens
> > <[hidden email]> wrote:
> > > Hi,
> > >
> > > I looked into compression algorithms a bit in a previous role,
> > and to be
> > > honest I'm quite surprised to see zstd proposed for package
> > storage. zstd,
> > > according to its own github repo, is "targeting real-time
> > compression
> > > scenarios". It's not really designed to be run at its maximum
> > compression
> > > level, it's designed to really quickly compress data coming off
> > the wire -
> > > things like compressing log files being streamed to a central
> > server, or I
> > > guess writing random data to btrfs where speed is absolutely an
> > issue.
> > >
> > > Is speed of decompression a big user concern relative to file
> > size? I admit
> > > that I am biased - as an Australian and with the crummy internet
> > that my
> > > location entails, I'd save much more time if the file was 6%
> > smaller and
> > > took 10% longer to decompress than the other way around.
> >
> > Yes, decompression speed is a big issue in some cases. Please
> > consider
> > the case of provisioning cluoud/container instances, where after
> > booting the image plenty of packages need to be installed and
> > saving
> > seconds matter a lot.
> >
> > Zstd format also allows parallel decompression which can make
> > package
> > installation even quicker in wall-clock time.
> >
> > Internet connection speed increases by ~50% (according to this [3]
> > study which matches my experience)  on average per year which is
> > more
> > than 6% for every two months.
> >
> >
> The future is pretty unevenly distributed, and lots of the planet is
> stuck on really bad internet still.
>
> AFAICT, [3] is anecdotal, rather than a 'study' - it's based on data
> from 1 person living in California. This is not really
> representative. If we look at the connection speed visualisation from
> the Akamai State of the Internet report [4], it shows that lots and
> lots of countries - most of the world! - has significantly slower
> internet than that person. 
>
> (FWIW, anecdotally, I've never had a residential connection get
> faster (except when I moved), which is mostly because the speed of
> ADSL is pretty much fixed. Anecdotal reports from users in developing
> countries, and rural areas of developed countries are not encouraging
> either: [5].)
>
> Having said that, I'm not unsympathetic to the usecase you outline. I
> just am saddened to see the trade-offs fall against the interests of
> people with worse access to the internet. If I can find you ways of
> saving at least as much time without making the files bigger, would
> you be open to that?
>
> Regards,
> Daniel
>
> [4] https://www.akamai.com/uk/en/about/our-thinking/state-of-the-inte
> rnet-report/state-of-the-internet-connectivity-visualization.jsp
> [5] https://danluu.com/web-bloat/

I want to mention that you can enable ultra compression levels 20 to 22
in zstd which usually achieve results comparable to the highest
compression levels of xz. There should be a level that matches the
results of xz -6 while still being faster than it.

Best regards,
Benjamin



--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Julian Andres Klode
On Wed, Mar 14, 2018 at 04:09:27PM +0100, Benjamin Tegge wrote:
> I want to mention that you can enable ultra compression levels 20 to 22
> in zstd which usually achieve results comparable to the highest
> compression levels of xz. There should be a level that matches the
> results of xz -6 while still being faster than it.

Ultra compression is unusable, it requires about 10 times the memory
or something to decompress.

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Julian Andres Klode
On Wed, Mar 14, 2018 at 02:40:01PM -0300, Marcos Alano wrote:
> May be run some tests to find the sweet spot between size and speed?

Well, that's what we did, and the sweet spot is -19, the maximum
non-ultra level.

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: zstd compression for packages

Steve Langasek-6
In reply to this post by Julian Andres Klode
Hi Julian,

Thanks for posting about this.  I agree that if this is landing in dpkg+apt
upstream, it's reasonable to try to get it into the 18.04 release so that it
can be used in later releases without needing a dpkg versioned pre-depends.

If we are to evaluate using zstd as the default compression in 18.10 (or
later), I think we need to consider the total install experience, and not
just look at the dpkg unpack time.

For example:

On Mon, Mar 12, 2018 at 11:06:11AM +0100, Julian Andres Klode wrote:
> firefox
> --------
> 8.80user 3.57system 0:37.17elapsed 33%CPU (0avgtext+0avgdata 25260maxresident)k
> 8inputs+548024outputs (0major+376614minor)pagefaults 0swaps
>
> 4.52user 3.30system 0:33.14elapsed 23%CPU (0avgtext+0avgdata 25152maxresident)k
> 0inputs+544560outputs (0major+386394minor)pagefaults 0swaps

> firefox eatmydata
> -----------------------
> 8.79user 2.87system 0:12.43elapsed 93%CPU (0avgtext+0avgdata 25416maxresident)k
> 0inputs+548016outputs (0major+384193minor)pagefaults 0swaps
> 4.24user 2.57system 0:08.54elapsed 79%CPU (0avgtext+0avgdata 25280maxresident)k
> 0inputs+544584outputs (0major+392117minor)pagefaults 0swaps

Since you don't list binary package names for kernel or libreoffice, I'll
look at firefox, which is the obvious one.  The archive version of this
package is 42MiB in size in bionic.  If the zstd version is 6% larger, but
takes 4 seconds less time to unpack, this means the total install time
(download+unpack) is only improved for the end user if the download speed
from the apt source is faster than (44108204 bytes * .06 * 8bits/byte / 4.03s
~=) 5.25Mbps.

Have you established that this is a typical effective download speed for
Ubuntu users?  It's certainly faster than my home connection, though I also
use a local mirror to speed up installs.  It may be reasonable to expect
cloud instances to have this much throughput from their mirrors, and so it
might be the sensible choice solely on that basis; I'm just checking that
it's been measured.

In other words: if we want to make this the default, we should quantify
Daniel's remark that he would prefer a 6% faster download over a 10% faster
unpack.


I think we also need to look at the spread of package size increases.  If 6%
is typical, are there some packages on the high end (of both absolute package
size and relative size increase) that we should exclude from switching to
zstd?  We should be transparent about our analysis here.

Thanks,
--
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                    http://www.debian.org/
[hidden email]                                     [hidden email]

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

signature.asc (817 bytes) Download Attachment
12