+1 maintenance report

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

+1 maintenance report

Bryce Harrington-8
Two priorities I focused on this week were getting ruby unblocked, and
drive phpunit towards completion.  In addition, I got about 80 other
packages to migrate via rebuilds and retriggers.

### Ruby / Rubygems / Rails ###

The Ruby transition has been gridlocked due to Node stuff, so I gave
this some priority attention this week.  tldr; I got the Node stuff
sorted but there's still some Ruby stuff to be done.

Ruby is blocked by rubygems, which is blocked by rails, which is blocked
by node-rollup.  Node-rollup's transition involved a few things, but the
key piece was node-rollup-plugin-node-resolve, which had a Breaks on the
exact version of rollup that is currently in hirsute.  This created a
situation where node-rollup FTBFS because it requires both rollup and
node-rollup-plugin-node-resolve, but could not have both installed
simultaneously due to the Breaks.  I softened the Breaks to allow
hirsute's rollup, which enabled node-rollup to build.

With that achieved, I was able to get rails to rebuild, and that allowed
a TON of ruby-* packages to retrigger and pass their autopkgtests.  A
few of these failed due to the usual intermittent network/hw issues, and
I got those resolved.

But for rails, there's still a handful of ruby-* packages needing
followup work.  Half a dozen of these show some sort of issue with
missing directory app/assets/javascripts/application.js; these probably
all share the same root cause.  There's a few more with other assorted
unrelated problems.

rubygems is only blocked by rails, so hopefully once rails clears,
rubygems will too.  I've retriggered its tests but think it may need to
wait until after rails.

ruby2.7 looks a lot closer to being ready to migrate now.  It's still
waiting on some armhf test results, but the only package that still has
problems is puma 4.3.6.  The test logs for puma on arm64 and amd64 show
different failures.  There is a puma 5.2.1 released upstream, and the
bug tracker shows work has been done on issues with similar error logs,
so there may be fixes available upstream.

For ruby-defaults, I retriggered the 15 or so packages listed with it
and rubygems, etc., but most did not pass and will need further
investigation.  The issues in the logs aren't obvious to me.



### PhpUnit ###

phpunit is transitioning from 8.5 to 9.5, which required some
bootstrapping work previously, but was down to three packages with
issues at the start of the week.

Two packages just required retriggers with the right versions of various
things.  The third package, php-http-request2/2.3.0-1ubuntu2, required a
bit more attention due to failures in 25 unit tests.

Upstream has released a new 2.4.2 version, which has some updates for
changes in PHP itself, but doesn't have updates needed for phpunit 9.

Fortunately, the updates were straightforward to figure out - there were
three (long-deprecated) PHPUnit functions that have been dropped with
PHPUnit 9.  I fixed these in the 25 test cases and uploaded
2.3.0-1ubuntu3 with the patches.

With this fixed, I believe phpunit should finally complete its
transition.  I'll check back on it next week to be sure.


### Regular +1 Maintenance ###

Along with the above, I did the usual labor of rebuilds and retriggers.


These had network, DNS, or test timeout issues.  I retriggered them and
they migrated:

    apport/2.20.11-0ubuntu57 test failures on amd64
    libreoffice/1:7.0.4~rc2-0ubuntu2 test failures on ppc64el
    python3-lxc/1:3.0.4-1ubuntu8 test failures on arm64
    automake-1.16/1:1.16.3-2ubuntu1 test failure on amd64
    libinsane/1.0.9-2 on s390x
    ocrmypdf/10.3.1+dfsg-1 on amd64
    sshuttle/1.0.4-1ubuntu4 test failure on amd64
    pcs/0.10.8-1 on amd64
    dolfin/2019.2.0~git20201207.b495043-4 on armhf
    julia/1.5.3+dfsg-2 on ppc64el
    rtags/2.38-3 on amd64
    rhonabwy/0.9.13-1 on armhf
    r-cran-rsvd/1.0.3-3build1 on armhf
    r-cran-sf/0.9-6+dfsg-2 on armhf
    nbsphinx/0.8.0+ds-1 on armhf


These were FTBFS, but appear to just be due to flaky hardware or
something.  A simple rebuild got them sorted, and they were able to
progress to running autopkgtests:

    flightgear-data/1:2020.3.6+dfsg-1 build failure on amd64
    libblockdev/2.25-1 build failure on riscv64
    mu-editor/1.0.3+dfsg-2 build failure on amd64
    botch/0.23-1 build failure on riscv64
    neutron/2:17.1.0+git2021012815.0fb63f7297-0ubuntu2 build failure on amd64
    trapperkeeper-scheduler-clojure/1.1.3-3 build failure on amd64
    phpmyadmin/4:5.0.4+dfsg2-2 build failure on amd64
    glewlwyd/2.5.2-1 build failure on amd64
    twitter-bootstrap4/4.5.2+dfsg1-6 build failure on amd64
    safe-rm/1.1.0-2 build failure on riscv64
    golang-github-hashicorp-go-plugin/1.0.1-3 build failure on amd64
    node-katex/0.10.2+dfsg-8 build failure on amd64
    gitbatch/0.5.0-3 build failure on riscv64
    vue.js/2.6.12+dfsg-3 build failure on amd64
    node-mini-css-extract-plugin/1.3.3-1 build failure on amd64


This next set had test failures due to intermittent network issues, or
other 'flaky' troubles, and passed on a simple retrigger.  (I'm not
entirely sure I can take full credit for the retriggers, as I think
archive admins and others were doing similar in parallel for perl and
such.)  In any case, I verified they all migrated out of proposed:

    forensics-extra/2.28 on armhf
    dbconfig-common/2.0.18 on armhf
    freedom-maker/0.28 on amd64, arm64, ppc64el, s390x
    golang-github-containerd-btrfs/0.0~git20201111.404b914-1 on amd64, arm64, ppc64el, s390x
    golang-github-containers-storage/1.23.9+dfsg1-1ubuntu2 on amd64, arm64, ppc64el, s390x
    tomb/ on amd64, arm64, ppc64el, s390x
    golang-github-markbates-goth/1.42.0-6 on armhf
    auto-multiple-choice/1.5.0~rc1-1ubuntu1 on armhf
    libapp-cli-perl/0.313-2 on armhf
    libbio-variation-perl/1.7.5-1 on armhf
    libcache-historical-perl/0.05-2.1 on armhf
    libcatmandu-template-perl/0.13-1 on armhf
    libclass-mixinfactory-perl/0.92-3.1 on armhf
    libconfig-scoped-perl/0.22-2.1 on armhf
    libconvert-color-perl/0.11-2.1 on armhf
    libcpan-distnameinfo-perl/0.12-2.1 on armhf
    libdata-password-zxcvbn-perl/1.0.4-2 on armhf
    libdata-tablereader-perl/0.011-1 on amd64
    libdevel-caller-ignorenamespaces-perl/1.1-1 on armhf
    libdist-zilla-plugins-cjm-perl/6.000-1 on armhf
    libextutils-depends-perl/0.8000-1 on armhf
    libfennec-lite-perl/ on armhf
    libfile-queue-perl/1.01a-2 on armhf
    libfile-sharedir-projectdistdir-perl/1.000009-1 on armhf
    libformvalidator-simple-perl/0.29-2.1 on armhf
    libgenome-model-tools-music-perl/ on armhf
    libhtml-escape-perl/1.10-1build3 on armhf
    libhtml-wikiconverter-usemod-perl/0.50-3 on armhf
    libhttp-daemon-ssl-perl/1.05-01-2 on armhf
    liblingua-en-number-isordinal-perl/ on armhf
    liblog-dispatch-config-perl/1.04-2 on armhf
    libmath-calculus-differentiate-perl/0.3-2.1 on armhf
    libmldbm-perl/2.05-2.1 on armhf
    libmodule-corelist-perl/5.20210123-1 on armhf
    libmodule-starter-smart-perl/0.0.9-1 on armhf
    libnamespace-clean-perl/0.27-1 on armhf
    libnet-amazon-s3-tools-perl/0.08-2.1 on armhf
    libpdl-io-hdf5-perl/1:0.73-6 on armhf
    libplack-middleware-crossorigin-perl/0.014-1 on armhf
    libpoe-component-jobqueue-perl/0.5710-1 on armhf
    libtemplate-plugin-clickable-email-perl/0.01-2.1 on amd64
    libtest-lectrotest-perl/0.5001-3 on amd64
    libtest-pod-perl/1.52-1 on armhf
    libtext-wikicreole-perl/0.07-2 on armhf
    libtie-cphash-perl/2.000-1.1 on armhf
    libxml-stream-perl/1.24-4 on armhf
    munin/2.0.57-1ubuntu1 on armhf
    forensics-all/3.27 on armhf
    ipset/7.10-1 on amd64
    node-d3-axis/1.0.12-3 on ppc64el
    node-gulp-sourcemaps/2.6.5+~cs4.0.1-3 on amd64, arm64, ppc64el, s390x

There were also a couple dozen packages that failed due to issues
launching their VM, which I've retriggered but are still running.

Bryce

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: +1 maintenance report

Seth Arnold
On Fri, Feb 12, 2021 at 06:10:32PM -0800, Bryce Harrington wrote:
> This next set had test failures due to intermittent network issues, or
> other 'flaky' troubles, and passed on a simple retrigger.  (I'm not

Could we build a retriggerbot that smashes the retry button three times
before bothering any humans about failed tests?

Hitting retry is often the first troubleshooting step people take; I've
heard tests may be retried something like ten times by different people,
each of whom was taking a reasonable enough "first debugging step"
without noticing that other people have also done the same.

This way, any failures that bubble up to a human would be 'bad enough'
that it requires a human to address. (Of course, this may require tuning
the 'three' in my suggestion based on how many retries are usually
necessary.) Whoever is looking at an error could skip a retry at that
point and start right away looking at the test in question.

Thanks

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel

signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: +1 maintenance report

Jan Ceuleers
On 13/02/2021 04:49, Seth Arnold wrote:
> Could we build a retriggerbot that smashes the retry button three times
> before bothering any humans about failed tests?
>
> Hitting retry is often the first troubleshooting step people take; I've
> heard tests may be retried something like ten times by different people,
> each of whom was taking a reasonable enough "first debugging step"
> without noticing that other people have also done the same.

Not an Ubuntu developer but I do work as a quality manager. Not sure
whether my list post will be accepted, so I'm copying you.

The assumption underlying your suggestion is that tests that
intermittently fail do so because of intermittent failures in the test
environment rather than due to actual bugs that manifest themselves only
intermittently (such as race conditions).

This is fine if you have evidence that the assumption holds in a
sufficiently large majority of cases.

HTH, Jan

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: +1 maintenance report

Julian Andres Klode
On Mon, Feb 15, 2021 at 05:36:15PM +0100, Jan Ceuleers wrote:

> On 13/02/2021 04:49, Seth Arnold wrote:
> > Could we build a retriggerbot that smashes the retry button three times
> > before bothering any humans about failed tests?
> >
> > Hitting retry is often the first troubleshooting step people take; I've
> > heard tests may be retried something like ten times by different people,
> > each of whom was taking a reasonable enough "first debugging step"
> > without noticing that other people have also done the same.
>
> Not an Ubuntu developer but I do work as a quality manager. Not sure
> whether my list post will be accepted, so I'm copying you.
>
> The assumption underlying your suggestion is that tests that
> intermittently fail do so because of intermittent failures in the test
> environment rather than due to actual bugs that manifest themselves only
> intermittently (such as race conditions).
>
> This is fine if you have evidence that the assumption holds in a
> sufficiently large majority of cases.

Analysing the root cause of intermittent failures is in practice wishful
thinking though. There are too many tests to fix all those race
conditions. The only thing we can do is mindlessly retry stuff.

The only two cases we care about are stable tests becoming flaky, and
flaky tests becoming consistent failures, and both of these will be
flagged by not migrating to release pocket.

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel
Reply | Threaded
Open this post in threaded view
|

Re: +1 maintenance report

Bryce Harrington-8
In reply to this post by Jan Ceuleers
On Mon, Feb 15, 2021 at 05:36:15PM +0100, Jan Ceuleers wrote:
> On 13/02/2021 04:49, Seth Arnold wrote:
> > Could we build a retriggerbot that smashes the retry button three times
> > before bothering any humans about failed tests?

Actually, we can be a lot more precise than that; see below.

> > Hitting retry is often the first troubleshooting step people take; I've
> > heard tests may be retried something like ten times by different people,
> > each of whom was taking a reasonable enough "first debugging step"
> > without noticing that other people have also done the same.
>
> Not an Ubuntu developer but I do work as a quality manager. Not sure
> whether my list post will be accepted, so I'm copying you.
>
> The assumption underlying your suggestion is that tests that
> intermittently fail do so because of intermittent failures in the test
> environment rather than due to actual bugs that manifest themselves only
> intermittently (such as race conditions).
>
> This is fine if you have evidence that the assumption holds in a
> sufficiently large majority of cases.


It's certainly true that "randomly retry tests" has proven to be an
effective way to get things unblocked.  No denying that.

In this particular situation, though, what I did was scanned build logs
for certain phrases such as 'Unable to connect to ftpmaster', 'Temporary
failure resolving', etc. that tend to be strong indicators of
environment problems rather than test problems.  There are a few other
good heuristics like tests that fail on only one architecture and pass
on all the others, or that are FTBFS on just one arch and haven't been
rebuilt in >15 days.  I also suspect some arch's may be more likely to
see environmental failures than others, but I don't have conclusive data
there yet.  And yes, obviously if someone's already retriggered the test
within the past few days and it still failed, there's little need to
retry that specific set of retriggers on that migration item again.

So, I strongly agree with Seth that there's some good automation
potential in retriggering things; plus, I think we can be even more
precise in how this is done by looking at what's causing these kinds of
failures, and then hopefully use resources more efficiently.

Bryce

--
ubuntu-devel mailing list
[hidden email]
Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel