Discussion:
[Twisted-Python] Test coverage requirements
Jean-Paul Calderone
2017-02-26 23:51:54 UTC
Permalink
Hi,

I'm looking at some <https://codecov.io/gh/twisted/twisted/pull/509> recent
<https://codecov.io/gh/twisted/twisted/pull/432> trunk
<https://codecov.io/gh/twisted/twisted/pull/652> commits
<https://codecov.io/gh/twisted/twisted/pull/695> (also, others) that seem
to have non-trivial untested code at at ReviewProcess
<https://twistedmatrix.com/trac/wiki/ReviewProcess#Authors:Thingsyourbranchorpatchmustcontain>.
I can't tell if the codecov reports are wrong or if the development process
documentation is wrong or if the commits just violate policy or (I guess)
some mix of the three.

Can anyone shed any light on this?

Jean-Paul
ex vito
2017-02-27 01:28:44 UTC
Permalink
I'm looking at some recent trunk commits (also, others) that seem to have non-trivial untested code at at ReviewProcess. I can't tell if the codecov reports are wrong or if the development process documentation is wrong or if the commits just violate policy or (I guess) some mix of the three.
Can anyone shed any light on this?
Jean-Paul,

I'm the author of PR 652, one of the ones you pointed out.

Starting with https://codecov.io/gh/twisted/twisted/pull/652 and navigating to https://codecov.io/gh/twisted/twisted/pulls/652/src/src/twisted/internet/unix.py I'm not sure I understand what it is representing -- lines of code that tests did not cover, I guess. :)

However, the diff for that file in the PR, at https://github.com/twisted/twisted/pull/652/files#diff-32f19fc001798d7ea33686492428bdf2 does not touch any of the lines highlighted by the those codecov links.

Does this make any sense? (plain honest question, I'm not familiar with codecov other that using its output to guide missing test scenarios)

At some point, like I commented in https://github.com/twisted/twisted/pull/652#issuecomment-276334447, codecov tests were failing, complaining that a single non-changed line within the -3/+3 line diff boundaries wasn't being covered. I found that strange then.

One other thing I find strange is that trying to access any previous codecov test results from within that PR -- say the first run on Dec 30th -- ends up showing the same (or very similar?) report as the one you pointed out initially; in other words, complaining about lines that were not changed in any commit within that PR.

Again, I'm not aware of the details of codecov's operation, but it certainly is no longer reporting the same things it did back when the PR was being worked on. Whatever it is reporting now, at least with regards to PR 652, does not seem to match the associated diff.

Regards,
--
exvito
Adi Roiban
2017-02-27 09:06:33 UTC
Permalink
On 26 February 2017 at 23:51, Jean-Paul Calderone
Hi,
I'm looking at some recent trunk commits (also, others) that seem to have
non-trivial untested code at at ReviewProcess. I can't tell if the codecov
reports are wrong or if the development process documentation is wrong or if
the commits just violate policy or (I guess) some mix of the three.
Can anyone shed any light on this?
Besise https://codecov.io/gh/twisted/twisted/pulls/509/diff, the other
PRs looks OK and with good coverage.

In terms of codecov.io coverage I am using the "diff" view as I find
it less confusing.

The strong red lines are the ones which codecov considered that were
part of the patch and were not covered.
The yellow lines are for missing branch coverage in this patch... and
they will decrease the coverage percentage.

In summary:

For PR 509 - The diff coverage is 74.39%. ... this looks bad
For PR 652 - The diff coverage is 98.7% ... not perfect, but pretty
good as the missing lines are in the test code area :)

--------

At some point we had a commit status check for 100% diff coverage, but
I remember that it was not popular.

If we are not aiming for the ultimate quality dev process we can still
consider having a lower barer of something like 75% or 95% diff
coverage... For PR less of this value, you will see a big warning...
but people from the Twisted admin team can still commit them. Simple
(non-admin) collaborators can only merge them of the diff coverage is
above that limit.

I am for a check for 100% coverage as many times it helped me get out
of comfort zone and "forced" to write better tests and better code
... I know that libertarians might not like being forced to do things
so I understand why the hard check was not popular.

----------

For the good PRs which were mentioned here

For https://codecov.io/gh/twisted/twisted/pulls/695/diff the only line
without a coverage is the one in which the exception is raised if the
reactor is installed twice.

For https://codecov.io/gh/twisted/twisted/pulls/652/diff the uncovered
lines are the case in which a tests is is not skipped on OSX (this is
strange ... but maybe osx coverage are no longer published) ... and
the other one is the self.fail which is expected not be be called.

I think there was a discussion about covering self.fail() calls by
extracting them into assertion helper and making sure we have tests
for those assertion helpers.... so that we can ask all changes to also
have 100% test code coverage :)

For https://codecov.io/gh/twisted/twisted/pulls/432/diff there were
some changes in formatting (some parenthesis) which were indented and
that code was not previously covered ... so now it is considered code
on which someone made a change without having the tests to cover the
changes.
--
Adi Roiban
Glyph Lefkowitz
2017-02-27 19:54:07 UTC
Permalink
Hi,
I'm looking at some <https://codecov.io/gh/twisted/twisted/pull/509> recent <https://codecov.io/gh/twisted/twisted/pull/432> trunk <https://codecov.io/gh/twisted/twisted/pull/652> commits <https://codecov.io/gh/twisted/twisted/pull/695> (also, others) that seem to have non-trivial untested code at at ReviewProcess <https://twistedmatrix.com/trac/wiki/ReviewProcess#Authors:Thingsyourbranchorpatchmustcontain>. I can't tell if the codecov reports are wrong or if the development process documentation is wrong or if the commits just violate policy or (I guess) some mix of the three.
Can anyone shed any light on this?
Thanks, JP. Your habit of post-hoc commit review really helps keep the process honest - I wish more folks (myself included) had more time for it!

I wasn't party to most of these changes, except for 432, where I can say based on local testing I was very clear that coverage was not in fact an issue, and codecov was misreporting it. So I can't comment on most of them specifically.

Generally though, I can say that Codecov, while more reliable than coveralls, is unfortunately really flaky. (Case in point, right now, clicking on all the URLs you posted here, I just see "AccessDeniedAccess Denied" followed by a long 64-bit identifier which I'm not pasting here for fear that it's some kind of github credential.) Its general unreliability is the main reason we removed the hard coverage requirement that blocked merging.

That said, it has been improving and if it keeps improving at the rate it has been, I expect that we'd be able to put that coverage blocker back in in another 3-4 months. Perhaps something to talk about at PyCon.

-glyph
Tristan Seligmann
2017-02-27 23:00:17 UTC
Permalink
That said, it has been *improving* and if it keeps improving at the rate
it has been, I expect that we'd be able to put that coverage blocker back
in in another 3-4 months. Perhaps something to talk about at PyCon.
I think at least one problem that we're suffering from here is our fault,
rather than Codecov's: the coverage of the test suite is not stable due to
non-determinism in the test suite. That is, the lines executed during a
test run are not the same every time due to things like ordering / timing
races / etc. This means that "changes" to coverage may show up for a
particular PReven though nothing in that PR is actually responsible.
Jean-Paul Calderone
2017-02-27 23:33:44 UTC
Permalink
Post by Tristan Seligmann
That said, it has been *improving* and if it keeps improving at the rate
it has been, I expect that we'd be able to put that coverage blocker back
in in another 3-4 months. Perhaps something to talk about at PyCon.
I think at least one problem that we're suffering from here is our fault,
rather than Codecov's: the coverage of the test suite is not stable due to
non-determinism in the test suite. That is, the lines executed during a
test run are not the same every time due to things like ordering / timing
races / etc. This means that "changes" to coverage may show up for a
particular PReven though nothing in that PR is actually responsible.
Changes to Twisted code which are only sometimes covered by the test suite
sound like they would violate a 100% coverage rule. But I guess the
experience of looking at a codecov report is so bad/confusing that it's not
surprising authors/reviewers might fail to see what's going on and fix the
non-deterministic.

Particularly for code that requires coverage measurements on multiple
platforms (ie, you basically *can't* do it locally), it seems like it would
be easier (though, to be clear, *bad*) to just forget about it and hope
everything is covered...

A tool that pointed out coverage differences between multiple runs of the
same version of the code would be a useful thing to start pointing out
where these flaws in the Twisted test suite lie, right? And then each area
could be given deterministic test coverage instead...
Glyph Lefkowitz
2017-02-28 00:46:11 UTC
Permalink
Post by Glyph Lefkowitz
That said, it has been improving and if it keeps improving at the rate it has been, I expect that we'd be able to put that coverage blocker back in in another 3-4 months. Perhaps something to talk about at PyCon.
I think at least one problem that we're suffering from here is our fault, rather than Codecov's: the coverage of the test suite is not stable due to non-determinism in the test suite. That is, the lines executed during a test run are not the same every time due to things like ordering / timing races / etc. This means that "changes" to coverage may show up for a particular PReven though nothing in that PR is actually responsible.
Changes to Twisted code which are only sometimes covered by the test suite sound like they would violate a 100% coverage rule. But I guess the experience of looking at a codecov report is so bad/confusing that it's not surprising authors/reviewers might fail to see what's going on and fix the non-deterministic.
Particularly for code that requires coverage measurements on multiple platforms (ie, you basically can't do it locally), it seems like it would be easier (though, to be clear, bad) to just forget about it and hope everything is covered...
A tool that pointed out coverage differences between multiple runs of the same version of the code would be a useful thing to start pointing out where these flaws in the Twisted test suite lie, right? And then each area could be given deterministic test coverage instead...
While this is certainly an issue, I don't think it's the issue we're discussing here. Unreliability of coverage is largely mitigated by the fact that the main thing we pay attention to is "patch coverage", which can be seen to fluctuate from commit to commit on a branch if the new test coverage is non-deterministic (and rarely is a PR an individual commit). This is opposed to "coverage delta", which only looks at coverage before / coverage after and is indeed somewhat unpredictable due to old / bad tests.

So I can say when I've had to overrule codecov, it's almost never been because of flapping coverage lines outside of the patch under consideration (and the patches in consideration either have deterministic tests, or I ask the author to add them).

General improvements to build reliability often reduce coverage unreliability as well, so as we've been using Github more, which surfaces status visibility / mergeability to reviewers more, we've been fixing lots of little build-reliability issues and this problem continues to get smaller.

-glyph

Loading...