While working on https://github.com/ooni/probe/issues/2130, and specifically on the action item related to making sure all workflows are green, I was confronted with the complexity of the QA directory.
There's plenty of cleaning up and simplifying there. The original intent was to A/B test `miniooni` and `measurement_kit` to ensure they were behaving the same. We don't have this need anymore.
Rather, it seems the QA scripts have grown large and flaky, to the point that I am always tempted to ignore them. The underlying censorship engine, jafar, has also not been developed for quite some time.
So, the first step towards improve the QA infrastructure seems to be humble and acknowledge that we cannot realistically maintain these checks using jafar as a backend for so many experiments.
Let us focus on our most important experiment, Web Connectivity, and let us keep QA checks for it.
Additionally, let us simplify and cleanup QA as much as possible, though without introducing radical changes.
The end result is a QA for Web Connectivity that seems reasonable and runs in six minutes.
While there, include integration testing to make sure the script
is working as intended before using it.
While there, edit maketarball.bash's comments.
This work aims to make Linux builds faster to make https://github.com/ooni/probe/issues/2249 more convenient. Since those builds runs inside Docker, the problem to solve here is to save/restore the Go caches notwithstanding Docker. Because Docker runs as root, we need to modify the build a bit to run as a normal user. Otherwise, we will not be able to save the Go cache using actions/cache@v3. (Other approaches such as using `sudo` are possible but running the build as an unprivileged user actually looks cleaner, so I chose to do that.) While there, add a `.editorconfig`.
This diff modifies all the github actions that produce assets to
publish on a release called rolling when we are not building a tag.
If everything goes as planned, we should be able to provide
people with automatically generated fresh binaries for testing.
While there, introduce caching for all builds to make them
as fast as possible. I suspect gomobile based builds will not
see any speed up but other builds most likely will.
See https://github.com/ooni/probe/issues/2249
This diff introduces a build script, makefile rules, and github actions
rules to build and public android CLI releases.
See https://github.com/ooni/probe/issues/1723
Rather than hardcoding the NDK version inside a script, encode it
as a file in the filesystem, which is easier to share.
Make sure we use the desired NDK by setting environment variables.
Use `-androidabi 21`, which:
1. is what rclone did: 8390ba4ca9
2. is the minimum ABI used by probe-android: 994651be52/app/build.gradle (L10)
Part of https://github.com/ooni/probe/issues/2130
Part of https://github.com/ooni/probe/issues/1753.
While there, introduce a rule by which, if the branch is named `fullbuild` we run all possible builds. It helps to test all the builds without creating a release branch. Because release branches are protected, they cannot be deleted easily. On the contrary, the `fullbuild` branch can easily be disposed of.
It seems several CI builds failed for [v3.16.0-alpha](https://github.com/ooni/probe-cli/releases/tag/v3.16.0-alpha). Let's aim to repair miniooni and ooniprobe-windows for now. The other failing builds seem more tricky. (Android fails with an unsupported NDK while Linux fails with issues accessing the git repository from Docker, probably because the the user running inside Docker is not the user that owns the repository.)
While there, document what we need to document without mentioning too
many details about release builds, for which we have the Makefile.
See https://github.com/ooni/probe/issues/2218.
I am not 100% sure I was able to fix all the cases in which we
need higher permissions than the strict default.
At least, I tried.
It may be reasonable to make an interim release to check whether I
successfully fixed all the cases.
Ref issue: https://github.com/ooni/probe/issues/2154
After https://github.com/ooni/probe-cli/pull/764, the build for
CGO_ENABLED=0 has been broken for miniooni:
https://github.com/ooni/probe-cli/runs/6636995859?check_suite_focus=true
Likewise, it's not possible to run tests with CGO_ENABLED=0.
To make tests work with `CGO_ENABLED=0`, I needed to sacrifice some
unit tests run for the CGO case. It is not fully clear to me what was happening
here, but basically `getaddrinfo_cgo_test.go` was compiled with CGO
being disabled, even though the ``//go:build cgo` flag was specified.
Additionally, @hellais previously raised a valid point in the review
of https://github.com/ooni/probe-cli/pull/698:
> Another issue we should consider is that, if I understand how
> this works correctly, depending on whether or not we have built
> with CGO_ENABLED=0 on or not, we are going to be measuring
> things in a different way (using our cgo inspired getaddrinfo
> implementation or using netgo). This might present issues when
> analyzing or interpreting the data.
>
> Do we perhaps want to add some field to the output data format that
> gives us an indication of which DNS resolution code was used to
> generate the the metric?
This comment is relevant to the current commit because
https://github.com/ooni/probe-cli/pull/698 is the previous
iteration of https://github.com/ooni/probe-cli/pull/764.
So, while fixing the build and test issues, let us also distinguish
between the CGO_ENABLED=1 and CGO_ENABLED=0 cases.
Before this commit, OONI used "system" to indicate the case where
we were using net.DefaultResolver. This behavior dates back to the
Measurement Kit days. While it is true that ooni/probe-engine and
ooni/probe-cli could have been using netgo in the past when we
said "system" as the resolver, it also seems reasonable to continue
to use "system" top indicate getaddrinfo.
So, the choice here is basically to use "netgo" from now on to
indicate the cases in which we were built with CGO_ENABLED=0.
This change will need to be documented into ooni/spec along with
the introduction of the `android_dns_cache_no_data` error.
## Checklist
- [x] I have read the [contribution guidelines](https://github.com/ooni/probe-cli/blob/master/CONTRIBUTING.md)
- [x] reference issue for this pull request: https://github.com/ooni/probe/issues/2029
- [x] if you changed anything related how experiments work and you need to reflect these changes in the ooni/spec repository, please link to the related ooni/spec pull request: https://github.com/ooni/spec/pull/242
The objective is to make PR checks run much faster.
See https://github.com/ooni/probe/issues/2113 for context.
Regarding netxlite's tests:
Checking for every commit on master or on a release branch is
good enough and makes pull requests faster than one minute since
netxlite for windows is now 1m slower than coverage.
We're losing some coverage but coverage from integration tests
is not so good anyway, so I'm not super sad about this loss.
Here's the squash of the following patches that enable support
for go1.18 and update our dependencies.
This diff WILL need to be backported to the release/3.14 branch.
* chore: use go1.17.8
See https://github.com/ooni/probe/issues/2067
* chore: upgrade to probe-assets@v0.8.0
See https://github.com/ooni/probe/issues/2067.
* chore: update dependencies and enable go1.18
As mentioned in 7a0d17ea91,
the tree won't build with `go1.18` unless we say it does.
So, not only here we need to update dependencies but also we
need to explicitly say `go1.18` in the `go.mod`.
This work is part of https://github.com/ooni/probe/issues/2067.
* chore(coverage.yml): run with go1.18
This change will give us a bare minimum confidence that we're
going to build our tree using version 1.18 of golang.
See https://github.com/ooni/probe/issues/2067.
* chore: update user agent used for measuring
See https://github.com/ooni/probe/issues/2067
* chore: run `go generate ./...`
See https://github.com/ooni/probe/issues/2067
* fix(dialer_test.go): make test work with go1.17 and go1.18
1. the original test wanted the dial to fail, so ensure we're not
passing any domain name to exercise dialing not resolving;
2. match the end of the error rather than the whole error string.
Tested locally with both go1.17 and go1.18.
See https://github.com/ooni/probe-cli/pull/708#issuecomment-1096447186
We're starting to prepare a new release. The first step is to use
go1.17.6 in the following places:
1. everywhere we define the version of Go in this tree;
2. when we're building for Android (using ooni/go);
3. in our ooni/oohttp fork of Go net/http standard library.
Reference issue: https://github.com/ooni/probe/issues/1845
* chore(netxlite): add currently failing test case
This diff introduces a test cases that will fail because of the reason
explained in https://github.com/ooni/probe/issues/1965.
* chore(netxlite/iox_test.go): add failing unit tests
These tests directly show how the Go implementation of ReadAll
and Copy has the issue of checking for io.EOF equality.
* fix(netxlite): make {ReadAll,Copy}Context robust to wrapped io.EOF
The fix is simple: we just need to check for `errors.Is(err, io.EOF)`
after either io.ReadAll or io.Copy has returned. When this condition is
true, we need to convert the error back to `nil` as it ought to be.
While there, observe that the unit tests I committed in the previous
commit are wrongly asserting that the error must be wrapped. This
assertion is not correct, because in both cases we have just ensured
that the returned error is `nil` (i.e., success).
See https://github.com/ooni/probe/issues/1965.
* cleanup: remove previous workaround for wrapped io.EOF
These workarounds were partial, meaning that they would cover some
cases in which the issue occurred but not all of them.
Handling the problem in `netxlite.{ReadAll,Copy}Context` is the
right thing to do _as long as_ we always use these functions instead
of `io.{ReadAll,Copy}`.
This is why it's now important to ensure we clearly mention that
inside of the `CONTRIBUTING.md` guide and to also ensure that we're
not using these functions in the code base.
* fix(urlgetter): repair tests who assumed to see EOF error
Now that we have established that we should normalize EOF when
reading bodies like the stdlib does and now that it's clear why
our behavior diverged from the stdlib, we also need to repair
all the tests that assumed this incorrect behavior.
* fix(all): don't use io{,util}.{Copy,ReadAll}
* feat: add checks to ensure we don't use io.{Copy,ReadAll}
* doc(netxlite): document we know how to deal w/ wrapped io.EOF
* fix(nocopyreadall.bash): add exception for i/n/iox.go
This diff forwardports 856e436e20d511a4f0d618546da7921fa9f8c5f6 to the master branch
Original commit message:
- - -
This pull request changes `mk` and github workflows to build and publish binaries on tag. We also update the documentation to explain this new branching model. Basically, we have release branches where we produce binary packages and we add extra code, on tag, to publish such packages inside a release.
We discussed removing most secrets from builds in this repository and having a different tool/repository that takes in input also secrets for doing follow-up actions after publishing. As a consequence, this pull request also removes all pieces of code that require secrets. The next step is to reinstate this code in this new repository/tool.
The existing code in `mk` also implemented caching. This feature was useful when doing local builds because it reduced the time required to obtain binary releases. With builds running as part of GitHub actions, we don't need caching because we spawn parallel machines to build binaries. Therefore, let us also remove caching, which makes the code simpler. (Caching in itself is hard and in https://github.com/ooni/probe/issues/1875 I noted that, for example, caching of the `ooni/go` repository was leading to some unwanted behaviour when changing the branch. Without caching, this behaviour is gone and we always generally use fresh information to produce builds.) Of course, this means that local builds are now slower, but I do not think this is a problem _because_ we want to use GitHub actions for building in the common case.
Reference issues: https://github.com/ooni/probe/issues/1879 and https://github.com/ooni/probe/issues/1875.
The final aspect to mention to conclude this description is an implementation one:
```
gh release create -p $tag --target $GITHUB_SHA || true
```
The code above uses `|| true` because there could already be a release. So, basically, it means that, if a release does not already exist, then we're going to create one. Otherwise, it does not matter because there's already a release.
This diff forward ports ea44e99451f345474738b9010ff791759a1f1367.
Original commit message:
- - -
This change allows for producing cloud builds using the psiphon
config files. We will add those files as build secrets. Only people
in the organization and collaborators with at least "write"
access could trigger builds containing such secrets.
Before this change, `./mk` unconditionally attempted to clone
github.com/ooni/probe-private. Now, it only checks whether
we need to clone _if_ files are not already there.
This allows us to use GitHub actions and secrets to copy the
files in there _without_ needing to clone a private repo.
Cloning a private repo would require us to include as repository
secret an access token with full `repo` scope, which is a very
broad scope. Instead, by using secrets to include psiphon config,
we are narrowing down the secrets required to make a release build.
See https://github.com/ooni/probe/issues/1878
This diff WILL require forward porting to the master branch.
This diff forward ports f47b0c6c16e0cd417e3591358eb85b45962f307d to master.
Original commit message:
- - -
1. we now need to name the framework `.xcframework` otherwise
gomobile refuses to build a new framework for us ¯\_(ツ)_/¯
2. remove duplicate errno definition for iOS (iOS and darwin
are considered the same, therefore we don't need iOS defs)
Reference issue for this PR: https://github.com/ooni/probe/issues/1876
This diff WILL need to be forwardported to master.
This diff forward ports adcb0f9ae3b9e074c301d4f7f0e8f2d0ef6466b9.
Original commit message:
- - -
- ensure we use go1.17.3 in workflows
- update to a version of ooni/oohttp that uses go1.17.3
This change WILL need to be forward ported to master.
Closes https://github.com/ooni/probe/issues/1861
In https://github.com/ooni/probe/issues/1741, we observed that
every attempt to use `docker --platform` along with `debian` for
packaging ooniprobe fails with `SEGFAULT`, except when using
the `debian:oldstable` container.
To fix this issue, in this diff we fix Debian packaging to run on
any debian system (`debian:stable` in our case) provided that we
have `qemu-user-static` installed on the system and the system is
a Debian (or Debian-derived) system.
The trick here is to use `dpkg-buildpackage -a $deb_arch`. We
also need to disable a few `debian/rules` that we don't actually
need anyway.
Closes https://github.com/ooni/probe/issues/1741.
This cherry-picks 36a5bf34f99f382a081efd642dd472888a57602b
from the stable branch into the master branch.
The issue at https://github.com/ooni/probe/issues/1741 is that running `docker --platform linux/arm64` segfaults when running `sudo apt-get update -q` inside the `arm64` docker environment.
As far as the `debianrepo` rule is concerned, we can fix the issue by taking advantage of Debian multi-arch. We now configure Debian multi-arch and install the package inside a `debian:stable` environment.
We keep using docker. In principle we could not. But the Ubuntu environment provided by GitHub actions does not support multi-arch for arm. Also, I'd like testing this rule to be possible also locally (where I don't have Debian).
* feat: run ~always netxlite integration tests
This diff ensures that we check on windows, linux, macos that our
fundamental networking library (netxlite) works.
We combine unit and integration tests.
This work is part of https://github.com/ooni/probe/issues/1733, where
I want to have more strong guarantees about the foundations.
* fix(filtering/tls_test.go): make portable on Windows
The trick here is to use the wrapped error so to normalize the
different errors messages we see on Windows.
* fix(netxlite/quic_test.go): make portable on windows
Rather than using the zero port, use the `x` port which fails
when the stdlib is parsing the address.
The zero port seems to work on Windows while it does not on Unix.
* fix(serialresolver_test.go): make error more timeout than before
This seems enough to convince Go on Windows about this error
being really a timeout timeouty timeouted thingie.
* fix: disable debianrepo build on master branch
This just mitigates https://github.com/ooni/probe/issues/1741 and does
not fully address it, but I'd rather avoid delving into this problem until
I open a release/v3.11.0 branch and have to really fix this issue.
* fix: only run coverage using go1.17
This is the version of Go with which we are going to bless v3.11.0
therefore it's the only version of Go that matters.
Reference issue: https://github.com/ooni/probe/issues/1738.
* fix(ptx/obfs4_test.go): avoid context-vs-normal-code race
We want to test whether we get the context failure if the error
generated inside normal code happens _after_ the context cancellation.
The best way to do that is to write code that is not racy. To this
end, we just need to pause normal code until we know that the context
has returned to the caller. We also need to ensure we do not leak
a goroutine, hence we use a WaitGroup to check that.
Fixes https://github.com/ooni/probe/issues/1750
We are mostly good to declare a stable release. We still need to deal with https://github.com/ooni/probe/issues/1484.
In this PR, we fix the aforementioned issue. These are the changes:
1. we remove the vendored `debops-ci`, and we pull it directly from `ooni/sysadmin`
2. we introduce a new script, `./CLI/linux/pubdebian`, to publish packages
3. we modify `./mk` to allow for publishing debian packages built outside of CI
The latter point has been quite useful in debugging what was wrong.
1. we can merge the e2eminiooni.yml test into the miniooni.yml test
so to reduce the number of tests we run;
2. ideally we would like the smoketest.sh test to evolve and also
check whether we can fetch the measurements we submitted, so start
moving this script into the `./E2E` folder, add a note saying we
would like to do that, and direct all the tests to run this script
at its new location and with its new name (`ooniprobe.sh`).
With these two changes, it's fine to remove the ooniprobe2debian.yml
test in ooni/e2etesting because we're moving its functionality to this
repository. (We mentioned the need to do this move in a previous TODO
comment at the top of such a script.)
Work part of https://github.com/ooni/probe/issues/1468