The main issue I see inside tracex at the moment is that we
construct the HTTP measurement from separate events.
This is fragile because we cannot be sure that these events
belong to the same round trip. (Currently, they _are_ part
of the same round trip, but this is a fragile assumption and
it would be much more robust to dispose of it.)
To prepare for emitting a single event, it's imperative to
have two distinct fields for HTTP request and response headers,
which is the main contribution in this commit.
Then, we have a bunch of smaller changes including:
1. correctly naming 'response' the DNS response (instead of 'reply')
2. ensure we always use pointer receivers
Reference issue: https://github.com/ooni/probe/issues/2121
Rather than matching a string, match a type.
This is more robust considering future refactorings.
We're confident the names did not change in _this_ refactoring
because we're still testing the same strings in the tests.
Part of https://github.com/ooni/probe/issues/2121
Acknowledge that transports MAY be used in isolation (i.e., outside
of a Resolver) and add support for wrapping.
Ensure that every factory that creates an unwrapped type is named
accordingly to hopefully ensure there are no surprises.
Implement DNSTransport wrapping and use a technique similar to the
one used by Dialer to customize the DNSTransport while constructing
more complex data types (e.g., a specific resolver).
Ensure that the stdlib resolver's own "getaddrinfo" transport (1)
is wrapped and (2) could be extended during construction.
This work is part of my ongoing effort to bring to this repository
websteps-illustrated changes relative to netxlite.
Ref issue: https://github.com/ooni/probe/issues/2096
This diff modifies the system resolver to use a getaddrinf transport.
Obviously the transport is a fake, but its existence will allow us
to observe DNS events more naturally.
A lookup using the system resolver would be a ANY lookup that will
contain all the resolved IP addresses into the same response.
This change was also part of websteps-illustrated, albeit the way in
which I did it there was less clean than what we have here.
Ref issue: https://github.com/ooni/probe/issues/2096
Rather than passing functions to construct complex objects such
as Dialer and QUICDialer, pass interface implementations.
Ensure that a nil implementation does not cause harm.
Make Saver implement the correct interface either directly or
indirectly. We need to implement the correct interface indirectly
for TCP conns (or connected UDP sockets) because we have two
distinct use cases inside netx: observing just the connect event
and observing just the I/O events.
With this change, the construction of composed Dialers and
QUICDialers is greatly simplified and more obvious.
Part of https://github.com/ooni/probe/issues/2121
The code that is now into the tracex package was written a long
time ago, so let's start to make it more in line with the coding
style of packages that were written more recently.
I didn't apply all the changes I'd like to apply in a single diff
and for now I am committing just this diff.
Broadly, what we need to do is:
1. improve documentation
2. ~always use pointer receivers (object receives have the issue
that they are not mutable by accident meaning that you can mutate
them but their state do not change after the call returns, which
is potentially a source of bugs in case you later refactor to use
a pointer receiver, so always use pointer receivers)
3. ~always avoid embedding (let's say we want to avoid embedding
for types we define and it's instead fine to embed types that are
defined in the stdlib: if later we add a new method, we will not
see a broken build and we'll probably forget to add the new method
to all wrappers -- conversely, if we're wrapping rather than
embedding, we'll see a broken build and act accordingly)
4. prefer unit tests and group tests by type being tested rather
than using a flat structure for tests
There's a coverage slippage that I'll compensate in a follow-up diff where I'll focus on unit testing.
Reference issue: https://github.com/ooni/probe/issues/2121
This diff creates a new package under netx called tracex that
contains everything we need to perform measurements using events
tracing and postprocessing (which is the technique with which
we implement most network experiments).
The general idea here is to (1) create a unique package out of
all of these packages; (2) clean up the code a bit (improve tests,
docs, apply more recent code patterns); (3) move the resulting
code as a toplevel package inside of internal.
Once this is done, netx can be further refactored to avoid
subpackages and we can search for more code to salvage/refactor.
See https://github.com/ooni/probe/issues/2121
This diff modifies the construction of a dialer to allow one
to insert custom dialer wrappers into the dialers chain.
The point of the chain in which we allow custom wrappers is the
optimal one for connect, read, and write measurements.
This new design is better than the previous netx design since
we don't need to construct the whole chain manually now.
The work in this diff is part of the effort to make engine/netx
just a tiny wrapper around netxlite.
See https://github.com/ooni/probe/issues/2121.
This diff required us to move some code around, but no major
change actually happened, except better tests.
While there, I also slightly refactored ndt7's implementation and
removed the ProxyURL setting, which was actually unused.
See https://github.com/ooni/probe/issues/2121
This diff replaces engine/netx code with netxlite code in
the engine/session.go file. To this end, we needed to move
some code from engine/netx to netxlite. While there, we
did review and improve the unit tests.
A notable change in this diff is (or seems to be) that in
engine/session.go we're not filtering for bogons anymore so
that, in principle, we could believe a resolver returning
to us bogon IP addresses for OONI services. However, I did
not bother with changing bogons filtering because the
sessionresolver package is already filtering for bogons,
so it is actually okay to avoid doing that again the
session.go code. See:
https://github.com/ooni/probe-cli/blob/v3.15.0-alpha.1/internal/engine/internal/sessionresolver/resolvermaker.go#L88
There are two reference issues for this cleanup:
1. https://github.com/ooni/probe/issues/2115
2. https://github.com/ooni/probe/issues/2121
After https://github.com/ooni/probe-cli/pull/764, the build for
CGO_ENABLED=0 has been broken for miniooni:
https://github.com/ooni/probe-cli/runs/6636995859?check_suite_focus=true
Likewise, it's not possible to run tests with CGO_ENABLED=0.
To make tests work with `CGO_ENABLED=0`, I needed to sacrifice some
unit tests run for the CGO case. It is not fully clear to me what was happening
here, but basically `getaddrinfo_cgo_test.go` was compiled with CGO
being disabled, even though the ``//go:build cgo` flag was specified.
Additionally, @hellais previously raised a valid point in the review
of https://github.com/ooni/probe-cli/pull/698:
> Another issue we should consider is that, if I understand how
> this works correctly, depending on whether or not we have built
> with CGO_ENABLED=0 on or not, we are going to be measuring
> things in a different way (using our cgo inspired getaddrinfo
> implementation or using netgo). This might present issues when
> analyzing or interpreting the data.
>
> Do we perhaps want to add some field to the output data format that
> gives us an indication of which DNS resolution code was used to
> generate the the metric?
This comment is relevant to the current commit because
https://github.com/ooni/probe-cli/pull/698 is the previous
iteration of https://github.com/ooni/probe-cli/pull/764.
So, while fixing the build and test issues, let us also distinguish
between the CGO_ENABLED=1 and CGO_ENABLED=0 cases.
Before this commit, OONI used "system" to indicate the case where
we were using net.DefaultResolver. This behavior dates back to the
Measurement Kit days. While it is true that ooni/probe-engine and
ooni/probe-cli could have been using netgo in the past when we
said "system" as the resolver, it also seems reasonable to continue
to use "system" top indicate getaddrinfo.
So, the choice here is basically to use "netgo" from now on to
indicate the cases in which we were built with CGO_ENABLED=0.
This change will need to be documented into ooni/spec along with
the introduction of the `android_dns_cache_no_data` error.
## Checklist
- [x] I have read the [contribution guidelines](https://github.com/ooni/probe-cli/blob/master/CONTRIBUTING.md)
- [x] reference issue for this pull request: https://github.com/ooni/probe/issues/2029
- [x] if you changed anything related how experiments work and you need to reflect these changes in the ooni/spec repository, please link to the related ooni/spec pull request: https://github.com/ooni/spec/pull/242
This diff refactors the DNSTransport model to receive in input a DNSQuery and return in output a DNSResponse.
The design of DNSQuery and DNSResponse takes into account the use case of a transport using getaddrinfo, meaning that we don't need to serialize and deserialize messages when using getaddrinfo.
The current codebase does not use a getaddrinfo transport, but I wrote one such a transport in the Websteps Winter 2021 prototype (https://github.com/bassosimone/websteps-illustrated/).
The design conversation that lead to producing this diff is https://github.com/ooni/probe/issues/2099
These two small packages could easily be merged into the model
package, since they're clearly model-like packages.
Part of https://github.com/ooni/probe/issues/2115
The objective is to make PR checks run much faster.
See https://github.com/ooni/probe/issues/2113 for context.
Regarding netxlite's tests:
Checking for every commit on master or on a release branch is
good enough and makes pull requests faster than one minute since
netxlite for windows is now 1m slower than coverage.
We're losing some coverage but coverage from integration tests
is not so good anyway, so I'm not super sad about this loss.
Rather than building for Android using ooni/go, we're now using
ooni/oocryto as the TLS dependency. Such a repository only forks
crypto/tls and some minor crypto packages and includes the
same set of patches that we have been using in ooni/go.
This new strategy should be better than the previous one in
terms of building for Android, because we can use the vanilla
go1.18.2 build. It also seems that it is easier to track and
merge from upstream with ooni/oocrypto than it is with ooni/go.
Should this assessment be wrong, we can revert back to the
previous scenario where we used ooni/go.
See https://github.com/ooni/probe/issues/2106 for extra context.
* Passed the TestHelpers field to RunAsyc and MeasureAsync. This reflects the test_helpers in the measurement.
* Spec already contains the correct output.
See https://github.com/ooni/probe/issues/2073
Co-authored-by: decfox <decfox>
This diff re-implements the vanilla_tor experiment. This experiment was
part of the ooni/probe-legacy implementation.
The reference issue is https://github.com/ooni/probe/issues/803. We didn't
consider the possible improvements mentioned by the
https://github.com/ooni/probe/issues/803#issuecomment-598715694 comment,
which means we'll need to create a follow-up issue for them. We will
then decide whether, when, and how to implement those follow-up measurements
either into `vanilla_tor` or into the existing `tor` experiment.
This novel `vanilla_tor` implementation emits test_keys that are mostly
compatible with the original implementation, however:
1. the `timeout` is a `float64` rather than integer (but the default
timeout is an integer, so there are no JSON-visible changes);
2. the `tor_log` string is gone and replaced by the `tor_logs` list
of strings, which contains the same information;
3. the definition of `error` has been augmented to include the
case in which there is an unknown error;
4. the implementation of vanilla_tor mirrors closely the one of torsf
and we have taken steps to make the two implementations as comparable
as possible in terms of the generated JSON measurement.
The main reason why we replaced `tor_log` with `tor_logs` are:
1. that `torsf` already used that;
2. that reading the JSON is easier with this implementation compared to
an implementation where all logs are into the same string.
If one is processing the new data format using Python, then it will
not be difficult convert `tor_log` to `tor_logs`. In any case, because
we extract the most interesting fields (e.g., the percentage of the
bootstrap where tor fails), it seems that logs are probably more useful
as something you want to read in edge cases (I guess).
Also, because we want `torsf` and `vanilla_tor` to have similar JSONs,
we renamed `torsf`'s `default_timeout` to `timeout`. This change has little
to none real-world impact, because no stable version of OONI Probe has
ever shipped a `torsf` producing the `default_timeout` field.
Regarding the structure of this diff, we have:
1. factored code to parse tor logs into a separate package;
2. implemented `vanilla_tor` as a stripped down `torsf` and added further
changes to ensure compatibility with the previous `vanilla_tor`'s data format;
3. improved `torsf` to merge back the changes in `vanilla_tor`, so the two
data formats of the two experiments are as similar as possible.
We believe producing as similar as possible data formats helps anyone who's
reading measurements generated by both experiments.
We have retained/introduced `vanilla_tor`'s `error` field, which is not very
useful when one has a more precise failure but is still what `vanilla_tor`
used to emit, so it makes sense to also have this field.
In addition to changing the implementation, we also updated the specs.
As part of our future work, we may want to consider factoring the common code
of these two experiments into the same underlying support library.
* quic-go upgrade: replaced Session/EarlySession with Connection/EarlyConnection
* quic-go upgrade: added context to RoundTripper.Dial
* quic-go upgrade: made corresponding changes to tutorial
* quic-go upgrade: changed sess variable instances to qconn
* quic-go upgrade: made corresponding changes to tutorial
* cleanup: remove unnecessary comments
Those comments made sense in terms of illustrating the changes
but they're going to be less useful once we merge.
* fix(go.mod): apparently we needed `go1.18.1 mod tidy`
VSCode just warned me about this. It seems fine to apply this
change as part of the pull request at hand.
* cleanup(netxlite): http3dialer can be removed
We used to use http3dialer to glue a QUIC dialer, which had a
context as its first argument, to the Dial function used by the
HTTP3 transport, which did not have a context as its first
argument.
Now that HTTP3 transport has a Dial function taking a context as
its first argument, we don't need http3dialer
anymore, since we can use the QUIC dialer directly.
Cc: @DecFox
* Revert "cleanup(netxlite): http3dialer can be removed"
This reverts commit c62244c620cee5fadcc2ca89d8228c8db0b96add
to investigate the build failure mentioned at
https://github.com/ooni/probe-cli/pull/715#issuecomment-1119450484
* chore(netx): show that test was already broken
We didn't see the breakage before because we were not using
the created transport, but the issue of using a nil dialer was
already present before, we just didn't see it.
Now we understand why removing the http3transport in
c62244c620cee5fadcc2ca89d8228c8db0b96add did cause the
breakage mentioned at
https://github.com/ooni/probe-cli/pull/715#issuecomment-1119450484
* fix(netx): convert broken integration test to working unit test
There's no point in using the network here. Add a fake dialer that
breaks and ensure we're getting the expected error.
We've now improved upon the original test because the original test was
not doing anything while now we're testing whether we get back a QUIC
dialer that _can be used_.
After this commit, I can then readd the cleanup commit
c62244c620cee5fadcc2ca89d8228c8db0b96add and it won't be
broken anymore (at least, this is what I expected to happen).
* Revert "Revert "cleanup(netxlite): http3dialer can be removed""
This reverts commit 0e254bfc6ba3bfd65365ce3d8de2c8ec51b925ff
because now we should have fixed the broken test.
Co-authored-by: decfox <decfox>
Co-authored-by: Simone Basso <bassosimone@gmail.com>
* tls_handshakes: add IP addresses
* tls_handshakes: extract ip from tcp-connect
* tls_handshake: switched to trace event
* saver.go: get remoteAddr before handshake
Not sure whether this is strictly necessary, but I'd rather take the
remoteAddr before calling Handshake, just in case a future version
of the handshake closes the `conn`. In such a case, `conn.RemoteAddr`
would return `nil` and we would crash here.
This occurred to me while reading once again the diff before merging.
Co-authored-by: decfox <decfox>
Co-authored-by: Simone Basso <bassosimone@gmail.com>
This diff changes the software name used by unattended runs for which
we did not override the default software name (`ooniprobe-cli`).
It will become `ooniprobe-cli-unattended`. This software name is in line
with the one we use for Android, iOS, and desktop unattended runs.
While working in this diff, I introduced string constants for the run
types and a string constant for the default software name.
See https://github.com/ooni/probe/issues/2081.
Here's the squash of the following patches that enable support
for go1.18 and update our dependencies.
This diff WILL need to be backported to the release/3.14 branch.
* chore: use go1.17.8
See https://github.com/ooni/probe/issues/2067
* chore: upgrade to probe-assets@v0.8.0
See https://github.com/ooni/probe/issues/2067.
* chore: update dependencies and enable go1.18
As mentioned in 7a0d17ea91,
the tree won't build with `go1.18` unless we say it does.
So, not only here we need to update dependencies but also we
need to explicitly say `go1.18` in the `go.mod`.
This work is part of https://github.com/ooni/probe/issues/2067.
* chore(coverage.yml): run with go1.18
This change will give us a bare minimum confidence that we're
going to build our tree using version 1.18 of golang.
See https://github.com/ooni/probe/issues/2067.
* chore: update user agent used for measuring
See https://github.com/ooni/probe/issues/2067
* chore: run `go generate ./...`
See https://github.com/ooni/probe/issues/2067
* fix(dialer_test.go): make test work with go1.17 and go1.18
1. the original test wanted the dial to fail, so ensure we're not
passing any domain name to exercise dialing not resolving;
2. match the end of the error rather than the whole error string.
Tested locally with both go1.17 and go1.18.
See https://github.com/ooni/probe-cli/pull/708#issuecomment-1096447186
This change should prevent old clients (e.g., Android 6) from
failing to perform a ndt7 experiment because their internal CA
bundle is now too old.
Reference issue: https://github.com/ooni/probe/issues/2031
While there, run `go mod tidy` to fix a minor inconsistence in
the current `go.mod` file.
This diff WILL require a backport to release/3.14.
This experiment pings a QUIC-able host. It can be used to measure QUIC availability independently from TLS.
This is the reference issue: https://github.com/ooni/probe/issues/1994
### A QUIC PING is:
- a QUIC Initial packet with a size of 1200 bytes (minimum datagram size defined in the [RFC 9000](https://www.rfc-editor.org/rfc/rfc9000.html#initial-size)),
- with a random payload (i.e. no TLS ClientHello),
- with the version string 0xbabababa which forces Version Negotiation at the server.
QUIC-able hosts respond to the QUIC PING with a Version Negotiation packet.
The input is a domain name or an IP address. The default port used by quicping is 443, as this is the port used by HTTP/3. The port can be modified with the `-O Port=` option.
The default number of repetitions is 10, it can be changed with `-O Repetitions=`.
### Usage:
```
./miniooni -i google.com quicping
./miniooni -i 142.250.181.206 quicping
./miniooni -i 142.250.181.206 -OPort=443 quicping
./miniooni -i 142.250.181.206 -ORepetitions=2 quicping
```
This diff forward ports b6db4f64dc83a2a27ee3ce6bba5ac93db922832d, whose
original log message is the following:
- - -
We're now using ooni/oohttp as our HTTP library in most cases.
A limitation of this library is that net/http/httptrace does not
work very well and reliably because (1) we need to use oohttp's
version of that code and (2) we cannot observe net events.
I noticed this fact because an integration test for collecting
HTTP performance metrics was broken.
The best solution here is to remove this functionality, since
it was basically unused in the repository. Only some integration
tests inside urlgetter bothered with these metrics.
A more clinical fix would have been to use ooni/oohttp/httptrace
instead of net/http/httptrace in the stdlib, but it does not
seem to be a good idea, given that those metrics were not used.
With this diff applied, we'll further reduce the number of locally
failing integration tests to just jafar-specific tests.
This diff WILL need to be forwardported to `master`.
This issue aims at making life slighly better for users impacted by
sanctions whose iplookup may be quite slow in case there are timeouts
as documented in https://github.com/ooni/probe/issues/1988.
Cloudflare hosted services provide a certain service of `/cdn-cgi/trace` with their base url (for example, `www.cloudflare.com` or `www.nginx.com`), which can be used to obtain `ip` in the probe's `geolocate` feature.
The same feature was added in this pr, hence, increasing the number of `baseURL`s in `geolocate`.
Co-authored-by: Simone Basso <bassosimone@gmail.com>