ooni-probe-cli/internal/experiment/webconnectivity
Simone Basso c2ea0b4704
feat(webconnectivity): try all the available THs (#980)
We introduce a fork of internal/httpx, named internal/httpapi, where there is a clear split between the concept of an API endpoint (such as https://0.th.ooni.org/) and of an API descriptor (such as using `GET` to access /api/v1/test-list/url).

Additionally, httpapi allows to create a SequenceCaller that tries to call a given API descriptor using multiple API endpoints. The SequenceCaller will stop once an endpoint works or when all the available endpoints have been tried unsuccessfully.

The definition of "success" is the following: we consider "failure" any error that occurs during the HTTP round trip or when reading the response body. We DO NOT consider "failure" errors (1) when parsing the input URL; (2) when the server returns >= 400; (3) when the server returns a string that does not parse as valid JSON. The idea of this classification of failures is that we ONLY want to retry when we see what looks like a network error that may be caused by (collateral or targeted) censorship.

We take advantage of the availability of this new package and we refactor web_connectivity@v0.4 and web_connectivity@v0.5 to use a SequenceCaller for calling the web connectivity TH API. This means that we will now try all the available THs advertised by the backend rather than just selecting and using the first one provided by the backend.

Because this diff is designed to be backported to the `release/3.16` branch, we have omitted additional changes to always use httpapi where we are currently using httpx. Yet, to remind ourselves about the need to do that, we have deprecated the httpx package. We will rewrite all the code currently using httpx to use httpapi as part of future work.

It is also worth noting that httpapi will allow us to refactor the backend code such that (1) we remove code to select a backend URL endpoint at the beginning and (2) we try several endpoints. The design of the code is such that we can add to the mix some endpoints using as `http.Client` a special client using a tunnel. This will allow us to automatically fallback backend queries.

Closes https://github.com/ooni/probe/issues/2353.

Related to https://github.com/ooni/probe/issues/1519.
2022-11-21 16:28:53 +01:00
..
analysiscore.go refactor(webconnectivity@v0.5): improve logging clarity (#964) 2022-09-15 07:03:53 +02:00
analysisdns.go refactor(webconnectivity@v0.5): improve logging clarity (#964) 2022-09-15 07:03:53 +02:00
analysishttpcore.go refactor(webconnectivity@v0.5): improve logging clarity (#964) 2022-09-15 07:03:53 +02:00
analysishttpdiff.go refactor(webconnectivity@v0.5): improve logging clarity (#964) 2022-09-15 07:03:53 +02:00
analysistcpip.go refactor(webconnectivity@v0.5): improve logging clarity (#964) 2022-09-15 07:03:53 +02:00
analysistls.go refactor(webconnectivity@v0.5): improve logging clarity (#964) 2022-09-15 07:03:53 +02:00
cleartextflow.go feat(webconnectivity): try all the available THs (#980) 2022-11-21 16:28:53 +01:00
config.go feat(webconnectivity): long-term-evolution prototype (#882) 2022-08-26 16:42:48 +02:00
control.go feat(webconnectivity): try all the available THs (#980) 2022-11-21 16:28:53 +01:00
dnscache.go fix(webconnectivity@v0.5): fetch HTTP only using system-resolver addrs (#935) 2022-09-05 13:33:59 +02:00
dnsresolvers.go feat(webconnectivity): try all the available THs (#980) 2022-11-21 16:28:53 +01:00
dnswhoami.go feat(webconnectivity): long-term-evolution prototype (#882) 2022-08-26 16:42:48 +02:00
doc.go feat(webconnectivity): long-term-evolution prototype (#882) 2022-08-26 16:42:48 +02:00
inputparser.go feat(webconnectivity): long-term-evolution prototype (#882) 2022-08-26 16:42:48 +02:00
iox.go webconnectivity@v0.5: handle successful https chains (#960) 2022-09-14 08:40:13 +02:00
measurer.go feat(webconnectivity): try all the available THs (#980) 2022-11-21 16:28:53 +01:00
priority.go feat(webconnectivity@v0.5): flag case where noone resolved any address (#953) 2022-09-12 07:33:34 +02:00
README.md doc(webconnectivity@v0.5): link to analysiscore.go 2022-09-15 08:17:55 +02:00
redirects.go fix(web_connectivity@v0.5): limit number of redirects (#965) 2022-09-15 08:46:53 +02:00
secureflow.go feat(webconnectivity): try all the available THs (#980) 2022-11-21 16:28:53 +01:00
summary.go feat(webconnectivity): long-term-evolution prototype (#882) 2022-08-26 16:42:48 +02:00
testkeys.go feat(webconnectivity): try all the available THs (#980) 2022-11-21 16:28:53 +01:00

webconnectivity

This directory contains a new implementation of Web Connectivity.

As of 2022-09-15, this code is experimental and is not selected by default when you run the websites group. You can select this implementation with miniooni using miniooni web_connectivity@v0.5 from the command line.

Issue #2237 explains the rationale behind writing this new implementation.

Implementation overview

graph TD;
    measurer.go --> dnsresolvers.go;
	dnsresolvers.go --> control.go;
	dnsresolvers.go --> cleartext.go;
	dnsresolvers.go --> secure.go;
	control.go --> cleartext.go;
	control.go --> secure.go;
	cleartext.go --> dnsresolvers.go;
	secure.go --> dnsresolvers.go;
	measurer.go --> analysiscore.go;

Figure I. Relationship between files in this implementation

The experiment measures a single URL at a time. The OONI Engine invokes the Run method inside the measurer.go file.

The first task that Run starts deals with DNS and lives in the dnsresolvers.go file. This task is responsible for resolving the domain inside the URL into 0..N IP addresses. The domain resolution includes the system resolver and a DNS-over-UDP resolver. The implementaion may do more than that, but this is the bare minimum we're feeling like documenting right now. (We need to experiment a bit more to understand what else we can do there, hence the code is probably doing more than just that.)

Once we know the 0..N IP addresses for the domain we do the following:

  1. start a background task to communicate with the Web Connectivity test helper, using code inside control.go;

  2. start an endpoint measurement task for each IP adddress (which of course only happens when we know at least one addr).

Regarding starting endpoint measurements, we follow this policy:

  1. if the original URL is http://... then, for each address, we start an HTTP task using port 80 and an HTTPS task using 443.

  2. if it's https://..., then we only start HTTPS tasks.

HTTP tasks are implemented by cleartextflow.go while the HTTPS tasks live in secureflow.go.

An HTTP task does the following:

  1. TCP connect;

  2. additionally, the first task to establish a connection also performs a GET request to fetch a webpage (we cannot GET for all connections, because that would be websteps and would require a different data format).

An HTTPS task does the following:

  1. TCP connect;

  2. TLS handshake;

  3. additionally, the first task to handshake also performs a GET request to fetch a webpage iff the input URL was https://... (we cannot GET for all connections, because that would be websteps and would require a different data format).

If fetching the webpage returns a redirect, we start a new DNS task passing it the redirect URL as the new URL to measure, thus transferring the control again to dnsresolvers.go. We do not call the test helper again when this happens, though. The Web Connectivity test helper already follows the whole redirect chain, so we would need to change the test helper to get information on each flow. If we fetched more than one webpage per redirect chain, this experiment would be websteps.

Additionally, when the test helper terminates, control.go may run HTTP and/or HTTPS tasks (when applicable) for new IP addresses discovered using the test helper that were previously unknown to the probe, thus collecting extra information.

When several connections are racing to fetch a webpage, we need specific logic to choose which of them to give the permission to actually fetch the webpage. This logic lives inside the priority.go file.

When all tasks complete, either because we reach a final state or because we have followed too many redirects, we use code inside analysiscore.go to compute the top-level test keys. We emit the blocking and accessible keys we emitted before as well as new keys, prefixed by x_ to indicate that they're experimental.

Limitations and next steps

Further changes are probably possible. Departing too radically from the Web Connectivity model, though, will lead us to have a websteps implementation (but then the data model would most likely be different).