ooni-probe-cli/internal/experiment/webconnectivity
Simone Basso 39cb5959c9
fix(datafmt): sync measurexlite and v0.5 with previous code (#942)
* fix(model/archival.go): more optional keys

Basically, `t0` and `transaction_id` should be optional. Version 0.4.x
of web_connectivity should not include them, version 0.5.x should.

There is a technical reason why v0.4.x should not include them. The code
it is based on, tracex, does not record these two fields.

Whereas, v0.5.x, uses measurexlite, which records these two fields.

Part of https://github.com/ooni/probe/issues/2238

* fix(webconnectivity@v0.5): add more fields

This diff adds the following fields to webconnectivity@v0.5:

1. agent, always set to "redirect" (legacy field);

2. client_resolver, properly initialized w/ the resolver's IPv4 address;

3. retries, legacy field always set to null;

4. socksproxy, legacy field always set to null.

Part of https://github.com/ooni/probe/issues/2238

* fix(webconnectivity@v0.5): register extensions

The general idea behind this field is that we would be able
in the future to tweak the data model for some fields, by declaring
we're using a later version, so it seems useful to add it.

See https://github.com/ooni/probe/issues/2238

* fix(measurexlite): use tcp or quic for tls handshake network

This diff fixes a bug where measurexlite was using "tls" as the
protocol for the TLS handshake when using TCP.

While this choice _could_ make sense, the rest of the code we have
written so far uses "tcp" instead.

Using "tcp" makes more sense because it allows you to search for
the same endpoint across different events by checking for the same
network and for the same endpoint rather than special casing TLS
handshakes for using "tls" when the endpoint is "tcp".

See https://github.com/ooni/probe/issues/2238

* chore: run alltests.yml for "alltestsbuild" branches

Part of https://github.com/ooni/probe/issues/2238
2022-09-08 10:02:47 +02:00
..
analysiscore.go feat(webconnectivity@v0.5): use TLS info from TH (#933) 2022-09-05 11:35:48 +02:00
analysisdns.go refactor: spin geoipx off geolocate (#893) 2022-08-28 20:00:25 +02:00
analysishttpcore.go feat(webconnectivity@v0.5): use TLS info from TH (#933) 2022-09-05 11:35:48 +02:00
analysishttpdiff.go refactor: move WebGetTitle inside measurexlite (#895) 2022-08-28 20:26:40 +02:00
analysistcpip.go feat(webconnectivity@v0.5): use TLS info from TH (#933) 2022-09-05 11:35:48 +02:00
analysistls.go feat(webconnectivity@v0.5): use TLS info from TH (#933) 2022-09-05 11:35:48 +02:00
cleartextflow.go feat(webconnectivity): long-term-evolution prototype (#882) 2022-08-26 16:42:48 +02:00
config.go feat(webconnectivity): long-term-evolution prototype (#882) 2022-08-26 16:42:48 +02:00
control.go fix(webconnectivity@v0.5): fetch HTTP only using system-resolver addrs (#935) 2022-09-05 13:33:59 +02:00
dnscache.go fix(webconnectivity@v0.5): fetch HTTP only using system-resolver addrs (#935) 2022-09-05 13:33:59 +02:00
dnsresolvers.go fix(webconnectivity@v0.5): fetch HTTP only using system-resolver addrs (#935) 2022-09-05 13:33:59 +02:00
dnswhoami.go feat(webconnectivity): long-term-evolution prototype (#882) 2022-08-26 16:42:48 +02:00
doc.go feat(webconnectivity): long-term-evolution prototype (#882) 2022-08-26 16:42:48 +02:00
inputparser.go feat(webconnectivity): long-term-evolution prototype (#882) 2022-08-26 16:42:48 +02:00
measurer.go fix(datafmt): sync measurexlite and v0.5 with previous code (#942) 2022-09-08 10:02:47 +02:00
README.md feat(webconnectivity): long-term-evolution prototype (#882) 2022-08-26 16:42:48 +02:00
secureflow.go feat(webconnectivity): long-term-evolution prototype (#882) 2022-08-26 16:42:48 +02:00
summary.go feat(webconnectivity): long-term-evolution prototype (#882) 2022-08-26 16:42:48 +02:00
testkeys.go fix(datafmt): sync measurexlite and v0.5 with previous code (#942) 2022-09-08 10:02:47 +02:00

webconnectivity

This directory contains a new implementation of Web Connectivity.

As of 2022-08-26, this code is experimental and is not selected by default when you run the websites group. You can select this implementation with miniooni using miniooni web_connectivity@v0.5 from the command line.

Issue #2237 explains the rationale behind writing this new implementation.

Implementation overview

The experiment measures a single URL at a time. The OONI Engine invokes the Run method inside the measurer.go file.

This code starts a number of background tasks, waits for them to complete, and finally calls TestKeys.finalize to finalize the content of the JSON measurement.

The first task that is started deals with DNS and lives in the dnsresolvers.go file. This task is responsible for resolving the domain inside the URL into 0..N IP addresses.

The domain resolution includes the system resolver and a DNS-over-UDP resolver. The implementaion may do more than that, but this is the bare minimum we're feeling like documenting right now. (We need to experiment a bit more to understand what else we can do there, hence the code is probably doing more than just that.)

Once we know the 0..N IP addresses for the domain we do the following:

  1. start a background task to communicate with the Web Connectivity test helper, using code inside control.go;

  2. start an endpoint measurement task for each IP adddress (which of course only happens when we know at least one addr).

Regarding starting endpoint measurements, we follow this policy:

  1. if the original URL is http://... then we start a cleartext task and an encrypted task for each address using ports 80 and 443 respectively.

  2. if it's https://..., then we only start encrypted tasks.

Cleartext tasks are implemented by cleartextflow.go while the encrypted tasks live in secureflow.go.

A cleartext task does the following:

  1. TCP connect;

  2. additionally, the first task to establish a connection also performs a GET request to fetch a webpage (we cannot GET for all connections, because that would be websteps and would require a different data format).

An encrypted task does the following:

  1. TCP connect;

  2. TLS handshake;

  3. additionally, the first task to handshake also performs a GET request to fetch a webpage iff the input URL was https://... (we cannot GET for all connections, because that would be websteps and would require a different data format).

If fetching the webpage returns a redirect, we start a new DNS task passing it the redirect URL as the new URL to measure. We do not call the test helper again when this happens, though. The Web Connectivity test helper already follows the whole redirect chain, so we would need to change the test helper to get information on each flow. When this will happen, this experiment will probably not be Web Connectivity anymore, but rather some form of websteps.

Additionally, when the test helper terminates, we run TCP connect and TLS handshake (when applicable) for new IP addresses discovered using the test helper that were previously unknown to the probe, thus collecting extra information. This logic lives inside the control.go file.

As previously mentioned, when all tasks complete, we call TestKeys.finalize.

In turn, this function analyzes the collected data by calling code implemented inside the following files:

We emit the blocking and accessible keys we emitted before as well as new keys, prefixed by x_ to indicate that they're experimental.

Limitations and next steps

We need to extend the Web Connectivity test helper to return us information about TLS handshakes with IP addresses discovered by the probe. This information would allow us to make more precise TLS blocking statements.

Further changes are probably possible. Departing too radically from the Web Connectivity model, though, will lead us to have a websteps implementation (but then the data model would most likely be different).