doc(webconnectivity@v0.5): improve readme

This commit is contained in:
Simone Basso 2022-09-15 08:17:15 +02:00 committed by GitHub
parent 5e76c6ec92
commit 0a0253f104
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -3,7 +3,7 @@
This directory contains a new implementation of [Web Connectivity]( This directory contains a new implementation of [Web Connectivity](
https://github.com/ooni/spec/blob/master/nettests/ts-017-web-connectivity.md). https://github.com/ooni/spec/blob/master/nettests/ts-017-web-connectivity.md).
As of 2022-08-26, this code is experimental and is not selected As of 2022-09-15, this code is experimental and is not selected
by default when you run the `websites` group. You can select this by default when you run the `websites` group. You can select this
implementation with `miniooni` using `miniooni web_connectivity@v0.5` implementation with `miniooni` using `miniooni web_connectivity@v0.5`
from the command line. from the command line.
@ -13,16 +13,27 @@ behind writing this new implementation.
## Implementation overview ## Implementation overview
```mermaid
graph TD;
measurer.go --> dnsresolvers.go;
dnsresolvers.go --> control.go;
dnsresolvers.go --> cleartext.go;
dnsresolvers.go --> secure.go;
control.go --> cleartext.go;
control.go --> secure.go;
cleartext.go --> dnsresolvers.go;
secure.go --> dnsresolvers.go;
measurer.go --> analysiscore.go;
```
<p align="center"><b>Figure I</b>. Relationship between files in this implementation</p>
The experiment measures a single URL at a time. The OONI Engine invokes the The experiment measures a single URL at a time. The OONI Engine invokes the
`Run` method inside the [measurer.go](measurer.go) file. `Run` method inside the [measurer.go](measurer.go) file.
This code starts a number of background tasks, waits for them to complete, and The first task that `Run` starts deals with DNS and lives in the
finally calls `TestKeys.finalize` to finalize the content of the JSON measurement.
The first task that is started deals with DNS and lives in the
[dnsresolvers.go](dnsresolvers.go) file. This task is responsible for [dnsresolvers.go](dnsresolvers.go) file. This task is responsible for
resolving the domain inside the URL into `0..N` IP addresses. resolving the domain inside the URL into `0..N` IP addresses.
The domain resolution includes the system resolver and a DNS-over-UDP The domain resolution includes the system resolver and a DNS-over-UDP
resolver. The implementaion _may_ do more than that, but this is the resolver. The implementaion _may_ do more than that, but this is the
bare minimum we're feeling like documenting right now. (We need to bare minimum we're feeling like documenting right now. (We need to
@ -39,16 +50,15 @@ course only happens when we know _at least_ one addr).
Regarding starting endpoint measurements, we follow this policy: Regarding starting endpoint measurements, we follow this policy:
1. if the original URL is `http://...` then we start a cleartext task 1. if the original URL is `http://...` then, for each address, we start
and an encrypted task for each address using ports `80` and `443` an HTTP task using port `80` and an HTTPS task using `443`.
respectively.
2. if it's `https://...`, then we only start encrypted tasks. 2. if it's `https://...`, then we only start HTTPS tasks.
Cleartext tasks are implemented by [cleartextflow.go](cleartextflow.go) while HTTP tasks are implemented by [cleartextflow.go](cleartextflow.go) while
the encrypted tasks live in [secureflow.go](secureflow.go). the HTTPS tasks live in [secureflow.go](secureflow.go).
A cleartext task does the following: An HTTP task does the following:
1. TCP connect; 1. TCP connect;
@ -56,7 +66,7 @@ A cleartext task does the following:
a GET request to fetch a webpage (we cannot GET for all connections, because a GET request to fetch a webpage (we cannot GET for all connections, because
that would be `websteps` and would require a different data format). that would be `websteps` and would require a different data format).
An encrypted task does the following: An HTTPS task does the following:
1. TCP connect; 1. TCP connect;
@ -68,42 +78,28 @@ for all connections, because that would be `websteps` and would require a
different data format). different data format).
If fetching the webpage returns a redirect, we start a new DNS task passing it If fetching the webpage returns a redirect, we start a new DNS task passing it
the redirect URL as the new URL to measure. We do not call the test helper again the redirect URL as the new URL to measure, thus transferring the control again
to [dnsresolvers.go](dnsresolvers.go). We do not call the test helper again
when this happens, though. The Web Connectivity test helper already follows the whole when this happens, though. The Web Connectivity test helper already follows the whole
redirect chain, so we would need to change the test helper to get information on redirect chain, so we would need to change the test helper to get information on
each flow. When this will happen, this experiment will probably not be Web Connectivity each flow. If we fetched more than one webpage per redirect chain, this experiment would
anymore, but rather some form of [websteps](https://github.com/bassosimone/websteps-illustrated/). be [websteps](https://github.com/bassosimone/websteps-illustrated/).
Additionally, when the test helper terminates, we run TCP connect and TLS handshake Additionally, when the test helper terminates, [control.go](control.go) may run
(when applicable) for new IP addresses discovered using the test helper that were HTTP and/or HTTPS tasks (when applicable) for new IP addresses discovered using the test helper that were
previously unknown to the probe, thus collecting extra information. This logic lives previously unknown to the probe, thus collecting extra information.
inside the [control.go](control.go) file.
As previously mentioned, when all tasks complete, we call `TestKeys.finalize`. When several connections are racing to fetch a webpage, we need specific logic to choose
which of them to give the permission to actually fetch the webpage. This logic
lives inside the [priority.go](priority.go) file.
In turn, this function analyzes the collected data by calling code implemented When all tasks complete, either because we reach a final state or because we have
inside the following files: followed too many redirects, we use code inside `analysiscore.go` to compute the
top-level test keys. We emit the `blocking` and `accessible` keys we emitted before
- [analysiscore.go](analysiscore.go) contains the core analysis algorithm; as well as new keys, prefixed by `x_` to indicate that they're experimental.
- [analysisdns.go](analysisdns.go) contains DNS specific analysis;
- [analysishttpcore.go](analysishttpcore.go) contains the bulk of the HTTP
analysis, where we mainly determine TLS blocking;
- [analysishttpdiff.go](analysishttpdiff.go) contains the HTTP diff algorithm;
- [analysistcpip.go](analysistcpip.go) checks for TCP/IP blocking.
We emit the `blocking` and `accessible` keys we emitted before as well as new
keys, prefixed by `x_` to indicate that they're experimental.
## Limitations and next steps ## Limitations and next steps
We need to extend the Web Connectivity test helper to return us information
about TLS handshakes with IP addresses discovered by the probe. This information
would allow us to make more precise TLS blocking statements.
Further changes are probably possible. Departing too radically from the Web Further changes are probably possible. Departing too radically from the Web
Connectivity model, though, will lead us to have a `websteps` implementation (but Connectivity model, though, will lead us to have a `websteps` implementation (but
then the data model would most likely be different). then the data model would most likely be different).