Commit Graph

126 Commits

Author SHA1 Message Date
Simone Basso
58adb68b2c
refactor: move tracex outside of engine/netx (#782)
* refactor: move tracex outside of engine/netx

Consistently with https://github.com/ooni/probe/issues/2121 and
https://github.com/ooni/probe/issues/2115, we can now move tracex
outside of engine/netx. The main reason why this makes sense now
is that the package is now changed significantly from the one
that we imported from ooni/probe-engine.

We have improved its implementation, which had not been touched
significantly for quite some time, and converted it to unit
testing. I will document tomorrow some extra work I'd like to
do with this package but likely could not do $soon.

* go fmt

* regen tutorials
2022-06-02 00:50:55 +02:00
Simone Basso
d397036073
refactor(tracex): convert to unit testing (#781)
The exercise already allowed me to notice issues such as fields not
being properly initialized by savers.

This is one of the last steps before moving tracex away from the
internal/netx package and into the internal package.

See https://github.com/ooni/probe/issues/2121
2022-06-01 23:15:47 +02:00
Simone Basso
6212daa54a
fix(tracex): generate archival from single transaction-done event (#780)
Tracex contained some fragile code that assembled HTTP measurements
from scattered events, which worked because we were sure we were
performing a single measurement at any given time.

This diff restructures the code to emit a transaction-start and a
transaction-done events only. We have basically removed all the other
events (which we were not using). We kept the transaction-start
though, because it may be useful to see it when reading events. In
any case, what matters here is that we're now using the transaction-done
event aline to generate the archival HTTP measurement.

Hence, the original issue has been addressed. We will possibly
do more refactoring in the future, but for now this seems sufficient.

Part of https://github.com/ooni/probe/issues/2121
2022-06-01 19:27:47 +02:00
Simone Basso
c740be987b
refactor(tracex): do not depend on strings for event names (#777)
Rather than matching a string, match a type.

This is more robust considering future refactorings.

We're confident the names did not change in _this_ refactoring
because we're still testing the same strings in the tests.

Part of https://github.com/ooni/probe/issues/2121
2022-06-01 14:32:16 +02:00
Simone Basso
f4f3ed7c42
refactor(tracex): start applying recent code conventions (#773)
The code that is now into the tracex package was written a long
time ago, so let's start to make it more in line with the coding
style of packages that were written more recently.

I didn't apply all the changes I'd like to apply in a single diff
and for now I am committing just this diff.

Broadly, what we need to do is:

1. improve documentation

2. ~always use pointer receivers (object receives have the issue
that they are not mutable by accident meaning that you can mutate
them but their state do not change after the call returns, which
is potentially a source of bugs in case you later refactor to use
a pointer receiver, so always use pointer receivers)

3. ~always avoid embedding (let's say we want to avoid embedding
for types we define and it's instead fine to embed types that are
defined in the stdlib: if later we add a new method, we will not
see a broken build and we'll probably forget to add the new method
to all wrappers -- conversely, if we're wrapping rather than
embedding, we'll see a broken build and act accordingly)

4. prefer unit tests and group tests by type being tested rather
than using a flat structure for tests

There's a coverage slippage that I'll compensate in a follow-up diff where I'll focus on unit testing.

Reference issue: https://github.com/ooni/probe/issues/2121
2022-06-01 07:44:54 +02:00
Simone Basso
bbcd2e2280
refactor(netx): merge archival, trace, and the savers (#772)
This diff creates a new package under netx called tracex that
contains everything we need to perform measurements using events
tracing and postprocessing (which is the technique with which
we implement most network experiments).

The general idea here is to (1) create a unique package out of
all of these packages; (2) clean up the code a bit (improve tests,
docs, apply more recent code patterns); (3) move the resulting
code as a toplevel package inside of internal.

Once this is done, netx can be further refactored to avoid
subpackages and we can search for more code to salvage/refactor.

See https://github.com/ooni/probe/issues/2121
2022-05-31 21:53:01 +02:00
Simone Basso
3265bc670a
refactor(ndt7): use netxlite rather than netx (#768)
This diff required us to move some code around, but no major
change actually happened, except better tests.

While there, I also slightly refactored ndt7's implementation and
removed the ProxyURL setting, which was actually unused.

See https://github.com/ooni/probe/issues/2121
2022-05-30 23:14:07 +02:00
Simone Basso
f3912188e1
getaddrinfo: fix CGO_ENABLED=0 and record resolver type (#765)
After https://github.com/ooni/probe-cli/pull/764, the build for
CGO_ENABLED=0 has been broken for miniooni:

https://github.com/ooni/probe-cli/runs/6636995859?check_suite_focus=true

Likewise, it's not possible to run tests with CGO_ENABLED=0.

To make tests work with `CGO_ENABLED=0`, I needed to sacrifice some
unit tests run for the CGO case. It is not fully clear to me what was happening
here, but basically `getaddrinfo_cgo_test.go` was compiled with CGO
being disabled, even though the ``//go:build cgo` flag was specified.

Additionally, @hellais previously raised a valid point in the review
of https://github.com/ooni/probe-cli/pull/698:

> Another issue we should consider is that, if I understand how
> this works correctly, depending on whether or not we have built
> with CGO_ENABLED=0 on or not, we are going to be measuring
> things in a different way (using our cgo inspired getaddrinfo
> implementation or using netgo). This might present issues when
> analyzing or interpreting the data.
>
> Do we perhaps want to add some field to the output data format that
> gives us an indication of which DNS resolution code was used to
> generate the the metric?

This comment is relevant to the current commit because
https://github.com/ooni/probe-cli/pull/698 is the previous
iteration of https://github.com/ooni/probe-cli/pull/764.

So, while fixing the build and test issues, let us also distinguish
between the CGO_ENABLED=1 and CGO_ENABLED=0 cases.

Before this commit, OONI used "system" to indicate the case where
we were using net.DefaultResolver. This behavior dates back to the
Measurement Kit days. While it is true that ooni/probe-engine and
ooni/probe-cli could have been using netgo in the past when we
said "system" as the resolver, it also seems reasonable to continue
to use "system" top indicate getaddrinfo.

So, the choice here is basically to use "netgo" from now on to
indicate the cases in which we were built with CGO_ENABLED=0.

This change will need to be documented into ooni/spec along with
the introduction of the `android_dns_cache_no_data` error.

## Checklist

- [x] I have read the [contribution guidelines](https://github.com/ooni/probe-cli/blob/master/CONTRIBUTING.md)
- [x] reference issue for this pull request: https://github.com/ooni/probe/issues/2029
- [x] if you changed anything related how experiments work and you need to reflect these changes in the ooni/spec repository, please link to the related ooni/spec pull request: https://github.com/ooni/spec/pull/242
2022-05-30 07:34:25 +02:00
Simone Basso
cf6dbe48e0
netxlite: call getaddrinfo and handle platform-specific oddities (#764)
This commit changes our system resolver to call getaddrinfo directly when CGO is enabled. This change allows us to:

1. obtain the CNAME easily

2. obtain the real getaddrinfo retval

3. handle platform specific oddities such as `EAI_NODATA`
returned on Android devices

See https://github.com/ooni/probe/issues/2029 and https://github.com/ooni/probe/issues/2029#issuecomment-1140258729 in particular.

See https://github.com/ooni/probe/issues/2033 for documentation regarding the desire to see `getaddrinfo`'s retval.

See https://github.com/ooni/probe/issues/2118 for possible follow-up changes.
2022-05-28 15:10:30 +02:00
Simone Basso
7a0a156aec
Spring cleanup: remove unused/unneded code (#761)
* cleanup: remove the archival package

See https://github.com/ooni/probe/issues/2116

* cleanup: remove websteps fall 2021 edition

See https://github.com/ooni/probe/issues/2116

* cleanup: remove JavaScript based testing framework

https://github.com/ooni/probe/issues/2116

* cleanup: remove the unused ooapi package

See https://github.com/ooni/probe/issues/2116
2022-05-25 13:21:39 +02:00
Simone Basso
2d721baa91
cleanup: merge httpheader and httpfailure into model (#758)
These two small packages could easily be merged into the model
package, since they're clearly model-like packages.

Part of https://github.com/ooni/probe/issues/2115
2022-05-25 09:54:50 +02:00
Simone Basso
d922bd9afc
cleanup: mark more integration tests as !short mode (#755)
The objective is to make PR checks run much faster.

See https://github.com/ooni/probe/issues/2113 for context.

Regarding netxlite's tests:

Checking for every commit on master or on a release branch is
good enough and makes pull requests faster than one minute since
netxlite for windows is now 1m slower than coverage.

We're losing some coverage but coverage from integration tests
is not so good anyway, so I'm not super sad about this loss.
2022-05-24 21:01:15 +02:00
Simone Basso
6924d1ad81
refactor: only use shaping dialer for ndt7 and dash (#754)
See https://github.com/ooni/probe/issues/2112 for context.

While there, run `go fix -fix buildtag ./...`
2022-05-24 18:23:42 +02:00
Simone Basso
b68b8e1e8f
fix({simplequic,tls}ping): default SNI to URL's hostname (#753)
See https://github.com/ooni/probe/issues/2111
2022-05-24 16:29:13 +02:00
Simone Basso
f5b801ae95
refactor(netxlite): add Transport suffix to DNS transports (#731)
This diff has been extracted from c2f7ccab0e

See https://github.com/ooni/probe/issues/2096
2022-05-14 17:38:31 +02:00
Simone Basso
1776ea1288
cleanup: remove websteps summer 2021 implementation (#722)
See https://github.com/ooni/probe/issues/2094
2022-05-13 15:06:03 +02:00
Yeganathan S
ded4b08113
fix(ndt7): discards all incoming websockets messages during upload (#719)
See https://github.com/ooni/probe/issues/2084
2022-05-12 08:18:05 +02:00
Simone Basso
b7cc309901
feat: re-implement the vanilla_tor experiment (#718)
This diff re-implements the vanilla_tor experiment. This experiment was
part of the ooni/probe-legacy implementation.

The reference issue is https://github.com/ooni/probe/issues/803. We didn't
consider the possible improvements mentioned by the
https://github.com/ooni/probe/issues/803#issuecomment-598715694 comment,
which means we'll need to create a follow-up issue for them. We will
then decide whether, when, and how to implement those follow-up measurements
either into `vanilla_tor` or into the existing `tor` experiment.

This novel `vanilla_tor` implementation emits test_keys that are mostly
compatible with the original implementation, however:

1. the `timeout` is a `float64` rather than integer (but the default
timeout is an integer, so there are no JSON-visible changes);

2. the `tor_log` string is gone and replaced by the `tor_logs` list
of strings, which contains the same information;

3. the definition of `error` has been augmented to include the
case in which there is an unknown error;

4. the implementation of vanilla_tor mirrors closely the one of torsf
and we have taken steps to make the two implementations as comparable
as possible in terms of the generated JSON measurement.

The main reason why we replaced `tor_log` with `tor_logs` are:

1. that `torsf` already used that;

2. that reading the JSON is easier with this implementation compared to
an implementation where all logs are into the same string.

If one is processing the new data format using Python, then it will
not be difficult convert `tor_log` to `tor_logs`. In any case, because
we extract the most interesting fields (e.g., the percentage of the
bootstrap where tor fails), it seems that logs are probably more useful
as something you want to read in edge cases (I guess).

Also, because we want `torsf` and `vanilla_tor` to have similar JSONs,
we renamed `torsf`'s `default_timeout` to `timeout`. This change has little
to none real-world impact, because no stable version of OONI Probe has
ever shipped a `torsf` producing the `default_timeout` field.

Regarding the structure of this diff, we have:

1. factored code to parse tor logs into a separate package;

2. implemented `vanilla_tor` as a stripped down `torsf` and added further
changes to ensure compatibility with the previous `vanilla_tor`'s data format;

3. improved `torsf` to merge back the changes in `vanilla_tor`, so the two
data formats of the two experiments are as similar as possible.

We believe producing as similar as possible data formats helps anyone who's
reading measurements generated by both experiments.

We have retained/introduced `vanilla_tor`'s `error` field, which is not very
useful when one has a more precise failure but is still what `vanilla_tor`
used to emit, so it makes sense to also have this field.

In addition to changing the implementation, we also updated the specs.

As part of our future work, we may want to consider factoring the common code
of these two experiments into the same underlying support library.
2022-05-10 15:43:28 +02:00
Simone Basso
36ca28d673
feat: add a simple dnsping experiment (#674)
See https://github.com/ooni/probe/issues/1987 (issue).

See https://github.com/ooni/spec/pull/238 (impl).

While there, fix the build for go1.18 by adding go1.18 specific tests. I was
increasingly bothered by the build being red.
2022-05-09 15:28:18 +02:00
Simone Basso
a7a6d7df7f
feat: introduce the simplequicping experiment (#717)
See https://github.com/ooni/probe/issues/2091 (issue) and https://github.com/ooni/spec/pull/237 (spec).
2022-05-09 11:22:44 +02:00
Simone Basso
2917dd6c76
feat: introduce the tlsping experiment (#716)
See https://github.com/ooni/probe/issues/2088 (issue) and https://github.com/ooni/spec/pull/236 (spec).
2022-05-09 10:25:50 +02:00
Simone Basso
e983a5cffb
feat: introduce the tcpping experiment (#696)
See https://github.com/ooni/probe/issues/2030 (reference issue) and https://github.com/ooni/spec/pull/235 (spec).
2022-05-09 09:33:18 +02:00
DecFox
5d2afaade4
cli: upgrade to lucas-clemente/quic-go@v0.27.0 (#715)
* quic-go upgrade: replaced Session/EarlySession with Connection/EarlyConnection

* quic-go upgrade: added context to RoundTripper.Dial

* quic-go upgrade: made corresponding changes to tutorial

* quic-go upgrade: changed sess variable instances to qconn

* quic-go upgrade: made corresponding changes to tutorial

* cleanup: remove unnecessary comments

Those comments made sense in terms of illustrating the changes
but they're going to be less useful once we merge.

* fix(go.mod): apparently we needed `go1.18.1 mod tidy`

VSCode just warned me about this. It seems fine to apply this
change as part of the pull request at hand.

* cleanup(netxlite): http3dialer can be removed

We used to use http3dialer to glue a QUIC dialer, which had a
context as its first argument, to the Dial function used by the
HTTP3 transport, which did not have a context as its first
argument.

Now that HTTP3 transport has a Dial function taking a context as
its first argument, we don't need http3dialer
anymore, since we can use the QUIC dialer directly.

Cc: @DecFox

* Revert "cleanup(netxlite): http3dialer can be removed"

This reverts commit c62244c620cee5fadcc2ca89d8228c8db0b96add
to investigate the build failure mentioned at
https://github.com/ooni/probe-cli/pull/715#issuecomment-1119450484

* chore(netx): show that test was already broken

We didn't see the breakage before because we were not using
the created transport, but the issue of using a nil dialer was
already present before, we just didn't see it.

Now we understand why removing the http3transport in
c62244c620cee5fadcc2ca89d8228c8db0b96add did cause the
breakage mentioned at
https://github.com/ooni/probe-cli/pull/715#issuecomment-1119450484

* fix(netx): convert broken integration test to working unit test

There's no point in using the network here. Add a fake dialer that
breaks and ensure we're getting the expected error.

We've now improved upon the original test because the original test was
not doing anything while now we're testing whether we get back a QUIC
dialer that _can be used_.

After this commit, I can then readd the cleanup commit
c62244c620cee5fadcc2ca89d8228c8db0b96add and it won't be
broken anymore (at least, this is what I expected to happen).

* Revert "Revert "cleanup(netxlite): http3dialer can be removed""

This reverts commit 0e254bfc6ba3bfd65365ce3d8de2c8ec51b925ff
because now we should have fixed the broken test.

Co-authored-by: decfox <decfox>
Co-authored-by: Simone Basso <bassosimone@gmail.com>
2022-05-06 12:24:03 +02:00
DecFox
b81af5b058
feat(torsf): add default_timeout test keys (#709)
See https://github.com/ooni/probe/issues/2061
2022-05-06 10:47:26 +02:00
ParitoshKabra
4c55102789
fix(torsf): ensure tor-logs-filtering regexp is correct (#707)
* Fix Regex in TorProgressRegex

* fix: update regexp link

As suggested by @hellais

Co-authored-by: Simone Basso <bassosimone@gmail.com>
2022-05-06 10:36:26 +02:00
Yeganathan S
74e31d5cc1
cleanup: use ErrorToStringOrOK func in other tests that returns nil (#701)
Reference issue: https://github.com/ooni/probe/issues/2040
2022-03-08 11:59:44 +01:00
Simone Basso
024eb42334
fix(ndt7): force our bundled CA pool (#700)
This change should prevent old clients (e.g., Android 6) from
failing to perform a ndt7 experiment because their internal CA
bundle is now too old.

Reference issue: https://github.com/ooni/probe/issues/2031

While there, run `go mod tidy` to fix a minor inconsistence in
the current `go.mod` file.

This diff WILL require a backport to release/3.14.
2022-02-23 12:59:03 +01:00
Yeganathan S
6a63f1b044
fix(dnscheck): log "ok" rather than "<nil>" on success (#695)
See https://github.com/ooni/probe/issues/2020
2022-02-16 20:47:44 +01:00
kelmenhorst
88236a4352
feat: add an experimental quicping experiment (#677)
This experiment pings a QUIC-able host. It can be used to measure QUIC availability independently from TLS.
This is the reference issue: https://github.com/ooni/probe/issues/1994

### A QUIC PING is:
- a QUIC Initial packet with a size of 1200 bytes (minimum datagram size defined in the [RFC 9000](https://www.rfc-editor.org/rfc/rfc9000.html#initial-size)),
- with a random payload (i.e. no TLS ClientHello),
- with the version string 0xbabababa which forces Version Negotiation at the server.

QUIC-able hosts respond to the QUIC PING with a Version Negotiation packet.

The input is a domain name or an IP address. The default port used by quicping is 443, as this is the port used by HTTP/3. The port can be modified with the `-O Port=` option.
The default number of repetitions is 10, it can be changed with `-O Repetitions=`.

### Usage:
```
./miniooni -i google.com quicping
./miniooni -i 142.250.181.206 quicping
./miniooni -i 142.250.181.206 -OPort=443 quicping
./miniooni -i 142.250.181.206 -ORepetitions=2 quicping

```
2022-02-14 19:21:16 +01:00
Simone Basso
bf3c8bcdc3
[forwardport] fix(netx): stop collecting HTTP performance metrics (#689)
This diff forward ports b6db4f64dc83a2a27ee3ce6bba5ac93db922832d, whose
original log message is the following:

- - -

We're now using ooni/oohttp as our HTTP library in most cases.

A limitation of this library is that net/http/httptrace does not
work very well and reliably because (1) we need to use oohttp's
version of that code and (2) we cannot observe net events.

I noticed this fact because an integration test for collecting
HTTP performance metrics was broken.

The best solution here is to remove this functionality, since
it was basically unused in the repository. Only some integration
tests inside urlgetter bothered with these metrics.

A more clinical fix would have been to use ooni/oohttp/httptrace
instead of net/http/httptrace in the stdlib, but it does not
seem to be a good idea, given that those metrics were not used.

With this diff applied, we'll further reduce the number of locally
failing integration tests to just jafar-specific tests.

This diff WILL need to be forwardported to `master`.
2022-02-09 15:08:19 +01:00
Simone Basso
85664f1e31
feat(torsf): collect tor logs, select rendezvous method, count bytes (#683)
This diff contains significant improvements over the previous
implementation of the torsf experiment.

We add support for configuring different rendezvous methods after
the convo at https://github.com/ooni/probe/issues/2004. In doing
that, I've tried to use a terminology that is consistent with the
names being actually used by tor developers.

In terms of what to do next, this diff basically instruments
torsf to always rendezvous using domain fronting. Yet, it's also
possible to change the rendezvous method from the command line,
when using miniooni, which allows to experiment a bit more. In the
same vein, by default we use a persistent tor datadir, but it's
also possible to use a temporary datadir using the cmdline.

Here's how a generic invocation of `torsf` looks like:

```bash
./miniooni -O DisablePersistentDatadir=true \
           -O RendezvousMethod=amp \
           -O DisableProgress=true \
           torsf
```

(The default is `DisablePersistentDatadir=false` and
`RendezvousMethod=domain_fronting`.)

With this implementation, we can start measuring whether snowflake
and tor together can boostrap, which seems the most important thing
to focus on at the beginning. Understanding why the bootstrap most
often does not converge with a temporary datadir on Android devices
remains instead an open problem for now. (I'll also update the
relevant issues or create new issues after commit this.)

We also address some methodology improvements that were proposed
in https://github.com/ooni/probe/issues/1686. Namely:

1. we record the tor version;

2. we include the bootstrap percentage by reading the logs;

3. we set the anomaly key correctly;

4. we measure the bytes send and received (by `tor` not by `snowflake`, since
doing it for snowflake seems more complex at this stage).

What remains to be done is the possibility of including Snowflake
events into the measurement, which is not possible until the new
improvements at common/event in snowflake.git are included into a
tagged version of snowflake itself. (I'll make sure to mention
this aspect to @cohosh in https://github.com/ooni/probe/issues/2004.)
2022-02-07 17:05:36 +01:00
Simone Basso
d92c1641ac
feat: start adding torsf to desktop and mobile (#671)
This commit message is the same across probe-cli, probe-desktop,
and probe-android. With the changes contained in the enclosed
diff, I'm starting to add support for torsf for android and for
desktop.

When smoke testing that torsf was WAI, I also noticed that its
progress messages in output are too frequent. We may want to do
better in a future version when we'll be able to read `tor`'s
output. In the meanwhile, make the progress messages less
frequent and indicated the maximum runtime inside of the messages
themselves. This improved message, albeit not so nice from the
UX PoV, should at least provide a clue that we're not stuck.

Reference issue: https://github.com/ooni/probe/issues/1917
2022-01-24 12:39:27 +01:00
Simone Basso
a01f901e13
feat(ooniprobe): add torsf to experimental group (#670)
Reference issue: https://github.com/ooni/probe/issues/1917.

I needed to change the summary key type returned by `torsf` to be a value. It seems the DB layer assumes that. If we pass it a pointer, it panics because it's experiment a value rather than a pointer 🤷.
2022-01-21 12:32:08 +01:00
Simone Basso
cfb054efd4
feat(snowflake): upgrade to v2 (+ small tweaks) (#667)
This diff contains the following changes and enhancements:

1. upgrade snowflake to v2

2. observe that we were not changing defaults from outside of snowflake.go, so remove code allowing to do that;

3. bump the timeout to 600 seconds (it seems 300 was not always enough based on my testing);

4. add useful knob to disable `torsf` progress (it's really annoying on console, we should do something about this);

5. ptx.go: avoid printing an error when the connection has just been closed;

6. snowflake: test AMP cache, see that it's not working currently, so leave it disabled.

Related issues: https://github.com/ooni/probe/issues/1845, https://github.com/ooni/probe/issues/1894, and https://github.com/ooni/probe/issues/1917.
2022-01-19 17:23:27 +01:00
Simone Basso
e904b90006
feature: merge measurex and netx archival layer (1/N) (#663)
This diff introduces a new package called `./internal/archival`. This package collects data from `./internal/model` network interfaces (e.g., `Dialer`, `QUICDialer`, `HTTPTransport`), saves such data into an internal tabular data format suitable for on-line processing and analysis, and allows exporting data into the OONI data format.

The code for collecting and the internal tabular data formats are adapted from `measurex`. The code for formatting and exporting OONI data-format-compliant structures is adapted from `netx/archival`.

My original objective was to _also_ (1) fully replace `netx/archival` with this package and (2) adapt `measurex` to use this package rather than its own code. Both operations seem easily feasible because: (a) this code is `measurex` code without extensions that are `measurex` related, which will need to be added back as part of the process; (b) the API provided by this code allows for trivially converting from using `netx/archival` to using this code.

Yet, both changes should not be taken lightly. After implementing them, there's need to spend some time doing QA and ensuring all nettests work as intended. However, I am planning a release in the next two weeks, and this QA task is likely going to defer the release. For this reason, I have chosen to commit the work done so far into the tree and defer the second part of this refactoring for a later moment in time. (This explains why the title mentions "1/N").

On a more high-level perspective, it would also be beneficial, I guess, to explain _why_ I am doing these changes. There are two intertwined reasons. The first reason is that `netx/archival` has shortcomings deriving from its original https://github.com/ooni/netx legacy. The most relevant shortcoming is that it saves all kind of data into the same tabular structure named `Event`. This design choice is unfortunate because it does not allow one to apply data-type specific logic when processing the results. In turn, this choice results in complex processing code. Therefore, I believe that replacing the code with event-specific data structures is clearly an improvement in terms of code maintainability and would quite likely lead us to more confidently change and evolve the codebase.

The second reason why I would like to move forward these changes is to unify the codepaths used for measuring. At this point in time, we basically have two codepaths: `./internal/engine/netx` and `./internal/measurex`. They both have pros and cons and I don't think we want to rewrite whole experiments using `netx`. Rather, what we probably want is to gradually merge these two codepaths such that `netx` is a set of abstractions on top of `measurex` (which is more low-level and has a more-easily-testable design). Because saving events and generating an archival data format out of them consists of at least 50% of the complexity of both `netx` and `measurex`, it seems reasonable to unify this archival-related part of the two codebases as the first step.

At the highest level of abstraction, these changes are part of the train of changes which will eventually lead us to bless `websteps` as a first class citizen in OONI land. Because `websteps` requires different underlying primitives, I chose to develop these primitives from scratch rather than wrestling with `netx`, which used another model. The model used by `websteps` is that we perform each operation in isolation and immediately we save the results, while `netx` creates whole data structures and collects all the events happening via tracing. We believe the model used by `websteps` to be better because it does not require your code to figure out everything that happened after the measurement, which is a source of subtle bugs in the current implementation. So, when I started implementing websteps I extracted the bits of `netx` that could also be beneficial to `websteps` into a separate library, thus `netxlite` was born.

The reference issue describing merging the archival of `netx` and `measurex` is https://github.com/ooni/probe/issues/1957. As of this writing the issue still references the original plan, which I could not complete by the end of this Sprint, so I am going to adapt the text of the issue to only refer to what was done in here next. Of course, I also need follow-up issues.
2022-01-14 12:13:10 +01:00
Simone Basso
b5da8be183
fix(netxlite): robust {ReadAll,Copy}Context with wrapped io.EOF (#661)
* chore(netxlite): add currently failing test case

This diff introduces a test cases that will fail because of the reason
explained in https://github.com/ooni/probe/issues/1965.

* chore(netxlite/iox_test.go): add failing unit tests

These tests directly show how the Go implementation of ReadAll
and Copy has the issue of checking for io.EOF equality.

* fix(netxlite): make {ReadAll,Copy}Context robust to wrapped io.EOF

The fix is simple: we just need to check for `errors.Is(err, io.EOF)`
after either io.ReadAll or io.Copy has returned. When this condition is
true, we need to convert the error back to `nil` as it ought to be.

While there, observe that the unit tests I committed in the previous
commit are wrongly asserting that the error must be wrapped. This
assertion is not correct, because in both cases we have just ensured
that the returned error is `nil` (i.e., success).

See https://github.com/ooni/probe/issues/1965.

* cleanup: remove previous workaround for wrapped io.EOF

These workarounds were partial, meaning that they would cover some
cases in which the issue occurred but not all of them.

Handling the problem in `netxlite.{ReadAll,Copy}Context` is the
right thing to do _as long as_ we always use these functions instead
of `io.{ReadAll,Copy}`.

This is why it's now important to ensure we clearly mention that
inside of the `CONTRIBUTING.md` guide and to also ensure that we're
not using these functions in the code base.

* fix(urlgetter): repair tests who assumed to see EOF error

Now that we have established that we should normalize EOF when
reading bodies like the stdlib does and now that it's clear why
our behavior diverged from the stdlib, we also need to repair
all the tests that assumed this incorrect behavior.

* fix(all): don't use io{,util}.{Copy,ReadAll}

* feat: add checks to ensure we don't use io.{Copy,ReadAll}

* doc(netxlite): document we know how to deal w/ wrapped io.EOF

* fix(nocopyreadall.bash): add exception for i/n/iox.go
2022-01-12 14:26:10 +01:00
Simone Basso
d3c6c11e48
cleanup(netx): remove the DNSClient type (#660)
The DNSClient type existed because the Resolver type did not
include CloseIdleConnections in its signature.

Now that Resolver includes CloseIdleConnections, the DNSClient
type has become unnecessary and can be safely removed.

See https://github.com/ooni/probe/issues/1956.
2022-01-10 11:53:06 +01:00
Simone Basso
554ae47c5a
cleanup(netx): remove more legacy names and functions (#658)
This diff addresses two items of https://github.com/ooni/probe/issues/1956:

> - [ ] we can remove legacy names from `./internal/engine/netx/resolver/legacy.go`
>
> - [ ] we can remove `DialTLSContext` from `./internal/engine/netx/resolver/tls_test.go`

More cleanups may follow.
2022-01-07 20:02:19 +01:00
Simone Basso
566c6b246a
cleanup: remove unnecessary legacy interfaces (#656)
This diff addresses another point of https://github.com/ooni/probe/issues/1956:

> - [ ] observe that we're still using a bunch of private interfaces for common interfaces such as the `Dialer`, so we can get rid of these private interfaces and always use the ones in `model`, which allows us to remove a bunch of legacy wrappers

Additional cleanups may still be possible. The more I cleanup, the more I see
there's extra legacy code we can dispose of (which seems good?).
2022-01-07 18:33:37 +01:00
Simone Basso
1c057d322d
cleanup: merge legacy errorsx in netxlite and hide classifiers (#655)
This diff implements the first two cleanups defined at https://github.com/ooni/probe/issues/1956:

> - [ ] observe that `netxlite` and `netx` differ in error wrapping only in the way in which we set `ErrWrapper.Operation`. Observe that the code using `netxlite` does not care about such a field. Therefore, we can modify `netxlite` to set such a field using the code of `netx` and we can remove `netx` specific code for errors (which currently lives inside of the `./internal/engine/legacy/errorsx` package
>
> - [ ] after we've done the previous cleanup, we can make all the classifiers code private, since there's no code outside `netxlite` that needs them

A subsequent diff will address the remaining cleanup.

While there, notice that there are failing, unrelated obfs4 tests, so disable them in short mode. (I am confident these tests are unrelated because they fail for me when running test locally from the `master` branch.)
2022-01-07 17:31:21 +01:00
Simone Basso
99ec7ffca9
fix: ensure experiments return nil when we want to submit (#654)
Since https://github.com/ooni/probe-cli/pull/527, if an experiment
returns an error, the corresponding measurement is not submitted since
the semantics of returning an error is that something fundamental
went wrong (e.g., we could not parse the input URL).

This diff ensures that all experiments only return and error when
something fundamental was wrong and return nil otherwise.

Reference issue: https://github.com/ooni/probe/issues/1808.
2022-01-07 13:17:20 +01:00
Simone Basso
dfa5e708fe
refactor(tor): rewrite using measurex (#652)
This diff rewrites the tor experiment to use measurex "easy" API.

To this end, we need to introduce an "easy" measurex API, which basically
performs easy measurements returning two pieces of data:

1. the resulting measurement, which is already using the OONI
archival data format and is always non-nil

2. a failure (i.e., the pointer to an error string), which
is nil on success and points to a string on failure

With this change, we should now be able to completely dispose of
the original netx API, which was only used by tor.

Reference issue: https://github.com/ooni/probe/issues/1688.
2022-01-05 18:41:11 +01:00
Simone Basso
f0181c432f
refactor: move httpx into the internal package (#646)
This concludes the TODO list at https://github.com/ooni/probe/issues/1951
2022-01-05 17:17:20 +01:00
Simone Basso
dba861d262
feat(httpx): implement optional body logging also on http error (#651)
1. we want optionally to log the body (we don't want to log the body
when we're fetching psiphon secrets or tor targets)

2. we want body logging to _also_ happen on error since this is quite
useful to debug possible errors when accessing the API

This diff adds the above functionality, which were previously
described in https://github.com/ooni/probe/issues/1951.

This diff also adds comprehensive testing.
2022-01-05 16:26:51 +01:00
Simone Basso
eed51978ca
refactor(httpx): hide the real APIClient (#648)
As mentioned in https://github.com/ooni/probe/issues/1951, one of
the main issues I did see with httpx.APIClient is that in some cases
it's used in a very fragile way by probeservices.Client.

This happens in psiphon.go and tor.go, where we create a copy of
the APIClient and then modify it's Authorization field.

If we ever refactor probeservices.Client to take a pointer to
httpx.Client, we are now mutating the httpx.Client.

Of course, we don't want that to happen.

This diff attempts to address such a problem as follows:

1. we create a new APIClientTemplate type that holds the same
fields of an APIClient and allows to build an APIClient

2. we modify every user of APIClient to use APIClientTemplate

3. when we need an APIClient, we build it from the corresponding
template and, when we need to use a specific Authorization, we
use a build factory that sets APIClient.Authorization

4. we hide APIClient by renaming it apiClient and by defining
an interface called APIClient that allows to use it

So, now the codebase always uses the opaque APIClient interface to
issue API calls and always uses the APIClientTemplate to build an
opaque APIClient.

Boom! We have separated construction from usage and we are not
mutating in weird ways the APIClient anymore.
2022-01-05 14:15:42 +01:00
Simone Basso
7b7df2c6af
refactor(httpx): improve and modernize (1/n) (#647)
This PR starts to implement the refactoring described at https://github.com/ooni/probe/issues/1951. I originally wrote more patches than the ones in this PR, but overall they were not readable. Since I want to squash and merge, here's a reasonable subset of the original patches that will still be readable and understandable in the future.
2022-01-05 12:48:32 +01:00
Simone Basso
43161a8138
cleanup: remove redundant HTTPClient definition (#643)
This counts as a follow-up cleanup as part of doing
https://github.com/ooni/probe/issues/1885.
2022-01-03 16:47:54 +01:00
Simone Basso
273b70bacc
refactor: interfaces and data types into the model package (#642)
## Checklist

- [x] I have read the [contribution guidelines](https://github.com/ooni/probe-cli/blob/master/CONTRIBUTING.md)
- [x] reference issue for this pull request: https://github.com/ooni/probe/issues/1885
- [x] related ooni/spec pull request: N/A

Location of the issue tracker: https://github.com/ooni/probe

## Description

This PR contains a set of changes to move important interfaces and data types into the `./internal/model` package.

The criteria for including an interface or data type in here is roughly that the type should be important and used by several packages. We are especially interested to move more interfaces here to increase modularity.

An additional side effect is that, by reading this package, one should be able to understand more quickly how different parts of the codebase interact with each other.

This is what I want to move in `internal/model`:

- [x] most important interfaces from `internal/netxlite`
- [x] everything that was previously part of `internal/engine/model`
- [x] mocks from `internal/netxlite/mocks` should also be moved in here as a subpackage
2022-01-03 13:53:23 +01:00
Simone Basso
cba72d1ca3
refactor(stunreachability): input required and must be an URL (#630)
Here we're refactoring stunreachability to not provide internally a
default input and to take in input an URL rather than a string.

The related ooni/spec change is https://github.com/ooni/spec/pull/227.

This diff has been extracted from https://github.com/ooni/probe-cli/pull/539.

Because the original diff was large, I'm splitting it in a set of
more easily manageable diffs.

The reference issue is https://github.com/ooni/probe/issues/1814, which
is complex enough to require us to proceed incrementally.

This diff WILL need to be backported to release/3.11.
2021-12-03 14:27:04 +01:00
Simone Basso
9cdca4137d
forwardport: pull the patches mentioned in ooni/probe#1908 (#629)
* [forwardport] fix(oonimkall): make logger used by tasks unit testable (#623)

This diff forward ports e4b04642c51e7461728b25941624e1b97ef0ec83.

Reference issue: https://github.com/ooni/probe/issues/1903

* [forwardport] feat(oonimkall): improve taskEmitter testability (#624)

This diff forward ports 3e0f01a389c1f4cdd7878ec151aff91870a0bdff.

1. rename eventemitter{,_test}.go => taskemitter{,_test}.go because
the new name is more proper after we merged the internal/task package
inside of the oonimkall package;

2. rename runner.go's `run` function to `runTask`;

3. modify `runTask` to use the new `taskEmitterUsingChan` abstraction
on which we will spend more works in a later point of this list;

4. introduce `runTaskWithEmitter` factory that is called by `runTask`
and allows us to more easily write unit tests;

5. acknowledge that `runner` was not using its `out` field;

6. use the new `taskEmitterWrapper` in `newRunner`;

7. acknowledge that `runnerCallbacks` could use a generic
`taskEmitter` as field type rather than a specific type;

8. rewrite tests to use `runTaskWithEmitter` which leads to
simpler code that does not require a goroutine;

9. acknowledge that the code has been ignoring the `DisabledEvents`
settings for quite some time, so stop supporting it;

10. refactor the `taskEmitter` implementation to be like:

    1. we still have the `taskEmitter` interface;

    2. `taskEmitterUsingChan` wraps the channel and allows for
    emitting events using the channel;

    3. `taskEmitterUsingChan` owns an `eof` channel that is
    closed by `Close` (which is idempotent) and signals we
    should be stop emitting;

    4. make sure `runTask` creates a `taskEmitterUsingChan`
    and calls its `Close` method when done;

    5. completely remove the code for disabling events
    since the code was actually ignoring the stting;

    6. add a `taskEmitterWrapper` that adds common functions
    for emitting events to _any_ `taskWrapper`;

    7. write unit tests for `taskEmitterUsingChan` and
    for `taskEmitterWrapper`;

11. acknowledge that the abstraction we need for testing is
actually a thread-safe thing that collects events into a
vector containing events and refactor all tests accordingly.

See https://github.com/ooni/probe/issues/1903

* [forwardport] refactor(oonimkall): make the runner unit-testable (#625)

This diff forward ports 9423947faf6980d92d2fe67efe3829e8fef76586.

See https://github.com/ooni/probe/issues/1903

* [forwardport] feat(oonimkall): write unit tests for the runner component (#626)

This diff forward ports 35dd0e3788b8fa99c541452bbb5e0ae4871239e1.

Forward porting note: compared to 35dd0e3788b8fa99c541452bbb5e0ae4871239e1,
the diff I'm committing here is slightly different. In `master` we do not
have the case where a measurement fails and a measurement is returned, thus
I needed to adapt the test to become like this:

```diff
diff --git a/pkg/oonimkall/runner_internal_test.go b/pkg/oonimkall/runner_internal_test.go
index 334b574..84c7436 100644
--- a/pkg/oonimkall/runner_internal_test.go
+++ b/pkg/oonimkall/runner_internal_test.go
@@ -568,15 +568,6 @@ func TestTaskRunnerRun(t *testing.T) {
                }, {
                        Key:   failureMeasurement,
                        Count: 1,
-               }, {
-                       Key:   measurement,
-                       Count: 1,
-               }, {
-                       Key:   statusMeasurementSubmission,
-                       Count: 1,
-               }, {
-                       Key:   statusMeasurementDone,
-                       Count: 1,
                }, {
                        Key:   statusEnd,
                        Count: 1,
```

I still need to write more assertions for each emitted event
but the code we've here is already a great starting point.

See https://github.com/ooni/probe/issues/1903

* [forwardport] refactor(oonimkall): merge files, use proper names, zap unneeded integration tests (#627)

This diff forward ports f894427d24edc9a03fc78306d0093e7b51c46c25.

Forward porting note: this diff is slightly different from the original
mentioned above because it carries forward changes mentioned in the
previous diff caused by a different way of handling a failed measurement
in the master branch compared to the release/3.11 branch.

Move everything that looked like "task's model" inside of the
taskmodel.go file, for consistency.

Make sure it's clear some variables are event types.

Rename the concrete `runner` as `runnerForTask`.

Also, remove now-unnecessary (and flaky!) integration tests
for the `runnerForTask` type.

While there, notice there were wrong URLs that were generated
during the probe-engine => probe-cli move and fix them.

See https://github.com/ooni/probe/issues/1903

* [forwardport] refactor(oonimkall): we can simplify StartTask tests (#628)

This diff forward ports dcf2986c2032d8185d58d24130a7f2c2d61ef2fb.

* refactor(oonimkall): we can simplify StartTask tests

We have enough checks for runnerForTask. So we do not need to
duplicate them when checking for StartTask.

While there, refactor how we start tasks to remove the need for
extra runner functions.

This is the objective I wanted to achieve for oonimkall:

1. less duplicate tests, and

2. more unit tests (which are less flaky)

At this point, we're basically done (pending forwardporting to
master) with https://github.com/ooni/probe/issues/1903.

* fix(oonimkall): TestStartTaskGood shouldn't cancel the test

This creates a race condition where the test may fail if we cannot
complete the whole "Example" test in less than one second.

This should explain the build failures I've seen so far and why
I didn't see those failures when running locally.
2021-12-02 12:47:07 +01:00