Skip options that begin with the `Safe` prefix from appearing in the
serialization of a Measurement that will be submitted to the OONI
backend.
Fixes https://github.com/ooni/probe/issues/2214
I made a mistake while adapting code from an experimental branch thus
breaking these two experiments because of interface conversion.
This diff fixes it.
While there, remove the panic trap for miniooni. Because miniooni is
an experimental tool, we want to see the full panic text, which definitely
leads to a more pleasant and effective debugging experience.
See https://github.com/ooni/probe/issues/2216 for context on why we
were trying to change how we register experiments.
The broken commit is 6a0ae5c70b.
* feat: add support for system resolver in measurexlite
* more tests for coverage
* Apply suggestions from code review
Co-authored-by: decfox <decfox@github.com>
Co-authored-by: Simone Basso <bassosimone@gmail.com>
Until OONI Run v2 has support for repeating the measurement with a schedule, introduce a command line flag requested by users to repeat a measurement every given number of seconds.
Part of https://github.com/ooni/probe/issues/2184
This diff adds support for running OONIRun v1 links.
Run with `miniooni` using:
```
./miniooni -i LINK oonirun
```
Part of https://github.com/ooni/probe/issues/2184
This diff refactors the ./internal/cmd/miniooni pkg and moves the code
for running experiments inside of the ./internal/oonirun pkg.
It's the first concrete step towards https://github.com/ooni/probe/issues/2184.
The integration test that was broken was:
```
--- FAIL: TestCreateInvalidExperiment (0.35s)
experiment_integration_test.go:192: expected a nil builder here
```
While there improve the documentation of the ExperimentSession
and see there's a method that we are not using.
This diff is a cleanup that I come up with while working
on https://github.com/ooni/probe/issues/2184.
This option has been disabled for a long time and we said in the
codebase we were going to remove it after 2021-11-01.
So, it feels okay to remove it.
This diff is a cleanup in preparation for https://github.com/ooni/probe/issues/2184.
This diff modifies the engine package to make Experiment and
ExperimentBuilder interfaces rather than structs.
The previosuly existing structs are now named experiment{,Builder}.
This diff helps https://github.com/ooni/probe/issues/2184
because it allows us to write unit tests more easily.
There should be no functional change.
While there, I removed a bunch of deprecated functions, which were
unnecessarily complicate the implementation and could be easily
replaced by passing them a context.Context or context.Background().
This diff refactors how we set options for experiments to accept
in input an any value or a map[string]any, depending on which method
we choose to actually set options.
There should be no functional change, except that now we're not
guessing the type and then attempting to set the value of the selected
field: now, instead, we match the provided type and the field's type
as part of the same function (i.e., SetOptionAny).
This diff is functional to https://github.com/ooni/probe/issues/2184,
because it will allow us to load options from a map[string]any,
which will be part of the OONI Run v2 JSON descriptor.
If we didn't apply this change, we would only have been to set options
from a map[string]string, which is good enough as a solution for the
CLI but is definitely clumsy when you have to write stuff like:
```JSON
{
"options": {
"HTTP3Enabled": "true"
}
}
```
when you could instead more naturally write:
```JSON
{
"options": {
"HTTP3Enabled": true
}
}
```
This diff makes the implementation of the engine package more
abstract by changing HTTPClient() to return a model.HTTPClient
as opposed to returning an *http.Client.
Part of https://github.com/ooni/probe/issues/2184
In https://github.com/ooni/probe-cli/pull/832's initial diff, I
mentioned it would be cool to flatten oohelperd's hier.
I'm doing this now, and just for the master branch.
This diff is mostly a mechanical refactoring with very light
and apparently rather safe manual changes.
This diff modifies the implementation of oohelperd in the master branch
to always use throw-away HTTPClient, Dialer, and Resolver.
The rationale of this change is to ensure we're not hitting limits of the
HTTPClient regarding the max number of connections per host.
This issue is described at https://github.com/ooni/probe/issues/2182.
While there, it feels more correct to use throw-away Dialer and Resolver.
We have a different patch for the release/3.15 branch because of
netx-related refactorings: https://github.com/ooni/probe-cli/pull/832.
## Checklist
- [x] I have read the [contribution guidelines](https://github.com/ooni/probe-cli/blob/master/CONTRIBUTING.md)
- [x] reference issue for this pull request: https://github.com/ooni/probe/issues/2158
- [x] if you changed anything related how experiments work and you need to reflect these changes in the ooni/spec repository, please link to the related ooni/spec pull request: https://github.com/ooni/spec/pull/250
## Description
This diff refactors the codebase to reimplement tlsping and tcpping
to use the step-by-step measurements style.
See docs/design/dd-003-step-by-step.md for more information on the
step-by-step measurement style.
This pull request publishes the step-by-step design document that I have been discussing with @hellais and @DecFox recently. Compared to the document that was approved, this one has been edited for readability.
While there, I figured it was also be beneficial to publish the few ooni/probe-cli related design documents we produced in the past, because they probably help someone to get acquainted with the codebase.
Reference issue for this pull request: https://github.com/ooni/probe/issues/2148
This diff addresses the following points of https://github.com/ooni/probe/issues/2135:
- [x] the `childResolver` type is useless and we can use `model.Resolver` directly;
- [x] we should use `model/mocks` instead of custom fakes;
- [x] we should not use `log.Log` rather we should use `model.DiscardLogger`;
- [x] make `timeLimitedLookup` easier to test with a `-short` tests;
- [x] ensure `timeLimitedLookup` returns as soon as its context expires regardless of the child resolver;
Subsequent diffs will address more points mentioned in there.
The oohelperd implementation did not actually need using netx because
it was just constructing default types with logging, which is what
netxlite already does. Hence, let's avoid using netx here.
See https://github.com/ooni/probe/issues/2121
The oohelper does not need to use netx and it's enough to use
netxlite, hence let us apply this refactor.
The original code used DoT but the explanatory comment said we were
using DoT because of unclear issues inside GitHub actions.
We are now using DoH and this is fine as well. The comment implied
that any encrypted transport would do.
See https://github.com/ooni/probe/issues/2121
This diff forward ports 261d1a4cdc88522f6a8f63d6c540f51054566b28 to master
whose original commit message follows:
- - -
It's not working for me from a couple of places and also it does not
seem to be documented upstream, see:
https://docs.namebase.io/guides-1/resolving-handshake-1/hdns.io
This diff WILL need to be forwardported to master.
This diff refactors netx and netxlite to ensure we're not using
netxlite legacy names inside of netx.
To this end, we're cheating a bit. We're exposing a new factory to
get an unwrapped stdlib resolver rather than defining a legacy name
to export the private name of the same factory.
This is actually a fine place to stop, for now, the next and
netxlite refactoring at https://github.com/ooni/probe/issues/2121.
Before finishing the ongoing refactoring and leaving whatever
is left of netx in tree, I would like to restructure it so that
we'll have an easy time next time we need to modify it.
Currently, every functionality lives into the `netx.go` file and
we have a support file called `httptransport.go`.
I would like to reorganize by topic, instead. This would allow
future me to more easily perform topic-specific changes.
While there, improve `netx`'s documentation and duplicate some of
this documentation inside `internal/README.md` to provide pointers
to previous documentation, historical context, and some help to
understand the logic architecture of network extensions (aka `netx`).
Part of https://github.com/ooni/probe-cli/pull/396
Now that we have properly refactored the caching resolvers we can
move them into netxlite as optional resolvers created using the
proper abstract factories we just added.
This diff reduces the complexity and the code size of netx.
See https://github.com/ooni/probe/issues/2121.
For testability, replace most if-based construction logic with
calls to well-tested factories living in other packages.
While there, acknowledge that a bunch of types could now be private
and make them private, modifying the code to call the public
factories allowing to construct said types instead.
Part of https://github.com/ooni/probe/issues/2121
This diff modifies netx to stop using most netxlite resolver internals
but the internal function that creates a new, unwrapped system resolver,
which will be dealt with in a subsequent pull request.
See https://github.com/ooni/probe/issues/2121
1. Use the netxlite.NewHTTPTransport factory for creating a new
HTTP2 (and HTTP1) transport;
2. Recognize the netxlite.NewOOHTTPTransport has now become
an implementation detail so make it private;
3. Recognize that netxlite.NewHTTP3Transport should call
netxlite.WrapTransport so it returns the same typechain
returned by netxlite.NewHTTPTransport (modulo, of course,
the real underlying transport), so ensure that we are
calling netxlite.WrapTransport in NewHTTP3Transport;
4. Recognize that the table based constructor inside of
netx needs a logger to create HTTPTransport instances using
either netxlite.NewHTTP{,3}Transport so pass this argument
along and ensure it's not nil using a constructor inside
model that guarantees that;
5. Cleanup netx's tests to avoid type asserting on the
typechains returned by netxlite since we already test
that inside netxlite;
6. Recognize that now we can make more legacy names inside
of netxlite private because we don't need to use them
inside tests anymore (because of previous point).
Reference issue: https://github.com/ooni/probe/issues/2121
This diff modifies netx to use netxlite to build the TLSDialer.
Building the TLSDialer entails building a TLSHandshaker.
While there, hide netxlite names we don't want to be public
and change netx tests to test for functionality.
To this end, refactor filtering to provide an easier to
use TLS server. We don't need the complexity of proxying
rather we need to provoke specific errors.
Part of https://github.com/ooni/probe/issues/2121
By just storing the raw certificate we simplify the internal data
structure we use. In turn, this enables us to write better unit tests
using github.com/google/go-cmp where we can construct the expected
result and compare with that. (Yeah, in principle we could also
construct the full certificate but I'm not sure it's worth the effort
since we basically only care about the raw certificate.)
The general idea here is to make tracex more tested. Once it's more
tested, I will create separate structs for each event, which is
something that measurex also does. Once that is done, we can start
ensuring that the code in measurex and the code in tracex do the
same thing in terms of storing observations. When also this is done,
we can then rewrite measurex to use tracex directly.
The overall goal is https://github.com/ooni/probe/issues/2035.
There are two reasons why this is beneficial:
1. github.com/google/go-cmp is more annoying to use for comparing
data structures when there are interfaces to compare. Sure, there's
a recipe for teaching it to compare errors, but how about making
the errors trivially comparable instead?
2. if we want to send errors over the network, JSON serialization
works but we cannot unmarshal the resulting string back to an error,
so how about making this representation trivial to serialize (we
are not going this now, but we need this property for websteps and
it may be sensible to try to avoid to have duplicate code because
of that -- measurex currently duplicates many tracex functionality
and this is quite unfortunate because it slows development down)
Additionally, if an error is a string:
3. we can very easily use a switch for comparing its possible
values with "" representing the absence of errors, while it is
more complex to do the same when using a nullable string or even
an error (i.e., an interface)
4. if a type is not nullable, it's easier to write safe code for
it and we may want to refactor experiments to use the internal
representation of measurements for more robust processing code
For all these reasons, let's internally use strings in tracex.
The overall aim here is to reduce the duplicated code between pre
and post-measurex measurements (see https://github.com/ooni/probe/issues/2035).
This diff forward ports b606494db8a9293384efaf5c33a88601f6e1e2a6
to the main development branch.
Dnscheck is emitting progress and the experiment controller is
also emitting progress. This messes up the progress bar.
See https://github.com/ooni/probe/issues/2058#issuecomment-1141638067
* refactor: move tracex outside of engine/netx
Consistently with https://github.com/ooni/probe/issues/2121 and
https://github.com/ooni/probe/issues/2115, we can now move tracex
outside of engine/netx. The main reason why this makes sense now
is that the package is now changed significantly from the one
that we imported from ooni/probe-engine.
We have improved its implementation, which had not been touched
significantly for quite some time, and converted it to unit
testing. I will document tomorrow some extra work I'd like to
do with this package but likely could not do $soon.
* go fmt
* regen tutorials
The exercise already allowed me to notice issues such as fields not
being properly initialized by savers.
This is one of the last steps before moving tracex away from the
internal/netx package and into the internal package.
See https://github.com/ooni/probe/issues/2121
Tracex contained some fragile code that assembled HTTP measurements
from scattered events, which worked because we were sure we were
performing a single measurement at any given time.
This diff restructures the code to emit a transaction-start and a
transaction-done events only. We have basically removed all the other
events (which we were not using). We kept the transaction-start
though, because it may be useful to see it when reading events. In
any case, what matters here is that we're now using the transaction-done
event aline to generate the archival HTTP measurement.
Hence, the original issue has been addressed. We will possibly
do more refactoring in the future, but for now this seems sufficient.
Part of https://github.com/ooni/probe/issues/2121
The main issue I see inside tracex at the moment is that we
construct the HTTP measurement from separate events.
This is fragile because we cannot be sure that these events
belong to the same round trip. (Currently, they _are_ part
of the same round trip, but this is a fragile assumption and
it would be much more robust to dispose of it.)
To prepare for emitting a single event, it's imperative to
have two distinct fields for HTTP request and response headers,
which is the main contribution in this commit.
Then, we have a bunch of smaller changes including:
1. correctly naming 'response' the DNS response (instead of 'reply')
2. ensure we always use pointer receivers
Reference issue: https://github.com/ooni/probe/issues/2121
Rather than matching a string, match a type.
This is more robust considering future refactorings.
We're confident the names did not change in _this_ refactoring
because we're still testing the same strings in the tests.
Part of https://github.com/ooni/probe/issues/2121
Acknowledge that transports MAY be used in isolation (i.e., outside
of a Resolver) and add support for wrapping.
Ensure that every factory that creates an unwrapped type is named
accordingly to hopefully ensure there are no surprises.
Implement DNSTransport wrapping and use a technique similar to the
one used by Dialer to customize the DNSTransport while constructing
more complex data types (e.g., a specific resolver).
Ensure that the stdlib resolver's own "getaddrinfo" transport (1)
is wrapped and (2) could be extended during construction.
This work is part of my ongoing effort to bring to this repository
websteps-illustrated changes relative to netxlite.
Ref issue: https://github.com/ooni/probe/issues/2096
This diff modifies the system resolver to use a getaddrinf transport.
Obviously the transport is a fake, but its existence will allow us
to observe DNS events more naturally.
A lookup using the system resolver would be a ANY lookup that will
contain all the resolved IP addresses into the same response.
This change was also part of websteps-illustrated, albeit the way in
which I did it there was less clean than what we have here.
Ref issue: https://github.com/ooni/probe/issues/2096
Rather than passing functions to construct complex objects such
as Dialer and QUICDialer, pass interface implementations.
Ensure that a nil implementation does not cause harm.
Make Saver implement the correct interface either directly or
indirectly. We need to implement the correct interface indirectly
for TCP conns (or connected UDP sockets) because we have two
distinct use cases inside netx: observing just the connect event
and observing just the I/O events.
With this change, the construction of composed Dialers and
QUICDialers is greatly simplified and more obvious.
Part of https://github.com/ooni/probe/issues/2121
The code that is now into the tracex package was written a long
time ago, so let's start to make it more in line with the coding
style of packages that were written more recently.
I didn't apply all the changes I'd like to apply in a single diff
and for now I am committing just this diff.
Broadly, what we need to do is:
1. improve documentation
2. ~always use pointer receivers (object receives have the issue
that they are not mutable by accident meaning that you can mutate
them but their state do not change after the call returns, which
is potentially a source of bugs in case you later refactor to use
a pointer receiver, so always use pointer receivers)
3. ~always avoid embedding (let's say we want to avoid embedding
for types we define and it's instead fine to embed types that are
defined in the stdlib: if later we add a new method, we will not
see a broken build and we'll probably forget to add the new method
to all wrappers -- conversely, if we're wrapping rather than
embedding, we'll see a broken build and act accordingly)
4. prefer unit tests and group tests by type being tested rather
than using a flat structure for tests
There's a coverage slippage that I'll compensate in a follow-up diff where I'll focus on unit testing.
Reference issue: https://github.com/ooni/probe/issues/2121
This diff creates a new package under netx called tracex that
contains everything we need to perform measurements using events
tracing and postprocessing (which is the technique with which
we implement most network experiments).
The general idea here is to (1) create a unique package out of
all of these packages; (2) clean up the code a bit (improve tests,
docs, apply more recent code patterns); (3) move the resulting
code as a toplevel package inside of internal.
Once this is done, netx can be further refactored to avoid
subpackages and we can search for more code to salvage/refactor.
See https://github.com/ooni/probe/issues/2121
This diff modifies the construction of a dialer to allow one
to insert custom dialer wrappers into the dialers chain.
The point of the chain in which we allow custom wrappers is the
optimal one for connect, read, and write measurements.
This new design is better than the previous netx design since
we don't need to construct the whole chain manually now.
The work in this diff is part of the effort to make engine/netx
just a tiny wrapper around netxlite.
See https://github.com/ooni/probe/issues/2121.
This diff required us to move some code around, but no major
change actually happened, except better tests.
While there, I also slightly refactored ndt7's implementation and
removed the ProxyURL setting, which was actually unused.
See https://github.com/ooni/probe/issues/2121
This diff replaces engine/netx code with netxlite code in
the engine/session.go file. To this end, we needed to move
some code from engine/netx to netxlite. While there, we
did review and improve the unit tests.
A notable change in this diff is (or seems to be) that in
engine/session.go we're not filtering for bogons anymore so
that, in principle, we could believe a resolver returning
to us bogon IP addresses for OONI services. However, I did
not bother with changing bogons filtering because the
sessionresolver package is already filtering for bogons,
so it is actually okay to avoid doing that again the
session.go code. See:
https://github.com/ooni/probe-cli/blob/v3.15.0-alpha.1/internal/engine/internal/sessionresolver/resolvermaker.go#L88
There are two reference issues for this cleanup:
1. https://github.com/ooni/probe/issues/2115
2. https://github.com/ooni/probe/issues/2121
In https://github.com/ooni/probe/issues/2029#issuecomment-1140805266, we
explained why calling it "netgo" would be incorrect.
In other words, we can get the platform's `getaddrinfo` as long as
we're not cross compiling. We do cross compile `ooniprobe`, actually
it's not even possible to cross compile it.
For increased accuracy, we should stop cross compiling `miniooni`
as well, so it would also directly use `getaddrinfo`.
This diff fixes at the same time ooni/probe-cli and ooni/spec
and we'll open two pull requests in parallel.
After https://github.com/ooni/probe-cli/pull/764, the build for
CGO_ENABLED=0 has been broken for miniooni:
https://github.com/ooni/probe-cli/runs/6636995859?check_suite_focus=true
Likewise, it's not possible to run tests with CGO_ENABLED=0.
To make tests work with `CGO_ENABLED=0`, I needed to sacrifice some
unit tests run for the CGO case. It is not fully clear to me what was happening
here, but basically `getaddrinfo_cgo_test.go` was compiled with CGO
being disabled, even though the ``//go:build cgo` flag was specified.
Additionally, @hellais previously raised a valid point in the review
of https://github.com/ooni/probe-cli/pull/698:
> Another issue we should consider is that, if I understand how
> this works correctly, depending on whether or not we have built
> with CGO_ENABLED=0 on or not, we are going to be measuring
> things in a different way (using our cgo inspired getaddrinfo
> implementation or using netgo). This might present issues when
> analyzing or interpreting the data.
>
> Do we perhaps want to add some field to the output data format that
> gives us an indication of which DNS resolution code was used to
> generate the the metric?
This comment is relevant to the current commit because
https://github.com/ooni/probe-cli/pull/698 is the previous
iteration of https://github.com/ooni/probe-cli/pull/764.
So, while fixing the build and test issues, let us also distinguish
between the CGO_ENABLED=1 and CGO_ENABLED=0 cases.
Before this commit, OONI used "system" to indicate the case where
we were using net.DefaultResolver. This behavior dates back to the
Measurement Kit days. While it is true that ooni/probe-engine and
ooni/probe-cli could have been using netgo in the past when we
said "system" as the resolver, it also seems reasonable to continue
to use "system" top indicate getaddrinfo.
So, the choice here is basically to use "netgo" from now on to
indicate the cases in which we were built with CGO_ENABLED=0.
This change will need to be documented into ooni/spec along with
the introduction of the `android_dns_cache_no_data` error.
## Checklist
- [x] I have read the [contribution guidelines](https://github.com/ooni/probe-cli/blob/master/CONTRIBUTING.md)
- [x] reference issue for this pull request: https://github.com/ooni/probe/issues/2029
- [x] if you changed anything related how experiments work and you need to reflect these changes in the ooni/spec repository, please link to the related ooni/spec pull request: https://github.com/ooni/spec/pull/242
To this end, we need to refactor the implementation to give the
DNSOverUDPChannel owenership over the net.Conn.
Once this happens, DNSOverUDPChannel.Close closes the conn.
When the conn is closed, the background goroutine will terminate
immediately because any blocking I/O operation will be immediately
unblocked and return net.ErrClosed.
See https://github.com/ooni/probe/issues/2099#issuecomment-1139066946
This diff introduces support for observing additional DNS-over-UDP
responses in some censored environments (e.g. China).
After some uncertainty around whether to use connected or unconnected
UDP sockets, I eventually settled for connected.
Here's a recap:
| | connected | unconnected |
| ----------------------- | --------- | ----------- |
| see ICMP errors | ✔️ | ❌ |
| responses from any server | ❌ | ✔️ |
Because most if not all DNS resolvers expect answers from exactly
the same servers to which they sent the query, I would say that
it's more important to have some limited ability of observing the
effect of ICMP errors (e.g., host_unreachable when we set a low
TTL and send out a query to a server).
Therefore, my choice was to modify the existing DNS-over-UDP transport.
Here's an overview of the changes:
1. introduce a new API for performing an async round trip that returns
a channel wrapper where all responses are posted. The channel will not ever
be closed, so the reader needs to use select for safely reading. If the
reader users the wrapper's Next or TryNextResponses methods, these details
do not matter because they already implement a safe reading pattern.
2. the async round trip API performs the round trip in the background
and stops processing when it sees the first error.
3. the background running code will use an overall deadline derived
from the DNSTransport.IOTimeout field to know when to stop.
4. the background running code will additionally stop running if
noone is reading the channel and there are no empty slots in the
channel's buffer.
5. the RoundTrip method has been rewritten in terms of the async API.
The design I'm using here implements the proposal for async round
trips defined at https://github.com/ooni/probe/issues/2099. I have
chosen not to make all transports async because the DNS transport
seems the only transport that needs to also work in async mode.
While there, I noticed that we were not propagating CloseIdleConnection
to the underlying dialer, which was potentially wrong, so I did it.
This diff refactors the DNSTransport model to receive in input a DNSQuery and return in output a DNSResponse.
The design of DNSQuery and DNSResponse takes into account the use case of a transport using getaddrinfo, meaning that we don't need to serialize and deserialize messages when using getaddrinfo.
The current codebase does not use a getaddrinfo transport, but I wrote one such a transport in the Websteps Winter 2021 prototype (https://github.com/bassosimone/websteps-illustrated/).
The design conversation that lead to producing this diff is https://github.com/ooni/probe/issues/2099
These two small packages could easily be merged into the model
package, since they're clearly model-like packages.
Part of https://github.com/ooni/probe/issues/2115
The objective is to make PR checks run much faster.
See https://github.com/ooni/probe/issues/2113 for context.
Regarding netxlite's tests:
Checking for every commit on master or on a release branch is
good enough and makes pull requests faster than one minute since
netxlite for windows is now 1m slower than coverage.
We're losing some coverage but coverage from integration tests
is not so good anyway, so I'm not super sad about this loss.
Rather than building for Android using ooni/go, we're now using
ooni/oocryto as the TLS dependency. Such a repository only forks
crypto/tls and some minor crypto packages and includes the
same set of patches that we have been using in ooni/go.
This new strategy should be better than the previous one in
terms of building for Android, because we can use the vanilla
go1.18.2 build. It also seems that it is easier to track and
merge from upstream with ooni/oocrypto than it is with ooni/go.
Should this assessment be wrong, we can revert back to the
previous scenario where we used ooni/go.
See https://github.com/ooni/probe/issues/2106 for extra context.
* Passed the TestHelpers field to RunAsyc and MeasureAsync. This reflects the test_helpers in the measurement.
* Spec already contains the correct output.
See https://github.com/ooni/probe/issues/2073
Co-authored-by: decfox <decfox>
Previously, the DNS decoder did not check whether it was parsing
a DNS query or a DNS response, which was wrong.
As a side note, it seems I am using "reply" in the codebase instead
of "response". The latter seems correct DNS terminology.
This diff has been extracted from 9249d14f80
See https://github.com/ooni/probe/issues/2096.
This diff has been extracted and adapted from 8848c8c516
The reason to prefer composition over embedding is that we want the
build to break if we add new methods to interfaces we define. If the build
does not break, we may forget about wrapping methods we should
actually be wrapping. I noticed this issue inside netxlite when I was working
on websteps-illustrated and I added support for NS and PTR queries.
See https://github.com/ooni/probe/issues/2096
While there, perform comprehensive netxlite code review
and apply minor changes and improve the docs.
This diff has been extracted from c2f7ccab0e
See https://github.com/ooni/probe/issues/2096
While there, export DecodeReply to decode a raw reply without
interpreting the Rcode or parsing the results, which seems a
nice extra feature to have to more flexibly parse DNS replies
in other parts of the codebase.
This error occurred for example when querying kazemjalali.com
in websteps measurements run from Iran.
This error is relatively uncommon, but it still makes sense to
create a specific mapping rule for it.
Originally: 4269e82fbd
See https://github.com/ooni/probe/issues/2096
I've seen some measurements returning some IP addresses for HTTPSSvc
queries but not returning any ALPN value.
For example:
```
% d4
decoding DNS round trip 0:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57768
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;psiphon.ca. IN HTTPS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57768
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;psiphon.ca. IN HTTPS
;; ANSWER SECTION:
psiphon.ca. 121 IN A 31.13.85.53
```
Now, the response is clearly bogus. At the time of this writing that
IP address belongs to Facebook. This measurement has been collected in
China, so it's expected for the GFW to behave like this.
Yet, I don't feel like it's accurate to report this measurement as a
"no answer" response. Rather, this response is a valid one containing
a clearly invalid IP address and should be flagged as such.
Originally: 57a023bcf4
See https://github.com/ooni/probe/issues/2096
This diff fixes the short-circuit-IP-addr resolver to
correctly handle IP addrs during LookupHTTPS.
The original diff was: 2b51d144bf
See https://github.com/ooni/probe/issues/2096
While there, add unit tests for IPv6.
This diff re-implements the vanilla_tor experiment. This experiment was
part of the ooni/probe-legacy implementation.
The reference issue is https://github.com/ooni/probe/issues/803. We didn't
consider the possible improvements mentioned by the
https://github.com/ooni/probe/issues/803#issuecomment-598715694 comment,
which means we'll need to create a follow-up issue for them. We will
then decide whether, when, and how to implement those follow-up measurements
either into `vanilla_tor` or into the existing `tor` experiment.
This novel `vanilla_tor` implementation emits test_keys that are mostly
compatible with the original implementation, however:
1. the `timeout` is a `float64` rather than integer (but the default
timeout is an integer, so there are no JSON-visible changes);
2. the `tor_log` string is gone and replaced by the `tor_logs` list
of strings, which contains the same information;
3. the definition of `error` has been augmented to include the
case in which there is an unknown error;
4. the implementation of vanilla_tor mirrors closely the one of torsf
and we have taken steps to make the two implementations as comparable
as possible in terms of the generated JSON measurement.
The main reason why we replaced `tor_log` with `tor_logs` are:
1. that `torsf` already used that;
2. that reading the JSON is easier with this implementation compared to
an implementation where all logs are into the same string.
If one is processing the new data format using Python, then it will
not be difficult convert `tor_log` to `tor_logs`. In any case, because
we extract the most interesting fields (e.g., the percentage of the
bootstrap where tor fails), it seems that logs are probably more useful
as something you want to read in edge cases (I guess).
Also, because we want `torsf` and `vanilla_tor` to have similar JSONs,
we renamed `torsf`'s `default_timeout` to `timeout`. This change has little
to none real-world impact, because no stable version of OONI Probe has
ever shipped a `torsf` producing the `default_timeout` field.
Regarding the structure of this diff, we have:
1. factored code to parse tor logs into a separate package;
2. implemented `vanilla_tor` as a stripped down `torsf` and added further
changes to ensure compatibility with the previous `vanilla_tor`'s data format;
3. improved `torsf` to merge back the changes in `vanilla_tor`, so the two
data formats of the two experiments are as similar as possible.
We believe producing as similar as possible data formats helps anyone who's
reading measurements generated by both experiments.
We have retained/introduced `vanilla_tor`'s `error` field, which is not very
useful when one has a more precise failure but is still what `vanilla_tor`
used to emit, so it makes sense to also have this field.
In addition to changing the implementation, we also updated the specs.
As part of our future work, we may want to consider factoring the common code
of these two experiments into the same underlying support library.
* quic-go upgrade: replaced Session/EarlySession with Connection/EarlyConnection
* quic-go upgrade: added context to RoundTripper.Dial
* quic-go upgrade: made corresponding changes to tutorial
* quic-go upgrade: changed sess variable instances to qconn
* quic-go upgrade: made corresponding changes to tutorial
* cleanup: remove unnecessary comments
Those comments made sense in terms of illustrating the changes
but they're going to be less useful once we merge.
* fix(go.mod): apparently we needed `go1.18.1 mod tidy`
VSCode just warned me about this. It seems fine to apply this
change as part of the pull request at hand.
* cleanup(netxlite): http3dialer can be removed
We used to use http3dialer to glue a QUIC dialer, which had a
context as its first argument, to the Dial function used by the
HTTP3 transport, which did not have a context as its first
argument.
Now that HTTP3 transport has a Dial function taking a context as
its first argument, we don't need http3dialer
anymore, since we can use the QUIC dialer directly.
Cc: @DecFox
* Revert "cleanup(netxlite): http3dialer can be removed"
This reverts commit c62244c620cee5fadcc2ca89d8228c8db0b96add
to investigate the build failure mentioned at
https://github.com/ooni/probe-cli/pull/715#issuecomment-1119450484
* chore(netx): show that test was already broken
We didn't see the breakage before because we were not using
the created transport, but the issue of using a nil dialer was
already present before, we just didn't see it.
Now we understand why removing the http3transport in
c62244c620cee5fadcc2ca89d8228c8db0b96add did cause the
breakage mentioned at
https://github.com/ooni/probe-cli/pull/715#issuecomment-1119450484
* fix(netx): convert broken integration test to working unit test
There's no point in using the network here. Add a fake dialer that
breaks and ensure we're getting the expected error.
We've now improved upon the original test because the original test was
not doing anything while now we're testing whether we get back a QUIC
dialer that _can be used_.
After this commit, I can then readd the cleanup commit
c62244c620cee5fadcc2ca89d8228c8db0b96add and it won't be
broken anymore (at least, this is what I expected to happen).
* Revert "Revert "cleanup(netxlite): http3dialer can be removed""
This reverts commit 0e254bfc6ba3bfd65365ce3d8de2c8ec51b925ff
because now we should have fixed the broken test.
Co-authored-by: decfox <decfox>
Co-authored-by: Simone Basso <bassosimone@gmail.com>
* tls_handshakes: add IP addresses
* tls_handshakes: extract ip from tcp-connect
* tls_handshake: switched to trace event
* saver.go: get remoteAddr before handshake
Not sure whether this is strictly necessary, but I'd rather take the
remoteAddr before calling Handshake, just in case a future version
of the handshake closes the `conn`. In such a case, `conn.RemoteAddr`
would return `nil` and we would crash here.
This occurred to me while reading once again the diff before merging.
Co-authored-by: decfox <decfox>
Co-authored-by: Simone Basso <bassosimone@gmail.com>
This diff changes the software name used by unattended runs for which
we did not override the default software name (`ooniprobe-cli`).
It will become `ooniprobe-cli-unattended`. This software name is in line
with the one we use for Android, iOS, and desktop unattended runs.
While working in this diff, I introduced string constants for the run
types and a string constant for the default software name.
See https://github.com/ooni/probe/issues/2081.
Here's the squash of the following patches that enable support
for go1.18 and update our dependencies.
This diff WILL need to be backported to the release/3.14 branch.
* chore: use go1.17.8
See https://github.com/ooni/probe/issues/2067
* chore: upgrade to probe-assets@v0.8.0
See https://github.com/ooni/probe/issues/2067.
* chore: update dependencies and enable go1.18
As mentioned in 7a0d17ea91,
the tree won't build with `go1.18` unless we say it does.
So, not only here we need to update dependencies but also we
need to explicitly say `go1.18` in the `go.mod`.
This work is part of https://github.com/ooni/probe/issues/2067.
* chore(coverage.yml): run with go1.18
This change will give us a bare minimum confidence that we're
going to build our tree using version 1.18 of golang.
See https://github.com/ooni/probe/issues/2067.
* chore: update user agent used for measuring
See https://github.com/ooni/probe/issues/2067
* chore: run `go generate ./...`
See https://github.com/ooni/probe/issues/2067
* fix(dialer_test.go): make test work with go1.17 and go1.18
1. the original test wanted the dial to fail, so ensure we're not
passing any domain name to exercise dialing not resolving;
2. match the end of the error rather than the whole error string.
Tested locally with both go1.17 and go1.18.
See https://github.com/ooni/probe-cli/pull/708#issuecomment-1096447186
This change should prevent old clients (e.g., Android 6) from
failing to perform a ndt7 experiment because their internal CA
bundle is now too old.
Reference issue: https://github.com/ooni/probe/issues/2031
While there, run `go mod tidy` to fix a minor inconsistence in
the current `go.mod` file.
This diff WILL require a backport to release/3.14.
This experiment pings a QUIC-able host. It can be used to measure QUIC availability independently from TLS.
This is the reference issue: https://github.com/ooni/probe/issues/1994
### A QUIC PING is:
- a QUIC Initial packet with a size of 1200 bytes (minimum datagram size defined in the [RFC 9000](https://www.rfc-editor.org/rfc/rfc9000.html#initial-size)),
- with a random payload (i.e. no TLS ClientHello),
- with the version string 0xbabababa which forces Version Negotiation at the server.
QUIC-able hosts respond to the QUIC PING with a Version Negotiation packet.
The input is a domain name or an IP address. The default port used by quicping is 443, as this is the port used by HTTP/3. The port can be modified with the `-O Port=` option.
The default number of repetitions is 10, it can be changed with `-O Repetitions=`.
### Usage:
```
./miniooni -i google.com quicping
./miniooni -i 142.250.181.206 quicping
./miniooni -i 142.250.181.206 -OPort=443 quicping
./miniooni -i 142.250.181.206 -ORepetitions=2 quicping
```
The oonireport client (re-)uploads a measurement report file. This can be helpful when the measurement was not uploaded at runtime.
Usage: `./oonireport upload <file>`, where `<file>` is a json(l) file containing one OONI measurement per line.
This pull request refers to https://github.com/ooni/probe/issues/2003 and https://github.com/ooni/probe/issues/950.
Co-authored-by: Simone Basso <bassosimone@gmail.com>
This commit forward ports 8f2d7945f806579af4d0495f4b8f5a6a01eefb0c, whose
commit message is as follows:
- - -
The discrepancy I was seeing between my local tests and tests run
in the CI is that my systemd is configured to use DoT.
Hence, it was bypassing iptables rules because the query was sent
over an encrypted tunnel. Using a pure Go resolver fixes since
that always uses UDP, so the filter works.
Also, reason that we want as minimal as possible tests, so refactor
a test so that we use just a resolver rather than an HTTP client, and,
while there, also enforce this resolver to be a pure Go resolver.
Reference issue: https://github.com/ooni/probe/issues/2016
This diff WILL need to be forward ported to master.
This diff forward ports b6db4f64dc83a2a27ee3ce6bba5ac93db922832d, whose
original log message is the following:
- - -
We're now using ooni/oohttp as our HTTP library in most cases.
A limitation of this library is that net/http/httptrace does not
work very well and reliably because (1) we need to use oohttp's
version of that code and (2) we cannot observe net events.
I noticed this fact because an integration test for collecting
HTTP performance metrics was broken.
The best solution here is to remove this functionality, since
it was basically unused in the repository. Only some integration
tests inside urlgetter bothered with these metrics.
A more clinical fix would have been to use ooni/oohttp/httptrace
instead of net/http/httptrace in the stdlib, but it does not
seem to be a good idea, given that those metrics were not used.
With this diff applied, we'll further reduce the number of locally
failing integration tests to just jafar-specific tests.
This diff WILL need to be forwardported to `master`.
We've just branched off the release/3.14 branch for finalizing
the release of 3.14.0, hence let's declare that from now on we're
3.15.0-alpha to avoid any confusion.
This issue aims at making life slighly better for users impacted by
sanctions whose iplookup may be quite slow in case there are timeouts
as documented in https://github.com/ooni/probe/issues/1988.
Cloudflare hosted services provide a certain service of `/cdn-cgi/trace` with their base url (for example, `www.cloudflare.com` or `www.nginx.com`), which can be used to obtain `ip` in the probe's `geolocate` feature.
The same feature was added in this pr, hence, increasing the number of `baseURL`s in `geolocate`.
Co-authored-by: Simone Basso <bassosimone@gmail.com>
This diff contains significant improvements over the previous
implementation of the torsf experiment.
We add support for configuring different rendezvous methods after
the convo at https://github.com/ooni/probe/issues/2004. In doing
that, I've tried to use a terminology that is consistent with the
names being actually used by tor developers.
In terms of what to do next, this diff basically instruments
torsf to always rendezvous using domain fronting. Yet, it's also
possible to change the rendezvous method from the command line,
when using miniooni, which allows to experiment a bit more. In the
same vein, by default we use a persistent tor datadir, but it's
also possible to use a temporary datadir using the cmdline.
Here's how a generic invocation of `torsf` looks like:
```bash
./miniooni -O DisablePersistentDatadir=true \
-O RendezvousMethod=amp \
-O DisableProgress=true \
torsf
```
(The default is `DisablePersistentDatadir=false` and
`RendezvousMethod=domain_fronting`.)
With this implementation, we can start measuring whether snowflake
and tor together can boostrap, which seems the most important thing
to focus on at the beginning. Understanding why the bootstrap most
often does not converge with a temporary datadir on Android devices
remains instead an open problem for now. (I'll also update the
relevant issues or create new issues after commit this.)
We also address some methodology improvements that were proposed
in https://github.com/ooni/probe/issues/1686. Namely:
1. we record the tor version;
2. we include the bootstrap percentage by reading the logs;
3. we set the anomaly key correctly;
4. we measure the bytes send and received (by `tor` not by `snowflake`, since
doing it for snowflake seems more complex at this stage).
What remains to be done is the possibility of including Snowflake
events into the measurement, which is not possible until the new
improvements at common/event in snowflake.git are included into a
tagged version of snowflake itself. (I'll make sure to mention
this aspect to @cohosh in https://github.com/ooni/probe/issues/2004.)
I have tested this integration test locally and it's now WAI.
It may be that it will fail again when run on GitHub Actions, which will
indicate we cannot fully trust Actions for running _some_ tests.
Closes https://github.com/ooni/probe/issues/1913.
This diff includes some final changes to be ready for blessing
a cli release. These changes are:
1. run `go generate ./...` to update the bundled CA
2. update the header we use for measuring
3. ensure `mk` uses the latest version of several tools
Reference issue: https://github.com/ooni/probe/issues/1845
This commit message is the same across probe-cli, probe-desktop,
and probe-android. With the changes contained in the enclosed
diff, I'm starting to add support for torsf for android and for
desktop.
When smoke testing that torsf was WAI, I also noticed that its
progress messages in output are too frequent. We may want to do
better in a future version when we'll be able to read `tor`'s
output. In the meanwhile, make the progress messages less
frequent and indicated the maximum runtime inside of the messages
themselves. This improved message, albeit not so nice from the
UX PoV, should at least provide a clue that we're not stuck.
Reference issue: https://github.com/ooni/probe/issues/1917
Reference issue: https://github.com/ooni/probe/issues/1917.
I needed to change the summary key type returned by `torsf` to be a value. It seems the DB layer assumes that. If we pass it a pointer, it panics because it's experiment a value rather than a pointer 🤷.
I have experimented with a new approach for embedding psiphon in
7fc0bcd97c.
It seems the build is still building and the experiment is still
running. With the new approach, we're now vendoring less dependencies,
which hopefully puts us in the right track to, one day, import
Psiphon as a normal Go dependency.
I'll make sure to report to the Psiphon team what is currently
preventing us from importing their ClientLibrary directly.
This work is part of https://github.com/ooni/probe/issues/1894.
As part of running the update, I run `go get -u -v ./...`, which
led to go-cmp also being updated in the process.
This diff contains the following changes and enhancements:
1. upgrade snowflake to v2
2. observe that we were not changing defaults from outside of snowflake.go, so remove code allowing to do that;
3. bump the timeout to 600 seconds (it seems 300 was not always enough based on my testing);
4. add useful knob to disable `torsf` progress (it's really annoying on console, we should do something about this);
5. ptx.go: avoid printing an error when the connection has just been closed;
6. snowflake: test AMP cache, see that it's not working currently, so leave it disabled.
Related issues: https://github.com/ooni/probe/issues/1845, https://github.com/ooni/probe/issues/1894, and https://github.com/ooni/probe/issues/1917.
This diff introduces a new package called `./internal/archival`. This package collects data from `./internal/model` network interfaces (e.g., `Dialer`, `QUICDialer`, `HTTPTransport`), saves such data into an internal tabular data format suitable for on-line processing and analysis, and allows exporting data into the OONI data format.
The code for collecting and the internal tabular data formats are adapted from `measurex`. The code for formatting and exporting OONI data-format-compliant structures is adapted from `netx/archival`.
My original objective was to _also_ (1) fully replace `netx/archival` with this package and (2) adapt `measurex` to use this package rather than its own code. Both operations seem easily feasible because: (a) this code is `measurex` code without extensions that are `measurex` related, which will need to be added back as part of the process; (b) the API provided by this code allows for trivially converting from using `netx/archival` to using this code.
Yet, both changes should not be taken lightly. After implementing them, there's need to spend some time doing QA and ensuring all nettests work as intended. However, I am planning a release in the next two weeks, and this QA task is likely going to defer the release. For this reason, I have chosen to commit the work done so far into the tree and defer the second part of this refactoring for a later moment in time. (This explains why the title mentions "1/N").
On a more high-level perspective, it would also be beneficial, I guess, to explain _why_ I am doing these changes. There are two intertwined reasons. The first reason is that `netx/archival` has shortcomings deriving from its original https://github.com/ooni/netx legacy. The most relevant shortcoming is that it saves all kind of data into the same tabular structure named `Event`. This design choice is unfortunate because it does not allow one to apply data-type specific logic when processing the results. In turn, this choice results in complex processing code. Therefore, I believe that replacing the code with event-specific data structures is clearly an improvement in terms of code maintainability and would quite likely lead us to more confidently change and evolve the codebase.
The second reason why I would like to move forward these changes is to unify the codepaths used for measuring. At this point in time, we basically have two codepaths: `./internal/engine/netx` and `./internal/measurex`. They both have pros and cons and I don't think we want to rewrite whole experiments using `netx`. Rather, what we probably want is to gradually merge these two codepaths such that `netx` is a set of abstractions on top of `measurex` (which is more low-level and has a more-easily-testable design). Because saving events and generating an archival data format out of them consists of at least 50% of the complexity of both `netx` and `measurex`, it seems reasonable to unify this archival-related part of the two codebases as the first step.
At the highest level of abstraction, these changes are part of the train of changes which will eventually lead us to bless `websteps` as a first class citizen in OONI land. Because `websteps` requires different underlying primitives, I chose to develop these primitives from scratch rather than wrestling with `netx`, which used another model. The model used by `websteps` is that we perform each operation in isolation and immediately we save the results, while `netx` creates whole data structures and collects all the events happening via tracing. We believe the model used by `websteps` to be better because it does not require your code to figure out everything that happened after the measurement, which is a source of subtle bugs in the current implementation. So, when I started implementing websteps I extracted the bits of `netx` that could also be beneficial to `websteps` into a separate library, thus `netxlite` was born.
The reference issue describing merging the archival of `netx` and `measurex` is https://github.com/ooni/probe/issues/1957. As of this writing the issue still references the original plan, which I could not complete by the end of this Sprint, so I am going to adapt the text of the issue to only refer to what was done in here next. Of course, I also need follow-up issues.
* chore(netxlite): add currently failing test case
This diff introduces a test cases that will fail because of the reason
explained in https://github.com/ooni/probe/issues/1965.
* chore(netxlite/iox_test.go): add failing unit tests
These tests directly show how the Go implementation of ReadAll
and Copy has the issue of checking for io.EOF equality.
* fix(netxlite): make {ReadAll,Copy}Context robust to wrapped io.EOF
The fix is simple: we just need to check for `errors.Is(err, io.EOF)`
after either io.ReadAll or io.Copy has returned. When this condition is
true, we need to convert the error back to `nil` as it ought to be.
While there, observe that the unit tests I committed in the previous
commit are wrongly asserting that the error must be wrapped. This
assertion is not correct, because in both cases we have just ensured
that the returned error is `nil` (i.e., success).
See https://github.com/ooni/probe/issues/1965.
* cleanup: remove previous workaround for wrapped io.EOF
These workarounds were partial, meaning that they would cover some
cases in which the issue occurred but not all of them.
Handling the problem in `netxlite.{ReadAll,Copy}Context` is the
right thing to do _as long as_ we always use these functions instead
of `io.{ReadAll,Copy}`.
This is why it's now important to ensure we clearly mention that
inside of the `CONTRIBUTING.md` guide and to also ensure that we're
not using these functions in the code base.
* fix(urlgetter): repair tests who assumed to see EOF error
Now that we have established that we should normalize EOF when
reading bodies like the stdlib does and now that it's clear why
our behavior diverged from the stdlib, we also need to repair
all the tests that assumed this incorrect behavior.
* fix(all): don't use io{,util}.{Copy,ReadAll}
* feat: add checks to ensure we don't use io.{Copy,ReadAll}
* doc(netxlite): document we know how to deal w/ wrapped io.EOF
* fix(nocopyreadall.bash): add exception for i/n/iox.go
The DNSClient type existed because the Resolver type did not
include CloseIdleConnections in its signature.
Now that Resolver includes CloseIdleConnections, the DNSClient
type has become unnecessary and can be safely removed.
See https://github.com/ooni/probe/issues/1956.
We recently started moving core data structures inside of the
internal/model package as detailed in https://github.com/ooni/probe/issues/1885.
The chief reason to do that is to have a set of fundamental
shared data types to help us rationalize the codebase.
This specific diff moves internal/netx/archival's core data types
inside the internal/model package. While there, it also refactors the
existing tests to improve their quality. Additionally, we also added
an extra test to ensure `ArchivalHTTPBody` is an alias for
`ArchivalMaybeBinaryData`, which is required to ensure the
custom JSON serialization process works for it.
We're doing that because both internal/netx/archival and
internal/measurex define their own archival data structures.
We developed measurex using its own structures because it
allowed to iterate more quickly. Now that we have sketched
out measurex, the time has come to consolidate.
My overall aim is to spend a few more hours this week on
engineering measurex. This work is preliminary work before
we finish up both measurex and websteps.
We described this cleanup in https://github.com/ooni/probe/issues/1957.
This diff addresses two items of https://github.com/ooni/probe/issues/1956:
> - [ ] we can remove legacy names from `./internal/engine/netx/resolver/legacy.go`
>
> - [ ] we can remove `DialTLSContext` from `./internal/engine/netx/resolver/tls_test.go`
More cleanups may follow.
This is another cleanup point mentioned by https://github.com/ooni/probe/issues/1956.
While there, fix a bunch of comments in jafar that were incorrectly
referring to the netx package name.
This diff addresses another point of https://github.com/ooni/probe/issues/1956:
> - [ ] observe that we're still using a bunch of private interfaces for common interfaces such as the `Dialer`, so we can get rid of these private interfaces and always use the ones in `model`, which allows us to remove a bunch of legacy wrappers
Additional cleanups may still be possible. The more I cleanup, the more I see
there's extra legacy code we can dispose of (which seems good?).