To this end, we need to refactor the implementation to give the
DNSOverUDPChannel owenership over the net.Conn.
Once this happens, DNSOverUDPChannel.Close closes the conn.
When the conn is closed, the background goroutine will terminate
immediately because any blocking I/O operation will be immediately
unblocked and return net.ErrClosed.
See https://github.com/ooni/probe/issues/2099#issuecomment-1139066946
This diff introduces support for observing additional DNS-over-UDP
responses in some censored environments (e.g. China).
After some uncertainty around whether to use connected or unconnected
UDP sockets, I eventually settled for connected.
Here's a recap:
| | connected | unconnected |
| ----------------------- | --------- | ----------- |
| see ICMP errors | ✔️ | ❌ |
| responses from any server | ❌ | ✔️ |
Because most if not all DNS resolvers expect answers from exactly
the same servers to which they sent the query, I would say that
it's more important to have some limited ability of observing the
effect of ICMP errors (e.g., host_unreachable when we set a low
TTL and send out a query to a server).
Therefore, my choice was to modify the existing DNS-over-UDP transport.
Here's an overview of the changes:
1. introduce a new API for performing an async round trip that returns
a channel wrapper where all responses are posted. The channel will not ever
be closed, so the reader needs to use select for safely reading. If the
reader users the wrapper's Next or TryNextResponses methods, these details
do not matter because they already implement a safe reading pattern.
2. the async round trip API performs the round trip in the background
and stops processing when it sees the first error.
3. the background running code will use an overall deadline derived
from the DNSTransport.IOTimeout field to know when to stop.
4. the background running code will additionally stop running if
noone is reading the channel and there are no empty slots in the
channel's buffer.
5. the RoundTrip method has been rewritten in terms of the async API.
The design I'm using here implements the proposal for async round
trips defined at https://github.com/ooni/probe/issues/2099. I have
chosen not to make all transports async because the DNS transport
seems the only transport that needs to also work in async mode.
While there, I noticed that we were not propagating CloseIdleConnection
to the underlying dialer, which was potentially wrong, so I did it.
This diff refactors the DNSTransport model to receive in input a DNSQuery and return in output a DNSResponse.
The design of DNSQuery and DNSResponse takes into account the use case of a transport using getaddrinfo, meaning that we don't need to serialize and deserialize messages when using getaddrinfo.
The current codebase does not use a getaddrinfo transport, but I wrote one such a transport in the Websteps Winter 2021 prototype (https://github.com/bassosimone/websteps-illustrated/).
The design conversation that lead to producing this diff is https://github.com/ooni/probe/issues/2099
These two small packages could easily be merged into the model
package, since they're clearly model-like packages.
Part of https://github.com/ooni/probe/issues/2115
The objective is to make PR checks run much faster.
See https://github.com/ooni/probe/issues/2113 for context.
Regarding netxlite's tests:
Checking for every commit on master or on a release branch is
good enough and makes pull requests faster than one minute since
netxlite for windows is now 1m slower than coverage.
We're losing some coverage but coverage from integration tests
is not so good anyway, so I'm not super sad about this loss.
Rather than building for Android using ooni/go, we're now using
ooni/oocryto as the TLS dependency. Such a repository only forks
crypto/tls and some minor crypto packages and includes the
same set of patches that we have been using in ooni/go.
This new strategy should be better than the previous one in
terms of building for Android, because we can use the vanilla
go1.18.2 build. It also seems that it is easier to track and
merge from upstream with ooni/oocrypto than it is with ooni/go.
Should this assessment be wrong, we can revert back to the
previous scenario where we used ooni/go.
See https://github.com/ooni/probe/issues/2106 for extra context.
* Passed the TestHelpers field to RunAsyc and MeasureAsync. This reflects the test_helpers in the measurement.
* Spec already contains the correct output.
See https://github.com/ooni/probe/issues/2073
Co-authored-by: decfox <decfox>
Previously, the DNS decoder did not check whether it was parsing
a DNS query or a DNS response, which was wrong.
As a side note, it seems I am using "reply" in the codebase instead
of "response". The latter seems correct DNS terminology.
This diff has been extracted from 9249d14f80
See https://github.com/ooni/probe/issues/2096.
This diff has been extracted and adapted from 8848c8c516
The reason to prefer composition over embedding is that we want the
build to break if we add new methods to interfaces we define. If the build
does not break, we may forget about wrapping methods we should
actually be wrapping. I noticed this issue inside netxlite when I was working
on websteps-illustrated and I added support for NS and PTR queries.
See https://github.com/ooni/probe/issues/2096
While there, perform comprehensive netxlite code review
and apply minor changes and improve the docs.
This diff has been extracted from c2f7ccab0e
See https://github.com/ooni/probe/issues/2096
While there, export DecodeReply to decode a raw reply without
interpreting the Rcode or parsing the results, which seems a
nice extra feature to have to more flexibly parse DNS replies
in other parts of the codebase.
This error occurred for example when querying kazemjalali.com
in websteps measurements run from Iran.
This error is relatively uncommon, but it still makes sense to
create a specific mapping rule for it.
Originally: 4269e82fbd
See https://github.com/ooni/probe/issues/2096
I've seen some measurements returning some IP addresses for HTTPSSvc
queries but not returning any ALPN value.
For example:
```
% d4
decoding DNS round trip 0:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57768
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;psiphon.ca. IN HTTPS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57768
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;psiphon.ca. IN HTTPS
;; ANSWER SECTION:
psiphon.ca. 121 IN A 31.13.85.53
```
Now, the response is clearly bogus. At the time of this writing that
IP address belongs to Facebook. This measurement has been collected in
China, so it's expected for the GFW to behave like this.
Yet, I don't feel like it's accurate to report this measurement as a
"no answer" response. Rather, this response is a valid one containing
a clearly invalid IP address and should be flagged as such.
Originally: 57a023bcf4
See https://github.com/ooni/probe/issues/2096
This diff fixes the short-circuit-IP-addr resolver to
correctly handle IP addrs during LookupHTTPS.
The original diff was: 2b51d144bf
See https://github.com/ooni/probe/issues/2096
While there, add unit tests for IPv6.
This diff re-implements the vanilla_tor experiment. This experiment was
part of the ooni/probe-legacy implementation.
The reference issue is https://github.com/ooni/probe/issues/803. We didn't
consider the possible improvements mentioned by the
https://github.com/ooni/probe/issues/803#issuecomment-598715694 comment,
which means we'll need to create a follow-up issue for them. We will
then decide whether, when, and how to implement those follow-up measurements
either into `vanilla_tor` or into the existing `tor` experiment.
This novel `vanilla_tor` implementation emits test_keys that are mostly
compatible with the original implementation, however:
1. the `timeout` is a `float64` rather than integer (but the default
timeout is an integer, so there are no JSON-visible changes);
2. the `tor_log` string is gone and replaced by the `tor_logs` list
of strings, which contains the same information;
3. the definition of `error` has been augmented to include the
case in which there is an unknown error;
4. the implementation of vanilla_tor mirrors closely the one of torsf
and we have taken steps to make the two implementations as comparable
as possible in terms of the generated JSON measurement.
The main reason why we replaced `tor_log` with `tor_logs` are:
1. that `torsf` already used that;
2. that reading the JSON is easier with this implementation compared to
an implementation where all logs are into the same string.
If one is processing the new data format using Python, then it will
not be difficult convert `tor_log` to `tor_logs`. In any case, because
we extract the most interesting fields (e.g., the percentage of the
bootstrap where tor fails), it seems that logs are probably more useful
as something you want to read in edge cases (I guess).
Also, because we want `torsf` and `vanilla_tor` to have similar JSONs,
we renamed `torsf`'s `default_timeout` to `timeout`. This change has little
to none real-world impact, because no stable version of OONI Probe has
ever shipped a `torsf` producing the `default_timeout` field.
Regarding the structure of this diff, we have:
1. factored code to parse tor logs into a separate package;
2. implemented `vanilla_tor` as a stripped down `torsf` and added further
changes to ensure compatibility with the previous `vanilla_tor`'s data format;
3. improved `torsf` to merge back the changes in `vanilla_tor`, so the two
data formats of the two experiments are as similar as possible.
We believe producing as similar as possible data formats helps anyone who's
reading measurements generated by both experiments.
We have retained/introduced `vanilla_tor`'s `error` field, which is not very
useful when one has a more precise failure but is still what `vanilla_tor`
used to emit, so it makes sense to also have this field.
In addition to changing the implementation, we also updated the specs.
As part of our future work, we may want to consider factoring the common code
of these two experiments into the same underlying support library.
* quic-go upgrade: replaced Session/EarlySession with Connection/EarlyConnection
* quic-go upgrade: added context to RoundTripper.Dial
* quic-go upgrade: made corresponding changes to tutorial
* quic-go upgrade: changed sess variable instances to qconn
* quic-go upgrade: made corresponding changes to tutorial
* cleanup: remove unnecessary comments
Those comments made sense in terms of illustrating the changes
but they're going to be less useful once we merge.
* fix(go.mod): apparently we needed `go1.18.1 mod tidy`
VSCode just warned me about this. It seems fine to apply this
change as part of the pull request at hand.
* cleanup(netxlite): http3dialer can be removed
We used to use http3dialer to glue a QUIC dialer, which had a
context as its first argument, to the Dial function used by the
HTTP3 transport, which did not have a context as its first
argument.
Now that HTTP3 transport has a Dial function taking a context as
its first argument, we don't need http3dialer
anymore, since we can use the QUIC dialer directly.
Cc: @DecFox
* Revert "cleanup(netxlite): http3dialer can be removed"
This reverts commit c62244c620cee5fadcc2ca89d8228c8db0b96add
to investigate the build failure mentioned at
https://github.com/ooni/probe-cli/pull/715#issuecomment-1119450484
* chore(netx): show that test was already broken
We didn't see the breakage before because we were not using
the created transport, but the issue of using a nil dialer was
already present before, we just didn't see it.
Now we understand why removing the http3transport in
c62244c620cee5fadcc2ca89d8228c8db0b96add did cause the
breakage mentioned at
https://github.com/ooni/probe-cli/pull/715#issuecomment-1119450484
* fix(netx): convert broken integration test to working unit test
There's no point in using the network here. Add a fake dialer that
breaks and ensure we're getting the expected error.
We've now improved upon the original test because the original test was
not doing anything while now we're testing whether we get back a QUIC
dialer that _can be used_.
After this commit, I can then readd the cleanup commit
c62244c620cee5fadcc2ca89d8228c8db0b96add and it won't be
broken anymore (at least, this is what I expected to happen).
* Revert "Revert "cleanup(netxlite): http3dialer can be removed""
This reverts commit 0e254bfc6ba3bfd65365ce3d8de2c8ec51b925ff
because now we should have fixed the broken test.
Co-authored-by: decfox <decfox>
Co-authored-by: Simone Basso <bassosimone@gmail.com>
* tls_handshakes: add IP addresses
* tls_handshakes: extract ip from tcp-connect
* tls_handshake: switched to trace event
* saver.go: get remoteAddr before handshake
Not sure whether this is strictly necessary, but I'd rather take the
remoteAddr before calling Handshake, just in case a future version
of the handshake closes the `conn`. In such a case, `conn.RemoteAddr`
would return `nil` and we would crash here.
This occurred to me while reading once again the diff before merging.
Co-authored-by: decfox <decfox>
Co-authored-by: Simone Basso <bassosimone@gmail.com>
This diff changes the software name used by unattended runs for which
we did not override the default software name (`ooniprobe-cli`).
It will become `ooniprobe-cli-unattended`. This software name is in line
with the one we use for Android, iOS, and desktop unattended runs.
While working in this diff, I introduced string constants for the run
types and a string constant for the default software name.
See https://github.com/ooni/probe/issues/2081.
Here's the squash of the following patches that enable support
for go1.18 and update our dependencies.
This diff WILL need to be backported to the release/3.14 branch.
* chore: use go1.17.8
See https://github.com/ooni/probe/issues/2067
* chore: upgrade to probe-assets@v0.8.0
See https://github.com/ooni/probe/issues/2067.
* chore: update dependencies and enable go1.18
As mentioned in 7a0d17ea91,
the tree won't build with `go1.18` unless we say it does.
So, not only here we need to update dependencies but also we
need to explicitly say `go1.18` in the `go.mod`.
This work is part of https://github.com/ooni/probe/issues/2067.
* chore(coverage.yml): run with go1.18
This change will give us a bare minimum confidence that we're
going to build our tree using version 1.18 of golang.
See https://github.com/ooni/probe/issues/2067.
* chore: update user agent used for measuring
See https://github.com/ooni/probe/issues/2067
* chore: run `go generate ./...`
See https://github.com/ooni/probe/issues/2067
* fix(dialer_test.go): make test work with go1.17 and go1.18
1. the original test wanted the dial to fail, so ensure we're not
passing any domain name to exercise dialing not resolving;
2. match the end of the error rather than the whole error string.
Tested locally with both go1.17 and go1.18.
See https://github.com/ooni/probe-cli/pull/708#issuecomment-1096447186
This change should prevent old clients (e.g., Android 6) from
failing to perform a ndt7 experiment because their internal CA
bundle is now too old.
Reference issue: https://github.com/ooni/probe/issues/2031
While there, run `go mod tidy` to fix a minor inconsistence in
the current `go.mod` file.
This diff WILL require a backport to release/3.14.