ooni-probe-cli/internal/experiment/webconnectivity/measurer.go

157 lines
4.1 KiB
Go
Raw Permalink Normal View History

package webconnectivity
//
// Measurer
//
import (
"context"
"errors"
"net/http/cookiejar"
"sync"
"github.com/ooni/probe-cli/v3/internal/atomicx"
"github.com/ooni/probe-cli/v3/internal/engine/experiment/webconnectivity"
"github.com/ooni/probe-cli/v3/internal/model"
"golang.org/x/net/publicsuffix"
)
// Measurer for the web_connectivity experiment.
type Measurer struct {
// Contains the experiment's config.
Config *Config
}
// NewExperimentMeasurer creates a new model.ExperimentMeasurer.
func NewExperimentMeasurer(config *Config) model.ExperimentMeasurer {
return &Measurer{
Config: config,
}
}
// ExperimentName implements model.ExperimentMeasurer.
func (m *Measurer) ExperimentName() string {
return "web_connectivity"
}
// ExperimentVersion implements model.ExperimentMeasurer.
func (m *Measurer) ExperimentVersion() string {
feat(webconnectivity): try all the available THs (#980) We introduce a fork of internal/httpx, named internal/httpapi, where there is a clear split between the concept of an API endpoint (such as https://0.th.ooni.org/) and of an API descriptor (such as using `GET` to access /api/v1/test-list/url). Additionally, httpapi allows to create a SequenceCaller that tries to call a given API descriptor using multiple API endpoints. The SequenceCaller will stop once an endpoint works or when all the available endpoints have been tried unsuccessfully. The definition of "success" is the following: we consider "failure" any error that occurs during the HTTP round trip or when reading the response body. We DO NOT consider "failure" errors (1) when parsing the input URL; (2) when the server returns >= 400; (3) when the server returns a string that does not parse as valid JSON. The idea of this classification of failures is that we ONLY want to retry when we see what looks like a network error that may be caused by (collateral or targeted) censorship. We take advantage of the availability of this new package and we refactor web_connectivity@v0.4 and web_connectivity@v0.5 to use a SequenceCaller for calling the web connectivity TH API. This means that we will now try all the available THs advertised by the backend rather than just selecting and using the first one provided by the backend. Because this diff is designed to be backported to the `release/3.16` branch, we have omitted additional changes to always use httpapi where we are currently using httpx. Yet, to remind ourselves about the need to do that, we have deprecated the httpx package. We will rewrite all the code currently using httpx to use httpapi as part of future work. It is also worth noting that httpapi will allow us to refactor the backend code such that (1) we remove code to select a backend URL endpoint at the beginning and (2) we try several endpoints. The design of the code is such that we can add to the mix some endpoints using as `http.Client` a special client using a tunnel. This will allow us to automatically fallback backend queries. Closes https://github.com/ooni/probe/issues/2353. Related to https://github.com/ooni/probe/issues/1519.
2022-11-21 16:28:53 +01:00
return "0.5.19"
}
// Run implements model.ExperimentMeasurer.
func (m *Measurer) Run(ctx context.Context, args *model.ExperimentArgs) error {
// Reminder: when this function returns an error, the measurement result
// WILL NOT be submitted to the OONI backend. You SHOULD only return an error
// for fundamental errors (e.g., the input is invalid or missing).
_ = args.Callbacks
measurement := args.Measurement
sess := args.Session
// make sure we have a cancellable context such that we can stop any
// goroutine running in the background (e.g., priority.go's ones)
ctx, cancel := context.WithCancel(ctx)
defer cancel()
// honour InputOrQueryBackend
input := measurement.Input
if input == "" {
return errors.New("no input provided")
}
// convert the input string to a URL
inputParser := &InputParser{
AcceptedSchemes: []string{
"http",
"https",
},
AllowEndpoints: false,
DefaultScheme: "",
}
URL, err := inputParser.Parse(string(measurement.Input))
if err != nil {
return err
}
// initialize the experiment's test keys
tk := NewTestKeys()
measurement.TestKeys = tk
// create variables required to run parallel tasks
idGenerator := &atomicx.Int64{}
wg := &sync.WaitGroup{}
// create cookiejar
jar, err := cookiejar.New(&cookiejar.Options{
PublicSuffixList: publicsuffix.List,
})
if err != nil {
return err
}
// obtain the test helper's address
testhelpers, _ := sess.GetTestHelpersByName("web-connectivity")
feat(webconnectivity): try all the available THs (#980) We introduce a fork of internal/httpx, named internal/httpapi, where there is a clear split between the concept of an API endpoint (such as https://0.th.ooni.org/) and of an API descriptor (such as using `GET` to access /api/v1/test-list/url). Additionally, httpapi allows to create a SequenceCaller that tries to call a given API descriptor using multiple API endpoints. The SequenceCaller will stop once an endpoint works or when all the available endpoints have been tried unsuccessfully. The definition of "success" is the following: we consider "failure" any error that occurs during the HTTP round trip or when reading the response body. We DO NOT consider "failure" errors (1) when parsing the input URL; (2) when the server returns >= 400; (3) when the server returns a string that does not parse as valid JSON. The idea of this classification of failures is that we ONLY want to retry when we see what looks like a network error that may be caused by (collateral or targeted) censorship. We take advantage of the availability of this new package and we refactor web_connectivity@v0.4 and web_connectivity@v0.5 to use a SequenceCaller for calling the web connectivity TH API. This means that we will now try all the available THs advertised by the backend rather than just selecting and using the first one provided by the backend. Because this diff is designed to be backported to the `release/3.16` branch, we have omitted additional changes to always use httpapi where we are currently using httpx. Yet, to remind ourselves about the need to do that, we have deprecated the httpx package. We will rewrite all the code currently using httpx to use httpapi as part of future work. It is also worth noting that httpapi will allow us to refactor the backend code such that (1) we remove code to select a backend URL endpoint at the beginning and (2) we try several endpoints. The design of the code is such that we can add to the mix some endpoints using as `http.Client` a special client using a tunnel. This will allow us to automatically fallback backend queries. Closes https://github.com/ooni/probe/issues/2353. Related to https://github.com/ooni/probe/issues/1519.
2022-11-21 16:28:53 +01:00
if len(testhelpers) < 1 {
sess.Logger().Warnf("continuing without a valid TH address")
tk.SetControlFailure(webconnectivity.ErrNoAvailableTestHelpers)
}
fix(datafmt): sync measurexlite and v0.5 with previous code (#942) * fix(model/archival.go): more optional keys Basically, `t0` and `transaction_id` should be optional. Version 0.4.x of web_connectivity should not include them, version 0.5.x should. There is a technical reason why v0.4.x should not include them. The code it is based on, tracex, does not record these two fields. Whereas, v0.5.x, uses measurexlite, which records these two fields. Part of https://github.com/ooni/probe/issues/2238 * fix(webconnectivity@v0.5): add more fields This diff adds the following fields to webconnectivity@v0.5: 1. agent, always set to "redirect" (legacy field); 2. client_resolver, properly initialized w/ the resolver's IPv4 address; 3. retries, legacy field always set to null; 4. socksproxy, legacy field always set to null. Part of https://github.com/ooni/probe/issues/2238 * fix(webconnectivity@v0.5): register extensions The general idea behind this field is that we would be able in the future to tweak the data model for some fields, by declaring we're using a later version, so it seems useful to add it. See https://github.com/ooni/probe/issues/2238 * fix(measurexlite): use tcp or quic for tls handshake network This diff fixes a bug where measurexlite was using "tls" as the protocol for the TLS handshake when using TCP. While this choice _could_ make sense, the rest of the code we have written so far uses "tcp" instead. Using "tcp" makes more sense because it allows you to search for the same endpoint across different events by checking for the same network and for the same endpoint rather than special casing TLS handshakes for using "tls" when the endpoint is "tcp". See https://github.com/ooni/probe/issues/2238 * chore: run alltests.yml for "alltestsbuild" branches Part of https://github.com/ooni/probe/issues/2238
2022-09-08 10:02:47 +02:00
registerExtensions(measurement)
// start background tasks
resos := &DNSResolvers{
DNSCache: NewDNSCache(),
Domain: URL.Hostname(),
IDGenerator: idGenerator,
Logger: sess.Logger(),
NumRedirects: NewNumRedirects(5),
TestKeys: tk,
URL: URL,
ZeroTime: measurement.MeasurementStartTimeSaved,
WaitGroup: wg,
CookieJar: jar,
Referer: "",
Session: sess,
feat(webconnectivity): try all the available THs (#980) We introduce a fork of internal/httpx, named internal/httpapi, where there is a clear split between the concept of an API endpoint (such as https://0.th.ooni.org/) and of an API descriptor (such as using `GET` to access /api/v1/test-list/url). Additionally, httpapi allows to create a SequenceCaller that tries to call a given API descriptor using multiple API endpoints. The SequenceCaller will stop once an endpoint works or when all the available endpoints have been tried unsuccessfully. The definition of "success" is the following: we consider "failure" any error that occurs during the HTTP round trip or when reading the response body. We DO NOT consider "failure" errors (1) when parsing the input URL; (2) when the server returns >= 400; (3) when the server returns a string that does not parse as valid JSON. The idea of this classification of failures is that we ONLY want to retry when we see what looks like a network error that may be caused by (collateral or targeted) censorship. We take advantage of the availability of this new package and we refactor web_connectivity@v0.4 and web_connectivity@v0.5 to use a SequenceCaller for calling the web connectivity TH API. This means that we will now try all the available THs advertised by the backend rather than just selecting and using the first one provided by the backend. Because this diff is designed to be backported to the `release/3.16` branch, we have omitted additional changes to always use httpapi where we are currently using httpx. Yet, to remind ourselves about the need to do that, we have deprecated the httpx package. We will rewrite all the code currently using httpx to use httpapi as part of future work. It is also worth noting that httpapi will allow us to refactor the backend code such that (1) we remove code to select a backend URL endpoint at the beginning and (2) we try several endpoints. The design of the code is such that we can add to the mix some endpoints using as `http.Client` a special client using a tunnel. This will allow us to automatically fallback backend queries. Closes https://github.com/ooni/probe/issues/2353. Related to https://github.com/ooni/probe/issues/1519.
2022-11-21 16:28:53 +01:00
TestHelpers: testhelpers,
UDPAddress: "",
}
resos.Start(ctx)
// wait for background tasks to join
wg.Wait()
// If the context passed to us has been cancelled, we cannot
// trust this experiment's results to be okay.
if err := ctx.Err(); err != nil {
return err
}
// perform any deferred computation on the test keys
tk.Finalize(sess.Logger())
feat(webconnectivity): try all the available THs (#980) We introduce a fork of internal/httpx, named internal/httpapi, where there is a clear split between the concept of an API endpoint (such as https://0.th.ooni.org/) and of an API descriptor (such as using `GET` to access /api/v1/test-list/url). Additionally, httpapi allows to create a SequenceCaller that tries to call a given API descriptor using multiple API endpoints. The SequenceCaller will stop once an endpoint works or when all the available endpoints have been tried unsuccessfully. The definition of "success" is the following: we consider "failure" any error that occurs during the HTTP round trip or when reading the response body. We DO NOT consider "failure" errors (1) when parsing the input URL; (2) when the server returns >= 400; (3) when the server returns a string that does not parse as valid JSON. The idea of this classification of failures is that we ONLY want to retry when we see what looks like a network error that may be caused by (collateral or targeted) censorship. We take advantage of the availability of this new package and we refactor web_connectivity@v0.4 and web_connectivity@v0.5 to use a SequenceCaller for calling the web connectivity TH API. This means that we will now try all the available THs advertised by the backend rather than just selecting and using the first one provided by the backend. Because this diff is designed to be backported to the `release/3.16` branch, we have omitted additional changes to always use httpapi where we are currently using httpx. Yet, to remind ourselves about the need to do that, we have deprecated the httpx package. We will rewrite all the code currently using httpx to use httpapi as part of future work. It is also worth noting that httpapi will allow us to refactor the backend code such that (1) we remove code to select a backend URL endpoint at the beginning and (2) we try several endpoints. The design of the code is such that we can add to the mix some endpoints using as `http.Client` a special client using a tunnel. This will allow us to automatically fallback backend queries. Closes https://github.com/ooni/probe/issues/2353. Related to https://github.com/ooni/probe/issues/1519.
2022-11-21 16:28:53 +01:00
// set the test helper we used
// TODO(bassosimone): it may be more informative to know about all the
// test helpers we _tried_ to use, however the data format does not have
// support for that as far as I can tell...
if th := tk.getTestHelper(); th != nil {
measurement.TestHelpers = map[string]interface{}{
"backend": th,
}
}
// return whether there was a fundamental failure, which would prevent
// the measurement from being submitted to the OONI collector.
return tk.fundamentalFailure
}
fix(datafmt): sync measurexlite and v0.5 with previous code (#942) * fix(model/archival.go): more optional keys Basically, `t0` and `transaction_id` should be optional. Version 0.4.x of web_connectivity should not include them, version 0.5.x should. There is a technical reason why v0.4.x should not include them. The code it is based on, tracex, does not record these two fields. Whereas, v0.5.x, uses measurexlite, which records these two fields. Part of https://github.com/ooni/probe/issues/2238 * fix(webconnectivity@v0.5): add more fields This diff adds the following fields to webconnectivity@v0.5: 1. agent, always set to "redirect" (legacy field); 2. client_resolver, properly initialized w/ the resolver's IPv4 address; 3. retries, legacy field always set to null; 4. socksproxy, legacy field always set to null. Part of https://github.com/ooni/probe/issues/2238 * fix(webconnectivity@v0.5): register extensions The general idea behind this field is that we would be able in the future to tweak the data model for some fields, by declaring we're using a later version, so it seems useful to add it. See https://github.com/ooni/probe/issues/2238 * fix(measurexlite): use tcp or quic for tls handshake network This diff fixes a bug where measurexlite was using "tls" as the protocol for the TLS handshake when using TCP. While this choice _could_ make sense, the rest of the code we have written so far uses "tcp" instead. Using "tcp" makes more sense because it allows you to search for the same endpoint across different events by checking for the same network and for the same endpoint rather than special casing TLS handshakes for using "tls" when the endpoint is "tcp". See https://github.com/ooni/probe/issues/2238 * chore: run alltests.yml for "alltestsbuild" branches Part of https://github.com/ooni/probe/issues/2238
2022-09-08 10:02:47 +02:00
// registerExtensions registers the extensions used by this
// experiment into the given measurement.
func registerExtensions(m *model.Measurement) {
model.ArchivalExtHTTP.AddTo(m)
model.ArchivalExtDNS.AddTo(m)
model.ArchivalExtNetevents.AddTo(m)
model.ArchivalExtTCPConnect.AddTo(m)
model.ArchivalExtTLSHandshake.AddTo(m)
model.ArchivalExtTunnel.AddTo(m)
}