fix(measurex): use same keys of the OONI data format (#572)

This change should simplify the pipeline's job.

Reference issue: https://github.com/ooni/probe/issues/1817.

I previously dismissed this possibility, but now it seems clear it
is simpler to have a very tabular data format internally and to
convert such a format to OONI's data format when serializing.

The OONI data format is what the pipeline expects, but processing
is easier with a more linear/tabular format.
This commit is contained in:
Simone Basso
2021-11-05 10:46:45 +01:00
committed by GitHub
parent 6f90d29bfa
commit aa27bbe33f
35 changed files with 1571 additions and 1025 deletions
+12 -4
View File
@@ -167,8 +167,12 @@ format, in the remainder of this program we're
going to serialize the `Measurement` to JSON and
print it to the standard output.
Rather than serializing the raw `Measurement` struct,
we first convert it to the "archival" format. This is the
data format specified at [ooni/spec](https://github.com/ooni/spec/tree/master/data-formats).
```Go
data, err := json.Marshal(m)
data, err := json.Marshal(measurex.NewArchivalDNSMeasurement(m))
runtimex.PanicOnError(err, "json.Marshal failed")
fmt.Printf("%s\n", string(data))
```
@@ -194,13 +198,14 @@ go run -race ./internal/tutorial/measurex/chapter01 | jq
```
Where `jq` is being used to make the output more presentable.
If you do that you obtain some logging messages, which are out of
the scope of this tutorial, and the following JSON:
```JSON
{
"domain": "example.com",
"lookup_host": [
"queries": [
{
"answers": [
{
@@ -237,6 +242,9 @@ the scope of this tutorial, and the following JSON:
}
```
This JSON [implements the df-002-dnst](https://github.com/ooni/spec/blob/master/data-formats/df-002-dnst.md)
OONI data format.
You see that we have two messages here. OONI splits a DNS
resolution performed using the system resolver into two "fake"
DNS resolutions for A and AAAA. (Under the hood, this is
@@ -266,7 +274,7 @@ This is the output JSON:
```JSON
{
"domain": "antani.ooni.org",
"lookup_host": [
"queries": [
{
"answers": null,
"engine": "system",
@@ -327,7 +335,7 @@ To get this JSON:
```JSON
{
"domain": "example.com",
"lookup_host": [
"queries": [
{
"answers": null,
"engine": "system",
+12 -4
View File
@@ -168,8 +168,12 @@ func main() {
// going to serialize the `Measurement` to JSON and
// print it to the standard output.
//
// Rather than serializing the raw `Measurement` struct,
// we first convert it to the "archival" format. This is the
// data format specified at [ooni/spec](https://github.com/ooni/spec/tree/master/data-formats).
//
// ```Go
data, err := json.Marshal(m)
data, err := json.Marshal(measurex.NewArchivalDNSMeasurement(m))
runtimex.PanicOnError(err, "json.Marshal failed")
fmt.Printf("%s\n", string(data))
// ```
@@ -195,13 +199,14 @@ func main() {
// ```
//
// Where `jq` is being used to make the output more presentable.
//
// If you do that you obtain some logging messages, which are out of
// the scope of this tutorial, and the following JSON:
//
// ```JSON
// {
// "domain": "example.com",
// "lookup_host": [
// "queries": [
// {
// "answers": [
// {
@@ -238,6 +243,9 @@ func main() {
// }
// ```
//
// This JSON [implements the df-002-dnst](https://github.com/ooni/spec/blob/master/data-formats/df-002-dnst.md)
// OONI data format.
//
// You see that we have two messages here. OONI splits a DNS
// resolution performed using the system resolver into two "fake"
// DNS resolutions for A and AAAA. (Under the hood, this is
@@ -267,7 +275,7 @@ func main() {
// ```JSON
// {
// "domain": "antani.ooni.org",
// "lookup_host": [
// "queries": [
// {
// "answers": null,
// "engine": "system",
@@ -328,7 +336,7 @@ func main() {
// ```JSON
// {
// "domain": "example.com",
// "lookup_host": [
// "queries": [
// {
// "answers": null,
// "engine": "system",