Correct JSON structure to filter through data [closed]

问题

What's the "best" JSON structure when you need to "filter" through data in Firebase (in Swift)?

I'm having users sort their questions into:

Business
Entertainment
Other

Is it better to have a separate child for each question genre? If so, how do I get all of the data (when i want it), and then filter it only by "business" when I want to?

回答1:

In NoSQL databases you usually end up modeling your data structure for the use-cases you want to allow in your app.

It's a bit of a learning path, so I'll explain it below in four steps:

Tree by category: Storing the data in a tree by its category, as you seem to be most interested in already.
Flat list of questions, and querying: Storing the data in a flat list, and then using queries to filter.
Flat list and indexes: Combining the above two approaches, to make the result more scalable.
Duplicating data: By duplicating data on top of that, you can reduce code complexity and improve performance further.

Tree by category

If you only want to get the questions by their category, you're best of simply storing each question under its category. In a simple model that'd look like this:

questionsByCategory: {
  Business: {
    question1: { ... },
    question4: { ... }
  },
  Entertainment: {
    question2: { ... },
    question5: { ... }
  },
  Other: {
    question3: { ... },
    question6: { ... }
  }
}

With the above structure, loading a list of question for a category is a simple, direct-access read for that category: firebase.database().ref("questionsByCategory").child("Business").once("value"....

But if you'd need a list of all questions, you'd need to read all categories, and denest the categories client-side. If you'd need a list of all question that is not a real problem, as you need to load them all anyway, but if you want to filter over some other condition than category, this may be wasteful.

Flat list of questions, and querying

An alternative is to create a flat list of all questions, and use queries to filter the data. In that case your JSON would look like this:

questions: {
  question1: { category: "Business", difficulty: 1, ... },
  question2: { category: "Entertainment", difficulty: 1, ... },
  question3: { category: "Other", difficulty: 2, ... },
  question4: { category: "Business", difficulty: 2, ... }
  question5: { category: "Entertainment", difficulty: 3, ... }
  question6: { category: "Other", difficulty: 1, ... }
}

Now, getting a list of all questions is easy, as you can just read them and loop over the results:

firebase.database().ref("questions").once("value").then(function(result) {
  result.forEach(function(snapshot) {
    console.log(snapshot.key+": "+snapshot.val().category);
  })
})

If we want to get all questions for a specific category, we use a query instead of just the ref("questions"). So:

Get all Business questions:

firebase.database().ref("questions").orderByChild("category").equalTo("Business").once("value")...

Get all questions with difficult 3:

firebase.database().ref("questions").orderByChild("difficult").equalTo(3).once("value")...

This approach works quite well, unless you have huge numbers of questions.

Flat list and indexes

If you have millions of questions, Firebase database queries may not perform well enough anymore for you. In that case you may need to combine the two approaches above, using a flat list to store the question, and so-called (self-made) secondary indexes to perform the filtered lookups.

If you think you'll ever reach this number of questions, I'd consider using Cloud Firestore, as that does not have the inherent scalability limits that the Realtime Database has. In fact, Cloud Firestore has the unique guarantee that retrieving a certain amount of data takes a fixed amount of time, no matter how much data there is in the database/collection.

In this scenario, your JSON would look like:

questions: {
  question1: { category: "Business", difficulty: 1, ... },
  question2: { category: "Entertainment", difficulty: 1, ... },
  question3: { category: "Other", difficulty: 2, ... },
  question4: { category: "Business", difficulty: 2, ... }
  question5: { category: "Entertainment", difficulty: 3, ... }
  question6: { category: "Other", difficulty: 1, ... }
},
questionsByCategory: {
  Business: {
    question1: true,
    question4: true
  },
  Entertainment: {
    question2: true,
    question5: true
  },
  Other: {
    question3: true,
    question6: true
  }
},
questionsByDifficulty: {
  "1": {
    question1: true,
    question2: true,
    question6: true
  },
  "2": {
    question3: true,
    question4: true
  },
  "3": {
    question3: true
  }
}

You see that we have a single flat list of the questions, and then separate lists with the different properties we want to filter on, and the question IDs of the question for each value. Those secondary lists are also often called (secondary) indexes, since they really serve as indexes on your data.

To load the hard questions in the above, we take a two-step approach:

Load the questions IDs with a direct lookup.
Load each question by their ID.

In code:

firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result) {
  result.forEach(function(snapshot) {
    firebase.database().ref("questions").child(snapshot.key).once("value").then(function(questionSnapshot) {
      console.log(questionSnapshot.key+": "+questionSnapshot.val().category);
    });
  })
})

If you need to wait for all questions before logging (or otherwise processing) them, you'd use Promise.all:

firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result) {
  var promises = [];
  result.forEach(function(snapshot) {
    promises.push(firebase.database().ref("questions").child(snapshot.key).once("value"));
  })
  Promise.all(promises).then(function(questionSnapshots) {
    questionSnapshots.forEach(function(questionSnapshot) {
      console.log(questionSnapshot.key+": "+questionSnapshot.val().category);
    })
  })
})

Many developers assume that this approach is slow, since it needs a separate call for each question. But it's actually quite fast, since Firebase pipelines the requests over its existing connection. For more on this, see Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly

Duplicating data

The code for the nested load/client-side join is a bit tricky to read at times. If you'd prefer only performing a single load, you could consider duplicating the data for each question into each secondary index too.

In this scenario, the secondary index would look like this:

questionsByCategory: {
  Business: {
    question1: { category: "Business", difficulty: 1, ... },
    question4: { category: "Business", difficulty: 2, ... }
  },

If you come from a background in relational data modeling, this may look quite unnatural, since we're now duplicating data between the main list and the secondary indexes.

To an experienced NoSQL data modeler however, this looks completely normal. We're trading off storing some extra data against the extra time/code it takes to load the data.

This trade-off is common in all areas of computer science, and in NoSQL data modeling you'll fairly often see folks choosing to sacrifice space (and thus store duplicate data) to get an easier and more scalable data model.

来源：https://stackoverflow.com/questions/58151590/correct-json-structure-to-filter-through-data

标签

swift

firebase

firebase-realtime-database