Should I denormalize or run multiple queries in DocumentDb?

旧街凉风 提交于 2019-12-04 02:23:39
Andrew Liu

I believe you're on the right track in considering the trade-offs between normalizing or de-normalizing your project and employee data. As you've mentioned:

Scenario 1) If you de-normalize your data model (couple projects and employee data together) - you may find yourself having to update many projects when you update an employee.

Scenario 2) If you normalize your data model (decouple projects and employee data) - you would have to query for projects to retrieve employeeIds and then query for the employees if you wanted to get the list of employees belonging to a project.

I would pick the appropriate trade-off given your application's use case. In general, I prefer de-normalizing when you have a read-heavy application and normalizing when you have a write-heavy application.

Note that you can avoid having to make multiple roundtrips between your application and the database by leveraging DocumentDB's store procedures (queries would be performed on DocumentDB-server-side).

Here's an example store procedure for retrieving employees belonging to a specific projectId:

function(projectId) {
  /* the context method can be accessed inside stored procedures and triggers*/
  var context = getContext();
  /* access all database operations - CRUD, query against documents in the current collection */
  var collection = context.getCollection();
  /* access HTTP response body and headers from the procedure */
  var response = context.getResponse();

  /* Callback for processing query on projectId */
  var projectHandler = function(documents) {
    var i;
    for (i = 0; i < documents[0].projectTeam.length; i++) {
      // Query for the Employees
      queryOnId(documents[0].projectTeam[i].id, employeeHandler);
    }
  };

  /* Callback for processing query on employeeId */
  var employeeHandler = function(documents) {
    response.setBody(response.getBody() + JSON.stringify(documents[0]));
  };

  /* Query on a single id and call back */
  var queryOnId = function(id, callbackHandler) {
    collection.queryDocuments(collection.getSelfLink(),
      'SELECT * FROM c WHERE c.id = \"' + id + '\"', {},
      function(err, documents) {
        if (err) {
          throw new Error('Error' + err.message);
        }
        if (documents.length < 1) {
          throw 'Unable to find id';
        }
        callbackHandler(documents);
      }
    );
  };

  // Query on the projectId
  queryOnId(projectId, projectHandler);
}

Even though DocumentDB supports limited OR statements during the preview - you can still get relatively good performance by splitting the employeeId-lookups into a bunch of asynchronous server-side queries.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!