Is it possible to group by multiple dimensions in crossfilter?

后端 未结 3 902
囚心锁ツ
囚心锁ツ 2021-02-02 03:36

For Example If we have data for books, authors and date information. Can we build a crossfilter for how many books are present for author per month?

相关标签:
3条回答
  • 2021-02-02 03:46

    I want to update an old answer with a new work around described in: https://github.com/dc-js/dc.js/pull/91

    This performance hasn't been tested on large data-sets

      var cf = crossfilter([
      { date:"1 jan 2014", author: "Mr X", book: "Book 1" },
      { date:"2 jan 2014", author: "Mr X", book: "Book 2" },
      { date:"3 feb 2014", author: "Mr X", book: "Book 3" },
      { date:"1 mar 2014", author: "Mr X", book: "Book 4" },
      { date:"2 apr 2014", author: "Mr X", book: "Book 5" },
      { date:"3 apr 2014", author: "Mr X", book: "Book 6"},
      { date:"1 jan 2014", author: "Ms Y", book: "Book 7" },
      { date:"2 jan 2014", author: "Ms Y", book: "Book 8" },
      { date:"3 jan 2014", author: "Ms Y", book: "Book 9" },
      { date:"1 mar 2014", author: "Ms Y", book: "Book 10" },
      { date:"2 mar 2014", author: "Ms Y", book: "Book 11" },
      { date:"3 mar 2014", author: "Ms Y", book: "Book 12" },
      { date:"4 apr 2014", author: "Ms Y", book: "Book 13" }
      ]);
    
      var dimensionMonthAuthor = cf.dimension(function (d) {
        var thisDate = new Date(d.date);
        //stringify() and later, parse() to get keyed objects
        return JSON.stringify ( { date: thisDate.getMonth() , author: d.author } ) ;
      });
    
      group = dimensionMonthAuthor.group();
      //this forEach method could be very expensive on write.
      group.all().forEach(function(d) {
        //parse the json string created above
        d.key = JSON.parse(d.key);
      });
    
      return group.all()
    

    Results in:

    [ { key: { date: 0, author: 'Mr X' },
        value: 2 },
      { key: { date: 0, author: 'Ms Y' },
        value: 3 },
      { key: { date: 1, author: 'Mr X' },
        value: 1 },
      { key: { date: 2, author: 'Mr X' },
        value: 1 },
      { key: { date: 2, author: 'Ms Y' },
        value: 3 },
      { key: { date: 3, author: 'Mr X' },
        value: 2 },
      { key: { date: 3, author: 'Ms Y' },
        value: 1 } ]
    
    0 讨论(0)
  • 2021-02-02 03:48

    I didn't find the accepted answer all that helpful.

    I used the following instead.

    I first made a keyed group (in your case month)

       var authors = cf.dimension(function (d) {
         return +d['month'];
       })
    

    Next, I used a map reduce method on the keyed dataset to compute the averages

    The grouping helper function:

    var monthsAvg = authors.group().reduce(reduceAddbooks, reduceRemovebooks, reduceInitialbooks).all();
    

    The map-reduce functions:

    function reduceAddbooks(p, v) {
        p.author = v['author'];
        p.books = +v['books'];
        return p;
    }
    
    function reduceRemovebooks(p, v) {
        p.author = v['author'];
        p.books = +v['books'];
        return p;
    }
    
    function reduceInitialbooks() {
        return {
            author:0,
            books:0
        };
    }
    
    0 讨论(0)
  • 2021-02-02 03:51

    In pseudo sql terms, what you are trying to do is:

    SELECT COUNT(book)
    GROUP BY author, month
    

    The way I approach this type of problem is to 'group' the fields together into a single dimension. So in your case I would concatenate the month and author information together, into a dimension.

    Let this be our test data:

    var cf = crossfilter([
    { date:"1 jan 2014", author: "Mr X", book: "Book 1" },
    { date:"2 jan 2014", author: "Mr X", book: "Book 2" },
    { date:"3 feb 2014", author: "Mr X", book: "Book 3" },
    { date:"1 mar 2014", author: "Mr X", book: "Book 4" },
    { date:"2 apr 2014", author: "Mr X", book: "Book 5" },
    { date:"3 apr 2014", author: "Mr X", book: "Book 6"},
    { date:"1 jan 2014", author: "Ms Y", book: "Book 7" },
    { date:"2 jan 2014", author: "Ms Y", book: "Book 8" },
    { date:"3 jan 2014", author: "Ms Y", book: "Book 9" },
    { date:"1 mar 2014", author: "Ms Y", book: "Book 10" },
    { date:"2 mar 2014", author: "Ms Y", book: "Book 11" },
    { date:"3 mar 2014", author: "Ms Y", book: "Book 12" },
    { date:"4 apr 2014", author: "Ms Y", book: "Book 13" }
    ]);  
    

    The dimension is defined as follows:

    var dimensionMonthAuthor = cf.dimension(function (d) {
      var thisDate = new Date(d.date);
      return 'month='+thisDate.getMonth()+';author='+d.author;
    });
    

    And now we can just simply do a reduce count to calculate how many books there are per author, per month (i.e. per dimension unit):

    var monthAuthorCount = dimensionMonthAuthor.group().reduceCount(function (d) { return d.book; }).all();
    

    And the results are as follows:

    {"key":"month=0;author=Mr X","value":2}
    {"key":"month=0;author=Ms Y","value":3}
    {"key":"month=1;author=Mr X","value":1}
    {"key":"month=2;author=Mr X","value":1}
    {"key":"month=2;author=Ms Y","value":3}
    {"key":"month=3;author=Mr X","value":2}
    {"key":"month=3;author=Ms Y","value":1}
    
    0 讨论(0)
提交回复
热议问题