How to purge old content in firebase realtime database

十年热恋 提交于 2019-12-23 13:11:38

问题


I am using Firebase realtime database and overtime there is a lot of stale data in it and I have written a script to delete the stale content.

My Node structure looks something like this:

store
  - {store_name}
    - products
      - {product_name}
        - data
          - {date} e.g. 01_Sep_2017
            - some_event

Scale of the data

#Stores: ~110K
#Products: ~25

Context

I want to cleanup all the data which is like 30 months old. I tried the following approach :-

For each store, traverse all the products and for each date, delete the node

I ran ~30 threads/script instances and each thread is responsible for deleting a particular date of data in that month. The whole script is running for ~12 hours to delete a month data with above structure.

I have placed a limit/cap on the number of pending calls in each script and it is evident from logging that each script reaches the limit very quickly and speed of firing the delete call is much faster than speed of deletion So here firebase becomes a bottleneck.

Pretty evident that I am running purge script at client side and to gain performance script should be executed close to the data to save network round trip time.

Questions

Q1. How to delete firebase old nodes efficiently ?

Q2. Is there a way we can set a TTL on each node so that it cleans up automatically ?

Q3. I have confirmed from multiple nodes that data has been deleted from the nodes but firebase console is not showing decrease in data. I also tried to take backup of data and it still is showing some data which is not there when I checked the nodes manually. I want to know the reason behind this inconsistency.

Does firebase make soft deletions So when we take backups, data is actually there but is not visible via firebase sdk or firebase console because they can process soft deletes but backups don't ?

Q4. For the whole duration my script is running, I have a continuous rise in bandwidth section. With below script I am only firing delete calls and I am not reading any data still I see a consistency with database read. Have a look at this screenshot ?

Is this because of callbacks of deleted nodes ?

Code

var stores = [];
var storeIndex = 0;
var products = [];
var productIndex = -1;

const month = 'Oct';
const year = 2017;

if (process.argv.length < 3) {
  console.log("Usage: node purge.js $beginDate $endDate i.e. node purge 1 2 | Exiting..");
  process.exit();
}

var beginDate = process.argv[2];
var endDate = process.argv[3];

var numPendingCalls = 0;

const maxPendingCalls = 500;

/**
 * Url Pattern: /store/{domain}/products/{product_name}/data/{date}
 * date Pattern: 01_Jan_2017
 */
function deleteNode() {
  var storeName = stores[storeIndex],
    productName = products[productIndex],
    date = (beginDate < 10 ? '0' + beginDate : beginDate) + '_' + month + '_' + year;

  numPendingCalls++;

  db.ref('store')
    .child(storeName)
    .child('products')
    .child(productName)
    .child('data')
    .child(date)
    .remove(function() {
      numPendingCalls--;
    });
}

function deleteData() {
  productIndex++;

  // When all products for a particular store are complete, start for the new store for given date
  if (productIndex === products.length) {
    if (storeIndex % 1000 === 0) {
      console.log('Script: ' + beginDate, 'PendingCalls: ' + numPendingCalls, 'StoreIndex: ' + storeIndex, 'Store: ' + stores[storeIndex], 'Time: ' + (new Date()).toString());
    }

    productIndex = 0;
    storeIndex++;
  }

  // When all stores have been completed, start deleting for next date
  if (storeIndex === stores.length) {
    console.log('Script: ' + beginDate, 'Successfully deleted data for date: ' + beginDate + '_' + month + '_' + year + '. Time: ' + (new Date()).toString());
    beginDate++;
    storeIndex = 0;
  }

  // When you have reached endDate, all data has been deleted call the original callback
  if (beginDate > endDate) {
    console.log('Script: ' + beginDate, 'Deletion script finished successfully at: ' + (new Date()).toString());
    process.exit();
    return;
  }

  deleteNode();
}

function init() {
  console.log('Script: ' + beginDate, 'Deletion script started at: ' + (new Date()).toString());

  getStoreNames(function() {
    getProductNames(function() {
      setInterval(function() {
        if (numPendingCalls < maxPendingCalls) {
          deleteData();
        }
      }, 0);
    });
  });
}

PS: This is not the exact structure I have but it is very similar to what we have (I have changed the node names and tried to make the example a realistic example)


回答1:


  1. Whether the deletes can be done more efficiently depends on how you now do them. Since you didn't share the minimal code that reproduces your current behavior it's hard to say how to improve it.

  2. There is no support for a time-to-live property on documents. Typically developers do the clean-up in a administrative program/script that runs periodically. The more frequently you run the cleanup script, the less work it has to do, and thus the faster it will be.

    Also see:

    • Delete firebase data older than 2 hours
    • How to delete firebase data after "n" days
  3. Firebase actually deletes the data from disk when you tell it to. There is no way through the API to retrieve it, since it is really gone. But if you have a backup from a previous day, the data will of course still be there.



来源:https://stackoverflow.com/questions/47437885/how-to-purge-old-content-in-firebase-realtime-database

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!