I have the function below where I am trying to scrape 4 websites, and then combine the results into a spreadsheet. Is there a faster way to match over a large array that isn
The issue you have is that you are forcing your script to loop through large datasets many many times for each row of compared data. A better approach is to build a lookup object, which maps between a desired unique identifier and the row of the data array you want to access:
/* Make an object from an Array[][] that has a unique identifier in one of the columns.
* @param Array[][] data The 2D array of data to index, e.g. [ [r1c1, r1c2, ...], [r2c1, r2c2, ...], ... ]
* @param Integer idColumn The column in the data array that is a unique row identifier
e.g. the column index that contains the product's serial number, in a data
array that has only a single row per unique product.
@return Object {} An object that maps between an id and a row index, such that
`object[id]` = the row index for the specific row in data that has id = id
*/
function makeKey(data, idColumn) {
if(!data || !data.length || !data[0].length)
throw new ValueError("Input data argument is not Array[][]");
// Assume the first column is the column with the unique identifier if not given by the caller.
if(idColumn === undefined)
idColumn = 0;
var key = {};
for(var r = 0, rows = data.length; r < rows; ++r) {
var id = data[r][idColumn];
if (key[id])
throw new ValueError("ID is not unique for id='" + id + "'");
key[id] = r;
}
return key;
}
Usage:
var database = someSheet.getDataRange().getValues();
var lookup = makeKey(database, 3); // here we say that the 4th column has the unique values.
var newData = /* read a 2D array from somewhere */;
for(var r = 0, rows < newData.length; r < rows; ++r) {
var id = newData[r][3];
var existingIndex = lookup[id];
if (existingIndex) {
var oldDataRow = database[existingIndex];
} else {
// No existing data.
}
}
By making a lookup object for your data arrays, you no longer have to re-search them and make comparisons, because you did the search once and stored the relationship, rather than discarding it every time. Note that the key that was made is based on a specific (and unique) property of the data. Without that relationship, this particular indexing approach won't work - but a different one will.