BigQuery UDF memory exceeded error on multiple rows but works fine on single row

前端 未结 3 919
小蘑菇
小蘑菇 2020-12-20 03:13

I\'m writing a UDF to process Google Analytics data, and getting the \"UDF out of memory\" error message when I try to process multiple rows. I downloaded the raw data and f

相关标签:
3条回答
  • 2020-12-20 03:42

    A UDF will fail on anything but very small datasets if it has a lot of if/then levels, such as:
    if () {
    .... if() {
    .........if () {
    etc

    We had to track down and remove the deepest if/then statement.

    But, that is not enough. In addition, when you pass the data into the UDF run a "GROUP EACH BY" on all the variables. This will force BQ to send the output to multiple "workers". Otherwise it will also fail.

    I've wasted 3 days of my life on this annoying bug. Argh.

    0 讨论(0)
  • 2020-12-20 03:49

    I love the concept of parsing my logs in BigQuery, but I've got the same problem, I get

    Error: Resources exceeded during query execution.

    The Job Id is bigquery-looker:bquijob_260be029_153dd96cfdb, if that at all helps.

    I wrote a very basic parser does a simple match and returns rows. Works just fine on a 10K row data set, but I get out of resources when trying to run against a 3M row logfile.

    Any suggestions for a work around?

    Here is the javascript code.

    function parseLogRow(row, emit) {
    
      r =  (row.logrow ? row.logrow : "") + (typeof row.l2 !== "undefined" ? " " + row.l2 : "") + (row.l3 ? " " + row.l3 : "")
      ts = null
      category = null
      user = null
      message = null
      db = null
      found = false
      if (r) {
          m = r.match(/^(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d\.\d\d\d (\+|\-)\d\d\d\d) \[([^|]*)\|([^|]*)\|([^\]]*)\] :: (.*)/ )
          if( m){
            ts = new Date(m[1])/1000
            category = m[3] || null
            user = m[4] || null
            db = m[5] || null
            message = m[6] || null
            found = true
          }
          else {
            message = r
            found = false
          }
       }
    
      emit({
        ts:  ts,
        category: category,
        user: user,
        db: db,
        message: message,
        found: found
        });
    }
    
    bigquery.defineFunction(
      'parseLogRow',                           // Name of the function exported to SQL
      ['logrow',"l2","l3"],                    // Names of input columns
      [
        {'name': 'ts', 'type': 'timestamp'},  // Output schema
        {'name': 'category', 'type': 'string'},
        {'name': 'user', 'type': 'string'},
        {'name': 'db', 'type': 'string'},
        {'name': 'message', 'type': 'string'},
        {'name': 'found', 'type': 'boolean'},
      ],
      parseLogRow                          // Reference to JavaScript UDF
    );
    
    0 讨论(0)
  • 2020-12-20 03:52

    Update Aug 2016 : We have pushed out an update that will allow the JavaScript worker to use twice as much RAM. We will continue to monitor jobs that have failed with JS OOM to see if more increases are necessary; in the meantime, please let us know if you have further jobs failing with OOM. Thanks!

    Update : this issue was related to limits we had on the size of the UDF code. It looks like V8's optimize+recompile pass of the UDF code generates a data segment that was bigger than our limits, but this was only happening when when the UDF runs over a "sufficient" number of rows. I'm meeting with the V8 team this week to dig into the details further.

    @Grayson - I was able to run your job over the entire 20160201 table successfully; the query takes 1-2 minutes to execute. Could you please verify that this works on your side?


    We've gotten a few reports of similar issues that seem related to # rows processed. I'm sorry for the trouble; I'll be doing some profiling on our JavaScript runtime to try to find if and where memory is being leaked. Stay tuned for the analysis.

    In the meantime, if you're able to isolate any specific rows that cause the error, that would also be very helpful.

    0 讨论(0)
提交回复
热议问题