Optimization: Dumping JSON from a Streaming API to Mongo

前端 未结 2 768
猫巷女王i
猫巷女王i 2021-02-15 17:33

Background: I have a python module set up to grab JSON objects from a streaming API and store them (bulk insert of 25 at a time) in MongoDB using p

2条回答
  •  北恋
    北恋 (楼主)
    2021-02-15 17:47

    Got rid of the StringIO library. As the WRITEFUNCTION callback handle_data, in this case, gets invoked for every line, just load the JSON directly. Sometimes, however, there could be two JSON objects contained in data. I am sorry, I can't post the curl command that I use as it contains our credentials. But, as I said, this is a general issue applicable to any streaming API.

    
    def handle_data(self, buf): 
        try:
            self.tweet = json.loads(buf)
        except Exception as json_ex:
            self.data_list = buf.split('\r\n')
            for data in self.data_list:
                self.tweet_list.append(json.loads(data))    
    

提交回复
热议问题