google-cloud-nl

Is it possible to run Google Cloud Platform NLP-API entity sentiment analysis in a batch processing mode for a large number of documents?

强颜欢笑 提交于 2021-02-10 06:45:28
问题 I am relatively new to Google Cloud Platform. I have a large dataset (18 Million articles). I need to do an entity-sentiment analysis using GCP's NLP-API. I am not sure if the way I have been conducting my analysis is the most optimal way in terms of the time it takes to get the entity-sentiment for all the articles. I wonder if there is a way to batch-process all these articles instead of iterating through each of them and making an API call. Here is a summary of the process I have been

Is it possible to run Google Cloud Platform NLP-API entity sentiment analysis in a batch processing mode for a large number of documents?

拈花ヽ惹草 提交于 2021-02-10 06:45:00
问题 I am relatively new to Google Cloud Platform. I have a large dataset (18 Million articles). I need to do an entity-sentiment analysis using GCP's NLP-API. I am not sure if the way I have been conducting my analysis is the most optimal way in terms of the time it takes to get the entity-sentiment for all the articles. I wonder if there is a way to batch-process all these articles instead of iterating through each of them and making an API call. Here is a summary of the process I have been

converting Google Cloud NLP API entity sentiment output to JSON

生来就可爱ヽ(ⅴ<●) 提交于 2021-01-28 03:51:06
问题 I have this output result from Google Cloud Natural Language API (took me quite a while to produce it so I don't want to go with the solution in How can I JSON serialize an object from google's natural language API? (No __dict__ attribute) ) Mentions: Name: "Trump" Begin Offset : 0 Content : Trump Magnitude : 0.0 Sentiment : 0.0 Type : 2 Salience: 0.6038374900817871 Sentiment: Mentions: Name: "hand" Begin Offset : 19 Content : hand Magnitude : 0.0 Sentiment : 0.0 Type : 2 Salience: 0

converting Google Cloud NLP API entity sentiment output to JSON

和自甴很熟 提交于 2021-01-28 03:03:30
问题 I have this output result from Google Cloud Natural Language API (took me quite a while to produce it so I don't want to go with the solution in How can I JSON serialize an object from google's natural language API? (No __dict__ attribute) ) Mentions: Name: "Trump" Begin Offset : 0 Content : Trump Magnitude : 0.0 Sentiment : 0.0 Type : 2 Salience: 0.6038374900817871 Sentiment: Mentions: Name: "hand" Begin Offset : 19 Content : hand Magnitude : 0.0 Sentiment : 0.0 Type : 2 Salience: 0

Google Cloud Natural Language API - How is document magnitude calculated?

牧云@^-^@ 提交于 2019-12-11 18:22:48
问题 I am currently working with the Google Cloud Natural Language API and need to know how the magnitude value for a whole document (consisting of several sentences) is calculated? For the document sentiment score the average of the scores for each sentence is taken. For the document magnitude I would have assumed that it's calculated by taking the absolute sum of the individual magnitude values for each sentence. But after testing some paragraphs it's clear that it's not the correct way to

How to authenticate to Google Cloud API without Application Default Credentials or Cloud SDK?

时光怂恿深爱的人放手 提交于 2019-12-09 14:51:42
问题 I'm trying to access the Google Cloud API from an AWS Lambda function but I don't know how to authenticate. The auth guide in the Google Cloud documentation (https://cloud.google.com/docs/authentication) wants me to download a credentials JSON file and use Application Default Credentials, but as anyone who has used hosted functions already knows, the point is that you don't need to manage a server or runtime environment, so Lambda doesn't give me the ability to store arbitrary files in the

How can I JSON serialize an object from google's natural language API? (No __dict__ attribute)

这一生的挚爱 提交于 2019-12-09 12:41:24
问题 I'm using the Google Natural Language API for a project tagging text with sentiment analysis. I want to store my NL results as JSON. If a direct HTTP request is made to Google then a JSON response is returned. However when using the provided Python libraries an object is returned instead, and that object is not directly JSON serializable. Here is a sample of my code: import os import sys import oauth2client.client from google.cloud.gapic.language.v1beta2 import enums, language_service_client

Train or Custom Word Entity Types?

允我心安 提交于 2019-12-08 06:19:05
问题 I was looking through the documentation and testing Google's Natural Language API and noticed it gets a number of people, events, organizations, and locations incorrect - it appears to be using Wikipedia as a major data source so if it is not in Wikipedia it seems to have trouble identifying the type of various words. Also, if certain words appear in a name (proper noun) it seems to always identify an entity as a certain type which is not always correct. For instance: "Congress" seems to

Train or Custom Word Entity Types?

痴心易碎 提交于 2019-12-07 08:24:26
I was looking through the documentation and testing Google's Natural Language API and noticed it gets a number of people, events, organizations, and locations incorrect - it appears to be using Wikipedia as a major data source so if it is not in Wikipedia it seems to have trouble identifying the type of various words. Also, if certain words appear in a name (proper noun) it seems to always identify an entity as a certain type which is not always correct. For instance: "Congress" seems to always identify as an organization [government] even when it is part of an event name. The name "WordCamp"

Format of the input dataset for Google AutoML Natural Language multi-label text classification

大憨熊 提交于 2019-12-06 04:39:31
What should the format of the input dataset be for Google AutoML Natural Language multi-label text classification? I know that for multi-class classification I need a column of text and another column for labels. The labels column include one label per row. I have multiple labels for each text and I want to do multi-label classification. I tried having one column per label and one-hot encoding but I got this error message: Max 1000 labels supported. Found 9823 labels. It was very confusing at first but later I managed to find the format in the documentation, which is a CSV file like: text1,