nlp

Sparse Efficiency Warning while changing the column

微笑、不失礼 提交于 2021-01-27 02:57:12
问题 def tdm_modify(feature_names,tdm): non_useful_words=['kill','stampede','trigger','cause','death','hospital'\ ,'minister','said','told','say','injury','victim','report'] indexes=[feature_names.index(word) for word in non_useful_words] for index in indexes: tdm[:,index]=0 return tdm I want to manually set zero weights for some terms in tdm matrix. Using the above code I get the warning. I don't seem to understand why? Is there a better way to do this? C:\Anaconda\lib\site-packages\scipy\sparse

InvalidArgumentError: input must be a vector, got shape: []

一曲冷凌霜 提交于 2021-01-24 11:25:06
问题 I m trying to save the embeddings of text data using universal sentence encoder in pandas dataframe new column but getting the error. Here is what I am trying to do. module_url = "https://tfhub.dev/google/universal-sentence-encoder/4" #@param ["https://tfhub.dev/google/universal-sentence-encoder/4", "https://tfhub.dev/google/universal-sentence-encoder-large/5"] model = thub.load(module_url) print ("module %s loaded" % module_url) def embed(input): return model(input) then for t in list(df[

InvalidArgumentError: input must be a vector, got shape: []

久未见 提交于 2021-01-24 11:23:50
问题 I m trying to save the embeddings of text data using universal sentence encoder in pandas dataframe new column but getting the error. Here is what I am trying to do. module_url = "https://tfhub.dev/google/universal-sentence-encoder/4" #@param ["https://tfhub.dev/google/universal-sentence-encoder/4", "https://tfhub.dev/google/universal-sentence-encoder-large/5"] model = thub.load(module_url) print ("module %s loaded" % module_url) def embed(input): return model(input) then for t in list(df[

How to remove english text from arabic string in python?

醉酒当歌 提交于 2021-01-22 08:52:34
问题 I have an Arabic string with English text and punctuations. I need to filter Arabic text and I tried removing punctuations and English words using sting. However, I lost the spacing between Arabic words. Where am I wrong? import string exclude = set(string.punctuation) main_text = "وزارة الداخلية: لا تتوفر لدينا معلومات رسمية عن سعوديين موقوفين في ليبيا http://alriyadh.com/1031499" main_text = ''.join(ch for ch in main_text if ch not in exclude) [output after this step="وزارة الداخلية لا

How to force a pos tag in spacy before/after tagger?

给你一囗甜甜゛ 提交于 2021-01-21 10:45:06
问题 If I process the sentence 'Return target card to your hand' with spacy and the en_web_core_lg model, it recognize the tokens as below: Return NOUN target NOUN card NOUN to ADP your ADJ hand NOUN How can I force 'Return' to be tagged as a VERB? And how can I do it before the parser, so that the parser can better interpret relations between tokens? There are other situations in which this would be useful. I am dealing with text which contains specific symbols such as {G} . These three

How do I preprocess and tokenize a TensorFlow CsvDataset inside the map method?

允我心安 提交于 2021-01-21 10:39:09
问题 I made a TensorFlow CsvDataset , and I'm trying to tokenize the data as such: import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' from tensorflow import keras import tensorflow as tf from tensorflow.keras.preprocessing.text import Tokenizer import os os.chdir('/home/nicolas/Documents/Datasets') fname = 'rotten_tomatoes_reviews.csv' def preprocess(target, inputs): tok = Tokenizer(num_words=5_000, lower=True) tok.fit_on_texts(inputs) vectors = tok.texts_to_sequences(inputs) return vectors,

Sentiment analysis of non-English texts

痴心易碎 提交于 2021-01-21 09:25:41
问题 I want to analyze sentiment of texts that are written in German. I found a lot of tutorials on how to do this with English, but I found none on how to apply it to different languages. I have an idea to use the TextBlob Python library to first translate the sentences into English and then to do sentiment analysis, but I am not sure whether or not it is the best way to solve this task. Or are there any other possible ways to solve this task? 回答1: As Andy has pointed about above, the best

Sentiment analysis of non-English texts

▼魔方 西西 提交于 2021-01-21 09:21:06
问题 I want to analyze sentiment of texts that are written in German. I found a lot of tutorials on how to do this with English, but I found none on how to apply it to different languages. I have an idea to use the TextBlob Python library to first translate the sentences into English and then to do sentiment analysis, but I am not sure whether or not it is the best way to solve this task. Or are there any other possible ways to solve this task? 回答1: As Andy has pointed about above, the best

Replace entity with its label in SpaCy

二次信任 提交于 2021-01-21 05:14:24
问题 Is there anyway by SpaCy to replace entity detected by SpaCy NER with its label? For example: I am eating an apple while playing with my Apple Macbook. I have trained NER model with SpaCy to detect "FRUITS" entity and the model successfully detects the first "apple" as "FRUITS", but not the second "Apple". I want to do post-processing of my data by replacing each entity with its label, so I want to replace the first "apple" with "FRUITS". The sentence will be " I am eating an FRUITS while

Replace entity with its label in SpaCy

北城余情 提交于 2021-01-21 05:12:24
问题 Is there anyway by SpaCy to replace entity detected by SpaCy NER with its label? For example: I am eating an apple while playing with my Apple Macbook. I have trained NER model with SpaCy to detect "FRUITS" entity and the model successfully detects the first "apple" as "FRUITS", but not the second "Apple". I want to do post-processing of my data by replacing each entity with its label, so I want to replace the first "apple" with "FRUITS". The sentence will be " I am eating an FRUITS while