nlp | 易学教程

How to perform entity linking to local knowledge graph?

阅读更多关于 How to perform entity linking to local knowledge graph?

问题 I'm building my own knowledge base from scratch, using articles online. I am trying to map the entities from my scraped SPO triples (the Subject and potentially the Object) to my own record of entities which consist of listed companies which I scraped from some other website. I've researched most of the libraries, and the method are focused on mapping entities to big knowledge bases like Wikipedia, YAGO, etc., but I'm not really sure how to apply those techniques to my own knowledge base.

Python NLP British English vs American English

阅读更多关于 Python NLP British English vs American English

问题 I'm currently working on NLP in python. However, in my corpus, there are both British and American English(realize/realise) I'm thinking to convert British to American. However, I did not find a good tool/package to do that. Any suggestions? 回答1: I've not been able to find a package either, but try this: (Note that I've had to trim the us2gb dictionary substantially for it to fit within the Stack Overflow character limit - you'll have to rebuild this yourself). # Based on Shengy's code: #

How do I count all occurrences of a phrase in a text file using regular expressions?

阅读更多关于 How do I count all occurrences of a phrase in a text file using regular expressions?

问题 I am reading in multiple files from a directory and attempting to find how many times a specific phrase (in this instance "at least") occurs in each file (not just that it occurs, but how many times in each text file it occurs) My code is as follows import glob import os path = 'D:/Test' k = 0 for filename in glob.glob(os.path.join(path, '*.txt')): if filename.endswith('.txt'): f = open(filename) data = f.read() data.split() data.lower() S = re.findall(r' at least ', data, re.MULTILINE) count

How do I count all occurrences of a phrase in a text file using regular expressions?

阅读更多关于 How do I count all occurrences of a phrase in a text file using regular expressions?

word2vec cosine similarity greater than 1 arabic text

阅读更多关于 word2vec cosine similarity greater than 1 arabic text

问题 I have trained my word2vec model from gensim and I am getting the nearest neighbors for some words in the corpus. Here are the similarity scores: top neighbors for الاحتلال: الاحتلال: 1.0000001192092896 الاختلال: 0.9541053175926208 الاهتلال: 0.872565507888794 الاحثلال: 0.8386293649673462 الاكتلال: 0.8209128379821777 It is odd to get a similarity greater than 1. I cannot apply any stemming to my text because the text includes many OCR spelling mistakes (I got the text from ORC-ed documents).

AttributeError: 'Tensor' object has no attribute '_keras_history' using CRF

阅读更多关于 AttributeError: 'Tensor' object has no attribute '_keras_history' using CRF

问题 I know there are a bunch of questions on this problem and I have read some of those but none of them worked for me. I am trying to build a model with the following architecture: The code is as follows: token_inputs = Input((32,), dtype=tf.int32, name='input_ids') mask_inputs = Input((32,), dtype=tf.int32, name='attention_mask') seg_inputs = Input((32,), dtype=tf.int32, name='token_type_ids') seq_out, _ = bert_model([token_inputs, mask_inputs, seg_inputs]) bd = Bidirectional(LSTM(units=50,

AttributeError: 'Tensor' object has no attribute '_keras_history' using CRF

阅读更多关于 AttributeError: 'Tensor' object has no attribute '_keras_history' using CRF

Create SavedModel for BERT

阅读更多关于 Create SavedModel for BERT

问题 I'm using this Colab for BERT model. In last cells in order to make predictions we have: def getPrediction(in_sentences): labels = ["Negative", "Positive"] input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, label = 0) for x in in_sentences] # here, "" is just a dummy label input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer) predict_input_fn = run_classifier.input_fn_builder(features=input_features,

Slice JSON File into Different Time Intercepts with Python

阅读更多关于 Slice JSON File into Different Time Intercepts with Python

问题 For a current research project, I am trying to slice a JSON file into different time intercepts. Based on the object "Date", I want to analyse content of the JSON file by quarter, i.e. 01 January - 31 March, 01 April - 20 June etc. The code would ideally have to pick the oldest date in the file and add quarterly time incercepts on top of that. I have done research on this point but not found any helpful methods yet. Is there any smart way to include this in the code? The JSON file has the

Dataframe aggregation of n-gram, their frequency and associate the entries of other columns with it using R

阅读更多关于 Dataframe aggregation of n-gram, their frequency and associate the entries of other columns with it using R

问题 I am trying to aggregate a dataframe based on 1-gram (can be extended to n-gram by changing n in the code below) frequency and associate other columns to it. The way I did it is shown below. Are there any other shortcuts/ alternatives to produce the table shown at the very end of this question for the dataframe given below? The code and the results are shown below. The below chunk sets the environment, loads the libraries and reads the dataframe: # Clear variables in the working environment