text-processing

Negation handling in NLP

我只是一个虾纸丫 提交于 2020-05-10 03:26:50
问题 I'm currently working on a project, where I want to extract emotion from text. As I'm using conceptnet5 (a semantic network), I can't however simply prefix words in a sentence that contains a negation-word, as those words would simply not show up in conceptnet5's API. Here's an example: The movie wasn't that good. Hence, I figured that I could use wordnet's lemma functionality to replace adjectives in sentences that contain negation-words like (not, ...). In the previous example, the

Negation handling in NLP

杀马特。学长 韩版系。学妹 提交于 2020-05-10 03:26:20
问题 I'm currently working on a project, where I want to extract emotion from text. As I'm using conceptnet5 (a semantic network), I can't however simply prefix words in a sentence that contains a negation-word, as those words would simply not show up in conceptnet5's API. Here's an example: The movie wasn't that good. Hence, I figured that I could use wordnet's lemma functionality to replace adjectives in sentences that contain negation-words like (not, ...). In the previous example, the

Increment a version number contained in a text file

陌路散爱 提交于 2020-03-21 07:03:25
问题 This self-answered question addresses the scenario originally described in Increment version number in file: A version number embedded in a text file is to be incremented. Sample text-file content: nuspec{ id = XXX; version: 0.0.30; title: XXX; For instance, I want embedded version number 0.0.30 updated to 0.0.31 . The line of interest can be assumed to match the following regex: ^\s+version: (.+);$ Note hat the intent is not to replace the version number with a fixed new version, but to

Measuring width of text (Python/PIL)

…衆ロ難τιáo~ 提交于 2020-03-17 04:38:49
问题 I'm using the following two methods to calculate a sample string's rendered width for a set font-type and size: font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 14) sample = "Lorem ipsum dolor sit amet, partem periculis an duo, eum lorem paulo an, mazim feugiat lobortis sea ut. In est error eirmod vituperata, prima iudicabit rationibus mel et. Paulo accumsan ad sit, et modus assueverit eum. Quod homero adversarium vel ne, mel noster dolorum te, qui ea senserit

Measuring width of text (Python/PIL)

我的梦境 提交于 2020-03-17 04:38:04
问题 I'm using the following two methods to calculate a sample string's rendered width for a set font-type and size: font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 14) sample = "Lorem ipsum dolor sit amet, partem periculis an duo, eum lorem paulo an, mazim feugiat lobortis sea ut. In est error eirmod vituperata, prima iudicabit rationibus mel et. Paulo accumsan ad sit, et modus assueverit eum. Quod homero adversarium vel ne, mel noster dolorum te, qui ea senserit

Using AWK to merge unique rows based on column one

僤鯓⒐⒋嵵緔 提交于 2020-03-17 03:19:23
问题 I am trying to write an AWK script to summarize data on a large text file. The order of the resulting data is important so i can't use sort. I have tried different variations of FNR==NR but haven't had any luck Input file Height 3.5 Weight 12.3 Age 23 : : Height 4.5 Weight 15.5 Age 31 : : Expected Output Height 3.5 4.5 Weight 12.3 15.5 Age 23 31 回答1: With awk: awk '{a[$1]=a[$1] FS $2} END{for(i in a) print i a[i]}' file Output: Weight 12.3 15.5 Height 3.5 4.5 : Age 23 31 Derived from: how to

How to print a series of words using awk?

微笑、不失礼 提交于 2020-02-21 13:49:42
问题 I know that awk can be used to print only certain words from the output.For example dpkg -l|awk '{print $2}' would print 2nd word from the output of dpkg -l . What I want to do is, print every word after a given word.My command looks like this awk '{printf "%-40s %s\n", $1, $n}' Rather than printing all the words with $n or $0 , I would like to print every word that comes after, say 5th character. How can I do this? EDIT : my complete command is bind -P|grep "can be found"|sort|awk '{printf "

How to print a series of words using awk?

99封情书 提交于 2020-02-21 13:44:11
问题 I know that awk can be used to print only certain words from the output.For example dpkg -l|awk '{print $2}' would print 2nd word from the output of dpkg -l . What I want to do is, print every word after a given word.My command looks like this awk '{printf "%-40s %s\n", $1, $n}' Rather than printing all the words with $n or $0 , I would like to print every word that comes after, say 5th character. How can I do this? EDIT : my complete command is bind -P|grep "can be found"|sort|awk '{printf "

How to obtain the first letter in a Bash variable?

眉间皱痕 提交于 2020-01-28 04:22:07
问题 I have a Bash variable, $word , which is sometimes a word or sentence, e.g.: word="tiger" Or: word="This is a sentence." How can I make a new Bash variable which is equal to only the first letter found in the variable? E.g., the above would be: echo $firstletter t Or: echo $firstletter T 回答1: initial="$(echo $word | head -c 1)" Every time you say "first" in your problem description, head is a likely solution. 回答2: word="tiger" firstletter=${word:0:1} 回答3: word=something first=${word::1} 回答4:

How to remove extra commas from data in Python

眉间皱痕 提交于 2020-01-26 04:40:09
问题 I have a CSV file through which I am trying to load data into my SQL table containing 2 columns. I have 2 columns and the data is separated by commas, which identify the next field. The second column contains text and some commas in that text. Because of the extra commas I am not able to load data into my SQL table as it looks like it has extra columns. I have millions of rows of data. How can I remove these extra commas? Data: Number Address "12345" , "123 abc street, Unit 345" "67893" ,