text-processing | 易学教程

Negation handling in NLP

阅读更多关于 Negation handling in NLP

问题 I'm currently working on a project, where I want to extract emotion from text. As I'm using conceptnet5 (a semantic network), I can't however simply prefix words in a sentence that contains a negation-word, as those words would simply not show up in conceptnet5's API. Here's an example: The movie wasn't that good. Hence, I figured that I could use wordnet's lemma functionality to replace adjectives in sentences that contain negation-words like (not, ...). In the previous example, the

Negation handling in NLP

阅读更多关于 Negation handling in NLP

Increment a version number contained in a text file

阅读更多关于 Increment a version number contained in a text file

问题 This self-answered question addresses the scenario originally described in Increment version number in file: A version number embedded in a text file is to be incremented. Sample text-file content: nuspec{ id = XXX; version: 0.0.30; title: XXX; For instance, I want embedded version number 0.0.30 updated to 0.0.31 . The line of interest can be assumed to match the following regex: ^\s+version: (.+);$ Note hat the intent is not to replace the version number with a fixed new version, but to

Measuring width of text (Python/PIL)

阅读更多关于 Measuring width of text (Python/PIL)

问题 I'm using the following two methods to calculate a sample string's rendered width for a set font-type and size: font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 14) sample = "Lorem ipsum dolor sit amet, partem periculis an duo, eum lorem paulo an, mazim feugiat lobortis sea ut. In est error eirmod vituperata, prima iudicabit rationibus mel et. Paulo accumsan ad sit, et modus assueverit eum. Quod homero adversarium vel ne, mel noster dolorum te, qui ea senserit

Measuring width of text (Python/PIL)

阅读更多关于 Measuring width of text (Python/PIL)

Using AWK to merge unique rows based on column one

阅读更多关于 Using AWK to merge unique rows based on column one

问题 I am trying to write an AWK script to summarize data on a large text file. The order of the resulting data is important so i can't use sort. I have tried different variations of FNR==NR but haven't had any luck Input file Height 3.5 Weight 12.3 Age 23 : : Height 4.5 Weight 15.5 Age 31 : : Expected Output Height 3.5 4.5 Weight 12.3 15.5 Age 23 31 回答1: With awk: awk '{a[$1]=a[$1] FS $2} END{for(i in a) print i a[i]}' file Output: Weight 12.3 15.5 Height 3.5 4.5 : Age 23 31 Derived from: how to

How to print a series of words using awk?

阅读更多关于 How to print a series of words using awk?

问题 I know that awk can be used to print only certain words from the output.For example dpkg -l|awk '{print $2}' would print 2nd word from the output of dpkg -l . What I want to do is, print every word after a given word.My command looks like this awk '{printf "%-40s %s\n", $1, $n}' Rather than printing all the words with $n or $0 , I would like to print every word that comes after, say 5th character. How can I do this? EDIT : my complete command is bind -P|grep "can be found"|sort|awk '{printf "

How to print a series of words using awk?

阅读更多关于 How to print a series of words using awk?

How to obtain the first letter in a Bash variable?

阅读更多关于 How to obtain the first letter in a Bash variable?

问题 I have a Bash variable, $word , which is sometimes a word or sentence, e.g.: word="tiger" Or: word="This is a sentence." How can I make a new Bash variable which is equal to only the first letter found in the variable? E.g., the above would be: echo $firstletter t Or: echo $firstletter T 回答1: initial="$(echo $word | head -c 1)" Every time you say "first" in your problem description, head is a likely solution. 回答2: word="tiger" firstletter=${word:0:1} 回答3: word=something first=${word::1} 回答4:

How to remove extra commas from data in Python

阅读更多关于 How to remove extra commas from data in Python

问题 I have a CSV file through which I am trying to load data into my SQL table containing 2 columns. I have 2 columns and the data is separated by commas, which identify the next field. The second column contains text and some commas in that text. Because of the extra commas I am not able to load data into my SQL table as it looks like it has extra columns. I have millions of rows of data. How can I remove these extra commas? Data: Number Address "12345" , "123 abc street, Unit 345" "67893" ,