text-processing

How can I detect a sequence of “hollows” (holes, lines not matching a pattern) bigger than n in a text file?

旧城冷巷雨未停 提交于 2019-12-24 03:12:18
问题 Case scenario : $ cat Status.txt 1,connected 2,connected 3,connected 4,connected 5,connected 6,connected 7,disconnected 8,disconnected 9,disconnected 10,disconnected 11,disconnected 12,disconnected 13,disconnected 14,connected 15,connected 16,connected 17,disconnected 18,connected 19,connected 20,connected 21,disconnected 22,disconnected 23,disconnected 24,disconnected 25,disconnected 26,disconnected 27,disconnected 28,disconnected 29,disconnected 30,connected As can be seen, there are

AWK how records and fields are executed and read

限于喜欢 提交于 2019-12-24 00:48:36
问题 I am getting the right result for the awk program below. But I dont understand how does AWK process lines of code for the below program : { for(i = 1; i <= NF; i++) { if (min[i]==""){ print "initial min " $i; min[i]=$i;} #line1 if (max[i]==""){ print "initial max " $i; max[i]=$i;} #line2 if ($i<min[i]) { print "New min " $i; min[i]=$i;} #line3 if ($i>max[i]) { print "New max " $i; max[i]=$i;} #line4 } } END { OFS="\t"; print "min","max"; for(i = 1; i <= NF; i++) { print min[i],max[i]; } }

What is the best way to select a text portion to cut based on keywords?

北慕城南 提交于 2019-12-23 20:04:08
问题 When you search something in Stackoverflow it cuts the portion of the question description that best matches your criteria and after that it marks the criteria words. I wonder the best way to do this manually in C#, meaning without the help of a full-text search engine. The main problem is how to select the best text portion in a fast way? What I did so far is: I obtain the space indexes of the text. This allows me to know where the words begin so that I can start my substring tests from them

How can I split a word into bi-grams, including repeated ones?

爱⌒轻易说出口 提交于 2019-12-23 17:39:32
问题 I am trying to split a word into bi-grams. I am using the qlcMatrix package, but it only returns distinct bi-grams. For example, for the word "detected" , it only returns "te" once. This is the command I use test_domain <- c("detected") library("qlcMatrix", lib.loc="~/R/win-library/3.2") bigram1 <- splitStrings(test_domain, sep = "", bigrams = TRUE, left.boundary = "", right.boundary = "")$bigrams and this is the result I get: bigram1 # [1] "ec" "ed" "de" "te" "ct" "et" 回答1: Another way to do

How to make grep separate output by NULL characters?

落爺英雄遲暮 提交于 2019-12-23 15:56:00
问题 Suppose we are doing a multiline regex pattern search on a bunch of files and we want to extract the matches from grep. By default, grep outputs matches separated by newlines, but since we are doing multiline patterns this creates the inconvenience that we cannot easily extract the individual matches. Example grep -rzPIho '}\n\n\w\w\b' | od -a Depending on the files in your filetree, this may yield an output like 0000000 } nl nl m y nl } nl nl i f nl } nl nl m 0000020 y nl } nl nl m y nl } nl

convert a `find` like output to a `tree` like output

余生长醉 提交于 2019-12-23 15:23:11
问题 This question is a generalized version of the Output of ZipArchive() in tree format question. Just before I am wasting time on writing this (*nix command line) utility, it will be a good idea to find out if someone already wrote it. I would like a utility that will get as its' standard input a list such as the one returned by find(1) and will output something similar to the one by tree(1) E.g.: Input: /fruit/apple/green /fruit/apple/red /fruit/apple/yellow /fruit/banana/green /fruit/banana

Reading text values into matlab variables from ASCII files

*爱你&永不变心* 提交于 2019-12-23 08:54:29
问题 Consider the following file var1 var2 variable3 1 2 3 11 22 33 I would like to load the numbers into a matrix, and the column titles into a variable that would be equivalent to: variable_names = char('var1', 'var2', 'variable3'); I don't mind to split the names and the numbers in two files, however preparing matlab code files and eval'ing them is not an option. Note that there can be an arbitrary number of variables (columns) 回答1: I suggest importdata for operations like this: d = importdata(

Calculating distance between word/document vectors from a nested dictionary

谁都会走 提交于 2019-12-23 06:45:15
问题 I have a nested dictionary as such: myDict = {'a': {1:2, 2:163, 3:12, 4:67, 5:84}, 'about': {1:27, 2:45, 3:21, 4:10, 5:15}, 'apple': {1:0, 2: 5, 3:0, 4:10, 5:0}, 'anticipate': {1:1, 2:5, 3:0, 4:8, 5:7}, 'an': {1:3, 2:15, 3:1, 4:312, 5:100}} The outer key is a word, the inner keys are file/document ids the values are the number of times the word (outer key occurs) How do I calculate the sum of the square values to the inner keys? For example for the inner key number 1 , I should get: 2^2 + 27

Calculating distance between word/document vectors from a nested dictionary

北慕城南 提交于 2019-12-23 06:42:07
问题 I have a nested dictionary as such: myDict = {'a': {1:2, 2:163, 3:12, 4:67, 5:84}, 'about': {1:27, 2:45, 3:21, 4:10, 5:15}, 'apple': {1:0, 2: 5, 3:0, 4:10, 5:0}, 'anticipate': {1:1, 2:5, 3:0, 4:8, 5:7}, 'an': {1:3, 2:15, 3:1, 4:312, 5:100}} The outer key is a word, the inner keys are file/document ids the values are the number of times the word (outer key occurs) How do I calculate the sum of the square values to the inner keys? For example for the inner key number 1 , I should get: 2^2 + 27

Batch file to read txt file with file names then search for that file and copy to folder

青春壹個敷衍的年華 提交于 2019-12-23 05:46:09
问题 What I need is a batch file that reads partial file names in a txt file. Each file name is on its own line. Then it needs to search for that file in a specified folder (including sub folders) and if the file is found, copy the file to a folder on my desktop. I found a batch script that does almost exactly that, but my file names aren't a complete file name, only part of it with no extension, and this script searches for exact file names. I need to modify this script to search for files using