text-parsing

How to get the first column of every line from a CSV file?

一个人想着一个人 提交于 2019-11-27 13:30:05
问题 How do get the first column of every line in an input CSV file and output to a new file? I am thinking using awk but not sure how. 回答1: Try this: awk -F"," '{print $1}' data.txt It will split each input line in the file data.txt into different fields based on , character (as specified with the -F ) and print the first field (column) to stdout. 回答2: Can be done: $ cut -d, -f1 data.txt 回答3: echo "a,b,c" | cut -d',' -f1 > newFile 回答4: Input a,12,34 b,23,56 Code awk -F "," '{print $1}' Input

What is CoNLL data format?

折月煮酒 提交于 2019-11-27 09:50:14
问题 I am new to text mining. I am using a open source jar (Mate Parser) which gives me output in a CoNLL 2009 format after dependency parsing. I want to use the dependency parsing results for Information Extraction. But i am able to understand some of the output but not able to comprehend the CoNLL data format. Can any one help me in making me understand the CoNLL data format?? Any kind of pointers would be appreciated. 回答1: There are many different CoNLL formats since CoNLL is a different shared

Saving nltk drawn parse tree to image file

烂漫一生 提交于 2019-11-27 08:39:57
Is there any way to save the draw image from tree.draw() to an image file programmatically? I tried looking through the documentation, but I couldn't find anything. Minjoon Seo I had exactly the same need, and looking into the source code of nltk.draw.tree I found a solution: from nltk import Tree from nltk.draw.util import CanvasFrame from nltk.draw import TreeWidget cf = CanvasFrame() t = Tree.fromstring('(S (NP this tree) (VP (V is) (AdjP pretty)))') tc = TreeWidget(cf.canvas(),t) cf.add_widget(tc,10,10) # (10,10) offsets cf.print_to_file('tree.ps') cf.destroy() The output file is a

Tips for reading in a complex file - Python

孤街浪徒 提交于 2019-11-27 04:54:47
问题 I have complex, variable text files that I want to read into Python, but I'm not sure what the best strategy would be. I'm not looking for you to code anything for me, just some tips about what modules would best suit my needs/tips etc. The files look something like: Program Username: X Laser: X Em: X exp 1 sample 1 Time: X Notes: X Read 1 X data Read 2 X data # unknown number of reads sample 2 Time: X Notes: X Read 1 X data ... # Unknown number of samples exp 2 sample 1 ... # Unknown number

Best way to get all digits from a string [duplicate]

我只是一个虾纸丫 提交于 2019-11-27 02:33:27
问题 This question already has an answer here: return only Digits 0-9 from a String 7 answers Is there any better way to get take a string such as "(123) 455-2344" and get "1234552344" from it than doing this: var matches = Regex.Matches(input, @"[0-9]+", RegexOptions.Compiled); return String.Join(string.Empty, matches.Cast<Match>() .Select(x => x.Value).ToArray()); Perhaps a regex pattern that can do it in a single match? I couldn't seem to create one to achieve that though. 回答1: Do you need to

How to extract polynomial coefficients in Java?

妖精的绣舞 提交于 2019-11-26 21:57:05
问题 Taking the string -2x^2+3x^1+6 as an example, how how to extract -2 , 3 and 6 from this equation stored in the string? 回答1: Not giving the exact answer but some hints: Use replace meyhod: replace all - with +- . Use split method: // after replace effect String str = "+-2x^2+3x^1+6" String[] arr = str.split("+"); // arr will contain: {-2x^2, 3x^1, 6} Now, each index value can be splitted individually: String str2 = arr[0]; // str2 = -2x^2; // split with x and get vale at index 0 回答2: String

How to parse text into sentences

亡梦爱人 提交于 2019-11-26 21:20:51
问题 I'm trying to break up a paragraph into sentences. Here is my code so far: import java.util.*; public class StringSplit { public static void main(String args[]) throws Exception{ String testString = "The outcome of the negotiations is vital, because the current tax levels signed into law by President George W. Bush expire on Dec. 31. Unless Congress acts, tax rates on virtually all Americans who pay income taxes will rise on Jan. 1. That could affect economic growth and even holiday sales.";

Evaluating a string of simple mathematical expressions [closed]

帅比萌擦擦* 提交于 2019-11-26 21:17:51
Challenge Here is the challenge (of my own invention, though I wouldn't be surprised if it has previously appeared elsewhere on the web). Write a function that takes a single argument that is a string representation of a simple mathematical expression and evaluates it as a floating point value. A "simple expression" may include any of the following: positive or negative decimal numbers, + , - , * , / , ( , ) . Expressions use (normal) infix notation . Operators should be evaluated in the order they appear, i.e. not as in BODMAS , though brackets should be correctly observed, of course. The

How should I detect which delimiter is used in a text file?

僤鯓⒐⒋嵵緔 提交于 2019-11-26 20:11:37
问题 I need to be able to parse both CSV and TSV files. I can't rely on the users to know the difference, so I would like to avoid asking the user to select the type. Is there a simple way to detect which delimiter is in use? One way would be to read in every line and count both tabs and commas and find out which is most consistently used in every line. Of course, the data could include commas or tabs, so that may be easier said than done. Edit: Another fun aspect of this project is that I will

Saving nltk drawn parse tree to image file

大城市里の小女人 提交于 2019-11-26 17:46:24
问题 Is there any way to save the draw image from tree.draw() to an image file programmatically? I tried looking through the documentation, but I couldn't find anything. 回答1: I had exactly the same need, and looking into the source code of nltk.draw.tree I found a solution: from nltk import Tree from nltk.draw.util import CanvasFrame from nltk.draw import TreeWidget cf = CanvasFrame() t = Tree.fromstring('(S (NP this tree) (VP (V is) (AdjP pretty)))') tc = TreeWidget(cf.canvas(),t) cf.add_widget