tokenize

Tokenizing math expression with functions in C#

馋奶兔 提交于 2019-12-12 01:48:58
问题 I figured this would be easy to find, but I haven't been successful. I need to be able to tokenize the following expression (4 + 5) + myfunc('two words', 3, 5) into ( 4 + 5 + myfunc ( 'two words' , 3 , 5 ) It seems like this is probably a common need, however I haven't been able to find any good documentation on this out there. Is this something I could do using regex? Anybody know of an existing way to do this? I'm using C#, but if you have the answer in another language, don't be shy.

How to make the tokinezer detect empty spaces while using strtok()

柔情痞子 提交于 2019-12-12 01:16:58
问题 I am designing a c++ program, somewhere in the program i need to detect if there is a blank(empty token) next to the token used know eg. if(token1==start) { token2=strtok(NULL," "); if(token2==NULL) {LCCTR=0;} else {LCCTR=atoi(token2);} so in the previous peice token1 is pointing to start , and i want to check if there is anumber next to the start , so I used token2=strtok(NULL," ") to point to the next token but unfortunattly the strtok function cannot detect empty spaces so it gives me an

Reading input from a file in python 3.x

一世执手 提交于 2019-12-11 21:16:58
问题 Say you are reading input from a file structured like so P3 400 200 255 255 255 255 255 0 0 255 0 0 etc... But you want to account for any mistakes that may come from the input file as in P3 400 200 255 255 255 255 255 0 0 255 0 0 etc... I want to read in the first token 'P3' then the next two '400' '200' (height/width) the '255' and from here on, I want to read every token in and account for how they should be in groups of 3. I have the correct code to read this information but I can't seem

String tokenization in java (LARGE text)

删除回忆录丶 提交于 2019-12-11 12:06:47
问题 I have this large text (read LARGE). I need to tokenize every word, delimit on every non-letter. I used StringTokenizer to read one word at a time. However, as I was researching how to write the delimiter string ("every non-letter") instead of doing something like: new StringTokenizer(text, "\" ();,.'[]{}!?:”“…\n\r0123456789 [etc etc]"); I found that everyone basically hates StringTokenizer (why?). So, what can I use instead? Dont suggest String.split as it will duplicate my large text. I

Preventing tokens from containing a space in Stanford CoreNLP

微笑、不失礼 提交于 2019-12-11 11:58:35
问题 Is there an option in Stanford CoreNLP's tokenizer to prevent tokens from containing a space? E.g. if the sentence is "my phone is 617 1555-6644", the substring "617 1555" should be Into two different tokens. I am aware of the option normalizeSpace: normalizeSpace: Whether any spaces in tokens (phone numbers, fractions get turned into U+00A0 (non-breaking space). It's dangerous to turn this off for most of our Stanford NLP software, which assumes no spaces in tokens. but I don't want tokens

Why is this function not breaking up this input string?

亡梦爱人 提交于 2019-12-11 11:48:41
问题 I'm trying to break up a string into "symbols" with C++ for further work. I haven't written anything in C++ for a long while, so forgive me if there is something inherently wrong with this code. The purpose of the symbolize() function below is to break up a string, such as "5+5", into a vector of strings, eg {"5","+","5"} . It's not working. If you think the code is too messy, please suggest a way to simplify it. Here's my code so far: #include <iostream> #include <string> #include <vector>

Regex tokenize issue

老子叫甜甜 提交于 2019-12-11 10:48:10
问题 I have strings input by the user and want to tokenize them. For that, I want to use regex and now have a problem with a special case. An example string is Test + "Hello" + "Good\"more" + "Escape\"This\"Test" or the C# equivalent @"Test + ""Hello"" + ""Good\""more"" + ""Escape\""This\""Test""" I am able to match the Test and + tokens, but not the ones contained by the ". I use the " to let the user specify that this is literally a string and not a special token. Now if the user wants to use

Parse/tokenize objective-c with objective-c (iPhone)

孤人 提交于 2019-12-11 10:32:34
问题 What are the options available of parsing and/or tokenizing Objective-C on iPhone? Essentially I'm thinking of parsing/tokenizing enough to power syntax highlighting and autocompletion at somewhat the same level as Xcode does. 回答1: I know the topic is old, but this might help someone else. Apple already provides the (very nice) CFStringTokenizer, with support for multiple languages. Here's a good presentation on that, including sample code. In case tokenization is enough, that should do it.

how to identify a end of a sentence

跟風遠走 提交于 2019-12-11 10:19:53
问题 String x=" i am going to the party at 6.00 in the evening. are you coming with me?"; if i have the above string, i need that to be broken to sentences by using sentence boundry punctuations(like . and ?) but it should not split the sentence at 6 because of having an pointer there. is there a way to identify what is the correct sentence boundry place in java? i have tried using stringTokenizer in java.util pakage but it always break the sentence whenever it finds a pointer. Can someone suggest

How to separate tokens in line using Unix? [duplicate]

≡放荡痞女 提交于 2019-12-11 08:26:00
问题 This question already has answers here : How split a file in words in unix command line? (11 answers) Closed 5 years ago . How to separate tokens in line using Unix? [in]: some sentences are like this. some sentences foo bar that [out:] some sentences are like this. some sentences foo bar that I could have done this in python as below, but is there any unix way to achieve the same output? >>> import codecs >>> outfile = codecs.open('outfile.txt','w','utf8') >>> intext = "some sentences are