text-parsing

load std::map from text file

回眸只為那壹抹淺笑 提交于 2019-11-30 19:48:08
This is a very simple thing, so I want to keep it as simple as it sounds. All I want is to load a bunch of key-value paires from a file, and populate them in to a map. I do not really care how the text is structured, as long as it is easy to read. What i have now is: xml with xsd generated code (overkill) Protocol buffer (also overkill) INI style text file I like the syntax of the INI file, but I not want to write a parser for that. It sounds to me like I would be doing something lots of people have done before me. Is there not some sort of library to read simple structured files like this?

Can awk deal with CSV file that contains comma inside a quoted field?

这一生的挚爱 提交于 2019-11-30 10:55:28
I am using awk to perform counting the sum of one column in the csv file. The data format is something like: id, name, value 1, foo, 17 2, bar, 76 3, "I am the, question", 99 I was using this awk script to count the sum: awk -F, '{sum+=$3} END {print sum}' Some of the value in name field contains comma and this break my awk script. My question is: can awk solve this problem? If yes, and how can I do that? Thank you. you write a function in awk like below: $ awk 'func isnum(x){return(x==x+0)}BEGIN{print isnum("hello"),isnum("-42")}' 0 1 you can incorporate in your script this function and check

Outlook “Run Script” rule not triggering VBA script for incoming messages

房东的猫 提交于 2019-11-30 06:06:51
问题 I am creating this new topic on the advice of another member. For additional history regarding how things arrived at this point see this question. I have this VBA script, that I know works if it gets triggered. If I use the TestLaunch subroutine with a message already in my inbox that meets the rule criteria (but, of course, isn't being kicked off by the rule) it activates the link I want it to activate flawlessly. If, when I create the rule I say to apply it to all existing messages in my

load std::map from text file

前提是你 提交于 2019-11-30 04:09:01
问题 This is a very simple thing, so I want to keep it as simple as it sounds. All I want is to load a bunch of key-value paires from a file, and populate them in to a map. I do not really care how the text is structured, as long as it is easy to read. What i have now is: xml with xsd generated code (overkill) Protocol buffer (also overkill) INI style text file I like the syntax of the INI file, but I not want to write a parser for that. It sounds to me like I would be doing something lots of

Code Golf: Quickly Build List of Keywords from Text, Including # of Instances

ぃ、小莉子 提交于 2019-11-30 03:49:04
I've already worked out this solution for myself with PHP, but I'm curious how it could be done differently - better even. The two languages I'm primarily interested in are PHP and Javascript, but I'd be interested in seeing how quickly this could be done in any other major language today as well (mostly C#, Java, etc). Return only words with an occurrence greater than X Return only words with a length greater than Y Ignore common terms like "and, is, the, etc" Feel free to strip punctuation prior to processing (ie. "John's" becomes "John") Return results in a collection/array Extra Credit

Create Great Parser - Extract Relevant Text From HTML/Blogs

半城伤御伤魂 提交于 2019-11-29 20:28:38
I'm trying to create a generalized HTML parser that works well on Blog Posts. I want to point my parser at the specific entrie's URL and get back clean text of the post itself. My basic approach (from python) has been to use a combination of BeautifulSoup / Urllib2, which is okay, but it assumes you know the proper tags for the blog entry. Does anyone have any better ideas? Here are some thoughts maybe someone could expand upon, that I don't have enough knowledge/know-how yet to implement. The unix program 'lynx' seems to parse blog posts especially well - what parser do they use, or how could

Can awk deal with CSV file that contains comma inside a quoted field?

余生颓废 提交于 2019-11-29 16:10:27
问题 I am using awk to perform counting the sum of one column in the csv file. The data format is something like: id, name, value 1, foo, 17 2, bar, 76 3, "I am the, question", 99 I was using this awk script to count the sum: awk -F, '{sum+=$3} END {print sum}' Some of the value in name field contains comma and this break my awk script. My question is: can awk solve this problem? If yes, and how can I do that? Thank you. 回答1: you write a function in awk like below: $ awk 'func isnum(x){return(x==x

How to parse a string and create several columns from it?

痞子三分冷 提交于 2019-11-29 14:19:48
I have a varchar(max) field containing Name Value pairs, in every line I have Name UnderScore Value. I need to do a query against it so that it returns the Name, Value pairs in two columns (so by parsing the text, removing the underscore and the "new line" char. So from this select NameValue from Table where I get this text: Name1_Value1 Name2_Value2 Name3_Value3 I would like to have this output Names Values ===== ====== Name1 Value1 Name2 Value2 Name3 Value3 SELECT substring(NameValue, 1, charindex('_', NameValue)-1) AS Names, substring(NameValue, charindex('_', NameValue)+1, LEN(NameValue))

How do I tokenize this string in Ruby?

混江龙づ霸主 提交于 2019-11-29 12:21:19
问题 I have this string: %{Children^10 Health "sanitation management"^5} And I want to convert it to tokenize this into an array of hashes: [{:keywords=>"children", :boost=>10}, {:keywords=>"health", :boost=>nil}, {:keywords=>"sanitation management", :boost=>5}] I'm aware of StringScanner and the Syntax gem but I can't find enough code examples for both. Any pointers? 回答1: For a real language, a lexer's the way to go - like Guss said. But if the full language is only as complicated as your example

Extracting “((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun” from Text (Justeson & Katz, 1995)

蓝咒 提交于 2019-11-29 08:47:43
I would like to query if it is possible to extract ((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun proposed by Justeson and Katz (1995) in R package openNLP? That is, I would like to use this linguistic filtering to extract candidate noun phrases. I cannot well understand its meaning. Could you do me a favor to explain it or transform such representation into R language. Many thanks. Maybe we can start the sample code from: library("openNLP") acq <- "This paper describes a novel optical thread plug gauge (OTPG) for internal thread inspection using machine vision. The OTPG is composed of a