text-parsing | 易学教程

load std::map from text file

阅读更多关于 load std::map from text file

This is a very simple thing, so I want to keep it as simple as it sounds. All I want is to load a bunch of key-value paires from a file, and populate them in to a map. I do not really care how the text is structured, as long as it is easy to read. What i have now is: xml with xsd generated code (overkill) Protocol buffer (also overkill) INI style text file I like the syntax of the INI file, but I not want to write a parser for that. It sounds to me like I would be doing something lots of people have done before me. Is there not some sort of library to read simple structured files like this?

Can awk deal with CSV file that contains comma inside a quoted field?

阅读更多关于 Can awk deal with CSV file that contains comma inside a quoted field?

I am using awk to perform counting the sum of one column in the csv file. The data format is something like: id, name, value 1, foo, 17 2, bar, 76 3, "I am the, question", 99 I was using this awk script to count the sum: awk -F, '{sum+=$3} END {print sum}' Some of the value in name field contains comma and this break my awk script. My question is: can awk solve this problem? If yes, and how can I do that? Thank you. you write a function in awk like below: $ awk 'func isnum(x){return(x==x+0)}BEGIN{print isnum("hello"),isnum("-42")}' 0 1 you can incorporate in your script this function and check

Outlook “Run Script” rule not triggering VBA script for incoming messages

阅读更多关于 Outlook “Run Script” rule not triggering VBA script for incoming messages

问题 I am creating this new topic on the advice of another member. For additional history regarding how things arrived at this point see this question. I have this VBA script, that I know works if it gets triggered. If I use the TestLaunch subroutine with a message already in my inbox that meets the rule criteria (but, of course, isn't being kicked off by the rule) it activates the link I want it to activate flawlessly. If, when I create the rule I say to apply it to all existing messages in my

load std::map from text file

阅读更多关于 load std::map from text file

问题 This is a very simple thing, so I want to keep it as simple as it sounds. All I want is to load a bunch of key-value paires from a file, and populate them in to a map. I do not really care how the text is structured, as long as it is easy to read. What i have now is: xml with xsd generated code (overkill) Protocol buffer (also overkill) INI style text file I like the syntax of the INI file, but I not want to write a parser for that. It sounds to me like I would be doing something lots of

Code Golf: Quickly Build List of Keywords from Text, Including # of Instances

阅读更多关于 Code Golf: Quickly Build List of Keywords from Text, Including # of Instances

I've already worked out this solution for myself with PHP, but I'm curious how it could be done differently - better even. The two languages I'm primarily interested in are PHP and Javascript, but I'd be interested in seeing how quickly this could be done in any other major language today as well (mostly C#, Java, etc). Return only words with an occurrence greater than X Return only words with a length greater than Y Ignore common terms like "and, is, the, etc" Feel free to strip punctuation prior to processing (ie. "John's" becomes "John") Return results in a collection/array Extra Credit

Create Great Parser - Extract Relevant Text From HTML/Blogs

阅读更多关于 Create Great Parser - Extract Relevant Text From HTML/Blogs

I'm trying to create a generalized HTML parser that works well on Blog Posts. I want to point my parser at the specific entrie's URL and get back clean text of the post itself. My basic approach (from python) has been to use a combination of BeautifulSoup / Urllib2, which is okay, but it assumes you know the proper tags for the blog entry. Does anyone have any better ideas? Here are some thoughts maybe someone could expand upon, that I don't have enough knowledge/know-how yet to implement. The unix program 'lynx' seems to parse blog posts especially well - what parser do they use, or how could

Can awk deal with CSV file that contains comma inside a quoted field?

阅读更多关于 Can awk deal with CSV file that contains comma inside a quoted field?

问题 I am using awk to perform counting the sum of one column in the csv file. The data format is something like: id, name, value 1, foo, 17 2, bar, 76 3, "I am the, question", 99 I was using this awk script to count the sum: awk -F, '{sum+=$3} END {print sum}' Some of the value in name field contains comma and this break my awk script. My question is: can awk solve this problem? If yes, and how can I do that? Thank you. 回答1: you write a function in awk like below: $ awk 'func isnum(x){return(x==x

How to parse a string and create several columns from it?

阅读更多关于 How to parse a string and create several columns from it?

I have a varchar(max) field containing Name Value pairs, in every line I have Name UnderScore Value. I need to do a query against it so that it returns the Name, Value pairs in two columns (so by parsing the text, removing the underscore and the "new line" char. So from this select NameValue from Table where I get this text: Name1_Value1 Name2_Value2 Name3_Value3 I would like to have this output Names Values ===== ====== Name1 Value1 Name2 Value2 Name3 Value3 SELECT substring(NameValue, 1, charindex('_', NameValue)-1) AS Names, substring(NameValue, charindex('_', NameValue)+1, LEN(NameValue))

How do I tokenize this string in Ruby?

阅读更多关于 How do I tokenize this string in Ruby?

问题 I have this string: %{Children^10 Health "sanitation management"^5} And I want to convert it to tokenize this into an array of hashes: [{:keywords=>"children", :boost=>10}, {:keywords=>"health", :boost=>nil}, {:keywords=>"sanitation management", :boost=>5}] I'm aware of StringScanner and the Syntax gem but I can't find enough code examples for both. Any pointers? 回答1: For a real language, a lexer's the way to go - like Guss said. But if the full language is only as complicated as your example

Extracting “((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun” from Text (Justeson & Katz, 1995)

阅读更多关于 Extracting “((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun” from Text (Justeson & Katz, 1995)

I would like to query if it is possible to extract ((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun proposed by Justeson and Katz (1995) in R package openNLP? That is, I would like to use this linguistic filtering to extract candidate noun phrases. I cannot well understand its meaning. Could you do me a favor to explain it or transform such representation into R language. Many thanks. Maybe we can start the sample code from: library("openNLP") acq <- "This paper describes a novel optical thread plug gauge (OTPG) for internal thread inspection using machine vision. The OTPG is composed of a