punctuation

This regex to strip punctuation also incorrectly makes the word Báenou into Benou

旧街凉风 提交于 2019-12-04 11:07:03
The goal of this regex is to remove punctuation characters: var myTxt = "Welcome, Visitor: The Royal Kingdom Of Báenou"; myTxt = myTxt.replace(/[^a-zA-Z0-9 ]+/g, '').replace('/ {2,}/',' '); alert(myTxt); So the text above should become this: Welcome Visitor The Royal Kingdom Of Báenou But instead it incorrectly drops the á in Báenou to produce this: Welcome Visitor The Royal Kingdom Of Benou What's the simplest change I could make to the regex to make it work as intended? Your problem is that you are dropping anything that is not in a "whitelist" which you define as all (non-accented) letters,

Are there character collections for all international full stop punctuations?

风流意气都作罢 提交于 2019-12-04 05:04:09
I am trying to parse utf-8 strings into "bite sized" segments. For example, I would like to break down a text into "sentences". Is there a comprehensive collection of characters (or regex) that correspond to end of sentences in all languages? I'm looking for something that would capture the Latin period, exclamation and interrogation marks, the Chinese and Japanese full stop, etc. Something like the above but for the equivalent of a comma would be great too. I haven’t encountered any compilations of such information, and I would expect it to be a major effort to collect it. For some widely

What is '`' character called?

北城以北 提交于 2019-12-03 22:25:36
I feel silly for asking this but it isn't like I could google this. What is the ` character called? In case it doesnt show up, it is the character used for inline code with markdown. Also, on most keyboards, it shares the key with ~ . I like all three answers so I made this a CW instead of accepting All sorts of things, but in programming mostly the back-quote or backtick, Grave (pronounced Grahv, not like the synonym for tomb) or Grave accent . From the Jargon file , the prime nerd reference which really should be an ISO standard :-) Common : backquote; left quote; left single quote; open

tm custom removePunctuation except hashtag

故事扮演 提交于 2019-12-03 17:28:33
I have a Corpus of tweets from twitter. I clean this corpus (removeWords, tolower, delete URls) and finally also want to remove punctuation. Here is my code: tweetCorpus <- tm_map(tweetCorpus, removePunctuation, preserve_intra_word_dashes = TRUE) The problem now is, that by doing so I also loose the hashtag (#). Is there a way to remove punctuation with tm_map but remain the hashtag? You could adapt the existing removePunctuation to suit your needs. For example removeMostPunctuation<- function (x, preserve_intra_word_dashes = FALSE) { rmpunct <- function(x) { x <- gsub("#", "\002", x) x <-

Why is the hyphen conventional in symbol names in LISP?

别说谁变了你拦得住时间么 提交于 2019-12-03 14:01:47
What's the reason of this recommendation? Why not keeping consistent with other programming languages which use underscore instead? I think that LISP uses the hyphen for two reasons: "history" and "because you can". History LISP is an old language, and in the early days typing an underscore could be challenging. For example, the first terminal I used for LISP was an ASR-33 teletype . On some hosts and teletype models, the key sequence for the underscore character would be interpreted as a left-pointing arrow (the assignment operator in Smalltalk). Hyphens could be typed more reliably. Because

Redefining “sentence” in Emacs? (single space between sentences, but ignoring abbreviations)

戏子无情 提交于 2019-12-03 11:27:42
I would like to be able to navigate by sentence in Emacs (M-a, M-e). Here's the problem: by default, Emacs expects that each sentence is separated by two spaces, and I'm used to just putting a single space. Of course, that setting can be turned off, to allow for sentences separated by only a single space, like so: (setq sentence-end-double-space nil) But then Emacs thinks that a sentence has ended after abbreviations with a full stop ("."), e.g. after something like "...a weird command, e.g. foo...". So rather than using the above code, is there a way to define the sentence-end variable so

How to read a .csv file containing apostrophes into R?

瘦欲@ 提交于 2019-12-03 09:22:42
I am having difficulty getting R to read a .txt or .csv file that contains apostrophes. Some of my columns contain descriptive text, such as "Attends to customers' needs" or "Sheriff's deputy". My file opens correctly in Excel (that is, all the data appear in the correct cells; there are 3 columns and about 8000 rows, and there is no missing data). But when I ask R to read the file, this is what happens: data <-read.table("datafile.csv", sep=",", header=TRUE) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 520 did not have 3 elements (Line 520 is the first

SQL Server: How do you remove punctuation from a field?

拈花ヽ惹草 提交于 2019-12-03 02:40:44
Any one know a good way to remove punctuation from a field in SQL Server? I'm thinking UPDATE tblMyTable SET FieldName = REPLACE(REPLACE(REPLACE(FieldName,',',''),'.',''),'''' ,'') but it seems a bit tedious when I intend on removing a large number of different characters for example: !@#$%^&*()<>:" Thanks in advance Ideally, you would do this in an application language such as C# + LINQ as mentioned above. If you wanted to do it purely in T-SQL though, one way make things neater would be to firstly create a table that held all the punctuation you wanted to removed. CREATE TABLE Punctuation (

How can I remove all leading and trailing punctuation?

狂风中的少年 提交于 2019-12-01 18:36:41
问题 I want to remove all the leading and trailing punctuation in a string. How can I do this? Basically, I want to preserve punctuation in between words, and I need to remove all leading and trailing punctuation. . , @ , _ , & , / , - are allowed if surrounded by letters or digits \' is allowed if preceded by a letter or digit I tried Pattern p = Pattern.compile("(^\\p{Punct})|(\\p{Punct}$)"); Matcher m = p.matcher(term); boolean a = m.find(); if(a) term=term.replaceAll("(^\\p{Punct})", ""); but

How can I remove all leading and trailing punctuation?

让人想犯罪 __ 提交于 2019-12-01 18:19:49
I want to remove all the leading and trailing punctuation in a string. How can I do this? Basically, I want to preserve punctuation in between words, and I need to remove all leading and trailing punctuation. . , @ , _ , & , / , - are allowed if surrounded by letters or digits \' is allowed if preceded by a letter or digit I tried Pattern p = Pattern.compile("(^\\p{Punct})|(\\p{Punct}$)"); Matcher m = p.matcher(term); boolean a = m.find(); if(a) term=term.replaceAll("(^\\p{Punct})", ""); but it didn't work!! Ok. So basically you want to find some pattern in your string and act if the pattern