Extract and split words from text and list them in order of occurrence using only shell terminal regex

假如想象 提交于 2019-12-13 17:35:18

问题


I have this text bellow (in this format), and I want the words to be separated and placed one by one in the order they occur in a vertical list like this example https://stackoverflow.com/a/21672824/10824251. I try egrep -vi "'?[^\\p{L}']+'?|^'|'$" mytext.txt > output.txt but I got no result just that output.txt had no (empty) content.

My text:

Teaching psychology is the part of education psychology that refers to school education. As will be seen later, both have the same goal: study, explain and understand the processes of behavioral change that are produce in people as a consequence of their participation in activities educational What gives an entity proper to teaching psychology is the nature and the characteristics of the educational activities that exist at the base of the of behavioral change studied.

My text in Portuguese:

A psicologia do ensino é a parte da psicologia da educacão que se refere à educacão escolar. Como se verá mais adiante, ambas têm um mesmo objetivo: estudar, explicar e compreender os processos de mudanca comportamental que se produzem nas pessoas como uma conseqüência da sua participacão em atividades educativas. O que confere uma entidade própria à psicologia do ensino é a natureza e as caracterís- ticas das atividades educativas que existem na base dos processos de mudanca comportamental estudados.


回答1:


You may want to tokenize texts by whitespace:

grep -o '[^[:space:]][^[:space:]]*' mytext.txt > output.txt
grep -o '[^[:space:]]\{1,\}' mytext.txt > output.txt
grep -oE '[^[:space:]]+' mytext.txt > output.txt

Or, you may extract all chunks of 1+ letters (\p{L}), diacritics (\p{M}) and numbers (\p{N}) with the PCRE regex like:

grep -oP '[\p{L}\p{M}\p{N}]+'  mytext.txt > output.txt

See the online demo. You will need pcregrep on MacOS for this to work.



来源:https://stackoverflow.com/questions/58549527/extract-and-split-words-from-text-and-list-them-in-order-of-occurrence-using-onl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!