Extract words from string with preg_match_all

前端未结

关注

 7  2141

I\'m not good with regex but i want to use it to extract words from a string.

The words i need should have minimum 4 characters and the provided string can

相关标签:

7条回答

离开以前

2020-12-21 17:48

You can use the regex below for simple strings. It will match any non-whitespace characters with min length = 4.

preg_match_all('/(\S{4,})/i', $str, $m);

Now $m[1] contains the array you want.

Update:

As Gordon said, the pattern will also match the '(20-40)'. The unwanted numbers can be removed using this regex:

preg_match_all('/(\pL{4,})/iu', $str, $m);

But I think it only works if PCRE is compiled with UTF-8 support. See PHP PCRE (regex) doesn't support UTF-8?. It works on my computer though.

0 讨论(0)

发布评论:

提交评论

加载中...

长情又很酷

2020-12-21 17:52

$string = Sus azahares presentan gruesos pétalos blancos teñidos de rosa o violáceo en la parte externa, con numerosos estambres $words = explode(' ', $string); echo $words[0]; echo $words[1];

and so on

0 讨论(0)

发布评论:

提交评论

加载中...

执念已碎

2020-12-21 17:57

Try this one:

$str='Sus azahares presentan gruesos pétalos blancos teñidos de rosa o violáceo en la parte externa, con numerosos estambres (20-40).'; preg_match_all('/([^0-9\s]){4,}/i', $str, $matches); echo '<pre>'; var_dump($matches); echo '</pre>';

0 讨论(0)

发布评论:

提交评论

加载中...

广开言路

2020-12-21 17:59

that should do the job for you

function extractCommonWords($string) { $stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www'); $string = preg_replace('/\s\s+/i', '', $string); //echo $string, "<br /><br />"; // replace whitespace $string = trim($string); // trim the string $string = preg_replace('/[^a-zA-Z0-9 -_]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too… $string = strtolower($string); // make it lowercase preg_match_all('/([a-zA-Z]|\xC3[\x80-\x96\x98-\xB6\xB8-\xBF]|\xC5[\x92\x93\xA0\xA1\xB8\xBD\xBE]){4,}/', $string, $matchWords); $matchWords = $matchWords[0]; foreach($matchWords as $key => $item) { if($item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3) { unset($matchWords[$key]); } } $wordCountArr = array(); if(is_array($matchWords)) { foreach($matchWords as $key => $val) { $val = strtolower($val); if(isset($wordCountArr[$val])) { $wordCountArr[$val]++; } else { $wordCountArr[$val] = 1; } } } arsort($wordCountArr); $wordCountArr = array_slice($wordCountArr, 0, 10); return $wordCountArr; }

0 讨论(0)

发布评论:

提交评论

加载中...

隐瞒了意图╮

2020-12-21 18:01

Explode your string with spaces (which will create an array with all words), then check if the word is bigger than 4 letters.

//The string you want to explode $string = "Sus azahares presentan gruesos pétalos blancos teñidos de rosa o violáceo en la parte externa, con numerosos estambres." //explode your $string, which will create an array which we will call $words $words = explode(' ', $string); //for each $word in $words foreach($words as $word) { //check if $word length if larger then 4 if(strlen($word) > 4) { //echo the $word echo $word; } }

strlen();

strlen — Get string length

explode();

explode — Split a string by string

0 讨论(0)

发布评论:

提交评论

加载中...

深忆病人

2020-12-21 18:05

This works if the words to look for are UTF-8 (at least 4 chars long, as per specs), consisting of alphabetic characters of ISO-8859-15 (which is fine for Spanish, but also for English, German, French, etc.):

$n_words = preg_match_all('/([a-zA-Z]|\xC3[\x80-\x96\x98-\xB6\xB8-\xBF]|\xC5[\x92\x93\xA0\xA1\xB8\xBD\xBE]){4,}/', $str, $match_arr); $word_arr = $match_arr[0];

0 讨论(0)

发布评论:

提交评论

加载中...

1 2 下一页

验证码

看不清?

提交回复