How do I check if a string contains a specific word?

前端未结

关注

 30  3873

醉酒成梦

相关标签:

30条回答

生来不讨喜

2020-11-21 05:20

Maybe you could use something like this:

<?php
    findWord('Test all OK');

    function findWord($text) {
        if (strstr($text, 'ok')) {
            echo 'Found a word';
        }
        else
        {
            echo 'Did not find a word';
        }
    }
?>

0 讨论(0)

深忆病人

2020-11-21 05:22

Peer to SamGoody and Lego Stormtroopr comments.

If you are looking for a PHP algorithm to rank search results based on proximity/relevance of multiple words here comes a quick and easy way of generating search results with PHP only:

Issues with the other boolean search methods such as strpos(), preg_match(), strstr() or stristr()

can't search for multiple words
results are unranked

PHP method based on Vector Space Model and tf-idf (term frequency–inverse document frequency):

It sounds difficult but is surprisingly easy.

If we want to search for multiple words in a string the core problem is how we assign a weight to each one of them?

If we could weight the terms in a string based on how representative they are of the string as a whole, we could order our results by the ones that best match the query.

This is the idea of the vector space model, not far from how SQL full-text search works:

function get_corpus_index($corpus = array(), $separator=' ') {

    $dictionary = array();

    $doc_count = array();

    foreach($corpus as $doc_id => $doc) {

        $terms = explode($separator, $doc);

        $doc_count[$doc_id] = count($terms);

        // tf–idf, short for term frequency–inverse document frequency, 
        // according to wikipedia is a numerical statistic that is intended to reflect 
        // how important a word is to a document in a corpus

        foreach($terms as $term) {

            if(!isset($dictionary[$term])) {

                $dictionary[$term] = array('document_frequency' => 0, 'postings' => array());
            }
            if(!isset($dictionary[$term]['postings'][$doc_id])) {

                $dictionary[$term]['document_frequency']++;

                $dictionary[$term]['postings'][$doc_id] = array('term_frequency' => 0);
            }

            $dictionary[$term]['postings'][$doc_id]['term_frequency']++;
        }

        //from http://phpir.com/simple-search-the-vector-space-model/

    }

    return array('doc_count' => $doc_count, 'dictionary' => $dictionary);
}

function get_similar_documents($query='', $corpus=array(), $separator=' '){

    $similar_documents=array();

    if($query!=''&&!empty($corpus)){

        $words=explode($separator,$query);

        $corpus=get_corpus_index($corpus, $separator);

        $doc_count=count($corpus['doc_count']);

        foreach($words as $word) {

            if(isset($corpus['dictionary'][$word])){

                $entry = $corpus['dictionary'][$word];


                foreach($entry['postings'] as $doc_id => $posting) {

                    //get term frequency–inverse document frequency
                    $score=$posting['term_frequency'] * log($doc_count + 1 / $entry['document_frequency'] + 1, 2);

                    if(isset($similar_documents[$doc_id])){

                        $similar_documents[$doc_id]+=$score;

                    }
                    else{

                        $similar_documents[$doc_id]=$score;

                    }
                }
            }
        }

        // length normalise
        foreach($similar_documents as $doc_id => $score) {

            $similar_documents[$doc_id] = $score/$corpus['doc_count'][$doc_id];

        }

        // sort from  high to low

        arsort($similar_documents);

    }   

    return $similar_documents;
}

CASE 1

$query = 'are';

$corpus = array(
    1 => 'How are you?',
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

RESULT

Array
(
    [1] => 0.52832083357372
)

CASE 2

$query = 'are';

$corpus = array(
    1 => 'how are you today?',
    2 => 'how do you do',
    3 => 'here you are! how are you? Are we done yet?'
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

RESULTS

Array
(
    [1] => 0.54248125036058
    [3] => 0.21699250014423
)

CASE 3

$query = 'we are done';

$corpus = array(
    1 => 'how are you today?',
    2 => 'how do you do',
    3 => 'here you are! how are you? Are we done yet?'
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

RESULTS

Array
(
    [3] => 0.6813781191217
    [1] => 0.54248125036058
)

There are plenty of improvements to be made but the model provides a way of getting good results from natural queries, which don't have boolean operators such as strpos(), preg_match(), strstr() or stristr().

NOTA BENE

Optionally eliminating redundancy prior to search the words

thereby reducing index size and resulting in less storage requirement
less disk I/O
faster indexing and a consequently faster search.

1. Normalisation

Convert all text to lower case

2. Stopword elimination

Eliminate words from the text which carry no real meaning (like 'and', 'or', 'the', 'for', etc.)

3. Dictionary substitution

Replace words with others which have an identical or similar meaning. (ex:replace instances of 'hungrily' and 'hungry' with 'hunger')
Further algorithmic measures (snowball) may be performed to further reduce words to their essential meaning.
The replacement of colour names with their hexadecimal equivalents
The reduction of numeric values by reducing precision are other ways of normalising the text.

RESOURCES

http://linuxgazette.net/164/sephton.html
http://snowball.tartarus.org/
MySQL Fulltext Search Score Explained
http://dev.mysql.com/doc/internals/en/full-text-search.html
http://en.wikipedia.org/wiki/Vector_space_model
http://en.wikipedia.org/wiki/Tf%E2%80%93idf
http://phpir.com/simple-search-the-vector-space-model/

0 讨论(0)

孤城傲影

2020-11-21 05:23
The strpos function works fine, but if you want to do case-insensitive checking for a word in a paragraph then you can make use of the stripos function of PHP.

For example,
```
$result = stripos("I love PHP, I love PHP too!", "php");
if ($result === false) {
    // Word does not exist
}
else {
    // Word exists
}
```
Find the position of the first occurrence of a case-insensitive substring in a string.

If the word doesn't exist in the string then it will return false else it will return the position of the word.
0 讨论(0)
发布评论:

提交评论
- 加载中...
悲哀的现实

2020-11-21 05:24
Lot of answers that use substr_count checks if the result is >0. But since the if statement considers zero the same as false, you can avoid that check and write directly:
```
if (substr_count($a, 'are')) {
```
To check if not present, add the ! operator:
```
if (!substr_count($a, 'are')) {
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
离开以前

2020-11-21 05:25
You could use regular expressions as it's better for word matching compared to strpos, as mentioned by other users. A strops check for are will also return true for strings such as: fare, care, stare, etc. These unintended matches can simply be avoided in regular expression by using word boundaries.

A simple match for are could look something like this:
```
$a = 'How are you?';

if (preg_match('/\bare\b/', $a)) {
    echo 'true';
}
```
On the performance side, strpos is about three times faster. When I did one million compares at once, it took preg_match 1.5 seconds to finish and for strpos it took 0.5 seconds.

Edit: In order to search any part of the string, not just word by word, I would recommend using a regular expression like
```
$a = 'How are you?';
$search = 'are y';
if(preg_match("/{$search}/i", $a)) {
    echo 'true';
}
```
The i at the end of regular expression changes regular expression to be case-insensitive, if you do not want that, you can leave it out.

Now, this can be quite problematic in some cases as the $search string isn't sanitized in any way, I mean, it might not pass the check in some cases as if $search is a user input they can add some string that might behave like some different regular expression...

Also, here's a great tool for testing and seeing explanations of various regular expressions Regex101

To combine both sets of functionality into a single multi-purpose function (including with selectable case sensitivity), you could use something like this:
```
function FindString($needle,$haystack,$i,$word)
{   // $i should be "" or "i" for case insensitive
    if (strtoupper($word)=="W")
    {   // if $word is "W" then word search instead of string in string search.
        if (preg_match("/\b{$needle}\b/{$i}", $haystack)) 
        {
            return true;
        }
    }
    else
    {
        if(preg_match("/{$needle}/{$i}", $haystack)) 
        {
            return true;
        }
    }
    return false;
    // Put quotes around true and false above to return them as strings instead of as bools/ints.
}
```
One more thing to take in mind, is that \b will not work in different languages other than english.

The explanation for this and the solution is taken from here:

\b represents the beginning or end of a word (Word Boundary). This regex would match apple in an apple pie, but wouldn’t match apple in pineapple, applecarts or bakeapples.

How about “café”? How can we extract the word “café” in regex? Actually, \bcafé\b wouldn’t work. Why? Because “café” contains non-ASCII character: é. \b can’t be simply used with Unicode such as समुद्र, 감사, месяц and
0 讨论(0)
发布评论:

提交评论
- 加载中...
再見小時候

2020-11-21 05:26
```
if (preg_match('/(are)/', $a)) {
   echo 'true';
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题