Programmatically extract keywords from domain names

前端 未结 7 1233
余生分开走
余生分开走 2021-02-01 11:32

Let\'s say I have a list of domain names that I would like to analyze. Unless the domain name is hyphenated, I don\'t see a particularly easy way to \"extract\" the keywords use

7条回答
  •  日久生厌
    2021-02-01 11:42

    function getwords( $string ) {
        if( strpos($string,"xn--") !== false ) {
            return false;
        }
        $string = trim( str_replace( '-', '', $string ) );
        $pspell = pspell_new( 'en' );
        $check = array();
        $words = array();
        for( $j = 0; $j < ( strlen( $string ) - 5 ); $j++ ) {
            for( $i = 4; $i < strlen( $string ); $i++ ) {
                if( pspell_check( $pspell, substr( $string, $j, $i ) ) ) {
                    $check[$j]++;
                    $words[] = substr( $string, $j, $i );
                }
            }
        }
        $words = array_unique( $words );
        if( count( $check ) > 0 ) {
            return $words;
        }
        return false;
    }
    
    print_r( getwords( 'ilikecheesehotels' ) );
    
    Array
    (
        [0] => like
        [1] => cheese
        [2] => hotel
        [3] => hotels
    )
    

    as a simple start with pspell. you might want to compare results and see if you got the stemm of a words without the "s" at the end and merge them.

提交回复
热议问题