Let\'s say I have a list of domain names that I would like to analyze. Unless the domain name is hyphenated, I don\'t see a particularly easy way to \"extract\" the keywords use
function getwords( $string ) {
if( strpos($string,"xn--") !== false ) {
return false;
}
$string = trim( str_replace( '-', '', $string ) );
$pspell = pspell_new( 'en' );
$check = array();
$words = array();
for( $j = 0; $j < ( strlen( $string ) - 5 ); $j++ ) {
for( $i = 4; $i < strlen( $string ); $i++ ) {
if( pspell_check( $pspell, substr( $string, $j, $i ) ) ) {
$check[$j]++;
$words[] = substr( $string, $j, $i );
}
}
}
$words = array_unique( $words );
if( count( $check ) > 0 ) {
return $words;
}
return false;
}
print_r( getwords( 'ilikecheesehotels' ) );
Array
(
[0] => like
[1] => cheese
[2] => hotel
[3] => hotels
)
as a simple start with pspell. you might want to compare results and see if you got the stemm of a words without the "s" at the end and merge them.