Programmatically extract keywords from domain names

前端 未结 7 1232
余生分开走
余生分开走 2021-02-01 11:32

Let\'s say I have a list of domain names that I would like to analyze. Unless the domain name is hyphenated, I don\'t see a particularly easy way to \"extract\" the keywords use

7条回答
  •  孤街浪徒
    2021-02-01 11:38

    Ok, I ran the script I wrote for this SO question, with a couple of minor changes -- using log probabilities to avoid underflow, and modifying it to read multiple files as the corpus.

    For my corpus I downloaded a bunch of files from project Gutenberg -- no real method to this, just grabbed all english-language files from etext00, etext01, and etext02.

    Below are the results, I saved the top three for each combination.

    expertsexchange: 97 possibilities
     -  experts exchange -23.71
     -  expert sex change -31.46
     -  experts ex change -33.86
    
    penisland: 11 possibilities
     -  pen island -20.54
     -  penis land -22.64
     -  pen is land -25.06
    
    choosespain: 28 possibilities
     -  choose spain -21.17
     -  chooses pain -23.06
     -  choose spa in -29.41
    
    kidsexpress: 15 possibilities
     -  kids express -23.56
     -  kid sex press -32.65
     -  kids ex press -34.98
    
    childrenswear: 34 possibilities
     -  children swear -19.85
     -  childrens wear -25.26
     -  child ren swear -32.70
    
    dicksonweb: 8 possibilities
     -  dickson web -27.09
     -  dick son web -30.51
     -  dicks on web -33.63
    

提交回复
热议问题