_Actual_ Twitter format for hashtags? Not your regex, not his code— the actual one?

前端 未结 6 585
野趣味
野趣味 2021-02-07 14:27

Update: Use Twitter\'s Entities if you can- they figured it out for you as well as other items. My case is that I just have the tweet without entities and all the extra metadat

6条回答
  •  醉梦人生
    2021-02-07 14:48

    From the starting point of twitter's support the basic rules seems to be that hashtags must be preceded by a space and stop on any whitespace or punctuation.


    Quote from Twitter's support:

    Check your hashtags for the following:

    • Is there any symbol in or after the hashtag?
      • If you write #noican't, your message will be categorized under #noican. Punctuation marks ( , . ; ' ? ! etc.) will end your hashtag wherever punctuation occurs.
    • Is there any letter preceding the #symbol?
      • If you write 23#idoittoo or word#idoittoo, your Tweets will not show in searches for the hashtag #idoittoo. Hashtags will not work with letters or numbers in front of the # symbol. The # symbol must have a space directly in front of it in order for it to show correctly in searches.

    Therefore, the initial token is # preceded by a space, and the terminator is any whitespace or punctuation. The "etc" in their list of punctuation (" , . ; ' ? ! etc.") is annoying, but I'll keep digging and see if I can find something authoritative on what else counts as punctuation.

    After digging around a while, I found some interesting blog articles by Terence Eden (Hashtags and Implicit Knowledge, Hashtag Standards) that provide evidence that Twitter doesn't even have a standard, given that the software it develops on different platforms seems to have different rules of what constitutes a hashtag.

    It also provided a link to the Twitter Conformance Library, which has twitter / twitter-text-conformance / autolink.yml. The hashtag section in autolink.yml has many cases matching the above rules, but also some that violate them are are still supposed to be autolinked. Some examples:

    - description: "DO NOT Autolink all-numeric hashtags"
      text: "text #1234"
      expected: "text #1234"
    
    - description: "Autolink hashtag preceded by a period"
      text: "text.#hashtag"
      expected: "text.#hashtag"
    
    - description: "Autolink hashtag with full-width hash (U+FF03)"
      text: "#hashtag"
      expected: "#hashtag"
    

    Those are just a few examples that don't match the basic rules given in the first support article, and unfortunately the yml is full of other examples as well.

提交回复
热议问题