From the starting point of twitter's support the basic rules seems to be that hashtags must be preceded by a space and stop on any whitespace or punctuation.
Quote from Twitter's support:
Check your hashtags for the following:
Therefore, the initial token is #
preceded by a space, and the terminator is any whitespace or punctuation. The "etc" in their list of punctuation (" , . ; ' ? ! etc.") is annoying, but I'll keep digging and see if I can find something authoritative on what else counts as punctuation.
After digging around a while, I found some interesting blog articles by Terence Eden (Hashtags and Implicit Knowledge, Hashtag Standards) that provide evidence that Twitter doesn't even have a standard, given that the software it develops on different platforms seems to have different rules of what constitutes a hashtag.
It also provided a link to the Twitter Conformance Library, which has twitter / twitter-text-conformance / autolink.yml. The hashtag
section in autolink.yml has many cases matching the above rules, but also some that violate them are are still supposed to be autolinked. Some examples:
- description: "DO NOT Autolink all-numeric hashtags"
text: "text #1234"
expected: "text #1234"
- description: "Autolink hashtag preceded by a period"
text: "text.#hashtag"
expected: "text.#hashtag"
- description: "Autolink hashtag with full-width hash (U+FF03)"
text: "#hashtag"
expected: "#hashtag"
Those are just a few examples that don't match the basic rules given in the first support article, and unfortunately the yml
is full of other examples as well.