I\'m really stumped on how Twitter expects users of its API to convert the plaintext tweets it sends to properly linked HTML.
Here\'s the deal: Twitter\'s JSON API sends
Ok so I needed to do exactly this and I solved it. Here is the function I wrote. https://gist.github.com/3337428
function parse_message( &$tweet ) {
if ( !empty($tweet['entities']) ) {
$replace_index = array();
$append = array();
$text = $tweet['text'];
foreach ($tweet['entities'] as $area => $items) {
$prefix = false;
$display = false;
switch ( $area ) {
case 'hashtags':
$find = 'text';
$prefix = '#';
$url = 'https://twitter.com/search/?src=hash&q=%23';
break;
case 'user_mentions':
$find = 'screen_name';
$prefix = '@';
$url = 'https://twitter.com/';
break;
case 'media':
$display = 'media_url_https';
$href = 'media_url_https';
$size = 'small';
break;
case 'urls':
$find = 'url';
$display = 'display_url';
$url = "expanded_url";
break;
default: break;
}
foreach ($items as $item) {
if ( $area == 'media' ) {
// We can display images at the end of the tweet but sizing needs to added all the way to the top.
// $append[$item->$display] = "<img src=\"{$item->$href}:$size\" />";
}else{
$msg = $display ? $prefix.$item->$display : $prefix.$item->$find;
$replace = $prefix.$item->$find;
$href = isset($item->$url) ? $item->$url : $url;
if (!(strpos($href, 'http') === 0)) $href = "http://".$href;
if ( $prefix ) $href .= $item->$find;
$with = "<a href=\"$href\">$msg</a>";
$replace_index[$replace] = $with;
}
}
}
foreach ($replace_index as $replace => $with) $tweet['text'] = str_replace($replace,$with,$tweet['text']);
foreach ($append as $add) $tweet['text'] .= $add;
}
}
Jeff's solution worked well with English text but it got broken when the tweet contained non-ASCII characters. This solution avoids that problem:
mb_internal_encoding("UTF-8");
// Return hyperlinked tweet text from json_decoded status object:
function MakeStatusLinks($status)
{$TextLength=mb_strlen($status['text']); // Number of UTF-8 characters in plain tweet.
for ($i=0;$i<$TextLength;$i++)
{$ch=mb_substr($status['text'],$i,1); if ($ch<>"\n") $ChAr[]=$ch; else $ChAr[]="\n<br/>"; // Keep new lines in HTML tweet.
}
if (isset($status['entities']['user_mentions']))
foreach ($status['entities']['user_mentions'] as $entity)
{$ChAr[$entity['indices'][0]] = "<a href='https://twitter.com/".$entity['screen_name']."'>".$ChAr[$entity['indices'][0]];
$ChAr[$entity['indices'][1]-1].="</a>";
}
if (isset($status['entities']['hashtags']))
foreach ($status['entities']['hashtags'] as $entity)
{$ChAr[$entity['indices'][0]] = "<a href='https://twitter.com/search?q=%23".$entity['text']."'>".$ChAr[$entity['indices'][0]];
$ChAr[$entity['indices'][1]-1] .= "</a>";
}
if (isset($status['entities']['urls']))
foreach ($status['entities']['urls'] as $entity)
{$ChAr[$entity['indices'][0]] = "<a href='".$entity['expanded_url']."'>".$entity['display_url']."</a>";
for ($i=$entity['indices'][0]+1;$i<$entity['indices'][1];$i++) $ChAr[$i]='';
}
if (isset($status['entities']['media']))
foreach ($status['entities']['media'] as $entity)
{$ChAr[$entity['indices'][0]] = "<a href='".$entity['expanded_url']."'>".$entity['display_url']."</a>";
for ($i=$entity['indices'][0]+1;$i<$entity['indices'][1];$i++) $ChAr[$i]='';
}
return implode('', $ChAr); // HTML tweet.
}
Here is an updated answer that works with Twitter's new Extended Mode. It combines the answer by @vita10gy and the comment by @Hugo (to make it utf8 compatible), with a few minor tweaks to work with the new api values.
function utf8_substr_replace($original, $replacement, $position, $length) {
$startString = mb_substr($original, 0, $position, "UTF-8");
$endString = mb_substr($original, $position + $length, mb_strlen($original), "UTF-8");
$out = $startString . $replacement . $endString;
return $out;
}
function json_tweet_text_to_HTML($tweet, $links=true, $users=true, $hashtags=true) {
// Media urls can show up on the end of the full_text tweet, but twitter doesn't index that url.
// The display_text_range indexes show the actual tweet text length.
// Cut the string off at the end to get rid of this unindexed url.
$return = mb_substr($tweet->full_text, $tweet->display_text_range[0],$tweet->display_text_range[1]);
$entities = array();
if($links && is_array($tweet->entities->urls))
{
foreach($tweet->entities->urls as $e)
{
$temp["start"] = $e->indices[0];
$temp["end"] = $e->indices[1];
$temp["replacement"] = " <a href='".$e->expanded_url."' target='_blank'>".$e->display_url."</a>";
$entities[] = $temp;
}
}
if($users && is_array($tweet->entities->user_mentions))
{
foreach($tweet->entities->user_mentions as $e)
{
$temp["start"] = $e->indices[0];
$temp["end"] = $e->indices[1];
$temp["replacement"] = " <a href='https://twitter.com/".$e->screen_name."' target='_blank'>@".$e->screen_name."</a>";
$entities[] = $temp;
}
}
if($hashtags && is_array($tweet->entities->hashtags))
{
foreach($tweet->entities->hashtags as $e)
{
$temp["start"] = $e->indices[0];
$temp["end"] = $e->indices[1];
$temp["replacement"] = " <a href='https://twitter.com/hashtag/".$e->text."?src=hash' target='_blank'>#".$e->text."</a>";
$entities[] = $temp;
}
}
usort($entities, function($a,$b){return($b["start"]-$a["start"]);});
foreach($entities as $item)
{
$return = utf8_substr_replace($return, $item["replacement"], $item["start"], $item["end"] - $item["start"]);
}
return($return);
}
All you have to do to use the indices twitter provides straight up with a simple replace is collect the replacements you want to make and then sort them backwards. You can probably find a more clever way to build $entities, I wanted them optional anyway, so I KISS as far as that went.
Either way, my point here was just to show that you don't need to explode the string and character count and whatnot. Regardless of how you do it, all you need to to is start at the end and work to the beginning of the string, and the index twitter has is still valid.
<?php
function json_tweet_text_to_HTML($tweet, $links=true, $users=true, $hashtags=true)
{
$return = $tweet->text;
$entities = array();
if($links && is_array($tweet->entities->urls))
{
foreach($tweet->entities->urls as $e)
{
$temp["start"] = $e->indices[0];
$temp["end"] = $e->indices[1];
$temp["replacement"] = "<a href='".$e->expanded_url."' target='_blank'>".$e->display_url."</a>";
$entities[] = $temp;
}
}
if($users && is_array($tweet->entities->user_mentions))
{
foreach($tweet->entities->user_mentions as $e)
{
$temp["start"] = $e->indices[0];
$temp["end"] = $e->indices[1];
$temp["replacement"] = "<a href='https://twitter.com/".$e->screen_name."' target='_blank'>@".$e->screen_name."</a>";
$entities[] = $temp;
}
}
if($hashtags && is_array($tweet->entities->hashtags))
{
foreach($tweet->entities->hashtags as $e)
{
$temp["start"] = $e->indices[0];
$temp["end"] = $e->indices[1];
$temp["replacement"] = "<a href='https://twitter.com/hashtag/".$e->text."?src=hash' target='_blank'>#".$e->text."</a>";
$entities[] = $temp;
}
}
usort($entities, function($a,$b){return($b["start"]-$a["start"]);});
foreach($entities as $item)
{
$return = substr_replace($return, $item["replacement"], $item["start"], $item["end"] - $item["start"]);
}
return($return);
}
?>
Here is a JavaScript version (using jQuery) of vita10gy's solution
function tweetTextToHtml(tweet, links, users, hashtags) {
if (typeof(links)==='undefined') { links = true; }
if (typeof(users)==='undefined') { users = true; }
if (typeof(hashtags)==='undefined') { hashtags = true; }
var returnStr = tweet.text;
var entitiesArray = [];
if(links && tweet.entities.urls.length > 0) {
jQuery.each(tweet.entities.urls, function() {
var temp1 = {};
temp1.start = this.indices[0];
temp1.end = this.indices[1];
temp1.replacement = '<a href="' + this.expanded_url + '" target="_blank">' + this.display_url + '</a>';
entitiesArray.push(temp1);
});
}
if(users && tweet.entities.user_mentions.length > 0) {
jQuery.each(tweet.entities.user_mentions, function() {
var temp2 = {};
temp2.start = this.indices[0];
temp2.end = this.indices[1];
temp2.replacement = '<a href="https://twitter.com/' + this.screen_name + '" target="_blank">@' + this.screen_name + '</a>';
entitiesArray.push(temp2);
});
}
if(hashtags && tweet.entities.hashtags.length > 0) {
jQuery.each(tweet.entities.hashtags, function() {
var temp3 = {};
temp3.start = this.indices[0];
temp3.end = this.indices[1];
temp3.replacement = '<a href="https://twitter.com/hashtag/' + this.text + '?src=hash" target="_blank">#' + this.text + '</a>';
entitiesArray.push(temp3);
});
}
entitiesArray.sort(function(a, b) {return b.start - a.start;});
jQuery.each(entitiesArray, function() {
returnStr = substrReplace(returnStr, this.replacement, this.start, this.end - this.start);
});
return returnStr;
}
You can then use this function like so ...
for(var i in tweetsJsonObj) {
var tweet = tweetsJsonObj[i];
var htmlTweetText = tweetTextToHtml(tweet);
// Do something with the formatted tweet here ...
}
It's an edge case but the use of str_replace() in Styledev's answer could cause issues if one entity is contained within another. For example, "I'm a genius! #me #mensa" could become "I'm a genius! #me #mensa" if the shorter entity is substituted first.
This solution avoids that problem:
<?php
/**
* Hyperlinks hashtags, twitter names, and urls within the text of a tweet
*
* @param object $apiResponseTweetObject A json_decoded() one of these: https://dev.twitter.com/docs/platform-objects/tweets
* @return string The tweet's text with hyperlinks added
*/
function linkEntitiesWithinText($apiResponseTweetObject) {
// Convert tweet text to array of one-character strings
// $characters = str_split($apiResponseTweetObject->text);
$characters = preg_split('//u', $apiResponseTweetObject->text, null, PREG_SPLIT_NO_EMPTY);
// Insert starting and closing link tags at indices...
// ... for @user_mentions
foreach ($apiResponseTweetObject->entities->user_mentions as $entity) {
$link = "https://twitter.com/" . $entity->screen_name;
$characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]];
$characters[$entity->indices[1] - 1] .= "</a>";
}
// ... for #hashtags
foreach ($apiResponseTweetObject->entities->hashtags as $entity) {
$link = "https://twitter.com/search?q=%23" . $entity->text;
$characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]];
$characters[$entity->indices[1] - 1] .= "</a>";
}
// ... for http://urls
foreach ($apiResponseTweetObject->entities->urls as $entity) {
$link = $entity->expanded_url;
$characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]];
$characters[$entity->indices[1] - 1] .= "</a>";
}
// ... for media
foreach ($apiResponseTweetObject->entities->media as $entity) {
$link = $entity->expanded_url;
$characters[$entity->indices[0]] = "<a href=\"$link\">" . $characters[$entity->indices[0]];
$characters[$entity->indices[1] - 1] .= "</a>";
}
// Convert array back to string
return implode('', $characters);
}
?>