I am wondering if it is possible to remove duplicate text using a mysql query from one field, or if a problem like this would be better solved using PHP.
I have a databa
The model you describe (all tags into a single cell, separated by spaces) is not normalized so you can't expect to find a simple, performant and reliable way to do stuff with it from the database server (beyond reading the column). The way it's now, PHP is your only chance to do the cleanup you are planning to do, and you'll have to retrieve every row.
Is it too late to make a little change in the database design? If you store each tag into a separate row in a tag
table you'd be able to do lots of stuff from plain SQL.
You may consider keeping one entry per tag instead of all tags as a string, so that you could do a select distinct
among other things.
Change your database design. I don't know about your time constraints so it may really not be an option, but consider which of these two paths you'd rather go down:
Let Sentence = the string of words.
Split Sentence up on every space and build an array out of it*. Store this as Words.
Let UniqueWords = an array of words with no duplicates.
For each Word in Words:
If the Word is not in UniqueWords, put it in.
*a la PHP explode
You could also process it as a raw string (stopping to check at spaces or EOL), which may be faster, but if speed is important, your current database design should be far more concerning than this loop.
EDIT: I didn't see that you wanted it in a SQL query. I'm not sure it'd be possible using a query; perhaps a stored procedure will do. I don't know how to use those though.
Here is another version,you generate a large enough number of rows so you can CROSS JOIN progressively for each word then just GROUP_CONCAT will once again concatenate the separate words with the added DISTINCT parameter.
A primary or unqiue key to group by would be better in case of identical rows.
SELECT GROUP_CONCAT(DISTINCT SUBSTRING_INDEX(SUBSTRING_INDEX(t.col, ' ', x.cifre), ' ', -1)) AS words
FROM t
INNER JOIN
(
SELECT 1 + a.i + b.i * 10 cifre, b.i + a.i * 10 sute
FROM (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) a
CROSS JOIN (SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) b
) x
ON (LENGTH(t.col) +1 - LENGTH(REPLACE(t.col, ' ', ''))) >= x.cifre
GROUP BY col
FIDDLE
IMO, you're best to handle this with PHP
$uniqueTags = array_unique(explode(' ', $tagsFromDbColumn));
+1 redesign, but if redesign is not an option now...
How many distinct tags are there? You might be able to do this using CASE and substring functions.
http://dev.mysql.com/doc/refman/5.0/en/case-statement.html