I can use the MySQL TRIM()
method to cleanup fields containing leading or trailing whitespace with an UPDATE
like so:
UPDATE Foo SE
Here is an example with RegEx
SELECT *
FROM
`foo`
WHERE
(name REGEXP '(^[[:space:]]|[[:space:]]$)')
Another solution could be using SUBSTRING() and IN
to compare the last and first characters of the string with a list of whitespace charaters...
(SUBSTRING(@s, 1, 1) IN (' ', '\t', '\n', '\r') OR SUBSTRING(@s, -1, 1) IN (' ', '\t', '\n', '\r'))
...where @s
is any input string. Add additional whitespace characters to the comparison list as needed in your case.
Here's a simple test to demonstrate how that expression behaves with various string inputs:
SET @s_normal = 'x';
SET @s_ws_leading = '\tx';
SET @s_ws_trailing = 'x ';
SET @s_ws_both = '\rx ';
SELECT
NOT(SUBSTRING(@s_normal, 1, 1) IN (' ', '\t', '\n', '\r') OR SUBSTRING(@s_normal, -1, 1) IN (' ', '\t', '\n', '\r')) test_normal #=> 1 (PASS)
, (SUBSTRING(@s_ws_leading, 1, 1) IN (' ', '\t', '\n', '\r') OR SUBSTRING(@s_ws_leading, -1, 1) IN (' ', '\t', '\n', '\r')) test_ws_leading #=> 1 (PASS)
, (SUBSTRING(@s_ws_trailing, 1, 1) IN (' ', '\t', '\n', '\r') OR SUBSTRING(@s_ws_trailing,-1, 1) IN (' ', '\t', '\n', '\r')) test_ws_trailing #=> 1 (PASS)
, (SUBSTRING(@s_ws_both, 1, 1) IN (' ', '\t', '\n', '\r') OR SUBSTRING(@s_ws_both, -1, 1) IN (' ', '\t', '\n', '\r')) test_ws_both #=> 1 (PASS)
;
If this is something you'll be doing a lot you could also create a function for it:
DROP FUNCTION IF EXISTS has_leading_or_trailing_whitespace;
CREATE FUNCTION has_leading_or_trailing_whitespace(s VARCHAR(2000))
RETURNS BOOLEAN
DETERMINISTIC
RETURN (SUBSTRING(s, 1, 1) IN (' ', '\t', '\n', '\r') OR SUBSTRING(s, -1, 1) IN (' ', '\t', '\n', '\r'))
;
# test
SELECT
NOT(has_leading_or_trailing_whitespace(@s_normal )) #=> 1 (PASS)
, has_leading_or_trailing_whitespace(@s_ws_leading ) #=> 1 (PASS)
, has_leading_or_trailing_whitespace(@s_ws_trailing) #=> 1 (PASS)
, has_leading_or_trailing_whitespace(@s_ws_both ) #=> 1 (PASS)
;
SELECT *
FROM
`foo`
WHERE
(name LIKE ' %')
OR
(name LIKE '% ')
As documented under The CHAR and VARCHAR Types:
All MySQL collations are of type
PADSPACE
. This means that allCHAR
andVARCHAR
values in MySQL are compared without regard to any trailing spaces.
In the definition of the LIKE operator, the manual states:
In particular, trailing spaces are significant, which is not true for CHAR or VARCHAR comparisons performed with the = operator:
As mentioned in this answer:
This behavior is specified in SQL-92 and SQL:2008. For the purposes of comparison, the shorter string is padded to the length of the longer string.
From the draft (8.2 <comparison predicate>):
If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD characteristic, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a <space>.
One solution:
SELECT * FROM Foo WHERE CHAR_LENGTH(field) != CHAR_LENGTH(TRIM(field))