问题
I am using sparq sql. Let's say this is a snapshot of my big table:
ups store
ups store austin
ups store chicago
ups store bern
walmart
target
How can I find the longest prefix for the above data in sql? That is:
ups store
walmart
target
I already have a Java program to do this but I have a large file, now my question is if this could be reasonably done in SQL?
How about the following more complicated scnenario? (I can live without this but nice to have it if possible)
ups store austin
ups store chicago
ups store bern
walmart
target
and that would return [ups store, walmart, target]
.
回答1:
Assuming you're free to create another table that simply has a list of ascending integers from zero up to the size of the longest possible string then the following should do the job using only ANSI SQL:
SELECT
id,
SUBSTRING(name, 1, CASE WHEN number = 0 THEN LENGTH(name) ELSE number END) AS prefix
FROM
-- Join all places to all possible substring lengths.
(SELECT *
FROM places p
CROSS JOIN lengths l) subq
-- If number is zero then no prefix match was found elsewhere
-- (from the question it looked like you wanted to include these)
WHERE (subq.number = 0 OR
-- Look for prefix match elsewhere
EXISTS (SELECT * FROM places p
WHERE SUBSTRING(p.name FROM 1 FOR subq.number)
= SUBSTRING(subq.name FROM 1 FOR subq.number)
AND p.id <> subq.id))
-- Include as a prefix match if the whole string is being used
AND (subq.number = LENGTH(name)
-- Don't include trailing spaces in a prefix
OR (SUBSTRING(subq.name, subq.number, 1) <> ' '
-- Only include the longest prefix match
AND NOT EXISTS (SELECT * FROM places p
WHERE SUBSTRING(p.name FROM 1 FOR subq.number + 1)
= SUBSTRING(subq.name FROM 1 FOR subq.number + 1)
AND p.id <> subq.id)))
ORDER BY id;
Live demo: http://rextester.com/XPNRP24390
The second aspect is that what if we have (ups store austin, ups store chicago). can we use SQL to extract the 'ups store' off of it.
This should be simply a case of using SUBSTRING
in a similar way to above, e.g:
SELECT SUBSTRING(name,
LENGTH('ups store ') + 1,
LENGTH(name) - LENGTH('ups store '))
FROM places
WHERE SUBSTRING(name,
1,
LENGTH('ups store ')) = 'ups store ';
回答2:
Supposing your column name is "mycolumn", and your big table is "mytable", and a single space is your field separator:
In PostgreSQL, you could do something as simple as this:
select
mycolumn
from
mytable
order by
length(split_part(mycolumn, ' ', 1)) desc
limit
1
If you ran this query often, I'd probably try an ordered functional index on the table like this:
create prefix_index on mytable (length(split_part(mycolumn, ' ', 1)) desc)
来源:https://stackoverflow.com/questions/42079042/how-to-query-text-to-find-the-longest-prefix-strings-in-sql