问题
I currently have a table called "RESOURCES" with a keywords field called "RES_Tags". The "RES_Tags" field contains a comma-delimited list of keywords for each record.
I need to normalize this table/field.
I have already set up the following tables: TAGS, TAGS_TO_RESOURCES.
Please see the schema here: http://sqlfiddle.com/#!9/edac4/1
What is a query that will allow me to parse the keywords in RES_Tags, write them into the TAGS table without creating duplicates and then write a listing in the TAGS_TO_RESOURCES table?
回答1:
Please copy your code into the actual posting, and provide the code you've tried to use to solve the problem.
The substring_index function returns a portion of a string with some delimiter (here a comma), and when a negative index is passed it starts searching for matches from the opposite side, so -1
grabs one item from what would otherwise be multi-item lists (for index>=2).
Per our discussion, I've tweaked how I did this and showed an example of using auto-increment. (This is run in the 'build schema' part of fiddle.)
create table TAGS
(`T_ID` int auto_increment primary key, `T_Name` varchar(18))
;
insert ignore into TAGS (T_Name)
SELECT
SUBSTRING_INDEX(RES_Tags, ',', 1) as X
FROM RESOURCES
;
insert ignore into TAGS (T_Name)
SELECT
SUBSTRING_INDEX(
SUBSTRING_INDEX(RES_Tags, ',', 2)
,',',-1)
FROM RESOURCES
;
insert ignore into TAGS (T_Name)
SELECT
SUBSTRING_INDEX(
SUBSTRING_INDEX(RES_Tags, ',', 3)
,',',-1) as X
FROM RESOURCES
;
insert ignore into TAGS (T_Name)
SELECT
SUBSTRING_INDEX(
SUBSTRING_INDEX(RES_Tags, ',', 4)
,',',-1) as X
FROM RESOURCES
;
insert ignore into TAGS (T_Name)
SELECT
SUBSTRING_INDEX(
SUBSTRING_INDEX(RES_Tags, ',', 5)
,',',-1) as X
FROM RESOURCES
;
insert ignore into TAGS (T_Name)
SELECT
SUBSTRING_INDEX(
SUBSTRING_INDEX(RES_Tags, ',', 6)
,',',-1) as X
FROM RESOURCES
;
create table New_TAGS like TAGS;
insert into New_TAGS (T_Name)
select distinct trim(T_Name)
from TAGS;
drop table TAGS;
rename table NEW_TAGS to TAGS;
documentation of the substring function Possible duplication of this question
回答2:
- based on
RESOURCES.RES_tags
create set ofINSERT ... INTO TAGS ...
statements. Prevent duplicating either withUNIQUE
constraint inTAGS
andON DUPLICATE KEY ...
or usingINSERT ... SELECT ... NOT EXISTS()
:
a) append on the fly some character to the start of RES_tags
and different character to the end(say - to start, + to end) - but don't save it back into DB(a,b,c
would transform into -a,b,c+
)
b) replace on the fly each ',' into ending previous INSERT
statement and starting next one; replace '-' with starting only, '+; with ending part only(e.g. -
is replaced with insert into tags(tag) values("
, +
becomes '")
and ,
would be "), ("
- but for keeping them unique it will be required to add something mentioned in step #1)
execute SQL generated by #1(e.g.
insert into tags(tag) values("a"), ("b"), ("c")
)link entity with tags using:
INSERT INTO TAGS_TO_RESOURCES(resource_id, tag_id) SELECT RESOURCES.id, TAGS.id FROM RESOURCES INNER JOIN TAGS ON INSTR(CONCAT(',', RESOURCES.RES.tags, ','), CONCAT(',', TAGS.tag, ','))> 0
来源:https://stackoverflow.com/questions/47189346/mysql-normalizing-a-comma-delimited-field