问题
I need to split a narrative field (free text) into multiple rows. Format is currently along the lines of:
Case_Reference | Narrative
```````````````|`````````````````````````````````````
XXXX/XX-123456 | [Endless_Text up to ~50k characters]
Within the narrative field as text, individual entries (when various agents have done something to the case) begin with the entry date followed by two spaces (i.e. 'dd/mm/yyyy '
), with the values of the dates changing with each entry within that same field.
In other words, after trawling for a better delimiter, the only one I can use is this format of string, so I need to identify multiple positions within the Narrative text where the format (would mask be a better word?) matches 'dd/mm/yyyy '
.
I can identify multiple occurrences of a consistent string no problem, but it's identifying it where I'm essentially looking for:
'%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %'
PATINDEX
of course returns the first occurrence/position of this, but so far as I'm aware, there's no way to "modify" this (i.e. a created function) to allow for picking up the rest of the occurrences/positions of this they way we can with CHARINDEX
(since PATINDEX
doesn't have a starting position parameter).
For clarity, I'm not looking for code to delimit this directly as I need to further manipulate each entry, so it's purely the positions of multiple occurrences of the string within the Narrative text I'm looking for.
Any help would be very much appreciated.
For clarity, there's no option to do this pre-import, so it needs to be done on this landed data.
Desired output would be
Case_Reference1 | 1st_Position_of_Delimiter_String
Case_Reference1 | 2nd_Position_of_Delimiter_String
Case_Reference2 | 1st_Position_of_Delimiter_String
Case_Reference2 | 2nd_Position_of_Delimiter_String
Case_Reference2 | 3rd_Position_of_Delimiter_String
回答1:
You might solve this with an recursive CTE
DECLARE @tbl TABLE (Case_Reference NVARCHAR(MAX),Narrative NVARCHAR(MAX));
INSERT INTO @tbl VALUES
(N'C1',N'01/02/2000 Some text with blanks 02/03/2000 More text 03/04/2000 An even more')
,(N'C2',N'01/02/2000 Test for C2 02/03/2000 One more for C2 03/04/2000 An even more 04/05/2000 Blah')
,(N'C3',N'01/02/2000 Test for C3 02/03/2000 One more for C3 03/04/2000 An even more')
;
WITH recCTE AS
(
SELECT 1 AS Step,Case_Reference,Narrative,CAST(1 AS BIGINT) AS StartsAt,NewPos.EndsAt+10 AS EndsAt,LEN(Narrative) AS MaxLen
,SUBSTRING(Narrative,NewPos.EndsAt+10+1,999999) AS RestString
FROM @tbl AS tbl
CROSS APPLY(SELECT PATINDEX('%[0-3][0-9]/[0-1][0-9]/[1-2][0-9][0-9][0-9] %',SUBSTRING(Narrative,12,9999999))) AS NewPos(EndsAt)
UNION ALL
SELECT r.Step+1,r.Case_Reference,r.Narrative,r.EndsAt+1,CASE WHEN NewPos.EndsAt>0 THEN r.EndsAt+NewPos.EndsAt+10 ELSE r.MaxLen END,r.MaxLen
,SUBSTRING(r.RestString,NewPos.EndsAt+10+1,999999)
FROM recCTE AS r
CROSS APPLY(SELECT PATINDEX('%[0-3][0-9]/[0-1][0-9]/[1-2][0-9][0-9][0-9] %',SUBSTRING(r.RestString,12,99999999))) AS NewPos(EndsAt)
WHERE r.EndsAt<r.MaxLen
)
SELECT Step,Case_Reference,StartsAt,EndsAt
,SUBSTRING(Narrative,StartsAt,EndsAt-StartsAt+1) AS OutputString
FROM recCTE
ORDER BY Case_Reference,Step
The result
+------+----------------+----------+--------+---------------------------------------+
| Step | Case_Reference | StartsAt | EndsAt | OutputString |
+------+----------------+----------+--------+---------------------------------------+
| 1 | C1 | 1 | 38 | 01/02/2000 Some text with blanks |
+------+----------------+----------+--------+---------------------------------------+
| 2 | C1 | 39 | 60 | 02/03/2000 More text |
+------+----------------+----------+--------+---------------------------------------+
| 3 | C1 | 61 | 84 | 03/04/2000 An even more |
+------+----------------+----------+--------+---------------------------------------+
| 1 | C2 | 1 | 24 | 01/02/2000 Test for C2 |
+------+----------------+----------+--------+---------------------------------------+
| 2 | C2 | 25 | 52 | 02/03/2000 One more for C2 |
+------+----------------+----------+--------+---------------------------------------+
| 3 | C2 | 53 | 77 | 03/04/2000 An even more |
+------+----------------+----------+--------+---------------------------------------+
| 4 | C2 | 78 | 93 | 04/05/2000 Blah |
+------+----------------+----------+--------+---------------------------------------+
| 1 | C3 | 1 | 24 | 01/02/2000 Test for C3 |
+------+----------------+----------+--------+---------------------------------------+
| 2 | C3 | 25 | 52 | 02/03/2000 One more for C3 |
+------+----------------+----------+--------+---------------------------------------+
| 3 | C3 | 53 | 76 | 03/04/2000 An even more |
+------+----------------+----------+--------+---------------------------------------+
回答2:
Try this recursive cte
declare @t table
(
caseref varchar(20),
narrative varchar(max)
)
insert into @t values('Case_Reference1', 'blah 10/11/2016 something 13/11/2016 something else');
insert into @t values('Case_Reference2', '11/11/2016 something 12/11/2016 something else 14/11/2016 something yet still');
insert into @t values('Case_Reference3', 'should find nothing');
with cte (caseref, pos, remainingstr) as
(
select caseref,
patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', narrative),
substring(narrative, patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', narrative) + 12, len(narrative) - 12 - patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', narrative))
from @t
where patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', narrative) > 0
union all
select caseref,
pos + 12 + patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', remainingstr),
substring(remainingstr, patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', remainingstr) + 12, len(remainingstr) - 12 - patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', remainingstr))
from cte
where patindex('%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %', remainingstr) > 0
)
select caseref, pos
from cte
order by caseref, pos
来源:https://stackoverflow.com/questions/40738970/sql-identify-positions-of-multiple-occurrences-of-a-string-format-within-a-fie