I have the data below.
I\'m only interested on program B. How do I change it into the table below using SQL syntax?
Below is my syntax but
Try this:
SELECT
CASE WHEN PATINDEX('%B[0-9][0-9]%', Program)>0 THEN SUBSTRING(Program, PATINDEX('%B[0-9][0-9]%', Program) - 1, 4)
WHEN PATINDEX('%B[0-9]%', Program)>0 THEN SUBSTRING(Program, PATINDEX('%B[0-9]%', Program) - 1, 3)
ELSE '' END
FROM DataBase1
First WHEN
is responsible for extracting pattern B[0-9][0-9]
, i.e. when B is followed by two digits, second one is for extracting B followed by one digits. Default is returning empty string, when no match is found. If you are interested in extracting pattern B followed by three digits, you need to add another when (as the first case), enter pattern B[-9][0-9][0-9]
instead of B[0-9][0-9]
and change last number from 4 to 5 (length of string that is extracted).
PATINDEX
returns position where the match is found.
If you use PostgreSql you can try next solution.
First create temp table with data:
CREATE TABLE temp.test AS (
SELECT 'A1, B1' AS program, 1 AS file_count
UNION
SELECT 'B2', 1
UNION
SELECT 'A2, B3', 1
UNION
SELECT 'B4', 1
UNION
SELECT 'A3, B5', 2
UNION
SELECT 'B6', 2
UNION
SELECT 'B7', 2
UNION
SELECT 'B8', 1
UNION
SELECT 'B9', 1
UNION
SELECT 'C1;D1;A4;B10', 1
UNION
SELECT 'C2;D2;B11', 1
UNION
SELECT 'C3,D3,A5,B12', 1
UNION
SELECT 'C4;B14;D4;B11,B13', 1
);
I suggested that in one program cell can contains several B values (last select).
After that use regexp_matches
to find all B in cell and select for each file_count
value(first inner select) and after that sum by each of program:
SELECT
b_program,
sum(file_count)
FROM (
SELECT
(SELECT regexp_matches(program, 'B\d+')) [1] AS b_program,
file_count
FROM temp.test
WHERE upper(program) LIKE '%B%') bpt
GROUP BY b_program
ORDER BY b_program;