I have a bunch url that has a string either has
hotel+4 digit number: hotel1234
or slash+4digit.html: /1234.html
Is there a regex to extract
You can try the REGEXP
SELECT * FROM Table WHERE ColumnName REGEXP '^[0-9]{4}$'
or
SELECT * FROM Table WHERE ColumnName REGEXP '^[[:digit:]]{4}$';
The following is a stackoverflow.com link that might be useful showing how to extract a substring from inside a string in Python?
Unfortunately, MySQL regexp simply returns true if the string exists. I have found substring_index useful if you know the text surrounding the target...
select case when ColumnName like 'hotel____' then substring_index(ColumnName,'hotel',-1)
when ColumnName like '/____.html' then substring_index(substring_index(ColumnName,'/',-1),'.html',1)
else ColumnName
end digit_extraction
from TableName
where ...;
The case statement above isn't necessary because of the way substring_index works (by returning the entire string if the search string isn't found).
select substring_index(substring_index(substring_index(ColumnName,'hotel',-1),'/',-1),'.html',1)
from TableName
where ...;