substring regex for first part of url

穿精又带淫゛_ 提交于 2019-12-08 11:33:25

问题


I've got a large database of projects and issue trackers, some of which have urls.

I'd like to query it to figure out a list of urls for each project, but many have extra data I'd like to avoid.

I'd like to do something like this:

substring(tracker_extra_field_data.field_data FROM 'http://([^/]*).*')

Except some urls are https, and I'd like to capture that as well as the first sub directory.

For example, given the url:

https://dev.foo.com/bar/action/?param=val

I'd like the select to return:

https://dev.foo.com/bar/

Is there a semi-simple way to do this with substring/regex in pgsql?


回答1:


try this:

select substring('https://dev.foo.com/bar/action/?param=val' from '(https?://([^/]*/){1,2})');

template1=# select substring('https://dev.foo.com/bar/action/?param=val' from '(https?://([^/]*/){1,2})');
        substring
-------------------------
 https://dev.foo.com/bar/
(1 row)

template1=# select substring('http://dev.foo.com/bar/action/?param=val' from '(https?://([^/]*/){1,2})');
       substring
------------------------
 http://dev.foo.com/bar/



回答2:


Updated after I didn't read the Q properly at first.

Use the pattern

^https?://[^/]+(?:/[^/]+)?/?

^ .. start of string
? .. zero or one atoms
(?:) .. non-capturing parens
[^/]+ .. any character except /, 1 or more of them

This only accepts URLs starting with http:// or https:// (protocol header required).

->SQLfiddle with a bigger test case.



来源:https://stackoverflow.com/questions/17750657/substring-regex-for-first-part-of-url

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!