How to use regexp on the results of a sub query?

喜夏-厌秋 提交于 2019-12-08 18:29:24

Try one of these queries:

SELECT a.phone_no
FROM admission a
JOIN users u on a.phone_no LIKE concat(u.phone_no, '__')
WHERE u.phone_no REGEXP  '^(99)+[0-9]+$'

or

SELECT a.phone_no
FROM admission a
JOIN users u on a.phone_no REGEXP concat('^', u.phone_no, '[0-9]{2}$')
WHERE u.phone_no REGEXP  '^(99)+[0-9]+$'

If the number of "trailing digits" is not fixed, you can also use:

LIKE concat(u.phone_no, '%')

or

REGEXP concat('^', u.phone_no, '[0-9]*$')

But in this case you might need to use SELECT DISTICT a.phone_no if it is possible that a users.phone_no is a subsequence of an other users.phone_no (e.g. 99123 and 991234).

Update

After running some tests with 10K rows for users table and 100K rows for admission table i came to the following query:

SELECT a.phone_no
FROM admission a
JOIN users u 
    ON  a.phone_no >= u.phone_no
    AND a.phone_no < CONCAT(u.phone_no, 'z')
    AND a.phone_no LIKE CONCAT(u.phone_no, '%')
    AND a.phone_no REGEXP CONCAT('^', u.phone_no, '[0-9]*$')
WHERE   u.phone_no LIKE  '99%'
    AND u.phone_no REGEXP  '^(99)+[0-9]*$'
UNION SELECT 0 FROM (SELECT 0) dummy WHERE 0

fiddle

This way you can use REGEXP and still have great performance. This query executes almost instantly in my test case.

Logically you only need the REGEXP conditions. But on bigger tables the query might time out. Using a LIKE condition will filter the result set before REGEXP check. But even using LIKE the query doesn't perform very well. For some reason MySQL doesn't use a range check for the join. So i added an explicit range check:

    ON  a.phone_no >= u.phone_no
    AND a.phone_no < CONCAT(u.phone_no, 'z')

With this check you can remove the LIKE condition from the JOIN part.

The UNION part is a replacement for DISTICT. MySQL seems to translate DISTINCT into a GROUP BY statement, which doesn't perform well. Using UNION with an empty result set i force MySQL to remove duplicates after the SELECT. You can remove that line, if you use a fixed number of trailing digits.

You can adjust the REGEXP patterns to your needs:

...
    AND a.phone_no REGEXP CONCAT('^', u.phone_no, '[0-9]{2}$')
...
    AND u.phone_no REGEXP  '^(99)+[0-9]{8}$'
...

If you only need REGEXP to check the length of the phone_no, you can also use a LIKE condition with the '_' placeholder.

    AND a.phone_no LIKE CONCAT(u.phone_no, '__')
...
    AND u.phone_no LIKE '99________$'

or combine a LIKE condition with a STR_LENGTH check.

I think this does what you want, I did some improvements (SQLfiddle):

select * from admission a where exists (
  select * from (
     select substr(phone_no, 1, 7) pn from users where phone_no REGEXP '^99[0-9]{5}'
  ) o where a.phone_no like concat(o.pn, '%')
)

I had to modify the regex to get any matches. If the length is fixed the second check can easily be done with like. We look in the user table to see if there exists any phone_no that matches the criteria for the admission number we are currently looking at.

Never mind regex. Do a simple join using like

select distinct a.phone_no
from user u
join admission a on a.phone_no like concat(u.phone_no, '%')
where u.phone_no like '99%'

The distinct keyword is only needed if there are either duplicate numbers in the admission table, and/or in the user table. Otherwise, it can be omitted.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!