I have data which looks like this:
ID post_author post_title guid 3309 21 Should somebody not yet on SQL 2008 wait for SQL 2008 R2, since it\'s near release? h
SELECT COUNT(POST_AUTHOR) AS AUTHOR_COUNT, GUID FROM TABLE_NAME GROUP BY GUID
The problem is how to extract the root part of the URL. If we can be sure that every URL will have at least 3 slashes, this will work, using substring_index
select substring_index(guid,'/',3) as site, count(id) as authors from table
group by substring_index(guid,'/',3)
Of course, if you add an extra column with the site only at insert time, everything will be faster, cleaner and safer (you won't have to complexify the query to handle guids with only two slashes)
It may be possible to construct such a query but will be not optimized.
You should add a column to your table which will have an ID of the site. Then add a new table which will have a preparsed data for the site: domain, path, resource, whether http or https, etc
This way you can be more flexible in searches and will be much faster, since I assume you have few inserts and large number of reads.
Write a SQL FUNCTION - call it for instance, guid_extract(guid), which extracts the pertinent info, then you can add it to a column in your select::
SELECT stuff, otherstuff, guid_extract(guid) as site
...
GROUP BY site;