I need a regex that will give me the string inside an href tag and inside the quotes also.
For example i need to extract theurltoget.com in the following:
This will handle the case where there are no quotes around the URL.
/<a [^>]*href="?([^">]+)"?>/
But seriously, do not parse HTML with regex. Use DOM or a proper parsing library.
Dont use regex for this. You can use xpath and built in php functions to get what you want:
$xml = simplexml_load_string($myHtml);
$list = $xml->xpath("//@href");
$preparedUrls = array();
foreach($list as $item) {
$item = parse_url($item);
$preparedUrls[] = $item['scheme'] . '://' . $item['host'] . '/';
}
print_r($preparedUrls);
this expression will handle 3 options:
'/href=["\']?([^"\'>]+)["\']?/'