Assuming I have an Amazon product URL like so
http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C/ref=amb_link_86123711_2?pf_rd_m=ATVP
@Gumbo: Your code works great!
//JS Test: Test it into firebug.
url = window.location.href;
url.match("/([a-zA-Z0-9]{10})(?:[/?]|$)");
I add a php function that makes the same thing.
function amazon_get_asin_code($url) {
global $debug;
$result = "";
$pattern = "([a-zA-Z0-9]{10})(?:[/?]|$)";
$pattern = escapeshellarg($pattern);
preg_match($pattern, $url, $matches);
if($debug) {
var_dump($matches);
}
if($matches && isset($matches[1])) {
$result = $matches[1];
}
return $result;
}
This may be a simplistic approach, but I have yet to find an error in it using any of the URL's provided in this thread that people say is an issue.
Simply, I take the URL, split it on the "/" to get the discrete parts. Then loop through the contents of the array and bounce them off of the regex. In my case the variable i represents an object that has a property called RawURL to contain the raw url that I am working with and a property called VendorSKU that I am populating.
try
{
string[] urlParts = i.RawURL.Split('/');
Regex regex = new Regex(@"^[A-Z0-9]{10}");
foreach (string part in urlParts)
{
Match m = regex.Match(part);
if (m.Success)
{
i.VendorSKU = m.Value;
}
}
}
catch (Exception) { }
So far, this has worked perfectly.
You can get the ASIN number by getting/scraping that page content and then by getting value of element by id="ASIN". It will work in all the cases and you don not need to rely on regex.
You can scrape ASIN codes from the data-asin
attribute in the search results using XPath.
For example $x('//@data-asin').map(function(v,i){return v.nodeValue})
can be ran in Chrome's console.
Since the ASIN is always a sequence of 10 letters and/or numbers immediately after a slash, try this:
url.match("/([a-zA-Z0-9]{10})(?:[/?]|$)")
The additional (?:[/?]|$)
after the ASIN is to ensure that only a full path segment is taken.
A little bit of change to the regex of the first answer and it works on all the urls I have tested.
var url = "http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C";
m = url.match("/([a-zA-Z0-9]{10})(?:[/?]|$)");;
print(m);
if (m) {
print("ASIN=" + m[1]);
}