Assuming I have an Amazon product URL like so
http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C/ref=amb_link_86123711_2?pf_rd_m=ATVP
None of the above work in all cases. I have tried following urls to match with the examples above:
http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C
http://www.amazon.com/dp/B0015T963C
http://www.amazon.com/gp/product/B0015T963C
http://www.amazon.com/gp/product/glance/B0015T963C
https://www.amazon.de/gp/product/B00LGAQ7NW/ref=s9u_simh_gw_i1?ie=UTF8&pd_rd_i=B00LGAQ7NW&pd_rd_r=5GP2JGPPBAXXP8935Q61&pd_rd_w=gzhaa&pd_rd_wg=HBg7f&pf_rd_m=A3JWKAKR8XB7XF&pf_rd_s=&pf_rd_r=GA7GB6X6K6WMJC6WQ9RB&pf_rd_t=36701&pf_rd_p=c210947d-c955-4398-98aa-d1dc27e614f1&pf_rd_i=desktop
https://www.amazon.de/Sawyer-Wasserfilter-Wasseraufbereitung-Outdoor-Filter/dp/B00FA2RLX2/ref=pd_sim_200_3?_encoding=UTF8&psc=1&refRID=NMR7SMXJAKC4B3MH0HTN
https://www.amazon.de/Notverpflegung-Kg-Marine-wasserdicht-verpackt/dp/B01DFJTYSQ/ref=pd_sim_200_5?_encoding=UTF8&psc=1&refRID=7QM8MPC16XYBAZMJNMA4
https://www.amazon.de/dp/B01N32MQOA?psc=1
This is the best I could come up with: (?:[/dp/]|$)([A-Z0-9]{10})
Which will also select the prepending / in all cases. This can then be removed later on.
You can test it on: http://regexr.com/3gk2s
something like this should work (not tested)
var match = /\/dp\/(.*?)\/ref=amb_link/.exec(amazon_url);
var asin = match ? match[1] : '';
Amazon's detail pages can have several forms, so to be thorough you should check for them all. These are all equivalent:
http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C
http://www.amazon.com/dp/B0015T963C
http://www.amazon.com/gp/product/B0015T963C
http://www.amazon.com/gp/product/glance/B0015T963C
They always look like either this or this:
http://www.amazon.com/<SEO STRING>/dp/<VIEW>/ASIN
http://www.amazon.com/gp/product/<VIEW>/ASIN
This should do it:
var url = "http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C";
var regex = RegExp("http://www.amazon.com/([\\w-]+/)?(dp|gp/product)/(\\w+/)?(\\w{10})");
m = url.match(regex);
if (m) {
alert("ASIN=" + m[4]);
}
this is my universal amazon ASIN regexp:
~(?:\b)((?=[0-9a-z]*\d)[0-9a-z]{10})(?:\b)~i
Inspired by many of the answers here, I found that
(?:[/])([A-Z0-9]{10})(?:[\/|\?|\&|\s|$])
let url="https://www.amazon.com/Why-We-Sleep-Science-Dreams-ebook/dp/B06Y649387/ref=pd_sim_351_4/131-0417603-5732106?_encoding=UTF8&pd_rd_i=B06Y649387&pd_rd_r=5ebbfdd5-a2f6-4ee3-ad13-5036b5e20827&pd_rd_w=LBo2H&pd_rd_wg=OBomS&pf_rd_p=3c412f72-0ba4-4e48-ac1a-8867997981bd&pf_rd_r=TN0WDV3AC7ED4Y7EKNVP&psc=1&refRID=TN0WDV3AC7ED4Y7EKNVP"
url.match("(?:[/])([A-Z0-9]{10})(?:[\/|\?|\&|\s])")
>> Array [ "/B06Y649387/", "B06Y649387" ]
works really well for extracting asin from anywhere in the url. You can try it out here. https://regexr.com/56jm7
edit: Added end-of-string as one of the stopping checks. This is needed when the regex is used in python
Try using this regex:
(?:[/dp/]|$)([A-Z0-9]{10})
Check out the demo: https://regexr.com/3gk2s