scrape ASIN from amazon URL using javascript

前端 未结 16 722
旧巷少年郎
旧巷少年郎 2021-01-30 11:42

Assuming I have an Amazon product URL like so

http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C/ref=amb_link_86123711_2?pf_rd_m=ATVP         


        
相关标签:
16条回答
  • 2021-01-30 12:35

    Actually, the top answer doesn't work if it's something like amazon.com/BlackBerry... (since BlackBerry is also 10 characters).

    One workaround (assuming the ASIN is always capitalized, as it always is when taken from Amazon) is (in Ruby):

            url.match("/([A-Z0-9]{10})")
    

    I've found it to work on thousands of URLs.

    0 讨论(0)
  • 2021-01-30 12:36

    This worked perfectly for me, I tried all the links on this page and some other links:

    function ExtractASIN(url){
        var ASINreg = new RegExp(/(?:\/)([A-Z0-9]{10})(?:$|\/|\?)/);
        var  cMatch = url.match(ASINreg);
        if(cMatch == null){
            return null;
        }
        return cMatch[1];
    }
    ExtractASIN('http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C/ref=amb_link_86123711_2?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-1&pf_rd_r=0AY9N5GXRYHCADJP5P0V&pf_rd_t=101&pf_rd_p=500528151&pf_rd_i=507846');
    
    • I assumed that the ASIN is a 10-length with capital letters and numbers
    • I assumed that after the ASIN must be: end of the link, question mark or slash
    • I assumed that before the ASIN must be a slash
    0 讨论(0)
  • 2021-01-30 12:36

    The Wikipedia article on ASIN (which I've linkified in your question) gives the various forms of Amazon URLs. You can fairly easily create a regular expression (or series of them) to fetch this data using the match() method.

    0 讨论(0)
  • 2021-01-30 12:37

    If the ASIN is always in that position in the URL:

    var asin= decodeURIComponent(url.split('/')[5]);
    

    though there's probably little chance of an ASIN getting %-escaped.

    0 讨论(0)
提交回复
热议问题