Getting the URLs for the first Google search results in a shell script

后端 未结 6 1978
北恋
北恋 2021-02-06 01:19

It\'s relatively easy to parse the output of the AJAX API using a scripting language:

#!/usr/bin/env python

import urllib
import json

base = \'http://ajax.goog         


        
相关标签:
6条回答
  • 2021-02-06 01:41

    @Lri - Here is a script I personally use for my purpose of command line tools & scripts. It uses the command line utility "lynx" for dumping the URLs. Script can be downloaded from HERE and code view is HERE. Here is the code for your reference,

    #!/bin/bash
    
    clear
    echo ""
    echo ".=========================================================."
    echo "|                                                         |"
    echo "|  COMMAND LINE GOOGLE SEARCH                             |"
    echo "|  ---------------------------------------------------    |"
    echo "|                                                         |"
    echo "|  Version: 1.0                                           |"
    echo "|  Developed by: Rishi Narang                             |"
    echo "|  Blog: www.wtfuzz.com                                   |"
    echo "|                                                         |"
    echo "|  Usage: ./gocmd.sh <search strings>                     |"
    echo "|  Example: ./gocmd.sh example and test                   |"
    echo "|                                                         |"
    echo ".=========================================================."
    echo ""
    
    if [ -z $1 ]
    then
     echo "ERROR: No search string supplied."
     echo "USAGE: ./gocmd.sh <search srting>"
     echo ""
     echo -n "Anyways for now, supply the search string here: "
     read SEARCH
    else
     SEARCH=$@
    fi
    
    URL="http://google.com/search?hl=en&safe=off&q="
    STRING=`echo $SEARCH | sed 's/ /%20/g'`
    URI="$URL%22$STRING%22"
    
    lynx -dump $URI > gone.tmp
    sed 's/http/\^http/g' gone.tmp | tr -s "^" "\n" | grep http| sed 's/\ .*//g' > gtwo.tmp
    rm gone.tmp
    sed '/google.com/d' gtwo.tmp > urls
    rm gtwo.tmp
    
    echo "SUCCESS: Extracted `wc -l urls` and listed them in '`pwd`/urls' file for reference."
    echo ""
    cat urls
    echo ""
    
    #EOF
    
    0 讨论(0)
  • 2021-02-06 01:42

    Untested approach as I don't have access to a unix box currently ...

    Assuming "test" is the query string, you could use a simple wget on the following url http://www.google.co.in/#hl=en&source=hp&biw=1280&bih=705&q=test&btnI=Google+Search&aq=f&aqi=g10&aql=&oq=test&fp=3cc29334ffc8c2c

    This would leverage Google's "I'm feeling lucky" functionality and wget the first url for you. You may be able to clean up the above url a bit too.

    0 讨论(0)
  • 2021-02-06 01:52

    many years later, you can install googler

    googler -n 1 -c in -l en search something here --json

    you can control the number of output page using the n flag.

    To get only the url, simply pipe it to:

    grep "\"url\""|tr -s ' ' |cut -d ' ' -f3|tr -d "\""
    
    0 讨论(0)
  • 2021-02-06 01:54

    I ended up using curl's --data-urlencode option to encode the query parameter and just sed for extracting the first result.

    curl -s --get --data-urlencode "q=example" http://ajax.googleapis.com/ajax/services/search/web?v=1.0 | sed 's/"unescapedUrl":"\([^"]*\).*/\1/;s/.*GwebSearch",//'

    0 讨论(0)
  • 2021-02-06 02:00

    Just for reference: By November 2013, you will need to replace the ajax.googleapis.com/ajax/services/search/web calls completely.

    Most likely, it has to be replaced with Custom Search Engine (CSE). The problem is that you won't be able to get "global" results from CSE. Here is a nice tip on how to do this: http://groups.google.com/a/googleproductforums.com/d/msg/customsearch/0aoS-bXgnEM/lwlZ6_IyVDQJ.

    0 讨论(0)
  • 2021-02-06 02:01

    Lri's answer only returned the last result for me and i needed the top so I changed it to:

    JSON=$(curl -s --get --data-urlencode "q=QUERY STRING HERE" http://ajax.googleapis.com/ajax/services/search/web?v=1.0 | python -mjson.tool)
    response=$(echo "$JSON" | sed -n -e 's/^.*responseStatus\": //p')
    if [ $response -eq 200 ] ; then 
        url=$(echo "$JSON" | egrep "unescapedUrl" | sed -e '1!d' -e "s/^.*unescapedUrl\": \"//" -e "s/\".*$//")
        echo "Success! [$url]"
        wget $url;
    else 
        echo "FAILED! [$response]" 
    fi
    

    Its not as compact as I'd like but in a rush.

    0 讨论(0)
提交回复
热议问题