Getting the URLs for the first Google search results in a shell script

后端 未结 6 1977
北恋
北恋 2021-02-06 01:19

It\'s relatively easy to parse the output of the AJAX API using a scripting language:

#!/usr/bin/env python

import urllib
import json

base = \'http://ajax.goog         


        
6条回答
  •  北恋
    北恋 (楼主)
    2021-02-06 01:41

    @Lri - Here is a script I personally use for my purpose of command line tools & scripts. It uses the command line utility "lynx" for dumping the URLs. Script can be downloaded from HERE and code view is HERE. Here is the code for your reference,

    #!/bin/bash
    
    clear
    echo ""
    echo ".=========================================================."
    echo "|                                                         |"
    echo "|  COMMAND LINE GOOGLE SEARCH                             |"
    echo "|  ---------------------------------------------------    |"
    echo "|                                                         |"
    echo "|  Version: 1.0                                           |"
    echo "|  Developed by: Rishi Narang                             |"
    echo "|  Blog: www.wtfuzz.com                                   |"
    echo "|                                                         |"
    echo "|  Usage: ./gocmd.sh                      |"
    echo "|  Example: ./gocmd.sh example and test                   |"
    echo "|                                                         |"
    echo ".=========================================================."
    echo ""
    
    if [ -z $1 ]
    then
     echo "ERROR: No search string supplied."
     echo "USAGE: ./gocmd.sh "
     echo ""
     echo -n "Anyways for now, supply the search string here: "
     read SEARCH
    else
     SEARCH=$@
    fi
    
    URL="http://google.com/search?hl=en&safe=off&q="
    STRING=`echo $SEARCH | sed 's/ /%20/g'`
    URI="$URL%22$STRING%22"
    
    lynx -dump $URI > gone.tmp
    sed 's/http/\^http/g' gone.tmp | tr -s "^" "\n" | grep http| sed 's/\ .*//g' > gtwo.tmp
    rm gone.tmp
    sed '/google.com/d' gtwo.tmp > urls
    rm gtwo.tmp
    
    echo "SUCCESS: Extracted `wc -l urls` and listed them in '`pwd`/urls' file for reference."
    echo ""
    cat urls
    echo ""
    
    #EOF
    

提交回复
热议问题