Script to use Google Image Search with local image as input

♀尐吖头ヾ 提交于 2019-11-30 03:52:52
latkin

Cool question! I spent far too much time tinkering with this, but I think finally got it :)

In a nutshell, you have to upload the raw bytes of your image, embedded and properly formatted along with some other stuff, to images.google.com/searchbyimage/upload. The response to that request will contain a new URL which sends you to the actual results page.

This function will return back the results page URL. You can do whatever you want with it, but to simply open the results in a browser, pass it to Start-Process.

Of course, Google could change the workflow for this at any time, so don't expect this script to work forever.

function Get-GoogleImageSearchUrl
{
    param(
        [Parameter(Mandatory = $true)]
        [ValidateScript({ Test-Path $_ })]
        [string] $ImagePath
    )

    # extract the image file name, without path
    $fileName = Split-Path $imagePath -Leaf

    # the request body has some boilerplate before the raw image bytes (part1) and some after (part2)
    #   note that $filename is included in part1
    $part1 = @"
-----------------------------7dd2db3297c2202
Content-Disposition: form-data; name="encoded_image"; filename="$fileName"
Content-Type: image/jpeg


"@
    $part2 = @"
-----------------------------7dd2db3297c2202
Content-Disposition: form-data; name="image_content"


-----------------------------7dd2db3297c2202--

"@

    # grab the raw bytes composing the image file
    $imageBytes = [Io.File]::ReadAllBytes($imagePath)

    # the request body should sandwich the image bytes between the 2 boilerplate blocks
    $encoding = New-Object Text.ASCIIEncoding
    $data = $encoding.GetBytes($part1) + $imageBytes + $encoding.GetBytes($part2)

    # create the HTTP request, populate headers
    $request = [Net.HttpWebRequest] ([Net.HttpWebRequest]::Create('http://images.google.com/searchbyimage/upload'))
    $request.Method = "POST"
    $request.ContentType = 'multipart/form-data; boundary=---------------------------7dd2db3297c2202'  # must match the delimiter in the body, above
    $request.ContentLength = $data.Length

    # don't automatically redirect to the results page, just take the response which points to it
    $request.AllowAutoredirect = $false

    # populate the request body
    $stream = $request.GetRequestStream()
    $stream.Write($data, 0, $data.Length)
    $stream.Close()        

    # get response stream, which should contain a 302 redirect to the results page
    $respStream = $request.GetResponse().GetResponseStream()

    # pluck out the results page link that you would otherwise be redirected to
    (New-Object Io.StreamReader $respStream).ReadToEnd() -match 'HREF\="([^"]+)"' | Out-Null
    $matches[1]
}

Usage:

$url = Get-GoogleImageSearchUrl 'C:\somepic.jpg'
Start-Process $url

Edit/Explanation

Here's some more detail. I'll basically just take you through the steps I took as I figured this out.

First, I just went ahead and did a local image search.

The URL it sends you to is very long (~1500 chars in the case of longcat), but not nearly long enough to fully encode the image (60KB). So you can tell right off the bat that it's more complex than simply doing something like a base64 encoding.

Next, I fired up Fiddler and looked at what's actually going on when you do a local image search. After browsing/selecting the image, you see some traffic to images.google.com/searchbyimage/upload. Viewing that request in detail reveals the basic mechanism.

  1. The data is being sent in the format of multipart/form-data, and you need to specify what string of characters is separating the different fields (red boxes). If you Bing/Google around, you will find that multipart/form-data is some kind of web standard, but it really doesn't matter for this example.
  2. You need to (or at least should) include the original file name (orange box). Perhaps this factors into the search results.
  3. The full, raw image is included in the encoded-image field (green box).
  4. The response does not contain the actual results, it is simply a redirect to the actual results page (purple boxes)

There are a few fields not shown here, way at the bottom. They aren't super interesting.

Once I figured out the basic workflow, it was only a matter of coding it up. I just copied the web request I saw in Fiddler as closely as I could, using standard .NET web request APIs. The answers to this SO question demonstrate the APIs you need in order to properly encode and send body data in a web request.

From some experimentation, I found that you only need the two body fields I included in my code (encoded_image and image_content). Going through the web UI includes more, but apparently they are not required.

More experimentation revealed that none of the other headers or cookies shown in Fiddler are really required.

For our purposes, we don't actually want to access the results page, only get a pointer to it. Thus we should set AllowAutoRedirect to $false. That way, Google's 302 redirect is given to us directly and we can extract the results page URL from it.

While writing this edit, I slapped my forehead and realized that Powershell v3 has the Invoke-WebRequest cmdlet, which could potentially eliminate the need for the .NET web API calls. Unfortunately, I could not get it to work properly after tinkering for 10 min, so I gave up. Seems like some issue with the way the cmdlet is encoding the data, though I could be wrong.

function Get-GoogleImageSearchUrl
{
    param(
        [Parameter(Mandatory = $true)]
        [ValidateScript({ Test-Path $_ })]
        [string] $ImagePath
    )

    # extract the image file name, without path
    $fileName = Split-Path $imagePath -Leaf

    # the request body has some boilerplate before the raw image bytes (part1) and some after (part2)
    #   note that $filename is included in part1
    $part1 = @"
--7dd2db3297c2202
Content-Disposition: form-data; name="encoded_image"; filename="$fileName"
Content-Type: application/octet-stream`r`n`r`n
"@
    $part2 = @"
`r`n--7dd2db3297c2202--`r`n
"@

    # grab the raw bytes composing the image file
    $imageBytes = [Io.File]::ReadAllBytes($imagePath)

    # the request body should sandwich the image bytes between the 2 boilerplate blocks
    $encoding = New-Object Text.ASCIIEncoding
    $data = $encoding.GetBytes($part1) + $imageBytes + $encoding.GetBytes($part2)

    # create the HTTP request, populate headers
    $request = [Net.HttpWebRequest] ([Net.HttpWebRequest]::Create('http://images.google.com/searchbyimage/upload'))
    $request.Method = "POST"
    $request.ContentType = 'multipart/form-data; boundary=7dd2db3297c2202'  # must match the delimiter in the body, above

    # don't automatically redirect to the results page, just take the response which points to it
    $request.AllowAutoredirect = $false

    # populate the request body
    $stream = $request.GetRequestStream()
    $stream.Write($data, 0, $data.Length)
    $stream.Close()        

    # get response stream, which should contain a 302 redirect to the results page
    $respStream = $request.GetResponse().GetResponseStream()

    # pluck out the results page link that you would otherwise be redirected to
    (New-Object Io.StreamReader $respStream).ReadToEnd() -match 'HREF\="([^"]+)"' | Out-Null
    $matches[1]
}
$url = Get-GoogleImageSearchUrl 'C:\somepic.jpg'
Start-Process $url
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!