Automate picture downloads from website with authentication

后端 未结 2 1504
失恋的感觉
失恋的感觉 2021-01-22 00:43

My intention is to automate the downloading of all pictures in a website that requires a login (a web-form based login I think)

The website: http://www.cgwallpapers.com<

相关标签:
2条回答
  • 2021-01-22 01:23

    Here is a complete solution to your question exclusively using HttpWebRequest and HttpWebResponse requests to simulate browser requests. I have commented much of the code as to hopefully give you an idea of how this all works.

    You must change the sUsername and sPassword variables to your own username/password to successfully log into the site.

    Optional variables that you may want to change:

    • sDownloadPath: Currently set to the same folder as the application exe. Change this to the path where you want to download your images.
    • sImageResolution: Defaults to 1920x1080 which is what you specified in your original question. Change this value to any of the accepted resolution values on the website. Just a warning that I am not not 100% sure if all images have the same resolutions so changing this value may cause some images to be skipped if they do not have an image in the desired resolution.
    • nMaxErrorsInSuccession: Set to 10 by default. Once logged in, the app will continually increment the image id and attempt to download a new image. Some ids do not contain an image and this is normal as the image may have been deleted on the server (or maybe the image is not available in the desired resolution). If the app fails to download an image nMaxErrorsInSuccession times in a row then the application will stop as we assume we have reached the last of the images. It is possible that you may have to increase this to a higher number in the event that there are more than 10 images that are deleted or not available in the selected resolution.
    • nCurrentID: Set to 1 by default. This is the image id used by the website to determine which image to serve to the client. As images are downloaded, the nCurrentID variable is incremented by one each image download attempt. Depending on time and circumstances you may not be able to download all images in one session. If this is the case you can remember which ID you left off on and update this variable accordingly to start on a different id next time. Also useful for when you have successfully downloaded all images and want to run the app later to download newer images.
    • sUserAgent: Can be any user agent that you want. Currently using Firefox 35.0 for Windows 7. Note that some websites will function differently depending on what user agent you specify so only change this if you really need to emulate another browser.

    NOTE: There is a 3 second pause strategically inserted at various points in the code. Some websites have hammer scripts that will block or even ban users who are browsing a site too quickly. Although removing these lines will speed up the time it takes to download all images, I would not recommend doing so.

        Imports System.Net
        Imports System.IO
    
        Public Class Form2
            Const sUsername As String = "USERNAMEHERE"
            Const sPassword As String = "PASSWORDHERE"
            Const sImageResolution As String = "1920x1080"
            Const sUserAgent As String = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0"
            Const sMainURL As String = "http://www.cgwallpapers.com/"
            Const sCheckLoginURL As String = "http://www.cgwallpapers.com/login.php"
            Const sDownloadURLLeft As String = "http://www.cgwallpapers.com/members/getwallpaper.php?id="
            Const sDownloadURLRight As String = "&res="
            Private oCookieCollection As CookieCollection = Nothing
            Private nMaxErrorsInSuccession As Int32 = 10
            Private nCurrentID As Int32 = 1
            Private sDownloadPath As String = Application.StartupPath
    
            Private Sub Form2_Load(sender As Object, e As EventArgs) Handles MyBase.Load
                StartScrape()
            End Sub
    
            Private Sub StartScrape()
                Try
                    Dim bContinue As Boolean = True
    
                    Dim sPostData(5) As String
    
                    sPostData(0) = UrlEncode("action")
                    sPostData(1) = UrlEncode("go")
                    sPostData(2) = UrlEncode("email")
                    sPostData(3) = UrlEncode(sUsername)
                    sPostData(4) = UrlEncode("wachtwoord")
                    sPostData(5) = UrlEncode(sPassword)
    
                    If GetMethod(sMainURL) = True Then
                        If SetMethod(sCheckLoginURL, sPostData, sMainURL) = True Then
                            ' Login successful
    
                            Dim nErrorsInSuccession As Int32 = 0
    
                            Do Until nErrorsInSuccession > nMaxErrorsInSuccession
                                If DownloadImage(sDownloadURLLeft, sDownloadURLRight, sMainURL, nCurrentID) = True Then
                                    ' Always reset error count when we successfully download
                                    nErrorsInSuccession = 0
                                Else
                                    ' Add one to error count because there was no image at the current id
                                    nErrorsInSuccession += 1
                                End If
    
                                nCurrentID += 1
                                Threading.Thread.Sleep(3000)    ' Wait 3 seconds to prevent loading pages too quickly
                            Loop
    
                            MessageBox.Show("Finished downloading images")
                        End If
                    Else
                        MessageBox.Show("Error connecting to main site. Are you connected to the internet?")
                    End If
                Catch ex As Exception
                    MessageBox.Show(ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                End Try
            End Sub
    
            Private Function GetMethod(ByVal sPage As String) As Boolean
                Dim req As HttpWebRequest
                Dim resp As HttpWebResponse
                Dim stw As StreamReader
                Dim bReturn As Boolean = True
    
                Try
                    req = HttpWebRequest.Create(sPage)
                    req.Method = "GET"
                    req.AllowAutoRedirect = False
                    req.UserAgent = sUserAgent
                    req.Accept = "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"
                    req.Headers.Add("Accept-Language", "en-us,en;q=0.5")
                    req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
                    req.Headers.Add("Keep-Alive", "300")
                    req.KeepAlive = True
    
                    resp = req.GetResponse        ' Get the response from the server 
    
                    If req.HaveResponse Then
                        ' Save the cookie info
    
                        SaveCookies(resp.Headers("Set-Cookie"))
    
                        resp = req.GetResponse        ' Get the response from the server 
                        stw = New StreamReader(resp.GetResponseStream)
                        stw.ReadToEnd()    ' Read the response from the server, but we do not save it
                    Else
                        MessageBox.Show("No response received from host " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                        bReturn = False
                    End If
                Catch exc As WebException
                    MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                    bReturn = False
                End Try
    
                Return bReturn
            End Function
    
            Private Function SetMethod(ByVal sPage As String, ByVal sPostData() As String, sReferer As String) As Boolean
                Dim bReturn As Boolean = False
                Dim req As HttpWebRequest
                Dim resp As HttpWebResponse
                Dim str As StreamWriter
                Dim sPostDataValue As String = ""
                Dim nInitialCookieCount As Int32 = 0
    
                Try
                    req = HttpWebRequest.Create(sPage)
                    req.Method = "POST"
                    req.UserAgent = sUserAgent
                    req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
                    req.Headers.Add("Accept-Language", "en-us,en;q=0.5")
                    req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
                    req.Referer = sReferer
                    req.ContentType = "application/x-www-form-urlencoded"
                    req.Headers.Add("Keep-Alive", "300")
    
                    If oCookieCollection IsNot Nothing Then
                        ' Pass cookie info from the login page
                        req.CookieContainer = SetCookieContainer(sPage)
                    End If
    
                    str = New StreamWriter(req.GetRequestStream)
    
                    If sPostData.Count Mod 2 = 0 Then
                        ' There is an even number of post names and values
    
                        For i As Int32 = 0 To sPostData.Count - 1 Step 2
                            ' Put the post data together into one string
                            sPostDataValue &= sPostData(i) & "=" & sPostData(i + 1) & "&"
                        Next i
    
                        sPostDataValue = sPostDataValue.Substring(0, sPostDataValue.Length - 1) ' This will remove the extra "&" at the end that was added from the for loop above
    
                        ' Post the data to the server
    
                        str.Write(sPostDataValue)
                        str.Close()
    
                        ' Get the response
    
                        nInitialCookieCount = req.CookieContainer.Count
                        resp = req.GetResponse
    
                        If req.CookieContainer.Count > nInitialCookieCount Then
                            ' Login successful
                            ' Save new login cookies
    
                            SaveCookies(req.CookieContainer)
                            bReturn = True
                        Else
                            MessageBox.Show("The email or password you entered are incorrect." & vbCrLf & vbCrLf & "Please try again.", "Unable to log in", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
                            bReturn = False
                        End If
                    Else
                        ' Did not specify the correct amount of parameters so we cannot continue
                        MessageBox.Show("POST error.  Did not supply the correct amount of post data for " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                        bReturn = False
                    End If
                Catch ex As Exception
                    MessageBox.Show("POST error.  " & ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                    bReturn = False
                End Try
    
                Return bReturn
            End Function
    
            Private Function DownloadImage(ByVal sPageLeft As String, sPageRight As String, sReferer As String, nCurrentID As Int32) As Boolean
                Dim req As HttpWebRequest
                Dim bReturn As Boolean = False
                Dim sPage As String = sPageLeft & nCurrentID.ToString & sPageRight & sImageResolution
    
                Try
                    req = HttpWebRequest.Create(sPage)
                    req.Method = "GET"
                    req.AllowAutoRedirect = False
                    req.UserAgent = sUserAgent
                    req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
                    req.Headers.Add("Accept-Language", "en-US,en;q=0.5")
                    req.Headers.Add("Accept-Encoding", "gzip, deflate")
                    req.Headers.Add("Keep-Alive", "300")
                    req.KeepAlive = True
    
                    If oCookieCollection IsNot Nothing Then
                        ' Pass cookie info so that we remain logged in
                        req.CookieContainer = SetCookieContainer(sPage)
                    End If
    
                    ' Save file to disk
    
                    Using oResponse As System.Net.WebResponse = CType(req.GetResponse, System.Net.WebResponse)
                        Dim sContentDisposition As String = CType(oResponse, System.Net.HttpWebResponse).Headers("Content-Disposition")
    
                        If sContentDisposition IsNot Nothing Then
                            ' There is an image to download
    
                            Dim sFilename As String = sContentDisposition.Substring(sContentDisposition.IndexOf("filename="), sContentDisposition.Length - sContentDisposition.IndexOf("filename=")).Replace("filename=", "").Replace("""", "").Replace(";", "").Trim
    
                            Using responseStream As IO.Stream = oResponse.GetResponseStream
                                Using fs As New IO.FileStream(System.IO.Path.Combine(sDownloadPath, sFilename), FileMode.Create, FileAccess.Write)
                                    Dim buffer(2047) As Byte
                                    Dim read As Integer
    
                                    Do
                                        read = responseStream.Read(buffer, 0, buffer.Length)
                                        fs.Write(buffer, 0, read)
                                    Loop Until read = 0
    
                                    responseStream.Close()
                                    fs.Flush()
                                    fs.Close()
                                End Using
    
                                responseStream.Close()
                            End Using
    
                            bReturn = True
                        End If
    
                        oResponse.Close()
                    End Using
                Catch exc As WebException
                    MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                    bReturn = False
                End Try
    
                Return bReturn
            End Function
    
            Private Function SetCookieContainer(sPage As String) As System.Net.CookieContainer
                Dim oCookieContainerObject As New System.Net.CookieContainer
                Dim oCookie As System.Net.Cookie
    
                For c As Int32 = 0 To oCookieCollection.Count - 1
                    If IsDate(oCookieCollection(c).Value) = False Then
                        oCookie = New System.Net.Cookie
                        oCookie.Name = oCookieCollection(c).Name
                        oCookie.Value = oCookieCollection(c).Value
                        oCookie.Domain = New Uri(sPage).Host
                        oCookie.Secure = False
                        oCookieContainerObject.Add(oCookie)
                    End If
                Next
    
                Return oCookieContainerObject
            End Function
    
            Private Sub SaveCookies(sCookieString As String)
                ' Convert cookie string to global cookie collection object
    
                Dim sCookieStrings() As String = sCookieString.Trim.Replace("path=/,", "").Replace("path=/", "").Split(";".ToCharArray())
    
                oCookieCollection = New CookieCollection
    
                For Each sCookie As String In sCookieStrings
                    If sCookie.Trim <> "" Then
                        Dim sName As String = sCookie.Trim().Split("=".ToCharArray())(0)
                        Dim sValue As String = sCookie.Trim().Split("=".ToCharArray())(1)
    
                        oCookieCollection.Add(New Cookie(sName, sValue))
                    End If
                Next
            End Sub
    
            Private Sub SaveCookies(oCookieContainer As CookieContainer)
                ' Convert cookie container object to global cookie collection object
    
                oCookieCollection = New CookieCollection
    
                For Each oCookie As System.Net.Cookie In oCookieContainer.GetCookies(New Uri(sMainURL))
                    oCookieCollection.Add(oCookie)
                Next
            End Sub
    
            Private Function UrlEncode(ByRef URLText As String) As String
                Dim AscCode As Integer
                Dim EncText As String = ""
                Dim bStr() As Byte = System.Text.Encoding.ASCII.GetBytes(URLText)
    
                Try
                    For i As Long = 0 To UBound(bStr)
                        AscCode = bStr(i)
    
                        Select Case AscCode
                            Case 48 To 57, 65 To 90, 97 To 122, 46, 95
                                EncText = EncText & Chr(AscCode)
    
                            Case 32
                                EncText = EncText & "+"
    
                            Case Else
                                If AscCode < 16 Then
                                    EncText = EncText & "%0" & Hex(AscCode)
                                Else
                                    EncText = EncText & "%" & Hex(AscCode)
                                End If
    
                        End Select
                    Next i
    
                    Erase bStr
                Catch ex As WebException
                    MessageBox.Show(ex.Message, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
                End Try
    
                Return EncText
            End Function
        End Class
    
    0 讨论(0)
  • 2021-01-22 01:29
    Private Function DownloadImage() As String
        Dim remoteImgPath As String = "http://www.cgwallpapers.com/members/viewwallpaper.php?id=1764&res=1920x1080"
        Dim remoteImgPathUri As New Uri(remoteImgPath)
        Dim remoteImgPathWithoutQuery As String = remoteImgPathUri.GetLeftPart(UriPartial.Path)
        Dim fileName As String = Path.GetFileName(remoteImgPathWithoutQuery)
        Dim localPath As String = Convert.ToString(AppDomain.CurrentDomain.BaseDirectory + "LocalFolder\Images\Originals\") & fileName
        Dim webClient As New WebClient()
        webClient.DownloadFile(remoteImgPath, localPath)
        Return localPath
    End Function
    

    I threw this together I think its the right direction.

    Try

            Dim theFile As String = "c:\wallpaper.jpg"
    
            Dim fileName As String
    
            fileName = Path.GetFileName(theFile)
    
    
    
            Dim ms = New MemoryStream(File.ReadAllBytes(theFile))
    
    
    
            Dim dataLengthToRead As Long = ms.Length
            Dim blockSize As Integer = If(dataLengthToRead >= 5000, 5000, CInt(dataLengthToRead))
            Dim buffer As Byte() = New Byte(dataLengthToRead - 1) {}
    
    
            Response.Clear()
            Response.ClearContent()
            Response.ClearHeaders()
            Response.BufferOutput = True
    
    
            Response.AddHeader("Content-Disposition", "attachment; filename=" + fileName)
            Response.AddHeader("Content-Disposition", "inline; filename=" + fileName)
    
            Response.AddHeader("Content-Length", blockSize.ToString())
            Response.ContentType = "image/JPEG"
    
    
    
            While dataLengthToRead > 0 AndAlso Response.IsClientConnected
                Dim lengthRead As Int32 = ms.Read(buffer, 0, blockSize)
                Response.OutputStream.Write(buffer, 0, lengthRead)
                Response.Flush()
                dataLengthToRead = dataLengthToRead - lengthRead
            End While
    
    
    
    
            Response.Flush()
            Response.Close()
    
    
        Catch ex As Exception
    
        End Try
    
    0 讨论(0)
提交回复
热议问题