How can I call many URLs from a list asynchronously

浪子不回头ぞ 提交于 2019-12-01 01:12:36

问题


I have a few hundred thousand URLs that I need to call. These are calls to an application server which will process them and write a status code to a table. I do not need to wait for a response (success/fail), only that the server got the request. I also want to be able to specify how many concurrent jobs can be running at once as I haven't worked out how many concurrent requests tomcat can handle.

Here's what I've got so far, basically taken from someone's else's attempt to do something similar, just not with url calls. The text file contains each url on its own line. The url looks like this:

http://webserver:8080/app/mwo/services/create?server=ServerName&e1user=admin&newMWONum=123456&sourceMWONum=0&tagNum=33-A-1B

And the code:

$maxConcurrentJobs = 10
$content = Get-Content -Path "C:\Temp\urls.txt"

foreach ($url in $content) {
    $running = @(Get-Job | Where-Object { $_.State -eq 'Running' })
    if ($running.Count -le $maxConcurrentJobs) {
        Start-Job {
             Invoke-WebRequest -UseBasicParsing -Uri $using:url
        }
    } else {
         $running | Wait-Job -Any
    }
    Get-Job | Receive-Job
}

The problems I'm having is that it is giving 2 errors per "job" and I'm not sure why. When I dump the url array $content it looks fine and when I run my Invoke-WebRequest one by one they work without error.

126    Job126          BackgroundJob   Running       True            localhost            ...                
Invalid URI: The hostname could not be parsed.
    + CategoryInfo          : NotSpecified: (:) [Invoke-RestMethod], UriFormatException
    + FullyQualifiedErrorId : System.UriFormatException,Microsoft.PowerShell.Commands.InvokeRestMethodComman 
   d
    + PSComputerName        : localhost

Invalid URI: The hostname could not be parsed.
    + CategoryInfo          : NotSpecified: (:) [Invoke-RestMethod], UriFormatException
    + FullyQualifiedErrorId : System.UriFormatException,Microsoft.PowerShell.Commands.InvokeRestMethodComman 
   d
    + PSComputerName        : localhost

Any help or alternative implementations would be appreciated. I'm open to not using powershell, but I'm limited to Windows 7 Desktops or Windows 2008 R2 servers, and I'd probably be running the final script on the server itself using localhost in the url to cut down on network delays.


回答1:


With Jobs you incur a large amount of overhead, because each new Job spawns a new process.

Use Runspaces instead!

$maxConcurrentJobs = 10
$content = Get-Content -Path "C:\Temp\urls.txt"

# Create a runspace pool where $maxConcurrentJobs is the 
# maximum number of runspaces allowed to run concurrently    
$Runspace = [runspacefactory]::CreateRunspacePool(1,$maxConcurrentJobs)

# Open the runspace pool (very important)
$Runspace.Open()

foreach ($url in $content) {
    # Create a new PowerShell instance and tell it to execute in our runspace pool
    $ps = [powershell]::Create()
    $ps.RunspacePool = $Runspace

    # Attach some code to it
    [void]$ps.AddCommand("Invoke-WebRequest").AddParameter("UseBasicParsing",$true).AddParameter("Uri",$url)

    # Begin execution asynchronously (returns immediately)
    [void]$ps.BeginInvoke()

    # Give feedback on how far we are
    Write-Host ("Initiated request for {0}" -f $url)
}

As noted in the linked ServerFault post, you can also use a more generic solution, like Invoke-Parallel, which basically does the above




回答2:


I agree with the top post to use Runspaces. However the provided code doesn't show how to get data back from the request. Here's a PowerShell module recently published to my GitHub page:

https://github.com/phbits/AsyncHttps.

It will submit async HTTP requests to a single domain over SSL/TLS (TCP port 443). Here's an Example from the README.md

Import-Module AsyncHttps
Invoke-AsyncHttps -DnsName www.contoso.com -UriPaths $('dir1','dir2','dir3')

It returns a System.Object[] containing the results of each request. The result properties are as follows:

Uri       - Request Uri
Status    - Http Status Code or Exception Message
BeginTime - Job Start Time
EndTime   - Job End Time

After looking at your example, you'll probably need to make the following modifications:

  1. Allow usage of an alternative port (webserver:8080). The easiest would be to update the URI in the scriptblock. Alternatively add another parameter to the module and scriptblock just for the port.
  2. Test that Query Parameters are properly formatted and not mangled by percent encoding when used in the HTTP request. Consider skipping the use of UriBuilder in the scriptblock as long as your list of Uri Paths are known to be OK.



回答3:


You can also use async methods of .net webclients. Say if you just need to send a get request to your Urls, Net.WebClient will work. Below is a dummy example with example.com:

$urllist = 1..97
$batchSize = 20

$results = [System.Collections.ArrayList]::new()

$i = 1
foreach($url in $urllist) {

  $w = [System.Net.Webclient]::new().DownloadStringTaskAsync("http://www.example.com?q=$i")
  $results.Add($w) | Out-Null

  if($i % $batchSize -eq 0 -or $i -eq $urllist.Count) {
      While($false -in $results.IsCompleted) {sleep -Milliseconds 300} # waiting for batch to complete
       Write-Host " ........   Batch completed   ......... $i" -ForegroundColor Green
       foreach($r in $results) { 
         New-Object PSObject -Property @{url = $r.AsyncState.AbsoluteURI; jobstatus =$r.Status; success = !$r.IsFaulted} 
         # if you need response text use $r.Result
       }
     $results.Clear()
   }

$i+=1

}


来源:https://stackoverflow.com/questions/31524786/how-can-i-call-many-urls-from-a-list-asynchronously

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!