问题
I am currently developing a project in PHP + Laravel that needs to scrape data from two different websites. I am using the Goutte Scraping Library. I have 10 integration tests, where I use the Crawler object that Goutte's Client provide in order to get the specific data I want to scrape from each website.
The tests work just fine (I even used infection library for mutant testing)... But the thing is that I thik there could be a way to unit test all the functions (therefore, the tests would run faster).
The approach I tried to follow is to scrape all the html file from each of both websites and assert that the scrapped html equals to a local html file that I would have locally on my project and that would be the same html as the scrapped one. Therefore, if my local html and the scrapped html are the same, I could just pass the data from my local html to the functions that target spacific html tags to retrieve the info I want. I hope this make sense
I hope my code can elighten you guys a bit more:
My test class look like this:
private $html;
protected function setUp() :void
{
$myHtml= fopen("path\myLocal.html", "r");
$this->html = fread($myHtml, filesize("path\myLocal.html"));
fclose($myHtml);
}
public function test_webScrapping_returns_html()
{
$scrapper = new WebScraping();
$url = "www.the-url-I-wanna-scrape.com";
$scrappedHtml= $scrapper->getHtml($url);
$this->assertTrue($scrappedHtml=== $this->html);
}
And the getHtml() function of my WebScraping model looks like this:
public function getHtml(string $url)
{
$client = new Client(); //I know that I should not intantiate the Goutte Client here (inject in __constructor intead?)
$html = $client->request('GET', $url)->html();
return $html;
}
The thing is that if I dd($this->html)
or dd($scrappedHtml)
, the content is pretty much the same... with the only difference that one has \n
and \r
interpersed and the other hasn't. So... both htmls have the same stuff but I cannot assert that they're equal. What I'm missing??? Am I in the right path... or would you follow a totally different approach?
来源:https://stackoverflow.com/questions/63459024/how-to-unit-test-a-web-scraping-service-php-unit