问题
I'm working on an application in C# that goes to a website and gets some content out of a table. It's working fine, but here is the problem: the table that I'm getting the content of changes as I select a different value in a combobox. The Xpath that I use always gets the table that is first shown on the website and I don't know how to get the other ones. I'm posting here everything I think is useful for you to help me.
The webpage is: http://br.soccerway.com/national/brazil/serie-a/2012/regular-season/
xpath/C# code:
HtmlNodeCollection no2 = doc.DocumentNode
.SelectNodes("//*[@id='page_competition_1_block_competition_matches_summary_6']/div[2]/table/tbody/tr/td[@class='team team-a ' or @class='date no-repetition' or @class='score-time score' or @class='team team-b ']");
On the website, you have to click on the "Por semana de jogo" option, right above the scores, for the combobox to be visible.
I need to get all the scores from all the tables, not just the one that appears.
回答1:
So when you select a game week from the drop down (or click the "anterior" or "proximo" links above the drop down), the JavaScript in the page makes a call to the server to get the data for the selected game week. It just sends a URL to the server via GET.
The data is returned in the form of a JSON object, and inside this object is the table HTML. This HTML is loaded into the DOM in the right place and presto, the browser displays the data for that week.
It is a bit of work to get this programmatically, but it can be done. What you can do is determine what the URL is for each week. Hopefully, most of the query strings are constant except for the week in question. So you will have a boilerplate URL that you tweak for the week you want, and send it off to the server. You get the JSON back and parse out the table HTML. Then, you're golden: you just feed that HTML into the Agility Pack and work with it as usual.
I did a little investigation, and using Chrome's Developer Tools, in the Network tab, I found that when I selected a game week, the URL that is sent off to the server looks like so (this is for week 14):
http://br.soccerway.com/a/block_competition_matches_summary?block_id=page_competition_1_block_competition_matches_summary_6&callback_params=%7B%22page%22%3A%229%22%2C%22round_id%22%3A%2217449%22%2C%22outgroup%22%3A%22%22%2C%22view%22%3A%221%22%7D&action=changePage¶ms=%7B%22page%22%3A13%7D
(Note that you can also use other tools, such as Firebug in FireFox or Fiddler to get the URL).
By trying other weeks and comparing, it looks like the (selected week - 1) is found in near the end in the params query string: "...%3A13...". So for week 15 you'd use "...%3A14...". Fortunately it looks like there is only one more area of difference among the URLs for different weeks and it is in the callback_params query string. Unfortunately, I wasn't able to figure out how it connects to the selected week, but hopefully you can.
So when you feed that URL into your browser, you get back the JSON block. If you search for "<table" and "/table>" you'll see the HTML that you want. In your C# code, you can just use a simple regular expression to parse it out of the JSON string:
string json = "..." // load the JSON string here
RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.Singleline;
Regex regx = new Regex( "(?<theTable><table.*/table>)", options );
Match match = regx.Match( json );
if ( match.Success ) {
string tableHtml = match.Groups["theTable"].Value;
}
Feed the HTML string into the Agility Pack and you should be on your way.
来源:https://stackoverflow.com/questions/11413949/xpath-table-changes-as-combobox-changes-too