问题
Been struggling for 2 days. I'm using C# and HtmlAgilityPack within a .NET 4.5 winforms project to extract data from a website (the field I want to extract is $ flow and B/S ratio). I get to the field (flow : /n/t/t/t; instead of flow 245 M) but I have no value. I have no idea why I get no value when I query while I see the value in the web page. Would like to see if someone else finds the reasons of nodes =null result of my query. This is the url of athe queried web page : http://finance.avafin.com/tradeFlow?type=BS_RATIO&date=06%2F14%2F2013&alertId=0&symbol=spy§orId=0&industryId=0
I use the url above as a query.
Notice that I used the below method but with a different query on another webpage and it worked, there is somethig that does not work with current query or I suspect an obfuscation of the field for this current web page.
Method used:
/// <summary>
/// Gets the data.
/// </summary>
/// <param name="url"> The URL. </param>
/// <returns> </returns>
public List<string> GetFlowData(string url)
{
// ('//a[contains(@href, "genre")]')
// <td class=" sorting_1">137.27B</td>
//*[@id="tf_data"]/tbody/tr[1]/td[8] // this is the xpath as seen in navigator for first value => I get no value when used as a query => (nodes = null)
//*[@id="tf_data"]/tbody/tr[1]/td[9] // this is the xpath as seen in navigator for second value => I get no value when used as a query => (nodes = null)
// //td[@class=''] => nodes null too
// I see the b/s ratio node in body but no value /n/ttt instead using [@id='tf_data']/tbody
var nodes = LoadHtmlDoc(url, "//*[@id='tf_data']/tbody");
List<string> tickers = new List<string>();
if (nodes == null)
{
return new List<string> { "Ticker not available" };
}
int i = 0;
foreach (var v in nodes)
{
i++;
MessageBox.Show(v.InnerText + " " + i.ToString());
//// The placement of the data containing bought/sold ratio
//if (i == 7)
//{
// tickers.Add(v.InnerText);
//}
//// The placement of the data containing $ Flow
//if (i == 8)
//{
// tickers.Add(CleanFlowData(v.InnerText));
//}
}
return tickers;
}
回答1:
Page you are querying does not contain any data in table with id th_data
. If you will examine page markup, you'll see:
<table cellpadding="0" cellspacing="0" border="0" class="display" id="tf_data">
<thead>
<tr height="10">
<th align="center"></th>
<th align="center" width="90">CHART</th>
<th align="left" width="70">SYMBOL</th>
<th align="left">MARKET CAP</th>
<th align="right" width="65">PRICE</th>
<th align="center" width="80">CHANGE</th>
<th align="right">VOL</th>
<th align="right">B/S RATIO</th>
<th align="right" width="80">NET CASH FLOW</th>
</tr>
</thead>
<tbody> <-- empty!
</tbody>
</table>
All data are added to this table by browser via Java Script after document is loaded (see $(document).ready
function). So if you are getting html from that url, there will be no data until browser will run Java Script code. I.e. there is nothing you can parse.
I suggest you to examine script which loads JSON data into page, and simply call same service from your code.
Its out of question scope, but for retrieving data you can use HttpClient
class from System.Net.Http
assembly. Here is sample of usage (its up to you to analyze how query string should be composed):
HttpClient client = new HttpClient();
client.BaseAddress = new Uri("http://finance.avafin.com");
string url = "data?sEcho=2&iColumns=9&sColumns=&iDisplayStart=0&iDisplayLength=20&mDataProp_0=0&mDataProp_1=1&mDataProp_2=2&mDataProp_3=3&mDataProp_4=4&mDataProp_5=5&mDataProp_6=6&mDataProp_7=7&mDataProp_8=8&sSearch=&bRegex=false&sSearch_0=&bRegex_0=false&bSearchable_0=true&sSearch_1=&bRegex_1=false&bSearchable_1=true&sSearch_2=&bRegex_2=false&bSearchable_2=true&sSearch_3=&bRegex_3=false&bSearchable_3=true&sSearch_4=&bRegex_4=false&bSearchable_4=true&sSearch_5=&bRegex_5=false&bSearchable_5=true&sSearch_6=&bRegex_6=false&bSearchable_6=true&sSearch_7=&bRegex_7=false&bSearchable_7=true&sSearch_8=&bRegex_8=false&bSearchable_8=true&iSortCol_0=4&sSortDir_0=asc&iSortingCols=1&bSortable_0=true&bSortable_1=true&bSortable_2=true&bSortable_3=true&bSortable_4=true&bSortable_5=true&bSortable_6=true&bSortable_7=true&bSortable_8=true&type=BS_RATIO&date=06%2F14%2F2013&categoryName=&alertId=0&alertId2=&industryId=0§orId=0&symbol=spy&recom=&period=&perfPercent=";
var response = client.GetStringAsync(url).Result;
Response will contain html which you can parse.
来源:https://stackoverflow.com/questions/17163704/htmlagilitypack-query-returning-no-value