How to parse a Google CSE results located on a site in Java?

北城余情 提交于 2019-12-11 11:12:57

问题


I want to parse a Custom Search Element JavaScript function. Here's a template of this function https://developers.google.com/custom-search/docs/element#overview.

<!-- Put the following javascript before the closing  tag. -->
<script>
(function() {
  var cx = '123:456'; // Insert your own Custom Search engine ID here
  var gcse = document.createElement('script'); gcse.type = 'text/javascript'; gcse.async = true;
  gcse.src = 'https://cse.google.com/cse.js?cx=' + cx;
  var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(gcse, s);
})();
</script>

<!-- Place this tag where you want both of the search box and the search results to render -->
<gcse:search></gcse:search>

I want to parse this function from this site http://findmusicbylyrics.com/search.php?cx=partner-pub-1936238606905173%3A1893984547&cof=FORID%3A10&ie=UTF-8&q=Love&sa=Search+Lyrics which it's JavaScript is:

<script>
(function() {
    var cx = 'partner-pub-1936238606905173:8242090140';
    var gcse = document.createElement('script');
    gcse.type = 'text/javascript';
    gcse.async = true;
    gcse.src = 'http://www.google.com/cse/cse.js?cx=' + cx;
    var s = document.getElementsByTagName('script')[0];
    s.parentNode.insertBefore(gcse, s);
})();
</script>
<gcse:search></gcse:search>

Now i have no idea of where to start with it. I've done some HTML parsing using java Jsoup but this is the first time i bump into this CSE <script> tag to parse. Any suggestions will be very appreciated.


回答1:


I've done some HTML parsing using java Jsoup but this is the first time i bump into this CSE tag to parse.

You'll fetch the page and then find the script element. Once done, you'll call the html() method on this element.

HELPER FUNCTION

/**
 * 
 * Extract the Custom Search Element JavaScript of a site.
 * 
 * @param url
 *            The site url
 * @param cssQuery
 *            The query for finding the script element
 * @return the content of the between the tags &lt;script> and &lt;/script>
 * @throws IOException
 *             If the CSE Javscript is not found or an error occured during
 *             {@code url} fetching.
 * 
 */
public static String getCustomSearchElementJavascript(String url, String cssQuery) throws IOException {
    Document doc = Jsoup.connect(url).get();

    Element script = doc.select(cssQuery).first();

    if (script == null) {
        throw new IOException("Unable to find Custom Search Element JavaScript.");
    }

    return script.html();
}

SAMPLE CODE

String url = "http://findmusicbylyrics.com/search.php?cx=partner-pub-1936238606905173%3A1893984547&cof=FORID%3A10&ie=UTF-8&q=Love&sa=Search+Lyrics+";

System.out.println( getCustomSearchElementJavascript(url, "div#content > script") );

OUTPUT

(function() {
    var cx = 'partner-pub-1936238606905173:8242090140';
    var gcse = document.createElement('script');
    gcse.type = 'text/javascript';
    gcse.async = true;
    gcse.src = 'http://www.google.com/cse/cse.js?cx=' + cx;
    var s = document.getElementsByTagName('script')[0];
    s.parentNode.insertBefore(gcse, s);
})();


来源:https://stackoverflow.com/questions/35994781/how-to-parse-a-google-cse-results-located-on-a-site-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!