How to get page meta (title, description, images) like facebook attach url using Regex in java

前端未结

关注

 4  1647

孤城傲影

How to get page meta (title, description, images) like facebook attach url using Regex in .java

相关标签:

4条回答

忘了有多久

2021-02-19 09:04
I use JSOUP to get a Document object, then use something like the below method to get tags for each property I'm looking for.
```
String findTag(Document document, String property) {
    String tag = null;
    String cssQuery = "meta[property='og:" + property + "']";
    Elements elements = document.select(cssQuery);

    if (elements != null && elements.size() >= 1) {
        tag = elements.first().attr("content");
    }
    return tag;
}
```
I used this often enough to where I decided to combine the fetching (with JSOUP) and parsing together into a library called ogmapper.
0 讨论(0)
发布评论:

提交评论
- 加载中...

太阳男子

2021-02-19 09:06

How about this? Below statement parse all tags start with "og:". It's useful.

doc.select("meta[property^=og:]")

void parseOGTag(String response) {
    // Parse og tags
    Document doc = Jsoup.parse(response);
    Elements ogTags = doc.select("meta[property^=og:]");
    if (ogTags.size() <= 0) {
        return;
    }

    // Set OGTags you want
    String title;
    String desc;
    String image;
    for (int i = 0; i < ogTags.size(); i++) {
        Element tag = ogTags.get(i);

        String text = tag.attr("property");
        if ("og:image".equals(text)) {
            image = tag.attr("content");
        } else if ("og:description".equals(text)) {
            desc = tag.attr("content");
        } else if ("og:title".equals(text)) {
            title = tag.attr("content");
        }
    }                    
}

0 讨论(0)

情歌与酒

2021-02-19 09:08
As Ishikawa Yoshi mentioned, use JSoup

Example:
```
Document doc = Jsoup.connect("http://example.com/").get()
for(Element meta : doc.select("meta")) {
    System.out.println("Name: " + meta.attr("name") + " - Content: " + meta.attr("content"));
}
```
This code is untested, hope this helps.

Using RegEx for scraping a document is a bad idea, read about it on Coding Horror
0 讨论(0)
发布评论:

提交评论
- 加载中...

一整个雨季

2021-02-19 09:10

Here's a snippet that reads a web page and builds a little chunk of HTML that will display the Open Graph image, and Title to the right wrapping around the image. It falls back to using just html title if OG tags are missing, so it can work to represent all web pages.

public static String parsePageHeaderInfo(String urlStr) throws Exception {

    StringBuilder sb = new StringBuilder();
    Connection con = Jsoup.connect(urlStr);

    /* this browseragant thing is important to trick servers into sending us the LARGEST versions of the images */
    con.userAgent(Constants.BROWSER_USER_AGENT);
    Document doc = con.get();

    String text = null;
    Elements metaOgTitle = doc.select("meta[property=og:title]");
    if (metaOgTitle!=null) {
        text = metaOgTitle.attr("content");
    }
    else {
        text = doc.title();
    }

    String imageUrl = null;
    Elements metaOgImage = doc.select("meta[property=og:image]");
    if (metaOgImage!=null) {
        imageUrl = metaOgImage.attr("content");
    }

    if (imageUrl!=null) {
        sb.append("<img src='");
        sb.append(imageUrl);
        sb.append("' align='left' hspace='12' vspace='12' width='150px'>");
    }

    if (text!=null) {
        sb.append(text);
    }

    return sb.toString();       
}

0 讨论(0)