Get a page's last modified date using Java

前端 未结 2 807
野性不改
野性不改 2021-01-22 04:14

Is there a standard way to tell when a page was last modified? Currently I am doing this:

URLConnection uCon = url.openConnection();
uCon.setConnectTimeout(500         


        
相关标签:
2条回答
  • 2021-01-22 04:34

    There is no standard. Dynamically generated web pages generally do not have a Last-Modified field, and different web pages include dates in different ways. Some sites do not even include such a date, including "© <current year>" at the bottom. You could try looking for a date near the bottom or the top, but reliably extracting the date from the web page would have to be site-specific.

    0 讨论(0)
  • 2021-01-22 04:44

    From HTTP/1.1: Header Field Definitions:

    14.29 Last-Modified

    The Last-Modified entity-header field indicates the date and time at which the origin server believes the variant was last modified.

       Last-Modified  = "Last-Modified" ":" HTTP-date
    

    An example of its use is

       Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT
    

    The exact meaning of this header field depends on the implementation of the origin server and the nature of the original resource. For files, it may be just the file system last-modified time. For entities with dynamically included parts, it may be the most recent of the set of last-modify times for its component parts. For database gateways, it may be the last-update time stamp of the record. For virtual objects, it may be the last time the internal state changed.

    An origin server MUST NOT send a Last-Modified date which is later than the server's time of message origination. In such cases, where the resource's last modification would indicate some time in the future, the server MUST replace that date with the message origination date.

    An origin server SHOULD obtain the Last-Modified value of the entity as close as possible to the time that it generates the Date value of its response. This allows a recipient to make an accurate assessment of the entity's modification time, especially if the entity changes near the time that the response is generated.

    HTTP/1.1 servers SHOULD send Last-Modified whenever feasible.

    From this point Last-modified is optional and its value depends of the nature of the original resource.

    0 讨论(0)
提交回复
热议问题