InputStream from a URL

后端 未结 6 363
死守一世寂寞
死守一世寂寞 2020-12-02 14:41

How do I get an InputStream from a URL?

for example, I want to take the file at the url wwww.somewebsite.com/a.txt and read it as an InputStream in Java

相关标签:
6条回答
  • 2020-12-02 15:21

    Here is a full example which reads the contents of the given web page. The web page is read from an HTML form. We use standard InputStream classes, but it could be done more easily with JSoup library.

    <dependency>
        <groupId>javax.servlet</groupId>
        <artifactId>javax.servlet-api</artifactId>
        <version>3.1.0</version>
        <scope>provided</scope>
    
    </dependency>
    
    <dependency>
        <groupId>commons-validator</groupId>
        <artifactId>commons-validator</artifactId>
        <version>1.6</version>
    </dependency>  
    

    These are the Maven dependencies. We use Apache Commons library to validate URL strings.

    package com.zetcode.web;
    
    import com.zetcode.service.WebPageReader;
    import java.io.IOException;
    import java.nio.charset.StandardCharsets;
    import javax.servlet.ServletException;
    import javax.servlet.ServletOutputStream;
    import javax.servlet.annotation.WebServlet;
    import javax.servlet.http.HttpServlet;
    import javax.servlet.http.HttpServletRequest;
    import javax.servlet.http.HttpServletResponse;
    
    @WebServlet(name = "ReadWebPage", urlPatterns = {"/ReadWebPage"})
    public class ReadWebpage extends HttpServlet {
    
        @Override
        protected void doGet(HttpServletRequest request, HttpServletResponse response)
                throws ServletException, IOException {
    
            response.setContentType("text/plain;charset=UTF-8");
    
            String page = request.getParameter("webpage");
    
            String content = new WebPageReader().setWebPageName(page).getWebPageContent();
    
            ServletOutputStream os = response.getOutputStream();
            os.write(content.getBytes(StandardCharsets.UTF_8));
        }
    }
    

    The ReadWebPage servlet reads the contents of the given web page and sends it back to the client in plain text format. The task of reading the page is delegated to WebPageReader.

    package com.zetcode.service;
    
    import java.io.BufferedReader;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.InputStreamReader;
    import java.net.URL;
    import java.nio.charset.StandardCharsets;
    import java.util.logging.Level;
    import java.util.logging.Logger;
    import java.util.stream.Collectors;
    import org.apache.commons.validator.routines.UrlValidator;
    
    public class WebPageReader {
    
        private String webpage;
        private String content;
    
        public WebPageReader setWebPageName(String name) {
    
            webpage = name;
            return this;
        }
    
        public String getWebPageContent() {
    
            try {
    
                boolean valid = validateUrl(webpage);
    
                if (!valid) {
    
                    content = "Invalid URL; use http(s)://www.example.com format";
                    return content;
                }
    
                URL url = new URL(webpage);
    
                try (InputStream is = url.openStream();
                        BufferedReader br = new BufferedReader(
                                new InputStreamReader(is, StandardCharsets.UTF_8))) {
    
                    content = br.lines().collect(
                          Collectors.joining(System.lineSeparator()));
                }
    
            } catch (IOException ex) {
    
                content = String.format("Cannot read webpage %s", ex);
                Logger.getLogger(WebPageReader.class.getName()).log(Level.SEVERE, null, ex);
            }
    
            return content;
        }
    
        private boolean validateUrl(String webpage) {
    
            UrlValidator urlValidator = new UrlValidator();
    
            return urlValidator.isValid(webpage);
        }
    }
    

    WebPageReader validates the URL and reads the contents of the web page. It returns a string containing the HTML code of the page.

    <!DOCTYPE html>
    <html>
        <head>
            <title>Home page</title>
            <meta charset="UTF-8">
        </head>
        <body>
            <form action="ReadWebPage">
    
                <label for="page">Enter a web page name:</label>
                <input  type="text" id="page" name="webpage">
    
                <button type="submit">Submit</button>
    
            </form>
        </body>
    </html>
    

    Finally, this is the home page containing the HTML form. This is taken from my tutorial about this topic.

    0 讨论(0)
  • 2020-12-02 15:24

    Pure Java:

     urlToInputStream(url,httpHeaders);
    

    With some success I use this method. It handles redirects and one can pass a variable number of HTTP headers asMap<String,String>. It also allows redirects from HTTP to HTTPS.

    private InputStream urlToInputStream(URL url, Map<String, String> args) {
        HttpURLConnection con = null;
        InputStream inputStream = null;
        try {
            con = (HttpURLConnection) url.openConnection();
            con.setConnectTimeout(15000);
            con.setReadTimeout(15000);
            if (args != null) {
                for (Entry<String, String> e : args.entrySet()) {
                    con.setRequestProperty(e.getKey(), e.getValue());
                }
            }
            con.connect();
            int responseCode = con.getResponseCode();
            /* By default the connection will follow redirects. The following
             * block is only entered if the implementation of HttpURLConnection
             * does not perform the redirect. The exact behavior depends to 
             * the actual implementation (e.g. sun.net).
             * !!! Attention: This block allows the connection to 
             * switch protocols (e.g. HTTP to HTTPS), which is <b>not</b> 
             * default behavior. See: https://stackoverflow.com/questions/1884230 
             * for more info!!!
             */
            if (responseCode < 400 && responseCode > 299) {
                String redirectUrl = con.getHeaderField("Location");
                try {
                    URL newUrl = new URL(redirectUrl);
                    return urlToInputStream(newUrl, args);
                } catch (MalformedURLException e) {
                    URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
                    return urlToInputStream(newUrl, args);
                }
            }
            /*!!!!!*/
    
            inputStream = con.getInputStream();
            return inputStream;
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
    

    Full example call

    private InputStream getInputStreamFromUrl(URL url, String user, String passwd) throws IOException {
            String encoded = Base64.getEncoder().encodeToString((user + ":" + passwd).getBytes(StandardCharsets.UTF_8));
            Map<String,String> httpHeaders=new Map<>();
            httpHeaders.put("Accept", "application/json");
            httpHeaders.put("User-Agent", "myApplication");
            httpHeaders.put("Authorization", "Basic " + encoded);
            return urlToInputStream(url,httpHeaders);
        }
    
    0 讨论(0)
  • 2020-12-02 15:44

    Your original code uses FileInputStream, which is for accessing file system hosted files.

    The constructor you used will attempt to locate a file named a.txt in the www.somewebsite.com subfolder of the current working directory (the value of system property user.dir). The name you provide is resolved to a file using the File class.

    URL objects are the generic way to solve this. You can use URLs to access local files but also network hosted resources. The URL class supports the file:// protocol besides http:// or https:// so you're good to go.

    0 讨论(0)
  • 2020-12-02 15:45

    (a) wwww.somewebsite.com/a.txt isn't a 'file URL'. It isn't a URL at all. If you put http:// on the front of it it would be an HTTP URL, which is clearly what you intend here.

    (b) FileInputStream is for files, not URLs.

    (c) The way to get an input stream from any URL is via URL.openStream(), or URL.getConnection().getInputStream(), which is equivalent but you might have other reasons to get the URLConnection and play with it first.

    0 讨论(0)
  • 2020-12-02 15:46

    Use java.net.URL#openStream() with a proper URL (including the protocol!). E.g.

    InputStream input = new URL("http://www.somewebsite.com/a.txt").openStream();
    // ...
    

    See also:

    • Using java.net.URLConnection to fire and handle HTTP requests
    0 讨论(0)
  • 2020-12-02 15:46

    Try:

    final InputStream is = new URL("http://wwww.somewebsite.com/a.txt").openStream();
    
    0 讨论(0)
提交回复
热议问题