Removing the url from text using java

前端 未结 7 1795
梦毁少年i
梦毁少年i 2020-12-15 07:59

How to remove the URLs present in text example

String str=\"Fear psychosis after #AssamRiots - http://www.google.com/LdEbWTgD http://www.yahoo.com/mksVZKBz\"         


        
相关标签:
7条回答
  • 2020-12-15 08:21

    Well, you haven't provided any info about your text, so with the assumption of your text looking like this: "Some text here http://www.example.com some text there", you can do this:

    String yourText = "blah-blah";
    String cleartext = yourText.replaceAll("http.*?\\s", " ");
    

    This will remove all sequences starting with "http" and up to the first space character.

    You should read the Javadoc on String class. It will make things clear for you.

    0 讨论(0)
  • 2020-12-15 08:24

    How do you define URL? You might not just want to filter http:// but also https:// and other protocols like ftp://, rss:// or custom protocols.

    Maybe this regular expression would do the job:

    [\S]+://[\S]+

    Explanation:

    • one or more non-whitespaces
    • followed by the string "://"
    • followed by one or more non-whitespaces
    0 讨论(0)
  • 2020-12-15 08:29

    If you can move on towards python then you can find much better solution here using these code,

    import re
    text = "<hello how are you ?> then ftp and mailto and gopher and file ftp://ideone.com/K3Cut rthen you "
    text = re.sub(r"ftp\S+", "", result)
    print(result)
    
    0 讨论(0)
  • 2020-12-15 08:37

    m.group(0) should be replaced with an empty string rather than m.group(i) where i is incremented with every call to m.find() as mentioned in one of the answers above.

    private String removeUrl(String commentstr)
    {
        String urlPattern = "((https?|ftp|gopher|telnet|file|Unsure|http):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)";
        Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(commentstr);
        StringBuffer sb = new StringBuffer(commentstr.length);
        while (m.find()) {
            m.appendReplacement(sb, "");
        }
        return sb.toString();
    }
    
    0 讨论(0)
  • 2020-12-15 08:39

    Input the String that contains the url

    private String removeUrl(String commentstr)
        {
            String urlPattern = "((https?|ftp|gopher|telnet|file|Unsure|http):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)";
            Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
            Matcher m = p.matcher(commentstr);
            int i = 0;
            while (m.find()) {
                commentstr = commentstr.replaceAll(m.group(i),"").trim();
                i++;
            }
            return commentstr;
        }
    
    0 讨论(0)
  • 2020-12-15 08:42

    Note that if your URL contains characters like & and \ then the answers above will not work because replaceAll can't handle those characters. What worked for me was to remove those characters in a new string variable then remove those characters from the results of m.find() and use replaceAll on my new string variable.

    private String removeUrl(String commentstr)
    {
        // rid of ? and & in urls since replaceAll can't deal with them
        String commentstr1 = commentstr.replaceAll("\\?", "").replaceAll("\\&", "");
    
        String urlPattern = "((https?|ftp|gopher|telnet|file|Unsure|http):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)";
        Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
        Matcher m = p.matcher(commentstr);
        int i = 0;
        while (m.find()) {
            commentstr = commentstr1.replaceAll(m.group(i).replaceAll("\\?", "").replaceAll("\\&", ""),"").trim();
            i++;
        }
        return commentstr;
    }    
    
    0 讨论(0)
提交回复
热议问题