How to convert escape characters in HTML tags?

前端 未结 3 1115
忘了有多久
忘了有多久 2020-12-07 04:24

How can we directly convert \"\\u003chtml\\u003e\" to \"\"? Conversion of \"\" to \"\\u003chtml\\u003e\"

相关标签:
3条回答
  • 2020-12-07 04:48

    You can use the fmt string formatting package for this scope.

    fmt.Printf("%v","\u003chtml\u003e") // will output <html>
    

    https://play.golang.org/p/ZEot6bxO1H

    0 讨论(0)
  • 2020-12-07 05:02

    You can use the strconv.Unquote() to do the conversion.

    One thing you should be aware of is that strconv.Unquote() can only unquote strings that are in quotes (e.g. start and end with a quote char " or a back quote char `), so we have to manually append that.

    Example:

    // Important to use backtick ` (raw string literal)
    // else the compiler will unquote it (interpreted string literal)!
    
    s := `\u003chtml\u003e`
    fmt.Println(s)
    s2, err := strconv.Unquote(`"` + s + `"`)
    if err != nil {
        panic(err)
    }
    fmt.Println(s2)
    

    Output (try it on the Go Playground):

    \u003chtml\u003e
    <html>
    

    Note: To do HTML text escaping and unescaping, you can use the html package. Quoting its doc:

    Package html provides functions for escaping and unescaping HTML text.

    But the html package (specifically html.UnescapeString()) does not decode unicode sequences of the form \uxxxx, only &#decimal; or &#xHH;.

    Example:

    fmt.Println(html.UnescapeString(`\u003chtml\u003e`)) // wrong
    fmt.Println(html.UnescapeString(`&#60;html&#62;`))   // good
    fmt.Println(html.UnescapeString(`&#x3c;html&#x3e;`)) // good
    

    Output (try it on the Go Playground):

    \u003chtml\u003e
    <html>
    <html>
    

    Note #2:

    You should also note that if you write a code like this:

    s := "\u003chtml\u003e"
    

    This quoted string will be unquoted by the compiler itself as it is an interpreted string literal, so you can't really test that. To specify quoted string in the source, you may use the backtick to specify a raw string literal or you may use a double quoted interpreted string literal:

    s := "\u003chtml\u003e" // Interpreted string literal (unquoted by the compiler!)
    fmt.Println(s)
    
    s2 := `\u003chtml\u003e` // Raw string literal (no unquoting will take place)
    fmt.Println(s2)
    
    s3 := "\\u003chtml\\u003e" // Double quoted interpreted string literal
                               // (unquoted by the compiler to be "single" quoted)
    fmt.Println(s3)
    

    Output:

    <html>
    \u003chtml\u003e
    
    0 讨论(0)
  • 2020-12-07 05:10

    I think it's a common problem. This is how I get it work.

    func _UnescapeUnicodeCharactersInJSON(_jsonRaw json.RawMessage) (json.RawMessage, error) {
        str, err := strconv.Unquote(strings.Replace(strconv.Quote(string(_jsonRaw)), `\\u`, `\u`, -1))
        if err != nil {
            return nil, err
        }
        return []byte(str), nil
    }
    
    func main() {
        // Both are valid JSON.
        var jsonRawEscaped json.RawMessage   // json raw with escaped unicode chars
        var jsonRawUnescaped json.RawMessage // json raw with unescaped unicode chars
    
        // '\u263a' == '☺'
        jsonRawEscaped = []byte(`{"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}`) // "\\u263a"
        jsonRawUnescaped, _ = _UnescapeUnicodeCharactersInJSON(jsonRawEscaped)                        // "☺"
    
        fmt.Println(string(jsonRawEscaped))   // {"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}
        fmt.Println(string(jsonRawUnescaped)) // {"HelloWorld": "안녕, 세상(世上). ☺"}
    }
    

    https://play.golang.org/p/pUsrzrrcDG-

    Hope this helps someone.

    0 讨论(0)
提交回复
热议问题