Remove/replace html tags in bash

后端 未结 2 1576
梦毁少年i
梦毁少年i 2021-01-02 08:30

I have a file with lines that contain:

  • Some Text: More Text
  • I want to remove the html tags and r

    相关标签:
    2条回答
    • 2021-01-02 09:06

      One way using GNU sed:

      sed -e 's/<\/b>/-/g' -e 's/<[^>]*>//g' file.txt
      

      Example:

      echo "<li><b> Some Text:</b> More Text </li>" | sed -e 's/<\/b>/-/g' -e 's/<[^>]*>//g'
      

      Result:

       Some Text:- More Text
      
      0 讨论(0)
    • 2021-01-02 09:23

      If you strictly want to strip all HTML tags, but at the same time only replace the </b> tag with a -, you can chain two simple sed commands with a pipe:

      cat your_file | sed 's|</b>|-|g' | sed 's|<[^>]*>||g' > stripped_file
      

      This will pass all the file's contents to the first sed command that will handle replacing the </b> to a -. Then, the output of that will be piped to a sed that will replace all HTML tags with empty strings. The final output will be saved into the new file stripped_file.

      Using a similar method as the other answer from @Steve, you could also use sed's -e option to chain expressions into a single (non-piped command); by adding -i, you can also read-in and replace the contents of your original file without the need for cat, or a new file:

      sed -i -e 's|</b>|-|g' -e 's|<[^>]*>||g' your_file
      

      This will do the replacement just as the chained-command above, however this time it will directly replace the contents in the input file. To save to a new file instead, remove the -i and add > stripped_file to the end (or whatever file-name you choose).

      0 讨论(0)
    提交回复
    热议问题