问题
I have a string which contains some HTML encoded characters and I want to remove them:
"<div>Hi All,</div><div class=\"paragraph_break\">< /></div><div>Starting today we are initiating PoLS.</div><div class=\"paragraph_break\"><br /></div><div>Please use the following communication protocols:<br /></div><div>1. Task Breakup and allocation - Gravity<br /></div><div>2. All mail communications - BC messages<br /></div><div>3. Reports on PoC / Spikes: Writeboard<br /></div><div>4. Non story related tasks: BC To-Do<br /></div><div>5. All UI and HTML will communicated to you through BC.<br /></div><div>6. For File sharing, we'll be using Dropbox.<br /></div><div>7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.</div><div class=\"paragraph_break\"><br /></div><div>You'll have been given necessary accesses to all these portals. Please start using them judiciously.</div><div class=\"paragraph_break\"><br /></div><div>All the best!</div><div class=\"paragraph_break\"><br /></div><div>Thanks,<br /></div><div>Saurav<br /></div>"
回答1:
What you want to do is doable many ways. Perhaps looking at why you might want to do that will help. Usually when I want to remove encoded HTML, I want to recover the contents of the HTML. Ruby has some modules that make it easy.
require 'cgi'
require 'nokogiri'
html = "<div>Hi All,</div><div class=\"paragraph_break\">< /></div><div>Starting today we are initiating PoLS.</div><div class=\"paragraph_break\"><br /></div><div>Please use the following communication protocols:<br /></div><div>1. Task Breakup and allocation - Gravity<br /></div><div>2. All mail communications - BC messages<br /></div><div>3. Reports on PoC / Spikes: Writeboard<br /></div><div>4. Non story related tasks: BC To-Do<br /></div><div>5. All UI and HTML will communicated to you through BC.<br /></div><div>6. For File sharing, we'll be using Dropbox.<br /></div><div>7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.</div><div class=\"paragraph_break\"><br /></div><div>You'll have been given necessary accesses to all these portals. Please start using them judiciously.</div><div class=\"paragraph_break\"><br /></div><div>All the best!</div><div class=\"paragraph_break\"><br /></div><div>Thanks,<br /></div><div>Saurav<br /></div>"
puts CGI.unescapeHTML(html)
which outputs:
<div>Hi All,</div><div class="paragraph_break">< /></div><div>Starting today we are initiating PoLS.</div><div class="paragraph_break"><br /></div><div>Please use the following communication protocols:<br /></div><div>1. Task Breakup and allocation - Gravity<br /></div><div>2. All mail communications - BC messages<br /></div><div>3. Reports on PoC / Spikes: Writeboard<br /></div><div>4. Non story related tasks: BC To-Do<br /></div><div>5. All UI and HTML will communicated to you through BC.<br /></div><div>6. For File sharing, we'll be using Dropbox.<br /></div><div>7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.</div><div class="paragraph_break"><br /></div><div>You'll have been given necessary accesses to all these portals. Please start using them judiciously.</div><div class="paragraph_break"><br /></div><div>All the best!</div><div class="paragraph_break"><br /></div><div>Thanks,<br /></div><div>Saurav<br /></div>
If I want to take it a step farther and remove the tags, retrieving all the text:
puts Nokogiri::HTML(CGI.unescapeHTML(html)).content
Will output:
Hi All,Starting today we are initiating PoLS.Please use the following communication protocols:1. Task Breakup and allocation - Gravity2. All mail communications - BC messages3. Reports on PoC / Spikes: Writeboard4. Non story related tasks: BC To-Do5. All UI and HTML will communicated to you through BC.6. For File sharing, we'll be using Dropbox.7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.You'll have been given necessary accesses to all these portals. Please start using them judiciously.All the best!Thanks,Saurav
Which is where I usually want to get when I see that sort of string.
Ruby's CGI makes encoding and decoding HTML easy. The Nokogiri gem makes it easy to remove the tags.
回答2:
I would suggest:
clean = str.gsub /<.+?>/, ''
回答3:
I think the easiest way to do this is, Assuming you want to use the html in the string.
raw CGI.unescapeHTML('The string you want to manipulate')
回答4:
If you have assigned that string to a variable s
, is this the result you want?
puts s.gsub(/<[^&]*>/, '')
来源:https://stackoverflow.com/questions/8929006/how-do-i-remove-html-encoded-characters-from-a-string