问题
Hello, I have a long HTML document, this is only the part that interests me:
<iframe class="goog-te-menu-frame skiptranslate" src="javascript:void(0)" frameborder="0" style="display: none; visibility: visible;"></iframe><div class="chatbox3"><div class="chatbox2"><div class="chatbox"><div class="logwrapper" style="top: 89px; margin-right: 168px;"><div class="logbox"><div style="position: relative; min-height: 100%;"><div class="logitem"><p class="statuslog">You're now chatting with a random stranger. Say hi!</p></div><div class="logitem"><p class="strangermsg"><strong class="msgsource">Stranger:</strong> <span>hii there</span></p></div><div class="logitem"><p class="strangermsg"><strong class="msgsource">Stranger:</strong> <span>nice to meet you</span></p></div><div class="logitem"><p class="strangermsg"><strong class="msgsource">Stranger:</strong> <span>this is a text</span></p></div><div class="logitem"><p class="youmsg"><strong class="msgsource">You:</strong> <span>this text should not be taken</span></p></div><div class="logitem"><p class="statuslog">Stranger has disconnected.</p></div><div class="logitem"><div class="statuslog">
It outputs as follows:
You're now chatting with a random stranger. Say hi!
Stranger: hii thereStranger: nice to meet youStranger: this is a textYou: this text should not be takenStranger has disconnected.I want to extract all messages sent by Stranger into strings (Visual Basic), and ignore messages sent by me and system messages such as You are now chatting with a random stranger. Sai hi!
and Stranger has disconnected.
I have no idea on how I should approach this and need help, thank you.
回答1:
If anyone else is interested in such an operation, I've managed to simplify the process by applying the HTML code to another webbrowser then using the Document.Body.InnerHtml
property to get the text output in a richtextbox, so I can easily deal with the text instead of dealing with the HTML code.
OmegleHTML.Text = Omegle.Document.Body.InnerHtml
WebBrowser1.Document.Body.InnerHtml = OmegleHTML.Text
Log.Text = WebBrowser1.Document.Body.OuterText
I've also used the following code to get rid of any irrelevant text before the chat log:
Dim SInd, Eind As Integer
SInd = 0
Eind = Log.Text.IndexOf("You're now chatting with a random stranger. Say hi!")
Log.Text = Log.Text.Remove(SInd, Eind)
This is the closest I've got. If you have a better answer, please post it.
来源:https://stackoverflow.com/questions/28934060/need-to-extract-text-messages-out-of-an-html-document