Getting the Specified content into the buffer in c [closed]

后端未结

关注

 3  1075

失恋的感觉

相关标签:

3条回答

名媛妹妹

2021-01-26 06:59

You could try to strip the HTML but this might not work properly if there's more content outside tags (would require more specific filtering, e.g. checking the surrounding tag name).

Untested but should work:

char *html = ...; // html being a pointer to the document's contents
int ip = 0; // the input position
int op = 0; // the ouput position
int in_tag = 0; // are we inside a html tag?
char c; // current character
while(c = html[ip++])
{
    if(c == '<')
        in_tag = 1;
    else if(c == '>')
        in_tag = 0;
    else if(c == '\n' || c == '\r') // strip line breaks
        ;
    else if(!in_tag)
        html[op++] = c;
}
html[op] = '\0';

0 讨论(0)

后悔当初

2021-01-26 07:23

Basically you want to scan the buffer and ignore everything that is between < and >:

char *get_text (char *dst, char *src) {
  int html = 0;
  char ch;

  while (ch = *src++) {
    if (ch == '<' || ch == '>') {
      html = (ch == '<');
    } else if (!html) {
      *dst++ = ch;
    }
  }

  *dst = '\0';
  return dst;
}

0 讨论(0)

遥遥无期

2021-01-26 07:25

Use libexpat. This is a stream-oriented xml parser written in C. You can register a handler for BODY tag and read the content

Have a look at this question Geting xml data using xml parser expat

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题