The previous code is a point where start. The next step is delete html tags with the regular expressions. Look for ereg and eregi functions. Some other tricks are required for style and script tags (you have to remove the content)
Points and commas have to be removed too...