First part of question: p tag
I have a string that contains text with unnecessary line breaks caused by p tags, example:
hi ev
try using str_replace
$content = str_replace(array("<p> </p>\n", " <br />\n"), array('', ''), $content);
To use regex:
$content = preg_replace('/((<p\s*\/?>\s*) (<\/p\s*\/?>\s*))+/im', "<p> </p>\n", $content);
and for BRs
$content = preg_replace('/( (<br\s*\/?>\s*)|(<br\s*\/?>\s*))+/im', "<br />\n", $content);
EDIT Heres why your regex works (hopefully so you can understand it a bit :) ):
/((\\n\s*))+/im
^ ^^^ ^^ ^^^^
| \|/ || ||\|
| | || || -- Flags
| | || |-- Regex End Character
| | || -- One or more of the preceeding character(s)
| | |-- Zero or More of the preceeding character(s)
| | -- String Character
| -- Newline Character (Escaped)
-- Regex Start Character
Every regex expression must start and end with the same character. In this case, i've used the forward slash character.
The ( character indicates an expression block (to replace)
The Newline character is \n
. Because the backslash is used as the escape character in regex, you will need to escape it: \\n
.
The string character is \s
. This will search for a string. The *
character means to search for 0 or more of the preceeding expression, in this case, search for zero or more strings: \s*
.
The + symbols searches for ONE or more of the preceeding expresssion. In this case, the preceeding expression is (\\n\s*)
, so as long as that expression is found once or more, the preg_replace function will find something.
The flags I've used i
and m
means case *I*nsensitive, (not really needed for a newline expression), and *M*ultiline - meaning the expression can go over multiple lines of code, instead of the code needing to be on one line.