Checking for blank lines before PHP opening or closing tag

非 Y 不嫁゛ 提交于 2019-12-11 08:35:40

问题


I have an error in my WordPress website (XML parsing error) because there is a blank line before the <DOCTYPE>. Probably this is caused by a blank line in one of the theme or plugin files before the PHP opening tag <?php or after the closing tag ?>. I already checked some files (theme index.php, header.php, functions.php and a few plugins) but did not found the cause.

Is there a smart trick to check all files for any blank lines before or after the php tags? Some Regex maybe? Or otherwise any method to check which theme file or plugin file outputs this line?


回答1:


I don't think that just

  • a DOS/Windows line termination - carriage return \r plus line-feed \n pair, or
  • a UNIX line termination - only a line-feed \n

at top of the file is the problem. Those whitespace characters are usually ignored.

I suppose that you have created the files as UTF-8 encoded files with byte order mark (BOM) at beginning. Text editors and IDEs do not display the BOM of a Unicode encoded file.

The UTF-8 BOM is 0xEF 0xBB 0xBF displayed with Windows-1252 code page as  if text editors would display them. The text editor UltraEdit allows to override the automatic Unicode detection on using File - Open and selecting in the file opening dialog ASCII on Open as option to open a UTF-8 encoded file as ASCII/ANSI file. Than the UTF-8 BOM at beginning of an UTF-8 encoded Unicode file with BOM can be seen also in text editing mode.

A very simple search to find files with a UTF-8 BOM at top is searching for files containing the string . Or if you do not want to depend on a code page, run a Perl regular expression search with the expression \xEF\xBB\xBF.

Using an empty string as replace string should result in removing the UTF-8 BOM from all files.

\R can be used to match a DOS/Windows or UNIX or MAC line termination. In other words \R is equivalent to (?:\r\n|\n|\r) or shorter (?:\r?\n|\r)

However, because of my byte order mark suspicion I suggest to use as search string

(?:\xEF\xBB\xBF\s*|\s+)(?=<\?php)

Explanation:

(?:...) ... a non marking group for the OR expression.

\xEF\xBB\xBF\s* ... a UTF-8 BOM with zero or more whitespaces appended.

| ... means OR.

\s+ ... a whitespace character one or more times.

(?=<\?php) ... a positive lookahead to check if the next characters are <?php without really matching them.

That search string is not limited to beginning of a file. But perhaps it is nevertheless enough for your needs to find files with a UTF-8 BOM or with a blank line at beginning of a PHP file.




回答2:


Generally this issue is seen in Wordpress-generated XML documents such as RSS and atom feeds as well as XML sitemaps. In such cases the bug is not an anomalous BOM in the UTF-8 document, but rather an issue caused by PHP's propensity to consider everything following its closing '?>' as data to be sent to output. A blank line following the closing '?>' tag will be interpreted as an instruction to send a LF to the output document. If this happens before the document itself is buffered, the result is a XML document with a LF (blank line) before the xml declaration, rendering it invalid XML. You will then see something like this when you examine the xml output in a browser:

This page contains the following errors:

error on line 2 at column 6: XML declaration allowed only at the start of the document

The recommended solution is to look through all of the PHP files in the Wordpress theme, see if any closing '?>' PHP tags present have line feeds or carriage returns following them, and remove them for the fix. Unfortunately this is easier said than done, considering the number of files in the theme as well as the core Wordpress install, any one of which could host the bug.

My original solution was a small Perl script that checked every PHP file under /usr/share/wordpress for this issue. However I later found a very elegant PHP-only solution by Michal "Wejn" Jirků at http://wejn.org/stuff/wejnswpwhitespacefix.php.html, with additional debugging info contributed by Eric Auer. The authors provide a small script (wejnswpwhitespacefix.php) with a function that inserts itself into the output chain when called, and parses all content delivered to it for valid headers. If valid content is found, the script creates a new PHP output buffer by calling ob_start() and buffers this content for eventual output. The crux of this solution is the PHP ob_start function, which creates a new output buffer when called. PHP output buffers are stackable and are nested, so that actual output happens in the order of creation of the buffers. If the content is invalid, such as a single linefeed, it is rejected.

As the actual extra LF bug can happen anywhere in the output chain from the theme's own PHP files (typically functions.php) through index.php or up the chain to the the core WP files such as wp-settings.php, wp-config.php, wp-load.php etc., the recommendation is to insert the file at each stage to see if it solves the issue. If it does, that means the error lies in that stage, so it becomes much simpler to locate the offending whitespace and fix it. This is in general a much better way to resolve the issue than to just insert the file somewhere where it works and leave it there, as in that case the issue is not being fixed but rather worked around.



来源:https://stackoverflow.com/questions/27336082/checking-for-blank-lines-before-php-opening-or-closing-tag

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!