Trim Not Working with Array from MySQL fetched String

≯℡__Kan透↙ 提交于 2019-12-11 09:05:19

问题


What I'm trying to do is take a block of html, strip out all the html tags, and put each line of text into a PHP array.

I'm just trying it with one block to test (hence the WHERE ID = '2409' in my mysql query.

The HTML portion for ID 2409 looks like this:

<table class="description-table">
<tbody>
<tr><td>Saepe Encomia 2.aD NEC Mirum Populo Soluni Iis 8679-1370 Status Error Sed 9.9</td></tr>
<tr><td>Description</td></tr>
<tr><td></td>
<td><br>
<br><p></p><p></p>
<strong><br></strong> <strong><br></strong> <strong>Donec Rem </strong><br>
<br>
<strong>Animam Urgebat<br>
<br></strong> <strong><br>
<br>
Rerum Sed 8613 - 3669 8358 & 6699<br>
<br>
1.mE (magNA) QUO Ad Nominum Statum Massa<br>
ab SEM Autem Reddet Habitu Sit<br>
<br></strong> <strong> PRAEDAM ACCUMSAN PERSONARUM DENEGARE AC DUORUM</strong> <strong><br></strong> <strong><br></strong> <strong>Lius typi sit nec quo adversis cras ministri oppressa, versus class hic rem quos colubros ullo commune!economy!</strong><strong><br></strong><strong>                                                           ad Quisque Modeste</strong><strong>                                                           ac Rem Wisi</strong><strong>                                                           ex Hac Congue mus Leo</strong><strong>                                                           ab 7/92" Alias</strong><strong>                                                           ad 2/73" Adverso & Erat</strong><strong>                                                           me Personom Eget</strong><strong>                                                           ad Viribus Fuga Fuga</strong><strong>                                                           ab Louor-Sit Molles</strong><strong class="c2">                                                           3x Block-Off Plates</strong><strong class="c2">                                                           ad Facunda</strong><strong class="c2">                                                           ab Personas Diam<br>
NUNC<br>
ex Teniet te Palmam Eaque<br>
me Teniet in Versus Urna<br></strong> <strong><br></strong><br>
<strong class="c3">**CONDEMNENDUS REM CUM MAGNORUM**</strong><strong></strong><br>
</td>
</table>

And here's my PHP script designed to parse this

//connect to mysqli

$results = $mysqli->query("SELECT ID, post_content
FROM wp_posts'
WHERE ID = '2409';");

while($row = $results->fetch_array()) {
    $htmlarray2 = preg_split('/<.+?>/', $row['post_content']);
    $htmlarray = array_values(array_filter(array_map('trim', $htmlarray2)));
    echo '<pre>';
        print_r($htmlarray);
    echo '</pre>';
    . . . 
}

This produces an output like this

Array
(
[0] => Saepe Encomia 2.aD NEC Mirum Populo Soluni Iis 8679-1370 Status Error Sed 9.9
[1] => Donec Rem 
[2] => Animam Urgebat
[3] => Rerum Sed 8613 - 3669 8358 & 6699
[4] => 1.mE (magNA) QUO Ad Nominum Statum Massa
[5] => ab SEM Autem Reddet Habitu Sit
[6] =>  PRAEDAM ACCUMSAN PERSONARUM DENEGARE AC DUORUM
[7] => Lius typi sit nec quo adversis cras ministri oppressa, versus class hic rem quos colubros ullo commune!
[8] =>                                                            ad Quisque Modeste
[9] =>                                                            ac Rem Wisi
[10] =>                                                            ex Hac Congue mus Leo
[11] =>                                                            ab 7/92" Alias
[12] =>                                                            ad 2/73" Adverso & Erat
[13] =>                                                            me Personom Eget
[14] =>                                                            ad Viribus Fuga Fuga
[15] =>                                                            ea Totam Poenam
[16] =>                                                            ab Louor-Sit Molles
[17] =>                                                            ad Facunda
[18] =>                                                            ab Personas Diam
[19] => NUNC
[20] => ex Teniet te Palmam Eaque
[21] => me Teniet in Versus Urna
[22] => **CONDEMNENDUS REM CUM MAGNORUM**
)

This is okay, but now I'm having issue with removing the white-spaces before and after the strings in the array.

Let's take an example for the node 8 in the array

. . .
$arrayvalue = $htmlarray2['8'];

which echoes like this

                                                       ad Quisque Modeste

Now, what I'm trying to do is obviously trim each element of the array, but for testing I'm just working with this one variable $arrayvalue.

My issue is that trim() isn't working with this MySQL fetched variable. Meaning adding trim($arrayvalue); has no affect and echoes out the same way as above.

I know this is something to do with me fetching the array via my query, because if I just test this variable out normally in it's own php script

$string = '                                                            ad Quisque Modeste  ';
echo trim($string);

It works fine, and echo outputs just simply ad Quisque Modeste with the desired no white-spaces before or after the string.

Why isn't trim() working in my while loop? What's the trick to trimming the leading and trailing white-spaces from the elements?

Edit: Here's my full while loop as requested. It's a bit different then the above example (I've been doing a lot of modifications trying to solve this myself so it's constantly changing), but here is what I have right now in full:

while($row = $results->fetch_array()) {
    $id = $row['ID'];
    echo 'ID: ' . $id;
    echo '<br  />';

    //replace &nbsp; with white space
    $converted = strtr($row['post_content'],array_flip(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES))); 
    trim($converted, chr(0xC2).chr(0xA0));

    //remove html elements
    $htmlarray = preg_split('/<.+?>/', $converted);

    // remove empty array elements and re-index array
    $htmlarray2 = array_values(array_filter(array_map('trim', $htmlarray)));

    // test by getting single value from array
    $arrayvalue = $htmlarray2['9'];

    // my attempt to trim string in while loop
    trim($arrayvalue);

    // doesn't trim
    echo '<hr>' . $arrayvalue . '<hr>';

    // put this here so I can see the full array
    echo '<pre>';
        print_r($htmlarray2);
    echo '</pre>';
}

As requested, here is the results of var_export($row['post_content']);

'<table class="product-description-table">
<tbody>
<tr>
<td class="item" colspan="3">Saepe Encomia 2.aD NEC Mirum Populo Soluni Iis 8679-1370 Status Error Sed 9.9</td>
</tr>
<tr>
<td class="title" colspan="3"></td>
</tr>
<tr>
<td class="content"><br>
<br>
<p class="c1"></p>
<p class="c1"></p>
<strong><br></strong> <strong><br></strong> <strong>Donec Rem&nbsp;</strong><br>
<br>
<strong>Animam Urgebat<br>
<br></strong> <strong><br>
<br>
Rerum Sed 8613 - 3669 8358 & 6699<br>
<br>
1.mE (magNA) QUO Ad Nominum Statum Massa<br>
ab SEM Autem Reddet Habitu Sit<br>
<br></strong> <strong>&nbsp;PRAEDAM ACCUMSAN PERSONARUM DENEGARE AC DUORUM</strong> <strong><br></strong> <strong><br></strong> <strong>Lius typi sit nec quo adversis cras ministri oppressa, versus class hic rem quos colubros ullo commune!economy!</strong><strong><br></strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ad Quisque Modeste</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ac Rem Wisi</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ex Hac Congue mus Leo</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ab 7/92" Alias</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ad 2/73" Adverso & Erat</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;me Personom Eget</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ad Viribus Fuga Fuga</strong><strong>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ab Louor-Sit Molles</strong><strong class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3x Block-Off Plates</strong><strong class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ad Facunda</strong><strong class="c2">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ab Personas Diam<br>
NUNC<br>
ex Teniet te Palmam Eaque<br>
me Teniet in Versus Urna<br></strong> <strong><br></strong><br>
<strong class="c3">**CONDEMNENDUS REM CUM MAGNORUM**</strong><strong>&nbsp;</strong><br></td>
<td class="product-content-border"></td>
</tr>
<tr>
<td class="gallery" colspan="3">
<table>
<tbody>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td class="spacer" colspan="3"></td>
</tr>
<tr>
<td class="product-content-border"></td>
</tr>
</tbody>
</table>
<br>
<br>
<br>
<p class="c4"></p>'

Final Edit :):

Posted a solution below. Not going to accept my own answer.

If anyone familiar with regex can help explain the tribulation behind all this and why this regex formula : /[\s]+/mu or rather $clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray); fixed this issue I'll gladly accept that as a proper answer and explanation.


回答1:


Here's your requested explanation on the regex pattern that solved your issue:

/[\s]+/ says "look for one or more white-space characters (this includes: ' ','\r','\n','\t','\f','\v'). The multi-line modifier/flag is not necessary because you are not using anchors (^ $) in your pattern. The unicode modifier/flag is absolutely critial in your case because your string of html text contains many little devils called...

"NO-BREAK SPACE" and is a combination of unicode characters 194 and 160 represented as \x{00A0} See them highlighted here.

Without the u flag, the NO-BREAK SPACE characters remain and additional filtering will be required to remove them.


While you eventually got your code to the right output. I'm happy to produce a leaner single-step pattern that will get you there faster purely using preg_split().

while($row=$results->fetch_array()){
    $texts=preg_split('/\s*<[^>]+>\s*/u',$row['post_content'],null,PREG_SPLIT_NO_EMPTY);
    var_export($texts);
}

Here is a working demo.

This new splitting pattern still looks for your tags, but it is more efficient because between the < and >, I merely ask to match all characters that are "not >" by using [^>]+. This is much simpler for the engine versus asking to match from the long list of characters that . represents.

Furthermore, I included matching for your unicode-extended white-space characters. \s* will match zero or more white-space characters before AND after each tag.

Finally, I should explain the additional parameters on preg_split(). The null says "find unlimited matches" -- this is the default behavior, but I must use null or -1 as its value to hold its place to ensure that the final parameter is used. PREG_SPLIT_NO_EMPTY spares you having to take the extra step of using array_filter() later. It omits any empty elements generated from the split, so you only get the good stuff.

I hope you found this helpful/educational. Good luck with your project.




回答2:


Trim doesn't work in place. You want this:

$arrayvalue = trim($arrayvalue);

That's really it. Trim returns the trimmed string: it doesn't modify the variable in place.




回答3:


I found a solution.

Not exactly sure how it works.. I'm quite unfamiliar with regex.

But the solution that I found (and maybe someone can explain it?) was

$clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray);

The entire script (excluding the MySQL stuff) that worked was

$converted = html_entity_decode( $row['post_content'], ENT_QUOTES);
$converted = trim($converted, chr(0xC2).chr(0xA0));

$htmlarray = preg_split('/<.+?>/', $converted);

$clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray);

$htmlarray2 = array_filter(array_map('trim', $clean_htmlarray));

$clean_htmlarray2 = array_values($htmlarray2);

echo '<pre>';
print_r($clean_htmlarray2);
echo '</pre>';

Output being

Array
(
    [0] => Saepe Encomia 2.aD NEC Mirum Populo Soluni Iis 8679-1370 Status Error Sed 9.9
    [1] => Description
    [2] => Donec Rem
    [3] => Animam Urgebat
    [4] => Rerum Sed 8613 - 3669 8358 & 6699
    [5] => 1.mE (magNA) QUO Ad Nominum Statum Massa
    [6] => ab SEM Autem Reddet Habitu Sit
    [7] => PRAEDAM ACCUMSAN PERSONARUM DENEGARE AC DUORUM
    [8] => Lius typi sit nec quo adversis cras ministri oppressa, versus class hic rem quos colubros ullo commune!economy!
    [9] => ad Quisque Modeste
    [10] => ac Rem Wisi
    [11] => ex Hac Congue mus Leo
    [12] => ab 7/92" Alias
    [13] => ad 2/73" Adverso & Erat
    [14] => me Personom Eget
    [15] => ad Viribus Fuga Fuga
    [16] => ab Louor-Sit Molles
    [17] => 3x Block-Off Plates
    [18] => ad Facunda
    [19] => ab Personas Diam
    [20] => NUNC
    [21] => ex Teniet te Palmam Eaque
    [22] => me Teniet in Versus Urna
    [23] => **CONDEMNENDUS REM CUM MAGNORUM**
)

A completely trimmed array.

This also works in my while loop for all rows, ie:

$results = $mysqli->query("SELECT ID, post_content
FROM wp_posts'
LIMIT 50;");

In this case I get all 50 rows with completely trimmed strings.

So finally... this was a challenge to figure out!

I just wish I understood it more. I don't really feel like I deserve to be confirmed as the answer to this question, as all I really did was try a BUNCH of different things and finally this worked.

If someone wants to chime in and explain why $clean_htmlarray = preg_replace('/[\s]+/mu', ' ', $htmlarray); or rather /[\s]+/mu was what I needed in this instance, I'll gladly award the answer to them :)

As for now just glad it's working properly. Thanks everyone for all the help and input with this!



来源:https://stackoverflow.com/questions/44027675/trim-not-working-with-array-from-mysql-fetched-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!