Ignore the punctuation and highlight the pattern in given string

懵懂的女人 提交于 2019-12-02 15:52:03

问题


I have one model string and list of matching patterns. I want to highlight all the matching pattern in given model string even if any words in pattern/model contains punctuation mark.

Sample String:

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

Pattern List: 1. printing and typesetting industry Lorem Ipsum 2. industry's standard dummy text ever since the 1500s, 3. type specimen book, It has survived 4. but also the leap into electronic typesetting, remaining essentially unchanged. 5. containing Lorem Ipsum passages and 6. PageMaker including versions of Lorem Ipsum.

Expected Output:

What I am getting output:

Problem:

Here 1,3,5 pattern is not get highlighted. Because they contains some kind of punctuation mark but punctuation mark is not present in model for that word.

#1: In first pattern there is no punctuation mark after word industry rather model string have in industry.. It seems both words are different so this is not highlighting. But I want it should ignore the punctuation mark and highlight string.

#3: In third pattern, word has different punctuation book, and book.

I want to highlight the string even if there is any word having punctuation mark present in model or pattern string.(It would be fine if not highlight punctuation mark but it should highlight word)

I don`t want any change in model string it should be same as it with punctuation just highlight matching pattern.

<?php
$model = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry`s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.';
$phrases= [
    "printing and typesetting industry Lorem Ipsum"
    , "industry`s standard dummy text ever since the 1500s,"
    ,"type specimen book, It has survived"
    ,"but also the leap into electronic typesetting, remaining essentially unchanged."
    ,"containing Lorem Ipsum passages and"
    ,"PageMaker including versions of Lorem Ipsum."
];

$phrases = array_map(function($phrase) {
    return preg_replace('/\s+/', '\s+', '/(' . preg_quote($phrase, '/') . ')/iu');
}, array_reverse($phrases));

echo  $model = preg_replace($phrases, '<span style="color:red">$0</span>', $model);

Working example :

https://3v4l.org/QD8WY


回答1:


You can adapt your existing code to ignore punctuation differences between the model text and the phrases. Instead of just looking for matching spaces, you need to look for punctuation and spaces, and match each of them against punctuation and/or a space. This code should do what you want:

$phrases= [
    "printing and typesetting industry Lorem Ipsum"
    , "industry`s standard dummy text ever since the 1500s,"
    ,"type specimen book, It has survived"
    ,"but also the leap into electronic typesetting, remaining essentially unchanged."
    ,"containing Lorem Ipsum passages and"
    ,"PageMaker including versions of Lorem Ipsum."
];
$phrases = array_map(function($phrase) {
    return preg_replace(array('/[.?!,:;\-{}\[\]()\'`"]/', '/\s+/'), 
                        array('([.?!,:;\\-{}\\[\\]()\'`"]|\s+)', '([.?!,:;\\-{}\\[\\]()\'`"]*\s+|\s+[.?!,:;\\-{}\\[\\]()\'`"]*)'), 
                        "@$phrase@iu");
}, array_reverse($phrases));

echo  $model = preg_replace($phrases, '<span style="color:red">$0</span>', $model);

Output:

Lorem Ipsum is simply dummy text of the <span style="color:red">printing and typesetting industry.
Lorem Ipsum</span> has been the <span style="color:red">industry`s standard dummy text ever since
the 1500s,</span> when an unknown printer took a galley of type and scrambled it to make a
<span style="color:red">type specimen book. It has survived</span> not only five centuries,
<span style="color:red">but also the leap into electronic typesetting, remaining essentially unchanged.</span>
It was popularised in the 1960s with the release of Letraset sheets <span style="color:red">
containing Lorem Ipsum passages, and</span> more recently with desktop publishing software like Aldus
<span style="color:red">PageMaker including versions of Lorem Ipsum.</span>

Demo on 3v4l.org



来源:https://stackoverflow.com/questions/56987597/ignore-the-punctuation-and-highlight-the-pattern-in-given-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!