Split a string keeping the separators

前端 未结 3 1577
既然无缘
既然无缘 2020-12-06 17:40

Is there a trivial way to split a string keeping the separators? Instead of this:

let texte = \"Ten. Million. Questions. Let\'s celebrate all we\'ve done tog         


        
相关标签:
3条回答
  • 2020-12-06 17:42

    I was not able to find anything in the standard library, so I wrote my own:

    This version uses the unstable pattern API as it's more flexible, but the link above has a fallback that I've hardcoded for my specific stable usecase.

    #![feature(pattern)]
    
    use std::str::pattern::{Pattern, Searcher};
    
    #[derive(Copy, Clone, Debug, PartialEq)]
    pub enum SplitType<'a> {
        Match(&'a str),
        Delimiter(&'a str),
    }
    
    pub struct SplitKeepingDelimiter<'p, P>
    where
        P: Pattern<'p>,
    {
        searcher: P::Searcher,
        start: usize,
        saved: Option<usize>,
    }
    
    impl<'p, P> Iterator for SplitKeepingDelimiter<'p, P>
    where
        P: Pattern<'p>,
    {
        type Item = SplitType<'p>;
    
        fn next(&mut self) -> Option<Self::Item> {
            if self.start == self.searcher.haystack().len() {
                return None;
            }
    
            if let Some(end_of_match) = self.saved.take() {
                let s = &self.searcher.haystack()[self.start..end_of_match];
                self.start = end_of_match;
                return Some(SplitType::Delimiter(s));
            }
    
            match self.searcher.next_match() {
                Some((start, end)) => {
                    if self.start == start {
                        let s = &self.searcher.haystack()[start..end];
                        self.start = end;
                        Some(SplitType::Delimiter(s))
                    } else {
                        let s = &self.searcher.haystack()[self.start..start];
                        self.start = start;
                        self.saved = Some(end);
                        Some(SplitType::Match(s))
                    }
                }
                None => {
                    let s = &self.searcher.haystack()[self.start..];
                    self.start = self.searcher.haystack().len();
                    Some(SplitType::Match(s))
                }
            }
        }
    }
    
    pub trait SplitKeepingDelimiterExt: ::std::ops::Index<::std::ops::RangeFull, Output = str> {
        fn split_keeping_delimiter<P>(&self, pattern: P) -> SplitKeepingDelimiter<P>
        where
            P: for<'a> Pattern<'a>,
        {
            SplitKeepingDelimiter {
                searcher: pattern.into_searcher(&self[..]),
                start: 0,
                saved: None,
            }
        }
    }
    
    impl SplitKeepingDelimiterExt for str {}
    
    #[cfg(test)]
    mod test {
        use super::SplitKeepingDelimiterExt;
    
        #[test]
        fn split_with_delimiter() {
            use super::SplitType::*;
            let delims = &[',', ';'][..];
            let items: Vec<_> = "alpha,beta;gamma".split_keeping_delimiter(delims).collect();
            assert_eq!(
                &items,
                &[
                    Match("alpha"),
                    Delimiter(","),
                    Match("beta"),
                    Delimiter(";"),
                    Match("gamma")
                ]
            );
        }
    
        #[test]
        fn split_with_delimiter_allows_consecutive_delimiters() {
            use super::SplitType::*;
            let delims = &[',', ';'][..];
            let items: Vec<_> = ",;".split_keeping_delimiter(delims).collect();
            assert_eq!(&items, &[Delimiter(","), Delimiter(";")]);
        }
    }
    

    You'll note that I needed to track if something was one of the delimiters or not, but that should be easy to adapt if you don't need it.

    0 讨论(0)
  • 2020-12-06 17:45

    The unstable function str::split_inclusive returns an iterator keeping the delimiters as part of the matched strings, and may be useful in certain cases:

    #![feature(split_inclusive)]
    
    #[test]
    fn split_with_delimiter() {
        let items: Vec<_> = "alpha,beta;gamma"
            .split_inclusive(&[',', ';'][..])
            .collect();
        assert_eq!(&items, &["alpha,", "beta;", "gamma"]);
    }
    
    #[test]
    fn split_with_delimiter_allows_consecutive_delimiters() {
        let items: Vec<_> = ",;".split_inclusive(&[',', ';'][..]).collect();
        assert_eq!(&items, &[",", ";"]);
    }
    
    0 讨论(0)
  • 2020-12-06 18:06

    Using str::match_indices:

    let text = "Ten. Million. Questions. Let's celebrate all we've done together.";
    
    let mut result = Vec::new();
    let mut last = 0;
    for (index, matched) in text.match_indices(|c: char| !(c.is_alphanumeric() || c == '\'')) {
        if last != index {
            result.push(&text[last..index]);
        }
        result.push(matched);
        last = index + matched.len();
    }
    if last < text.len() {
        result.push(&text[last..]);
    }
    
    println!("{:?}", result);
    

    Prints:

    ["Ten", ".", " ", "Million", ".", " ", "Questions", ".", " ", "Let\'s", " ", "celebrate", " ", "all", " ", "we\'ve", " ", "done", " ", "together", "."]
    
    0 讨论(0)
提交回复
热议问题