Is there a method like JavaScript's substr in Rust?

前端 未结 7 1094
天命终不由人
天命终不由人 2021-02-02 10:21

I looked at the Rust docs for String but I can\'t find a way to extract a substring.

Is there a method like JavaScript\'s substr in Rust? If not, how would you implement

相关标签:
7条回答
  • 2021-02-02 10:45

    This code performs both substring-ing and string-slicing, without panicking nor allocating:

    use std::ops::{Bound, RangeBounds};
    
    trait StringUtils {
        fn substring(&self, start: usize, len: usize) -> &str;
        fn slice(&self, range: impl RangeBounds<usize>) -> &str;
    }
    
    impl StringUtils for str {
        fn substring(&self, start: usize, len: usize) -> &str {
            let mut char_pos = 0;
            let mut byte_start = 0;
            let mut it = self.chars();
            loop {
                if char_pos == start { break; }
                if let Some(c) = it.next() {
                    char_pos += 1;
                    byte_start += c.len_utf8();
                }
                else { break; }
            }
            char_pos = 0;
            let mut byte_end = byte_start;
            loop {
                if char_pos == len { break; }
                if let Some(c) = it.next() {
                    char_pos += 1;
                    byte_end += c.len_utf8();
                }
                else { break; }
            }
            &self[byte_start..byte_end]
        }
        fn slice(&self, range: impl RangeBounds<usize>) -> &str {
            let start = match range.start_bound() {
                Bound::Included(bound) | Bound::Excluded(bound) => *bound,
                Bound::Unbounded => 0,
            };
            let len = match range.end_bound() {
                Bound::Included(bound) => *bound + 1,
                Bound::Excluded(bound) => *bound,
                Bound::Unbounded => self.len(),
            } - start;
            self.substring(start, len)
        }
    }
    
    fn main() {
        let s = "abcdèfghij";
        // All three statements should print:
        // "abcdè, abcdèfghij, dèfgh, dèfghij."
        println!("{}, {}, {}, {}.",
            s.substring(0, 5),
            s.substring(0, 50),
            s.substring(3, 5),
            s.substring(3, 50));
        println!("{}, {}, {}, {}.",
            s.slice(..5),
            s.slice(..50),
            s.slice(3..8),
            s.slice(3..));
        println!("{}, {}, {}, {}.",
            s.slice(..=4),
            s.slice(..=49),
            s.slice(3..=7),
            s.slice(3..));
    }
    
    0 讨论(0)
  • 2021-02-02 10:54

    You can use the as_str method on the Chars iterator to get back a &str slice after you have stepped on the iterator. So to skip the first start chars, you can call

    let s = "Some text to slice into";
    let mut iter = s.chars();
    iter.by_ref().nth(start); // eat up start values
    let slice = iter.as_str(); // get back a slice of the rest of the iterator
    

    Now if you also want to limit the length, you first need to figure out the byte-position of the length character:

    let end_pos = slice.char_indices().nth(length).map(|(n, _)| n).unwrap_or(0);
    let substr = &slice[..end_pos];
    

    This might feel a little roundabout, but Rust is not hiding anything from you that might take up CPU cycles. That said, I wonder why there's no crate yet that offers a substr method.

    0 讨论(0)
  • 2021-02-02 10:56

    The solution given by oli_obk does not handle last index of string slice. It can be fixed with .chain(once(s.len())).

    Here function substr implements a substring slice with error handling. If invalid index is passed to function, then a valid part of string slice is returned with Err-variant. All corner cases should be handled correctly.

    fn substr(s: &str, begin: usize, length: Option<usize>) -> Result<&str, &str> {
        use std::iter::once;
        let mut itr = s.char_indices().map(|(n, _)| n).chain(once(s.len()));
        let beg = itr.nth(begin);
        if beg.is_none() {
            return Err("");
        } else if length == Some(0) {
            return Ok("");
        }
        let end = length.map_or(Some(s.len()), |l| itr.nth(l-1));
        if let Some(end) = end {
            return Ok(&s[beg.unwrap()..end]);
        } else {
            return Err(&s[beg.unwrap()..s.len()]);
        }
    }
    let s = "abc                                                                    
    0 讨论(0)
  • 2021-02-02 11:01

    I would suggest you use the crate substring. (And look at its source code if you want to learn how to do this properly.)

    0 讨论(0)
  • 2021-02-02 11:02

    For characters, you can use s.chars().skip(pos).take(len):

    fn main() {
        let s = "Hello, world!";
        let ss: String = s.chars().skip(7).take(5).collect();
        println!("{}", ss);
    }
    

    Beware of the definition of Unicode characters though.

    For bytes, you can use the slice syntax:

    fn main() {
        let s = b"Hello, world!";
        let ss = &s[7..12];
        println!("{:?}", ss);
    }
    
    0 讨论(0)
  • 2021-02-02 11:04

    You can also use .to_string()[ <range> ].

    This example takes an immutable slice of the original string, then mutates that string to demonstrate the original slice is preserved.

    let mut s: String = "Hello, world!".to_string();
    
    let substring: &str = &s.to_string()[..6];
    
    s.replace_range(..6, "Goodbye,");
    
    println!("{}   {} universe!", s, substring);
    
    //    Goodbye, world!   Hello, universe!
    
    0 讨论(0)
提交回复
热议问题