What are the differences between Rust's `String` and `str`?

前端 未结 9 835
醉梦人生
醉梦人生 2020-11-22 05:11

Why does Rust have String and str? What are the differences between String and str? When does one use String

相关标签:
9条回答
  • 2020-11-22 05:42

    String is the dynamic heap string type, like Vec: use it when you need to own or modify your string data.

    str is an immutable1 sequence of UTF-8 bytes of dynamic length somewhere in memory. Since the size is unknown, one can only handle it behind a pointer. This means that str most commonly2 appears as &str: a reference to some UTF-8 data, normally called a "string slice" or just a "slice". A slice is just a view onto some data, and that data can be anywhere, e.g.

    • In static storage: a string literal "foo" is a &'static str. The data is hardcoded into the executable and loaded into memory when the program runs.

    • Inside a heap allocated String: String dereferences to a &str view of the String's data.

    • On the stack: e.g. the following creates a stack-allocated byte array, and then gets a view of that data as a &str:

        use std::str;
      
        let x: &[u8] = &[b'a', b'b', b'c'];
        let stack_str: &str = str::from_utf8(x).unwrap();
      

    In summary, use String if you need owned string data (like passing strings to other threads, or building them at runtime), and use &str if you only need a view of a string.

    This is identical to the relationship between a vector Vec<T> and a slice &[T], and is similar to the relationship between by-value T and by-reference &T for general types.


    1 A str is fixed-length; you cannot write bytes beyond the end, or leave trailing invalid bytes. Since UTF-8 is a variable-width encoding, this effectively forces all strs to be immutable in many cases. In general, mutation requires writing more or fewer bytes than there were before (e.g. replacing an a (1 byte) with an ä (2+ bytes) would require making more room in the str). There are specific methods that can modify a &mut str in place, mostly those that handle only ASCII characters, like make_ascii_uppercase.

    2 Dynamically sized types allow things like Rc<str> for a sequence of reference counted UTF-8 bytes since Rust 1.2. Rust 1.21 allows easily creating these types.

    0 讨论(0)
  • 2020-11-22 05:47

    str, only used as &str, is a string slice, a reference to a UTF-8 byte array.

    String is what used to be ~str, a growable, owned UTF-8 byte array.

    0 讨论(0)
  • 2020-11-22 05:49

    It is str that is analogous to String, not the slice to it, which are also known as &str.

    An str is a string literal, basically a pre-allocated text:

    "Hello World"
    

    This text has to be stored somewhere, so it is stored in the text section of the executable along with the program’s machine code, as sequence of bytes ([u8]). Because the text can have any length, they are dynamically-sized, their size is known only at run-time:

    +----+-----+-----+-----+-----+----+----+-----+-----+-----+-----+
    |  H |  e  |  l  |  l  |  o  |    |  W |  o  |  r  |  l  |  d  |
    +----+-----+-----+-----+-----+----+----+-----+-----+-----+-----+
    
    +----+-----+-----+-----+-----+----+----+-----+-----+-----+-----+
    | 72 | 101 | 108 | 108 | 111 | 32 | 87 | 111 | 114 | 108 | 100 |
    +----+-----+-----+-----+-----+----+----+-----+-----+-----+-----+
    

    We need to access stored text, this is where the slice comes in.

    A slice,[T], is a view into a block of memory. Whether mutable or not, a slice always borrows and that is why it is always behind a pointer, &.

    So, "Hello World" expression returns a fat pointer, containing both the address of the actual data and its length. This pointer will be our handle to the actual data. Now data is behind a pointer, compiler knows its size at compile time.

    Since text is stored in the source code, it will be valid for the entire lifetime of the running program, hence will have the static lifetime.

    So, return value of "Hello Word" expression should reflect these two characteristics, which it does:

    let s: &'static str = "Hello World";
    

    You may ask why its type is written as str but not as [u8], it is because data is always guaranteed to be a valid UTF-8 sequence. Not all UTF-8 characters are single byte, some are 4 bytes and not all sequence of bytes are valid UTF-8 characters. So [u8] would be inaccurate.

    On the other hand, String is a specialized vector of u8 bytes, in other words resizable buffer holding UTF-8 text. We say specialized because it does not permit arbitrary access and enforces certain checks that data is always valid UTF-8. The buffer is allocated on the heap, so it can resize its buffer as needed or requested.

    Here is how it is defined in the source code:

    pub struct String {
        vec: Vec<u8>,
    }
    

    You would be able to create Strings using String struct but vec is private to ensure validity and proper checks, since not all stream of bytes are valid utf-8 characters.

    But there are several methods defined on String type to create String instance, new is one of them:

    pub const fn new() -> String {
      String { vec: Vec::new() }
    }
    

    We can use it to create a valid String. Unfortunately it does not accept input parameter. So result will be valid but an empty string:

    let s = String::new();
    println("{}", s);
    

    But we can fill this buffer with initial value from different sources:

    From a string literal

    let a = "Hello World";
    let s = String::from(a);
    

    From raw parts

    let ptr = s.as_mut_ptr();
    let len = s.len();
    let capacity = s.capacity();
    
    let s = String::from_raw_parts(ptr, len, capacity);
    

    From a character

    let ch = 'c';
    let s = ch.to_string();
    

    From vector of bytes

    let hello_world = vec![72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100];
    // We know it is valid sequence, so we can use unwrap
    let hello_world = String::from_utf8(hello_world).unwrap();
    println!("{}", hello_world); // Hello World
    

    From input buffer

    use std::io::{self, Read};
    
    fn main() -> io::Result<()> {
        let mut buffer = String::new();
        let stdin = io::stdin();
        let mut handle = stdin.lock();
    
        handle.read_to_string(&mut buffer)?;
        Ok(())
    }
    

    Or from any other type that implements ToString trait

    Since String is a vector under the hood, it will exhibit some vector characteristics:

    • a pointer: The pointer points to an internal buffer that stores the data.
    • length: The length is the number of bytes currently stored in the buffer.
    • capacity: The capacity is the size of the buffer in bytes. So, the length will always be less than or equal to the capacity.

    And it delegates some properties and methods to vectors:

    pub fn capacity(&self) -> usize {
      self.vec.capacity()
    }
    

    Most of the examples uses String::from, so people get confused thinking why create String from another string.

    It is a long read, hope it helps.

    0 讨论(0)
  • 2020-11-22 05:59

    std::String is simply a vector of u8. You can find its definition in source code . It's heap-allocated and growable.

    #[derive(PartialOrd, Eq, Ord)]
    #[stable(feature = "rust1", since = "1.0.0")]
    pub struct String {
        vec: Vec<u8>,
    }
    

    str is a primitive type, also called string slice. A string slice has fixed size. A literal string like let test = "hello world" has &'static str type. test is a reference to this statically allocated string. &str cannot be modified, for example,

    let mut word = "hello world";
    word[0] = 's';
    word.push('\n');
    

    str does have mutable slice &mut str, for example: pub fn split_at_mut(&mut self, mid: usize) -> (&mut str, &mut str)

    let mut s = "Per Martin-Löf".to_string();
    {
        let (first, last) = s.split_at_mut(3);
        first.make_ascii_uppercase();
        assert_eq!("PER", first);
        assert_eq!(" Martin-Löf", last);
    }
    assert_eq!("PER Martin-Löf", s);
    

    But a small change to UTF-8 can change its byte length, and a slice cannot reallocate its referent.

    0 讨论(0)
  • 2020-11-22 05:59

    In easy words, String is datatype stored on heap (just like Vec), and you have access to that location.

    &str is a slice type. That means it is just reference to an already present String somewhere in the heap.

    &str doesn't do any allocation at runtime. So, for memory reasons, you can use &str over String. But, keep in mind that when using &str you might have to deal with explicit lifetimes.

    0 讨论(0)
  • 2020-11-22 06:01

    Here is a quick and easy explanation.

    String - A growable, ownable heap-allocated data structure. It can be coerced to a &str.

    str - is (now, as Rust evolves) mutable, fixed-length string that lives on the heap or in the binary. You can only interact with str as a borrowed type via a string slice view, such as &str.

    Usage considerations:

    Prefer String if you want to own or mutate a string - such as passing the string to another thread, etc.

    Prefer &str if you want to have a read-only view of a string.

    0 讨论(0)
提交回复
热议问题