Why does Rust have String
and str
? What are the differences between String
and str
? When does one use String
It is str
that is analogous to String
, not the slice to it, which are also known as &str
.
An str
is a string literal, basically a pre-allocated text:
"Hello World"
This text has to be stored somewhere, so it is stored in the text section of the executable along with the program’s machine code, as sequence of bytes ([u8]). Because the text can have any length, they are dynamically-sized, their size is known only at run-time:
+----+-----+-----+-----+-----+----+----+-----+-----+-----+-----+
| H | e | l | l | o | | W | o | r | l | d |
+----+-----+-----+-----+-----+----+----+-----+-----+-----+-----+
+----+-----+-----+-----+-----+----+----+-----+-----+-----+-----+
| 72 | 101 | 108 | 108 | 111 | 32 | 87 | 111 | 114 | 108 | 100 |
+----+-----+-----+-----+-----+----+----+-----+-----+-----+-----+
We need to access stored text, this is where the slice comes in.
A slice,[T]
, is a view into a block of memory. Whether mutable or not, a slice always borrows and that is why it is always behind a pointer, &
.
So, "Hello World" expression returns a fat pointer, containing both the address of the actual data and its length. This pointer will be our handle to the actual data. Now data is behind a pointer, compiler knows its size at compile time.
Since text is stored in the source code, it will be valid for the entire lifetime of the running program, hence will have the static
lifetime.
So, return value of "Hello Word" expression should reflect these two characteristics, which it does:
let s: &'static str = "Hello World";
You may ask why its type is written as str
but not as [u8]
, it is because data is always guaranteed to be a valid UTF-8 sequence. Not all UTF-8 characters are single byte, some are 4 bytes and not all sequence of bytes are valid UTF-8 characters. So [u8] would be inaccurate.
On the other hand, String
is a specialized vector of u8 bytes, in other words resizable buffer holding UTF-8 text. We say specialized because it does not permit arbitrary access and enforces certain checks that data is always valid UTF-8. The buffer is allocated on the heap, so it can resize its buffer as needed or requested.
Here is how it is defined in the source code:
pub struct String {
vec: Vec,
}
You would be able to create Strings using String
struct but vec
is private to ensure validity and proper checks, since not all stream of bytes are valid utf-8 characters.
But there are several methods defined on String type to create String instance, new is one of them:
pub const fn new() -> String {
String { vec: Vec::new() }
}
We can use it to create a valid String. Unfortunately it does not accept input parameter. So result will be valid but an empty string:
let s = String::new();
println("{}", s);
But we can fill this buffer with initial value from different sources:
From a string literal
let a = "Hello World";
let s = String::from(a);
From raw parts
let ptr = s.as_mut_ptr();
let len = s.len();
let capacity = s.capacity();
let s = String::from_raw_parts(ptr, len, capacity);
From a character
let ch = 'c';
let s = ch.to_string();
From vector of bytes
let hello_world = vec![72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100];
// We know it is valid sequence, so we can use unwrap
let hello_world = String::from_utf8(hello_world).unwrap();
println!("{}", hello_world); // Hello World
From input buffer
use std::io::{self, Read};
fn main() -> io::Result<()> {
let mut buffer = String::new();
let stdin = io::stdin();
let mut handle = stdin.lock();
handle.read_to_string(&mut buffer)?;
Ok(())
}
Or from any other type that implements ToString
trait
Since String
is a vector under the hood, it will exhibit some vector characteristics:
And it delegates some properties and methods to vectors:
pub fn capacity(&self) -> usize {
self.vec.capacity()
}
Most of the examples uses String::from
, so people get confused thinking why create String from another string.
It is a long read, hope it helps.