2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
[TOC]
|
|
|
|
|
|
|
|
|
|
# Fundamental Collections
|
|
|
|
|
|
|
|
|
|
Rust's standard library includes a number of really useful data structures
|
|
|
|
|
called *collections*. Most other types represent one specific value, but
|
|
|
|
|
collections can contain multiple values inside of them. Each collection has
|
|
|
|
|
different capabilities and costs, and choosing an appropriate one for the
|
|
|
|
|
situation you're in is a skill you'll develop over time. In this chapter, we'll
|
|
|
|
|
go over three collections which are used very often in Rust programs:
|
|
|
|
|
|
|
|
|
|
* A *vector* allows us to store a variable number of values next to each other.
|
|
|
|
|
* A *string* is a collection of characters. We've seen the `String` type
|
|
|
|
|
before, but we'll talk about it in depth now.
|
|
|
|
|
* A *hash map* allows us to associate a value with a particular key.
|
|
|
|
|
|
|
|
|
|
There are more specialized variants of each of these data structures for
|
|
|
|
|
particular situations, but these are the most fundamental and common. We're
|
|
|
|
|
going to discuss how to create and update each of the collections, as well as
|
|
|
|
|
what makes each special.
|
|
|
|
|
|
|
|
|
|
## Vectors
|
|
|
|
|
|
|
|
|
|
The first type we'll look at is `Vec<T>`, also known as a *vector*. Vectors
|
|
|
|
|
allow us to store more than one value in a single data structure that puts all
|
|
|
|
|
the values next to each other in memory.
|
|
|
|
|
|
|
|
|
|
### Creating a New Vector
|
|
|
|
|
|
|
|
|
|
To create a new vector, we can call the `new` function:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let v: Vec<i32> = Vec::new();
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Note that we added a type annotation here. Since we don't actually do
|
|
|
|
|
anything with the vector, Rust doesn't know what kind of elements we intend to
|
2016-10-05 22:06:55 +08:00
|
|
|
|
store. This is an important point. Vectors are homogeneous: they may store many
|
2016-09-28 02:03:07 +08:00
|
|
|
|
values, but those values must all be the same type. Vectors are generic over
|
2016-10-05 22:06:55 +08:00
|
|
|
|
the type stored inside them (we'll talk about Generics more thoroughly in
|
2016-09-28 02:03:07 +08:00
|
|
|
|
Chapter 10), and the angle brackets here tell Rust that this vector will hold
|
|
|
|
|
elements of the `i32` type.
|
|
|
|
|
|
|
|
|
|
That said, in real code, we very rarely need to do this type annotation since
|
|
|
|
|
Rust can infer the type of value we want to store once we insert values. Let's
|
|
|
|
|
look at how to modify a vector next.
|
|
|
|
|
|
|
|
|
|
### Updating a Vector
|
|
|
|
|
|
|
|
|
|
To put elements in the vector, we can use the `push` method:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let mut v = Vec::new();
|
|
|
|
|
|
|
|
|
|
v.push(5);
|
|
|
|
|
v.push(6);
|
|
|
|
|
v.push(7);
|
|
|
|
|
v.push(8);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Since these numbers are `i32`s, Rust infers the type of data we want to store
|
|
|
|
|
in the vector, so we don't need the `<i32>` annotation.
|
|
|
|
|
|
|
|
|
|
We can improve this code even further. Creating a vector with some initial
|
|
|
|
|
values like this is very common, so there's a macro to do it for us:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let v = vec![5, 6, 7, 8];
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This macro does a similar thing to our previous example, but it's much more
|
|
|
|
|
convenient.
|
|
|
|
|
|
|
|
|
|
### Dropping a Vector Drops its Elements
|
|
|
|
|
|
|
|
|
|
Like any other `struct`, a vector will be freed when it goes out of scope:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
{
|
|
|
|
|
let v = vec![1, 2, 3, 4];
|
|
|
|
|
|
|
|
|
|
// do stuff with v
|
|
|
|
|
|
|
|
|
|
} // <- v goes out of scope and is freed here
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
When the vector gets dropped, it will also drop all of its contents, so those
|
|
|
|
|
integers are going to be cleaned up as well. This may seem like a
|
|
|
|
|
straightforward point, but can get a little more complicated once we start to
|
|
|
|
|
introduce references to the elements of the vector. Let's tackle that next!
|
|
|
|
|
|
|
|
|
|
### Reading Elements of Vectors
|
|
|
|
|
|
|
|
|
|
Now that we know how creating and destroying vectors works, knowing how to read
|
|
|
|
|
their contents is a good next step. There are two ways to reference a value
|
|
|
|
|
stored in a vector. In the following examples of these two ways, we've
|
|
|
|
|
annotated the types of the values that are returned from these functions for
|
|
|
|
|
extra clarity:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let v = vec![1, 2, 3, 4, 5];
|
|
|
|
|
|
|
|
|
|
let third: &i32 = &v[2];
|
|
|
|
|
let third: Option<&i32> = v.get(2);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
First, note that we use the index value of `2` to get the third element:
|
|
|
|
|
vectors are indexed by number, starting at zero. Secondly, the two different
|
|
|
|
|
ways to get the third element are using `&` and `[]`s and using the `get`
|
|
|
|
|
method. The square brackets give us a reference, and `get` gives us an
|
|
|
|
|
`Option<&T>`. The reason we have two ways to reference an element is so that we
|
|
|
|
|
can choose the behavior we'd like to have if we try to use an index value that
|
|
|
|
|
the vector doesn't have an element for:
|
|
|
|
|
|
|
|
|
|
```rust,should_panic
|
|
|
|
|
let v = vec![1, 2, 3, 4, 5];
|
|
|
|
|
|
|
|
|
|
let does_not_exist = &v[100];
|
|
|
|
|
let does_not_exist = v.get(100);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
With the `[]`s, Rust will cause a `panic!`. With the `get` method, it will
|
|
|
|
|
instead return `None` without `panic!`ing. Deciding which way to access
|
|
|
|
|
elements in a vector depends on whether we consider an attempted access past
|
|
|
|
|
the end of the vector to be an error, in which case we'd want the `panic!`
|
|
|
|
|
behavior, or whether this will happen occasionally under normal circumstances
|
|
|
|
|
and our code will have logic to handle getting `Some(&element)` or `None`.
|
|
|
|
|
|
|
|
|
|
Once we have a valid reference, the borrow checker will enforce the ownership
|
|
|
|
|
and borrowing rules we covered in Chapter 4 in order to ensure this and other
|
|
|
|
|
references to the contents of the vector stay valid. This means in a function
|
|
|
|
|
that owns a `Vec`, we can't return a reference to an element since the `Vec`
|
|
|
|
|
will be cleaned up at the end of the function:
|
|
|
|
|
|
|
|
|
|
```rust,ignore
|
|
|
|
|
fn element() -> String {
|
|
|
|
|
let list = vec![String::from("hi"), String::from("bye")];
|
|
|
|
|
list[1]
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Trying to compile this will result in the following error:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
error: cannot move out of indexed content [--explain E0507]
|
|
|
|
|
|>
|
|
|
|
|
4 |> list[1]
|
|
|
|
|
|> ^^^^^^^ cannot move out of indexed content
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Since `list` goes out of scope and gets cleaned up at the end of the function,
|
|
|
|
|
the reference `list[1]` cannot be returned because it would outlive `list`.
|
|
|
|
|
|
|
|
|
|
Here's another example of code that looks like it should be allowed, but it
|
|
|
|
|
won't compile because the references actually aren't valid anymore:
|
|
|
|
|
|
|
|
|
|
```rust,ignore
|
|
|
|
|
let mut v = vec![1, 2, 3, 4, 5];
|
|
|
|
|
|
|
|
|
|
let first = &v[0];
|
|
|
|
|
|
|
|
|
|
v.push(6);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Compiling this will give us this error:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
error: cannot borrow `v` as mutable because it is also borrowed as immutable
|
|
|
|
|
[--explain E0502]
|
|
|
|
|
|>
|
|
|
|
|
5 |> let first = &v[0];
|
|
|
|
|
|> - immutable borrow occurs here
|
|
|
|
|
7 |> v.push(6);
|
|
|
|
|
|> ^ mutable borrow occurs here
|
|
|
|
|
9 |> }
|
|
|
|
|
|> - immutable borrow ends here
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This violates one of the ownership rules we covered in Chapter 4: the `push`
|
|
|
|
|
method needs to have a mutable borrow to the `Vec`, and we aren't allowed to
|
|
|
|
|
have any immutable borrows while we have a mutable borrow.
|
|
|
|
|
|
|
|
|
|
Why is it an error to have a reference to the first element in a vector while
|
|
|
|
|
we try to add a new item to the end, though? Due to the way vectors work,
|
|
|
|
|
adding a new element onto the end might require allocating new memory and
|
|
|
|
|
copying the old elements over to the new space if there wasn't enough room to
|
|
|
|
|
put all the elements next to each other where the vector was. If this happened,
|
|
|
|
|
our reference would be pointing to deallocated memory. For more on this, see
|
|
|
|
|
The Nomicon at *https://doc.rust-lang.org/stable/nomicon/vec.html*.
|
|
|
|
|
|
|
|
|
|
### Using an Enum to Store Multiple Types
|
|
|
|
|
|
|
|
|
|
Let's put vectors together with what we learned about enums in Chapter 6. At
|
|
|
|
|
the beginning of this section, we said that vectors will only store values that
|
|
|
|
|
are all the same type. This can be inconvenient; there are definitely use cases
|
|
|
|
|
for needing to store a list of things that might be different types. Luckily,
|
|
|
|
|
the variants of an enum are all the same type as each other, so when we're in
|
|
|
|
|
this scenario, we can define and use an enum!
|
|
|
|
|
|
|
|
|
|
For example, let's say we're going to be getting values for a row in a
|
|
|
|
|
spreadsheet. Some of the columns contain integers, some floating point numbers,
|
|
|
|
|
and some strings. We can define an enum whose variants will hold the different
|
|
|
|
|
value types. All of the enum variants will then be the same type, that of the
|
|
|
|
|
enum. Then we can create a vector that, ultimately, holds different types:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
enum SpreadsheetCell {
|
|
|
|
|
Int(i32),
|
|
|
|
|
Float(f64),
|
|
|
|
|
Text(String),
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
let row = vec![
|
|
|
|
|
SpreadsheetCell::Int(3),
|
|
|
|
|
SpreadsheetCell::Text(String::from("blue")),
|
|
|
|
|
SpreadsheetCell::Float(10.12),
|
|
|
|
|
];
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This has the advantage of being explicit about what types are allowed in this
|
|
|
|
|
vector. If we allowed any type to be in a vector, there would be a chance that
|
|
|
|
|
the vector would hold a type that would cause errors with the operations we
|
|
|
|
|
performed on the vector. Using an enum plus a `match` where we access elements
|
|
|
|
|
in a vector like this means that Rust will ensure at compile time that we
|
|
|
|
|
always handle every possible case.
|
|
|
|
|
|
|
|
|
|
Using an enum for storing different types in a vector does imply that we need
|
|
|
|
|
to know the set of types we'll want to store at compile time. If that's not the
|
|
|
|
|
case, instead of an enum, we can use a trait object. We'll learn about those in
|
|
|
|
|
Chapter XX.
|
|
|
|
|
|
|
|
|
|
Now that we've gone over some of the most common ways to use vectors, be sure
|
|
|
|
|
to take a look at the API documentation for other useful methods defined on
|
|
|
|
|
`Vec` by the standard library. For example, in addition to `push` there's a
|
|
|
|
|
`pop` method that will remove and return the last element. Let's move on to the
|
|
|
|
|
next collection type: `String`!
|
|
|
|
|
|
|
|
|
|
## Strings
|
|
|
|
|
|
|
|
|
|
We've already talked about strings a bunch in Chapter 4, but let's take a more
|
|
|
|
|
in-depth look at them now.
|
|
|
|
|
|
|
|
|
|
### Many Kinds of Strings
|
|
|
|
|
|
|
|
|
|
Strings are a common place for new Rustaceans to get stuck. This is due to a
|
|
|
|
|
combination of three things: Rust's propensity for making sure to expose
|
|
|
|
|
possible errors, strings being a more complicated data structure than many
|
|
|
|
|
programmers give them credit for, and UTF-8. These things combine in a way that
|
|
|
|
|
can seem difficult coming from other languages.
|
|
|
|
|
|
|
|
|
|
Before we can dig into those aspects, we need to talk about what exactly we
|
|
|
|
|
even mean by the word 'string'. Rust actually only has one string type in the
|
|
|
|
|
core language itself: `&str`. We talked about *string slices* in Chapter 4:
|
|
|
|
|
they're a reference to some UTF-8 encoded string data stored somewhere else.
|
|
|
|
|
String literals, for example, are stored in the binary output of the program,
|
|
|
|
|
and are therefore string slices.
|
|
|
|
|
|
|
|
|
|
Rust's standard library is what provides the type called `String`. This is a
|
|
|
|
|
growable, mutable, owned, UTF-8 encoded string type. When Rustaceans talk about
|
|
|
|
|
'strings' in Rust, they usually mean "`String` and `&str`". This chapter is
|
|
|
|
|
largely about `String`, and these two types are used heavily in Rust's standard
|
|
|
|
|
library. Both `String` and string slices are UTF-8 encoded.
|
|
|
|
|
|
|
|
|
|
Rust's standard library also includes a number of other string types, such as
|
|
|
|
|
`OsString`, `OsStr`, `CString`, and `CStr`. Library crates may provide even
|
|
|
|
|
more options for storing string data. Similarly to the `*String`/`*Str` naming,
|
|
|
|
|
they often provide an owned and borrowed variant, just like `String`/`&str`.
|
|
|
|
|
These string types may store different encodings or be represented in memory in
|
|
|
|
|
a different way, for example. We won't be talking about these other string
|
|
|
|
|
types in this chapter; see their API documentation for more about how to use
|
|
|
|
|
them and when each is appropriate.
|
|
|
|
|
|
|
|
|
|
### Creating a New String
|
|
|
|
|
|
|
|
|
|
Let's look at how to do the same operations on `String` as we did with `Vec`,
|
|
|
|
|
starting with creating one. Similarly, `String` has `new`:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let s = String::new();
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Often, we'll have some initial data that we'd like to start the string off with.
|
|
|
|
|
For that, there's the `to_string` method:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let data = "initial contents";
|
|
|
|
|
|
|
|
|
|
let s = data.to_string();
|
|
|
|
|
|
|
|
|
|
// the method also works on a literal directly:
|
|
|
|
|
let s = "initial contents".to_string();
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This form is equivalent to using `to_string`:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let s = String::from("Initial contents");
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Since strings are used for so many things, there are many different generic
|
|
|
|
|
APIs that make sense for strings. There are a lot of options, and some of them
|
|
|
|
|
can feel redundant because of this, but they all have their place! In this
|
|
|
|
|
case, `String::from` and `.to_string` end up doing the exact same thing, so
|
|
|
|
|
which you choose is a matter of style. Some people use `String::from` for
|
|
|
|
|
literals, and `.to_string` for variable bindings. Most Rust style is pretty
|
|
|
|
|
uniform, but this specific question is one of the most debated.
|
|
|
|
|
|
|
|
|
|
Remember that strings are UTF-8 encoded, so we can include any properly encoded
|
|
|
|
|
data in them:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let hello = "السلام عليكم";
|
|
|
|
|
let hello = "Dobrý den";
|
|
|
|
|
let hello = "Hello";
|
|
|
|
|
let hello = "שָׁלוֹם";
|
|
|
|
|
let hello = "नमस्ते";
|
|
|
|
|
let hello = "こんにちは";
|
|
|
|
|
let hello = "안녕하세요";
|
|
|
|
|
let hello = "你好";
|
|
|
|
|
let hello = "Olá";
|
|
|
|
|
let hello = "Здравствуйте";
|
|
|
|
|
let hello = "Hola";
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Updating a String
|
|
|
|
|
|
|
|
|
|
A `String` can be changed and can grow in size, just like a `Vec` can.
|
|
|
|
|
|
|
|
|
|
#### Push
|
|
|
|
|
|
|
|
|
|
We can grow a `String` by using the `push_str` method to append another
|
|
|
|
|
string:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let mut s = String::from("foo");
|
|
|
|
|
s.push_str("bar");
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
`s` will contain "foobar" after these two lines.
|
|
|
|
|
|
|
|
|
|
The `push` method will add a `char`:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let mut s = String::from("lo");
|
|
|
|
|
s.push('l');
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
`s` will contain "lol" after this point.
|
|
|
|
|
|
|
|
|
|
We can make any `String` contain the empty string with the `clear` method:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let mut s = String::from("Noooooooooooooooooooooo!");
|
|
|
|
|
s.clear();
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Now `s` will be the empty string, "".
|
|
|
|
|
|
|
|
|
|
#### Concatenation
|
|
|
|
|
|
|
|
|
|
Often, we'll want to combine two strings together. One way is to use the `+`
|
|
|
|
|
operator:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let s1 = String::from("Hello, ");
|
|
|
|
|
let s2 = String::from("world!");
|
|
|
|
|
let s3 = s1 + &s2;
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This code will make `s3` contain "Hello, world!" There's some tricky bits here,
|
|
|
|
|
though, that come from the type signature of `+` for `String`. The signature
|
|
|
|
|
for the `add` method that the `+` operator uses looks something like this:
|
|
|
|
|
|
|
|
|
|
```rust,ignore
|
|
|
|
|
fn add(self, s: &str) -> String {
|
|
|
|
|
```
|
|
|
|
|
|
2016-10-05 22:06:55 +08:00
|
|
|
|
This isn't exactly what the actual signature is in the standard library because
|
2016-09-28 02:03:07 +08:00
|
|
|
|
`add` is defined using generics there. Here, we're just looking at what the
|
|
|
|
|
signature of the method would be if `add` was defined specifically for
|
|
|
|
|
`String`. This signature gives us the clues we need in order to understand the
|
|
|
|
|
tricky bits of `+`.
|
|
|
|
|
|
|
|
|
|
First of all, `s2` has an `&`. This is because of the `s` argument in the `add`
|
|
|
|
|
function: we can only add a `&str` to a `String`, we can't add two `String`s
|
|
|
|
|
together. Remember back in Chapter 4 when we talked about how `&String` will
|
|
|
|
|
coerce to `&str`: we write `&s2` so that the `String` will coerce to the proper
|
|
|
|
|
type, `&str`.
|
|
|
|
|
|
|
|
|
|
Secondly, `add` takes ownership of `self`, which we can tell because `self`
|
|
|
|
|
does *not* have an `&` in the signature. This means `s1` in the above example
|
|
|
|
|
will be moved into the `add` call and no longer be a valid binding after that.
|
|
|
|
|
So while `let s3 = s1 + &s2;` looks like it will copy both strings and create a
|
|
|
|
|
new one, this statement actually takes ownership of `s1`, appends a copy of
|
|
|
|
|
`s2`'s contents, then returns ownership of the result. In other words, it looks
|
|
|
|
|
like it's making a lot of copies, but isn't: the implementation is more
|
|
|
|
|
efficient than copying.
|
|
|
|
|
|
|
|
|
|
If we need to concatenate multiple strings, this behavior of `+` gets
|
|
|
|
|
unwieldy:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let s1 = String::from("tic");
|
|
|
|
|
let s2 = String::from("tac");
|
|
|
|
|
let s3 = String::from("toe");
|
|
|
|
|
|
|
|
|
|
let s = s1 + "-" + &s2 + "-" + &s3;
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
`s` will be "tic-tac-toe" at this point. With all of the `+` and `"`
|
|
|
|
|
characters, it gets hard to see what's going on. For more complicated string
|
|
|
|
|
combining, we can use the `format!` macro:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let s1 = String::from("tic");
|
|
|
|
|
let s2 = String::from("tac");
|
|
|
|
|
let s3 = String::from("toe");
|
|
|
|
|
|
|
|
|
|
let s = format!("{}-{}-{}", s1, s2, s3);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This code will also set `s` to "tic-tac-toe". The `format!` macro works in the
|
|
|
|
|
same way as `println!`, but instead of printing the output to the screen, it
|
|
|
|
|
returns a `String` with the contents. This version is much easier to read than
|
|
|
|
|
all of the `+`s.
|
|
|
|
|
|
|
|
|
|
### Indexing into Strings
|
|
|
|
|
|
|
|
|
|
In many other languages, accessing individual characters in a string by
|
|
|
|
|
referencing the characters by index is a valid and common operation. In Rust,
|
|
|
|
|
however, if we try to access parts of a `String` using indexing syntax, we'll
|
|
|
|
|
get an error. That is, this code:
|
|
|
|
|
|
|
|
|
|
```rust,ignore
|
|
|
|
|
let s1 = String::from("hello");
|
|
|
|
|
let h = s1[0];
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
will result in this error:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
error: the trait bound `std::string::String: std::ops::Index<_>` is not
|
|
|
|
|
satisfied [--explain E0277]
|
|
|
|
|
|>
|
|
|
|
|
|> let h = s1[0];
|
|
|
|
|
|> ^^^^^
|
|
|
|
|
note: the type `std::string::String` cannot be indexed by `_`
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The error and the note tell the story: Rust strings don't support indexing. So
|
|
|
|
|
the follow-up question is, why not? In order to answer that, we have to talk a
|
|
|
|
|
bit about how Rust stores strings in memory.
|
|
|
|
|
|
|
|
|
|
#### Internal Representation
|
|
|
|
|
|
|
|
|
|
A `String` is a wrapper over a `Vec<u8>`. Let's take a look at some of our
|
|
|
|
|
properly-encoded UTF-8 example strings from before. First, this one:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let len = "Hola".len();
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
In this case, `len` will be four, which means the `Vec` storing the string
|
|
|
|
|
"Hola" is four bytes long: each of these letters takes one byte when encoded in
|
|
|
|
|
UTF-8. What about this example, though?
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let len = "Здравствуйте".len();
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
There are two answers that potentially make sense here: the first is 12, which
|
|
|
|
|
is the number of letters that a person would count if we asked someone how long
|
|
|
|
|
this string was. The second, though, is what Rust's answer is: 24. This is the
|
|
|
|
|
number of bytes that it takes to encode "Здравствуйте" in UTF-8, because each
|
|
|
|
|
character takes two bytes of storage.
|
|
|
|
|
|
|
|
|
|
By the same token, imagine this invalid Rust code:
|
|
|
|
|
|
|
|
|
|
```rust,ignore
|
|
|
|
|
let hello = "Здравствуйте";
|
|
|
|
|
let answer = &h[0];
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
What should the value of `answer` be? Should it be `З`, the first letter? When
|
|
|
|
|
encoded in UTF-8, the first byte of `З` is `208`, and the second is `151`. So
|
|
|
|
|
should `answer` be `208`? `208` is not a valid character on its own, though.
|
2016-10-05 22:06:55 +08:00
|
|
|
|
Plus, for Latin letters, this would not return the answer most people would
|
2016-09-28 02:03:07 +08:00
|
|
|
|
expect: `&"hello"[0]` would then return `104`, not `h`.
|
|
|
|
|
|
|
|
|
|
#### Bytes and Scalar Values and Grapheme Clusters! Oh my!
|
|
|
|
|
|
|
|
|
|
This leads to another point about UTF-8: there are really three relevant ways
|
|
|
|
|
to look at strings, from Rust's perspective: bytes, scalar values, and grapheme
|
|
|
|
|
clusters. If we look at the string "नमस्ते", it is ultimately stored as a `Vec`
|
|
|
|
|
of `u8` values that looks like this:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
[224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164, 224, 165, 135]
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
That's 18 bytes. But if we look at them as Unicode scalar values, which are
|
|
|
|
|
what Rust's `char` type is, those bytes look like this:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
['न', 'म', 'स', '्', 'त', 'े']
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
There are six `char` values here. Finally, if we look at them as grapheme
|
|
|
|
|
clusters, which is the closest thing to what humans would call 'letters', we'd
|
|
|
|
|
get this:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
["न", "म", "स्", "ते"]
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Four elements! It turns out that even within 'grapheme cluster', there are
|
|
|
|
|
multiple ways of grouping things. Convinced that strings are actually really
|
|
|
|
|
complicated yet?
|
|
|
|
|
|
|
|
|
|
Another reason that indexing into a `String` to get a character is not available
|
|
|
|
|
is that indexing operations are expected to always be fast. This isn't possible
|
|
|
|
|
with a `String`, since Rust would have to walk through the contents from the
|
|
|
|
|
beginning to the index to determine how many valid characters there were, no
|
|
|
|
|
matter how we define "character".
|
|
|
|
|
|
|
|
|
|
All of these problems mean that Rust does not implement `[]` for `String`, so
|
|
|
|
|
we cannot directly do this.
|
|
|
|
|
|
|
|
|
|
### Slicing Strings
|
|
|
|
|
|
|
|
|
|
However, indexing the bytes of a string is very useful, and is not expected to
|
|
|
|
|
be fast. While we can't use `[]` with a single number, we _can_ use `[]` with
|
|
|
|
|
a range to create a string slice from particular bytes:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let hello = "Здравствуйте";
|
|
|
|
|
|
|
|
|
|
let s = &hello[0..4];
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Here, `s` will be a `&str` that contains the first four bytes of the string.
|
|
|
|
|
Earlier, we mentioned that each of these characters was two bytes, so that means
|
|
|
|
|
that `s` will be "Зд".
|
|
|
|
|
|
|
|
|
|
What would happen if we did `&hello[0..1]`? The answer: it will panic at
|
|
|
|
|
runtime, in the same way that accessing an invalid index in a vector does:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
thread 'main' panicked at 'index 0 and/or 1 in `Здравствуйте` do not lie on
|
|
|
|
|
character boundary', ../src/libcore/str/mod.rs:1694
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Methods for Iterating Over Strings
|
|
|
|
|
|
|
|
|
|
If we do need to perform operations on individual characters, the best way to
|
|
|
|
|
do that is using the `chars` method. Calling `chars` on "नमस्ते" gives us the six
|
|
|
|
|
Rust `char` values:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
for c in "नमस्ते".chars() {
|
|
|
|
|
println!("{}", c);
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This code will print:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
न
|
|
|
|
|
म
|
|
|
|
|
स
|
|
|
|
|
्
|
|
|
|
|
त
|
|
|
|
|
े
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The `bytes` method returns each raw byte, which might be appropriate for your
|
|
|
|
|
domain, but remember that valid UTF-8 characters may be made up of more than
|
|
|
|
|
one byte:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
for b in "नमस्ते".bytes() {
|
|
|
|
|
println!("{}", b);
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This code will print the 18 bytes that make up this `String`, starting with:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
224
|
|
|
|
|
164
|
|
|
|
|
168
|
|
|
|
|
224
|
|
|
|
|
// ... etc
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
There are crates available on crates.io to get grapheme clusters from `String`s.
|
|
|
|
|
|
|
|
|
|
To summarize, strings are complicated. Different programming languages make
|
|
|
|
|
different choices about how to present this complexity to the programmer. Rust
|
|
|
|
|
has chosen to attempt to make correct handling of `String` data be the default
|
|
|
|
|
for all Rust programs, which does mean programmers have to put more thought
|
|
|
|
|
into handling UTF-8 data upfront. This tradeoff exposes us to more of the
|
|
|
|
|
complexity of strings than we have to handle in other languages, but will
|
|
|
|
|
prevent us from having to handle errors involving non-ASCII characters later in
|
|
|
|
|
our development lifecycle.
|
|
|
|
|
|
|
|
|
|
Let's switch to something a bit less complex: Hash Map!
|
|
|
|
|
|
|
|
|
|
## Hash Maps
|
|
|
|
|
|
|
|
|
|
The last of our fundamental collections is the *hash map*. The type `HashMap<K,
|
|
|
|
|
V>` stores a mapping of keys of type `K` to values of type `V`. It does this
|
|
|
|
|
via a *hashing function*, which determines how it places these keys and values
|
2016-10-05 22:06:55 +08:00
|
|
|
|
into memory. Many different programming languages support this kind of data
|
2016-09-28 02:03:07 +08:00
|
|
|
|
structure, but often with a different name: hash, map, object, hash table, or
|
|
|
|
|
associative array, just to name a few.
|
|
|
|
|
|
|
|
|
|
We'll go over the basic API in this chapter, but there are many more goodies
|
|
|
|
|
hiding in the functions defined on `HashMap` by the standard library. As always,
|
|
|
|
|
check the standard library documentation for more information.
|
|
|
|
|
|
|
|
|
|
### Creating a New Hash Map
|
|
|
|
|
|
|
|
|
|
We can create an empty `HashMap` with `new`, and add elements with `insert`:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
|
|
|
|
let mut map = HashMap::new();
|
|
|
|
|
|
|
|
|
|
map.insert(1, "hello");
|
|
|
|
|
map.insert(2, "world");
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Note that we need to `use` the `HashMap` from the collections portion of the
|
|
|
|
|
standard library. Of our three fundamental collections, this one is the least
|
|
|
|
|
often used, so it has a bit less support from the language. There's no built-in
|
|
|
|
|
macro to construct them, for example, and they're not in the prelude, so we
|
|
|
|
|
need to add a `use` statement for them.
|
|
|
|
|
|
|
|
|
|
Just like vectors, hash maps store their data on the heap. This `HashMap` has
|
|
|
|
|
keys of type `i32` and values of type `&str`. Like vectors, hash maps are
|
2016-10-05 22:06:55 +08:00
|
|
|
|
homogeneous: all of the keys must have the same type, and all of the values must
|
2016-09-28 02:03:07 +08:00
|
|
|
|
have the same type.
|
|
|
|
|
|
|
|
|
|
If we have a vector of tuples, we can convert it into a hash map with the
|
|
|
|
|
`collect` method. The first element in each tuple will be the key, and the
|
|
|
|
|
second element will be the value:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
|
|
|
|
let data = vec![(1, "hello"), (2, "world")];
|
|
|
|
|
|
|
|
|
|
let map: HashMap<_, _> = data.into_iter().collect();
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The type annotation `HashMap<_, _>` is needed here because it's possible to
|
|
|
|
|
`collect` into many different data structures, so Rust doesn't know which we
|
|
|
|
|
want. For the type parameters for the key and value types, however, we can use
|
|
|
|
|
underscores and Rust can infer the types that the hash map contains based on the
|
|
|
|
|
types of the data in our vector.
|
|
|
|
|
|
|
|
|
|
For types that implement the `Copy` trait like `i32` does, the values are
|
|
|
|
|
copied into the hash map. If we insert owned values like `String`, the values
|
|
|
|
|
will be moved and the hash map will be the owner of those values:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
|
|
|
|
let field_name = String::from("Favorite color");
|
|
|
|
|
let field_value = String::from("Blue");
|
|
|
|
|
|
|
|
|
|
let mut map = HashMap::new();
|
|
|
|
|
map.insert(field_name, field_value);
|
|
|
|
|
// field_name and field_value are invalid at this point
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
We would not be able to use the bindings `field_name` and `field_value` after
|
|
|
|
|
they have been moved into the hash map with the call to `insert`.
|
|
|
|
|
|
|
|
|
|
If we insert references to values, the values themselves will not be moved into
|
|
|
|
|
the hash map. The values that the references point to must be valid for at least
|
|
|
|
|
as long as the hash map is valid, though. We will talk more about these issues
|
|
|
|
|
in the Lifetimes section of Chapter 10.
|
|
|
|
|
|
|
|
|
|
### Accessing Values in a Hash Map
|
|
|
|
|
|
|
|
|
|
We can get a value out of the hash map by providing its key to the `get` method:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
|
|
|
|
let mut map = HashMap::new();
|
|
|
|
|
|
|
|
|
|
map.insert(1, "hello");
|
|
|
|
|
map.insert(2, "world");
|
|
|
|
|
|
|
|
|
|
let value = map.get(&2);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Here, `value` will have the value `Some("world")`, since that's the value
|
|
|
|
|
associated with the `2` key. "world" is wrapped in `Some` because `get` returns
|
|
|
|
|
an `Option<V>`. If there's no value for that key in the hash map, `get` will
|
|
|
|
|
return `None`.
|
|
|
|
|
|
|
|
|
|
We can iterate over each key/value pair in a hash map in a similar manner as we
|
|
|
|
|
do with vectors, using a `for` loop:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
|
|
|
|
let mut map = HashMap::new();
|
|
|
|
|
|
|
|
|
|
map.insert(1, "hello");
|
|
|
|
|
map.insert(2, "world");
|
|
|
|
|
|
|
|
|
|
for (key, value) in &map {
|
|
|
|
|
println!("{}: {}", key, value);
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This will print:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
1: hello
|
|
|
|
|
2: world
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Updating a Hash Map
|
|
|
|
|
|
|
|
|
|
Since each key can only have one value, when we want to change the data in a
|
|
|
|
|
hash map, we have to decide how to handle the case when a key already has a
|
|
|
|
|
value assigned. We could choose to replace the old value with the new value. We
|
|
|
|
|
could choose to keep the old value and ignore the new value, and only add the
|
|
|
|
|
new value if the key *doesn't* already have a value. Or we could change the
|
|
|
|
|
existing value. Let's look at how to do each of these!
|
|
|
|
|
|
|
|
|
|
#### Overwriting a Value
|
|
|
|
|
|
|
|
|
|
If we insert a key and a value, then insert that key with a different value,
|
|
|
|
|
the value associated with that key will be replaced. Even though this code
|
|
|
|
|
calls `insert` twice, the hash map will only contain one key/value pair, since
|
|
|
|
|
we're inserting with the key `1` both times:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
|
|
|
|
let mut map = HashMap::new();
|
|
|
|
|
|
|
|
|
|
map.insert(1, "hello");
|
|
|
|
|
map.insert(1, "Hi There");
|
|
|
|
|
|
|
|
|
|
println!("{:?}", map);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This will print `{1: "Hi There"}`.
|
|
|
|
|
|
|
|
|
|
#### Only Insert If the Key Has No Value
|
|
|
|
|
|
|
|
|
|
It's common to want to see if there's some sort of value already stored in the
|
|
|
|
|
hash map for a particular key, and if not, insert a value. hash maps have a
|
|
|
|
|
special API for this, called `entry`, that takes the key we want to check as an
|
|
|
|
|
argument:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
|
|
|
|
let mut map = HashMap::new();
|
|
|
|
|
map.insert(1, "hello");
|
|
|
|
|
|
|
|
|
|
let e = map.entry(2);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Here, the value bound to `e` is a special enum, `Entry`. An `Entry` represents a
|
|
|
|
|
value that might or might not exist. Let's say that we want to see if the key
|
|
|
|
|
`2` has a value associated with it. If it doesn't, we want to insert the value
|
|
|
|
|
"world". In both cases, we want to return the resulting value that now goes
|
|
|
|
|
with `2`. With the entry API, it looks like this:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
|
|
|
|
let mut map = HashMap::new();
|
|
|
|
|
|
|
|
|
|
map.insert(1, "hello");
|
|
|
|
|
|
|
|
|
|
map.entry(2).or_insert("world");
|
|
|
|
|
map.entry(1).or_insert("Hi There");
|
|
|
|
|
|
|
|
|
|
println!("{:?}", map);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The `or_insert` method on `Entry` does exactly this: returns the value for the
|
|
|
|
|
`Entry`'s key if it exists, and if not, inserts its argument as the new value
|
|
|
|
|
for the `Entry`'s key and returns that. This is much cleaner than writing the
|
|
|
|
|
logic ourselves, and in addition, plays more nicely with the borrow checker.
|
|
|
|
|
|
|
|
|
|
This code will print `{1: "hello", 2: "world"}`. The first call to `entry` will
|
|
|
|
|
insert the key `2` with the value "world", since `2` doesn't have a value
|
|
|
|
|
already. The second call to `entry` will not change the hash map since `1`
|
|
|
|
|
already has the value "hello".
|
|
|
|
|
|
|
|
|
|
#### Update a Value Based on the Old Value
|
|
|
|
|
|
|
|
|
|
Another common use case for hash maps is to look up a key's value then update
|
|
|
|
|
it, using the old value. For instance, if we wanted to count how many times
|
|
|
|
|
each word appeared in some text, we could use a hash map with the words as keys
|
|
|
|
|
and increment the value to keep track of how many times we've seen that word.
|
|
|
|
|
If this is the first time we've seen a word, we'll first insert the value `0`.
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
|
|
|
|
let text = "hello world wonderful world";
|
|
|
|
|
|
|
|
|
|
let mut map = HashMap::new();
|
|
|
|
|
|
|
|
|
|
for word in text.split_whitespace() {
|
|
|
|
|
let count = map.entry(word).or_insert(0);
|
|
|
|
|
*count += 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
println!("{:?}", map);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This will print `{"world": 2, "hello": 1, "wonderful": 1}`. The `or_insert`
|
|
|
|
|
method actually returns a mutable reference (`&mut V`) to the value in the
|
|
|
|
|
hash map for this key. Here we store that mutable reference in the `count`
|
|
|
|
|
variable binding, so in order to assign to that value we must first dereference
|
|
|
|
|
`count` using the asterisk (`*`). The mutable reference goes out of scope at
|
|
|
|
|
the end of the `for` loop, so all of these changes are safe and allowed by the
|
|
|
|
|
borrowing rules.
|
|
|
|
|
|
|
|
|
|
### Hashing Function
|
|
|
|
|
|
|
|
|
|
By default, `HashMap` uses a cryptographically secure hashing function that can
|
|
|
|
|
provide resistance to Denial of Service (DoS) attacks. This is not the fastest
|
|
|
|
|
hashing algorithm out there, but the tradeoff for better security that comes
|
|
|
|
|
with the drop in performance is a good default tradeoff to make. If you profile
|
|
|
|
|
your code and find that the default hash function is too slow for your
|
|
|
|
|
purposes, you can switch to another function by specifying a different
|
|
|
|
|
*hasher*. A hasher is an object that implements the `BuildHasher` trait. We'll
|
|
|
|
|
be talking about traits and how to implement them in Chapter 10.
|
|
|
|
|
|
|
|
|
|
## Summary
|
|
|
|
|
|
|
|
|
|
Vectors, strings, and hash maps will take you far in programs where you need to
|
|
|
|
|
store, access, and modify data. Some programs you are now equipped to write and
|
|
|
|
|
might want to try include:
|
|
|
|
|
|
|
|
|
|
* Given a list of integers, use a vector and return their mean (average),
|
|
|
|
|
median (when sorted, the value in the middle position), and mode (the value
|
|
|
|
|
that occurs most often; a hash map will be helpful here).
|
|
|
|
|
* Convert strings to Pig Latin, where the first consonant of each word gets
|
|
|
|
|
moved to the end with an added "ay", so "first" becomes "irst-fay". Words that
|
|
|
|
|
start with a vowel get an h instead ("apple" becomes "apple-hay"). Remember
|
|
|
|
|
about UTF-8 encoding!
|
|
|
|
|
* Using a hash map and vectors, create a text interface to allow a user to add
|
|
|
|
|
employee names to a department in the company. For example, "Add Sally to
|
|
|
|
|
Engineering" or "Add Ron to Sales". Then let the user retrieve a list of all
|
|
|
|
|
people in a department or all people in the company by department, sorted
|
|
|
|
|
alphabetically.
|
|
|
|
|
|
|
|
|
|
The standard library API documentation describes methods these types have that
|
|
|
|
|
will be helpful for these exercises!
|
|
|
|
|
|
|
|
|
|
We're getting into more complex programs where operations can fail, which means
|
|
|
|
|
it's a perfect time to go over error handling next!
|