2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
[TOC]
|
|
|
|
|
|
|
|
|
|
# Fundamental Collections
|
|
|
|
|
|
|
|
|
|
Rust's standard library includes a number of really useful data structures
|
2016-11-28 23:32:29 +08:00
|
|
|
|
called *collections*. Most other data types represent one specific value, but
|
|
|
|
|
collections can contain multiple values. Unlike the built-in array and tuple
|
|
|
|
|
types, the data these collections point to is stored on the heap, which means
|
|
|
|
|
the amount of data does not need to be known at compile time and can grow or
|
|
|
|
|
shrink as the program runs. Each kind of collection has different capabilities
|
|
|
|
|
and costs, and choosing an appropriate one for the situation you're in is a
|
|
|
|
|
skill you'll develop over time. In this chapter, we'll go over three
|
|
|
|
|
collections which are used very often in Rust programs:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
* A *vector* allows us to store a variable number of values next to each other.
|
|
|
|
|
* A *string* is a collection of characters. We've seen the `String` type
|
|
|
|
|
before, but we'll talk about it in depth now.
|
|
|
|
|
* A *hash map* allows us to associate a value with a particular key.
|
|
|
|
|
|
|
|
|
|
There are more specialized variants of each of these data structures for
|
|
|
|
|
particular situations, but these are the most fundamental and common. We're
|
|
|
|
|
going to discuss how to create and update each of the collections, as well as
|
|
|
|
|
what makes each special.
|
|
|
|
|
|
|
|
|
|
## Vectors
|
|
|
|
|
|
|
|
|
|
The first type we'll look at is `Vec<T>`, also known as a *vector*. Vectors
|
|
|
|
|
allow us to store more than one value in a single data structure that puts all
|
2016-11-28 23:32:29 +08:00
|
|
|
|
the values next to each other in memory. Vectors can only store values of the
|
|
|
|
|
same type. They are useful in situations where you have a list of items, such
|
|
|
|
|
as the lines of text in a file or the prices of items in a shopping cart.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
### Creating a New Vector
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
To create a new, empty vector, we can call the `Vec::new` function:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let v: Vec<i32> = Vec::new();
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
Note that we added a type annotation here. Since we aren't inserting any values
|
|
|
|
|
into this vector, Rust doesn't know what kind of elements we intend to store.
|
|
|
|
|
This is an important point. Vectors are homogenous: they may store many values,
|
|
|
|
|
but those values must all be the same type. Vectors are implemented using
|
|
|
|
|
generics, which Chapter 10 will cover how to use in your own types. For now,
|
|
|
|
|
all you need to know is that the `Vec` type provided by the standard library
|
|
|
|
|
can hold any type, and when a specific `Vec` holds a specific type, the type
|
|
|
|
|
goes within angle brackets. We've told Rust that the `Vec` in `v` will hold
|
2016-09-28 02:03:07 +08:00
|
|
|
|
elements of the `i32` type.
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
In real code, Rust can infer the type of value we want to store once we insert
|
|
|
|
|
values, so you rarely need to do this type annotation. It's more common to
|
|
|
|
|
create a `Vec` that has initial values, and Rust provides the `vec!` macro for
|
|
|
|
|
convenience. The macro will create a new `Vec` that holds the values we give
|
|
|
|
|
it. This will create a new `Vec<i32>` that holds the values `1`, `2`, and `3`:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let v = vec![1, 2, 3];
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Because we've given initial `i32` values, Rust can infer that the type of `v`
|
|
|
|
|
is `Vec<i32>`, and the type annotation isn't necessary. Let's look at how to
|
|
|
|
|
modify a vector next.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
### Updating a Vector
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
To create a vector then add elements to it, we can use the `push` method:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let mut v = Vec::new();
|
|
|
|
|
|
|
|
|
|
v.push(5);
|
|
|
|
|
v.push(6);
|
|
|
|
|
v.push(7);
|
|
|
|
|
v.push(8);
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
As with any variable as we discussed in Chapter 3, if we want to be able to
|
|
|
|
|
change its value, we need to make it mutable with the `mut` keyword. The
|
|
|
|
|
numbers we place inside are all `i32`s, and Rust infers this from the data, so
|
|
|
|
|
we don't need the `Vec<i32>` annotation.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
### Dropping a Vector Drops its Elements
|
|
|
|
|
|
|
|
|
|
Like any other `struct`, a vector will be freed when it goes out of scope:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
{
|
|
|
|
|
let v = vec![1, 2, 3, 4];
|
|
|
|
|
|
|
|
|
|
// do stuff with v
|
|
|
|
|
|
|
|
|
|
} // <- v goes out of scope and is freed here
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
When the vector gets dropped, all of its contents will also be dropped, meaning
|
|
|
|
|
those integers it holds will be cleaned up. This may seem like a
|
2016-09-28 02:03:07 +08:00
|
|
|
|
straightforward point, but can get a little more complicated once we start to
|
|
|
|
|
introduce references to the elements of the vector. Let's tackle that next!
|
|
|
|
|
|
|
|
|
|
### Reading Elements of Vectors
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
Now that you know how to create, update, and destroy vectors, knowing how to
|
|
|
|
|
read their contents is a good next step. There are two ways to reference a
|
|
|
|
|
value stored in a vector. In the examples, we've annotated the types of the
|
|
|
|
|
values that are returned from these functions for extra clarity.
|
|
|
|
|
|
|
|
|
|
This example shows both methods of accessing a value in a vector either with
|
|
|
|
|
indexing syntax or the `get` method:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let v = vec![1, 2, 3, 4, 5];
|
|
|
|
|
|
|
|
|
|
let third: &i32 = &v[2];
|
|
|
|
|
let third: Option<&i32> = v.get(2);
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
There are a few things to note here. First, that we use the index value of `2`
|
|
|
|
|
to get the third element: vectors are indexed by number, starting at zero.
|
|
|
|
|
Second, the two different ways to get the third element are: using `&` and
|
|
|
|
|
`[]`s, which gives us a reference, or using the `get` method with the index
|
|
|
|
|
passed as an argument, which gives us an `Option<&T>`.
|
|
|
|
|
|
|
|
|
|
The reason Rust has two ways to reference an element is so that you can choose
|
|
|
|
|
how the program behaves when you try to use an index value that the vector
|
|
|
|
|
doesn't have an element for. As an example, what should a program do if it has
|
|
|
|
|
a vector that holds five elements then tries to access an element at index 100
|
|
|
|
|
like this:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust,should_panic
|
|
|
|
|
let v = vec![1, 2, 3, 4, 5];
|
|
|
|
|
|
|
|
|
|
let does_not_exist = &v[100];
|
|
|
|
|
let does_not_exist = v.get(100);
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
When you run this, you will find that with the first `[]` method, Rust will
|
|
|
|
|
cause a `panic!` when a non-existent element is referenced. This method would
|
|
|
|
|
be preferable if you want your program to consider an attempt to access an
|
|
|
|
|
element past the end of the vector to be a fatal error that should crash the
|
|
|
|
|
program.
|
|
|
|
|
|
|
|
|
|
When the `get` method is passed an index that is outside the array, it will
|
|
|
|
|
return `None` without `panic!`ing. You would use this if accessing an element
|
|
|
|
|
beyond the range of the vector will happen occasionally under normal
|
|
|
|
|
circumstances. Your code can then have logic to handle having either
|
|
|
|
|
`Some(&element)` or `None`, as we discussed in Chapter 6. For example, the
|
|
|
|
|
index could be coming from a person entering a number. If they accidentally
|
|
|
|
|
enter a number that's too large and your program gets a `None` value, you could
|
|
|
|
|
tell the user how many items are in the current `Vec` and give them another
|
|
|
|
|
chance to enter a valid value. That would be more user-friendly than crashing
|
|
|
|
|
the program for a typo!
|
|
|
|
|
|
|
|
|
|
#### Invalid References
|
|
|
|
|
|
|
|
|
|
Once the program has a valid reference, the borrow checker will enforce the
|
|
|
|
|
ownership and borrowing rules covered in Chapter 4 to ensure this reference and
|
|
|
|
|
any other references to the contents of the vector stay valid. Recall the rule
|
|
|
|
|
that says we can't have mutable and immutable references in the same scope.
|
|
|
|
|
That rule applies in this example, where we hold an immutable reference to the
|
|
|
|
|
first element in a vector and try to add an element to the end:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust,ignore
|
|
|
|
|
let mut v = vec![1, 2, 3, 4, 5];
|
|
|
|
|
|
|
|
|
|
let first = &v[0];
|
|
|
|
|
|
|
|
|
|
v.push(6);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Compiling this will give us this error:
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
```text
|
|
|
|
|
error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable
|
|
|
|
|
|
|
|
|
|
|
4 | let first = &v[0];
|
|
|
|
|
| - immutable borrow occurs here
|
|
|
|
|
5 |
|
|
|
|
|
6 | v.push(6);
|
|
|
|
|
| ^ mutable borrow occurs here
|
|
|
|
|
7 | }
|
|
|
|
|
| - immutable borrow ends here
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This code might look like it should work: why should a reference to the first
|
|
|
|
|
element care about what changes about the end of the vector? The reason why
|
|
|
|
|
this code isn't allowed is due to the way vectors work. Adding a new element
|
|
|
|
|
onto the end of the vector might require allocating new memory and copying the
|
|
|
|
|
old elements over to the new space, in the circumstance that there isn't enough
|
|
|
|
|
room to put all the elements next to each other where the vector was. In that
|
|
|
|
|
case, the reference to the first element would be pointing to deallocated
|
|
|
|
|
memory. The borrowing rules prevent programs from ending up in that situation.
|
|
|
|
|
|
|
|
|
|
> Note: For more on this, see The Nomicon at *https://doc.rust-lang.org/stable/nomicon/vec.html*.
|
|
|
|
|
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
### Using an Enum to Store Multiple Types
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
At the beginning of this chapter, we said that vectors can only store values
|
|
|
|
|
that are all the same type. This can be inconvenient; there are definitely use
|
|
|
|
|
cases for needing to store a list of things of different types. Luckily, the
|
|
|
|
|
variants of an enum are all defined under the same enum type. When we need to
|
|
|
|
|
store elements of a different type in a vector this scenario, we can define and
|
|
|
|
|
use an enum!
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
For example, let's say we want to get values from a row in a spreadsheet, where
|
|
|
|
|
some of the columns in the row contain integers, some floating point numbers,
|
2016-09-28 02:03:07 +08:00
|
|
|
|
and some strings. We can define an enum whose variants will hold the different
|
2016-11-28 23:32:29 +08:00
|
|
|
|
value types, and then all of the enum variants will be considered the same
|
|
|
|
|
type, that of the enum. Then we can create a vector that holds that enum and
|
|
|
|
|
so, ultimately, holds different types:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
enum SpreadsheetCell {
|
|
|
|
|
Int(i32),
|
|
|
|
|
Float(f64),
|
|
|
|
|
Text(String),
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
let row = vec![
|
|
|
|
|
SpreadsheetCell::Int(3),
|
|
|
|
|
SpreadsheetCell::Text(String::from("blue")),
|
|
|
|
|
SpreadsheetCell::Float(10.12),
|
|
|
|
|
];
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
The reason Rust needs to know exactly what types will be in the vector at
|
|
|
|
|
compile time is so that it knows exactly how much memory on the heap will be
|
|
|
|
|
needed to store each element. A secondary advantage to this is that we can be
|
|
|
|
|
explicit about what types are allowed in this vector. If Rust allowed a vector
|
|
|
|
|
to hold any type, there would be a chance that one or more of the types would
|
|
|
|
|
cause errors with the operations performed on the elements of the vector. Using
|
|
|
|
|
an enum plus a `match` means that Rust will ensure at compile time that we
|
|
|
|
|
always handle every possible case, as we discussed in Chapter 6.
|
|
|
|
|
|
|
|
|
|
<!-- Can you briefly explain what the match is doing here, as a recap? How does
|
|
|
|
|
it mean we always handle every possible case? I'm not sure it's totally clear.
|
|
|
|
|
-->
|
|
|
|
|
<!-- Because this is a focus of chapter 6 rather than this chapter's focus, we
|
|
|
|
|
don't think we should repeat it here as well, but we added a reference. /Carol
|
|
|
|
|
-->
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
If you don't know at the time that you're writing a program the exhaustive set
|
|
|
|
|
of types the program will get at runtime to store in a vector, the enum
|
|
|
|
|
technique won't work. Insetad, you can use a trait object, which we'll cover in
|
|
|
|
|
Chapter 13.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
Now that we've gone over some of the most common ways to use vectors, be sure
|
2016-11-28 23:32:29 +08:00
|
|
|
|
to take a look at the API documentation for all of the many useful methods
|
|
|
|
|
defined on `Vec` by the standard library. For example, in addition to `push`
|
|
|
|
|
there's a `pop` method that will remove and return the last element. Let's move
|
|
|
|
|
on to the next collection type: `String`!
|
|
|
|
|
|
|
|
|
|
<!-- Do you mean the Rust online documentation here? Are you not including it
|
|
|
|
|
in the book for space reasons? We might want to justify sending them out of the
|
|
|
|
|
book if we don't want to cover it here -->
|
|
|
|
|
|
|
|
|
|
<!-- Yes, there are many, many methods on Vec: https://doc.rust-lang.org/stable/std/vec/struct.Vec.html
|
|
|
|
|
Also there are occcasionally new methods available with new versions of the
|
|
|
|
|
language, so there's no way we can be comprehensive here. We want the reader to
|
|
|
|
|
use the API documentation in these situations since the purpose of the online
|
|
|
|
|
docs is to be comprehensive and up to date. I personally wouldn't expect a book
|
|
|
|
|
like this to duplicate the info that's in the API docs, so I don't think a
|
|
|
|
|
justification is necessary here. /Carol -->
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
## Strings
|
|
|
|
|
|
|
|
|
|
We've already talked about strings a bunch in Chapter 4, but let's take a more
|
2016-11-28 23:32:29 +08:00
|
|
|
|
in-depth look at them now. Strings are an area that new Rustaceans commonly get
|
|
|
|
|
stuck on. This is due to a combination of three things: Rust's propensity for
|
|
|
|
|
making sure to expose possible errors, strings being a more complicated data
|
|
|
|
|
structure than many programmers give them credit for, and UTF-8. These things
|
|
|
|
|
combine in a way that can seem difficult when coming from other languages.
|
|
|
|
|
|
|
|
|
|
The reason Strings are in the collections chapter is that strings are
|
|
|
|
|
implemented as a collection of bytes plus some methods to provide useful
|
|
|
|
|
functionality when those bytes are interpreted as text. In this section, we'll
|
|
|
|
|
talk about the operations on `String` that every collection type has, like
|
|
|
|
|
creating, updating, and reading. We'll also discuss the ways in which `String`
|
|
|
|
|
is different than the other collections, namely how indexing into a `String` is
|
|
|
|
|
complicated by the differences in which people and computers interpret `String`
|
|
|
|
|
data.
|
|
|
|
|
|
|
|
|
|
### What is a String?
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
Before we can dig into those aspects, we need to talk about what exactly we
|
2016-11-28 23:32:29 +08:00
|
|
|
|
mean by the term 'string'. Rust actually only has one string type in the core
|
|
|
|
|
language itself: `str`, the string slice, which is usually seen in its borrowed
|
|
|
|
|
form, `&str`. We talked about *string slices* in Chapter 4: these are a
|
|
|
|
|
reference to some UTF-8 encoded string data stored elsewhere. String literals,
|
|
|
|
|
for example, are stored in the binary output of the program, and are therefore
|
|
|
|
|
string slices.
|
|
|
|
|
|
|
|
|
|
The type called `String` is provided in Rust's standard library rather than
|
|
|
|
|
coded into the core language, and is a growable, mutable, owned, UTF-8 encoded
|
|
|
|
|
string type. When Rustaceans talk about 'strings' in Rust, they usually mean
|
|
|
|
|
both the `String` and the string slice `&str` types, not just one of those.
|
|
|
|
|
This section is largely about `String`, but both these types are used heavily
|
|
|
|
|
in Rust's standard library. Both `String` and string slices are UTF-8 encoded.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
Rust's standard library also includes a number of other string types, such as
|
|
|
|
|
`OsString`, `OsStr`, `CString`, and `CStr`. Library crates may provide even
|
2016-11-28 23:32:29 +08:00
|
|
|
|
more options for storing string data. Similar to the `*String`/`*Str` naming,
|
2016-09-28 02:03:07 +08:00
|
|
|
|
they often provide an owned and borrowed variant, just like `String`/`&str`.
|
|
|
|
|
These string types may store different encodings or be represented in memory in
|
|
|
|
|
a different way, for example. We won't be talking about these other string
|
|
|
|
|
types in this chapter; see their API documentation for more about how to use
|
|
|
|
|
them and when each is appropriate.
|
|
|
|
|
|
|
|
|
|
### Creating a New String
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
Many of the same operations available with `Vec` are available with `String` as
|
|
|
|
|
well, starting with the `new` function to create a string, like so:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let s = String::new();
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
This creates a new empty string called `s` that we can then load data into.
|
|
|
|
|
|
|
|
|
|
Often, we'll have some initial data that we'd like to start the string off
|
|
|
|
|
with. For that, we use the `to_string` method, which is available on any type
|
|
|
|
|
that implements the `Display` trait, which string literals do:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let data = "initial contents";
|
|
|
|
|
|
|
|
|
|
let s = data.to_string();
|
|
|
|
|
|
|
|
|
|
// the method also works on a literal directly:
|
|
|
|
|
let s = "initial contents".to_string();
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
This creates a string containing `initial contents`.
|
|
|
|
|
|
|
|
|
|
We can also use the function `String::from` to create a `String` from a string
|
|
|
|
|
literal. This is equivalent to using `to_string`:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
2016-11-28 23:32:29 +08:00
|
|
|
|
let s = String::from("initial contents");
|
2016-09-28 02:03:07 +08:00
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
Because strings are used for so many things, there are many different generic
|
|
|
|
|
APIs that can be used for strings, so there are a lot of options. Some of them
|
|
|
|
|
can feel redundant, but they all have their place! In this case, `String::from`
|
|
|
|
|
and `.to_string` end up doing the exact same thing, so which you choose is a
|
|
|
|
|
matter of style.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
Remember that strings are UTF-8 encoded, so we can include any properly encoded
|
|
|
|
|
data in them:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let hello = "السلام عليكم";
|
|
|
|
|
let hello = "Dobrý den";
|
|
|
|
|
let hello = "Hello";
|
|
|
|
|
let hello = "שָׁלוֹם";
|
|
|
|
|
let hello = "नमस्ते";
|
|
|
|
|
let hello = "こんにちは";
|
|
|
|
|
let hello = "안녕하세요";
|
|
|
|
|
let hello = "你好";
|
|
|
|
|
let hello = "Olá";
|
|
|
|
|
let hello = "Здравствуйте";
|
|
|
|
|
let hello = "Hola";
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Updating a String
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
A `String` can can grow in size and its contents can change just like the
|
|
|
|
|
contents of a `Vec`, by pushing more data into it. In addition, `String` has
|
|
|
|
|
concatenation operations implemented with the `+` operator for convenience.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
#### Appending to a String with Push
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
We can grow a `String` by using the `push_str` method to append a string slice:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let mut s = String::from("foo");
|
|
|
|
|
s.push_str("bar");
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
`s` will contain "foobar" after these two lines. The `push_str` method takes a
|
|
|
|
|
string slice because we don't necessarily want to take ownership of the
|
|
|
|
|
argument. For example, it would be unfortunate if we weren't able to use `s2`
|
|
|
|
|
after appending its contents to `s1`:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
2016-11-28 23:32:29 +08:00
|
|
|
|
let mut s1 = String::from("foo");
|
|
|
|
|
let s2 = String::from("bar");
|
|
|
|
|
s1.push_str(&s2);
|
2016-09-28 02:03:07 +08:00
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
The `push` method is defined to take a single character as an argument and add
|
|
|
|
|
it to the `String`:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
2016-11-28 23:32:29 +08:00
|
|
|
|
let mut s = String::from("lo");
|
|
|
|
|
s.push('l');
|
2016-09-28 02:03:07 +08:00
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
After this, `s` will contain "lol".
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
#### Concatenation with the + Operator or the `format!` Macro
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
Often, we'll want to combine two existing strings together. One way is to use
|
|
|
|
|
the `+` operator like this:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let s1 = String::from("Hello, ");
|
|
|
|
|
let s2 = String::from("world!");
|
2016-11-28 23:32:29 +08:00
|
|
|
|
let s3 = s1 + &s2; // Note that s1 has been moved here and can no longer be used
|
2016-09-28 02:03:07 +08:00
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
After this code the String `s3` will contain `Hello, world!`. The reason that
|
|
|
|
|
`s1` is no longer valid after the addition and the reason that we used a
|
|
|
|
|
reference to `s2` has to do with the signature of the method that gets called
|
|
|
|
|
when we use the `+` operator. The `+` operator uses the `add` method, whose
|
|
|
|
|
signature looks something like this:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust,ignore
|
|
|
|
|
fn add(self, s: &str) -> String {
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
This isn't the exact signature that's in the standard library; there `add` is
|
|
|
|
|
defined using generics. Here, we're looking at the signature of `add` with
|
|
|
|
|
concrete types substituted for the generic ones, which is what happens when we
|
|
|
|
|
call this method with `String` values. This signature gives us the clues we
|
|
|
|
|
need to understand the tricky bits of the `+` operator.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
First of all, `s2` has an `&`, meaning that we are adding a *reference* of the
|
|
|
|
|
second string to the first string. This is because of the `s` argument in the
|
|
|
|
|
`add` function: we can only add a `&str` to a `String`, we can't add two
|
|
|
|
|
`String`s together. Remember back in Chapter 4 when we talked about how
|
|
|
|
|
`&String` will coerce to `&str`: we write `&s2` so that the `String` will
|
|
|
|
|
coerce to the proper type, `&str`. Because this method does not take ownership
|
|
|
|
|
of the argument, `s2` will still be valid after this operation.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
Second, we can see in the signature that `add` takes ownership of `self`,
|
|
|
|
|
because `self` does *not* have an `&`. This means `s1` in the above example
|
|
|
|
|
will be moved into the `add` call and no longer be valid after that. So while
|
|
|
|
|
`let s3 = s1 + &s2;` looks like it will copy both strings and create a new one,
|
|
|
|
|
this statement actually takes ownership of `s1`, appends a copy of `s2`'s
|
|
|
|
|
contents, then returns ownership of the result. In other words, it looks like
|
|
|
|
|
it's making a lot of copies, but isn't: the implementation is more efficient
|
|
|
|
|
than copying.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
If we need to concatenate multiple strings, the behavior of `+` gets unwieldy:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let s1 = String::from("tic");
|
|
|
|
|
let s2 = String::from("tac");
|
|
|
|
|
let s3 = String::from("toe");
|
|
|
|
|
|
|
|
|
|
let s = s1 + "-" + &s2 + "-" + &s3;
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
`s` will be "tic-tac-toe" at this point. With all of the `+` and `"`
|
|
|
|
|
characters, it gets hard to see what's going on. For more complicated string
|
|
|
|
|
combining, we can use the `format!` macro:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let s1 = String::from("tic");
|
|
|
|
|
let s2 = String::from("tac");
|
|
|
|
|
let s3 = String::from("toe");
|
|
|
|
|
|
|
|
|
|
let s = format!("{}-{}-{}", s1, s2, s3);
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
<!-- Are we going to discuss the format macro elsewhere at all? If not, some
|
|
|
|
|
more info here might be good, this seems like a really useful tool. Is it only
|
|
|
|
|
used on strings? -->
|
|
|
|
|
|
|
|
|
|
<!-- No, we weren't planning on it. We thought it would be sufficient to
|
|
|
|
|
mention that it works the same way as `println!` since we've covered how
|
|
|
|
|
`println!` works in Ch 2, "Printing Values with `println!` Placeholders" and Ch
|
|
|
|
|
5, Ch 5, "Adding Useful Functionality with Derived Traits". `format!` can be
|
|
|
|
|
used on anything that `println!` can; using `{}` in the format string works
|
|
|
|
|
with anything that implements the `Display` trait and `{:?}` works with
|
|
|
|
|
anything that implements the `Debug` trait. Do you have any thoughts on how we
|
|
|
|
|
could make the similarities with `format!` and `println!` clearer than what we
|
|
|
|
|
have in the next paragraph without repeating the `println!` content too much?
|
|
|
|
|
/Carol -->
|
|
|
|
|
|
2016-09-28 02:03:07 +08:00
|
|
|
|
This code will also set `s` to "tic-tac-toe". The `format!` macro works in the
|
|
|
|
|
same way as `println!`, but instead of printing the output to the screen, it
|
2016-11-28 23:32:29 +08:00
|
|
|
|
returns a `String` with the contents. This version is much easier to read, and
|
|
|
|
|
also does not take ownership of any of its arguments.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
### Indexing into Strings
|
|
|
|
|
|
|
|
|
|
In many other languages, accessing individual characters in a string by
|
2016-11-28 23:32:29 +08:00
|
|
|
|
referencing them by index is a valid and common operation. In Rust, however, if
|
|
|
|
|
we try to access parts of a `String` using indexing syntax, we'll get an error.
|
|
|
|
|
That is, this code:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust,ignore
|
|
|
|
|
let s1 = String::from("hello");
|
|
|
|
|
let h = s1[0];
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
will result in this error:
|
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
error: the trait bound `std::string::String: std::ops::Index<_>` is not
|
|
|
|
|
satisfied [--explain E0277]
|
|
|
|
|
|>
|
|
|
|
|
|> let h = s1[0];
|
|
|
|
|
|> ^^^^^
|
|
|
|
|
note: the type `std::string::String` cannot be indexed by `_`
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The error and the note tell the story: Rust strings don't support indexing. So
|
|
|
|
|
the follow-up question is, why not? In order to answer that, we have to talk a
|
|
|
|
|
bit about how Rust stores strings in memory.
|
|
|
|
|
|
|
|
|
|
#### Internal Representation
|
|
|
|
|
|
|
|
|
|
A `String` is a wrapper over a `Vec<u8>`. Let's take a look at some of our
|
|
|
|
|
properly-encoded UTF-8 example strings from before. First, this one:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let len = "Hola".len();
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
In this case, `len` will be four, which means the `Vec` storing the string
|
|
|
|
|
"Hola" is four bytes long: each of these letters takes one byte when encoded in
|
|
|
|
|
UTF-8. What about this example, though?
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let len = "Здравствуйте".len();
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
A person asked how long the string is might say 12. However, Rust's answer
|
|
|
|
|
is 24. This is the number of bytes that it takes to encode "Здравствуйте" in
|
|
|
|
|
UTF-8, since each character takes two bytes of storage. Therefore, an index
|
|
|
|
|
into the string's bytes will not always correlate to a valid character.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
To demonstrate, consider this invalid Rust code:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust,ignore
|
|
|
|
|
let hello = "Здравствуйте";
|
2016-11-28 23:32:29 +08:00
|
|
|
|
let answer = &hello[0];
|
2016-09-28 02:03:07 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
What should the value of `answer` be? Should it be `З`, the first letter? When
|
2016-11-28 23:32:29 +08:00
|
|
|
|
encoded in UTF-8, the first byte of `З` is `208`, and the second is `151`, so
|
|
|
|
|
`answer` should in fact be `208`, but `208` is not a valid character on its
|
|
|
|
|
own. Returning `208` is likely not what a person would want if they asked for
|
|
|
|
|
the first letter of this string, but that's the only data that Rust has at byte
|
|
|
|
|
index 0. Returning the byte value is probably not what people want, even with
|
|
|
|
|
only latin letters: `&"hello"[0]` would return `104`, not `h`. To avoid
|
|
|
|
|
returning an unexpected value and causing bugs that might not be discovered
|
|
|
|
|
immediately, Rust chooses to not compile this code at all and prevent
|
|
|
|
|
misunderstandings earlier.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
#### Bytes and Scalar Values and Grapheme Clusters! Oh my!
|
|
|
|
|
|
|
|
|
|
This leads to another point about UTF-8: there are really three relevant ways
|
2016-11-28 23:32:29 +08:00
|
|
|
|
to look at strings, from Rust's perspective: as bytes, scalar values, and
|
|
|
|
|
grapheme clusters (the closest thing to what people would call 'letters').
|
|
|
|
|
|
|
|
|
|
If we look at the Hindi word "नमस्ते" written in the Devanagari script, it is
|
|
|
|
|
ultimately stored as a `Vec` of `u8` values that looks like this:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
[224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164, 224, 165, 135]
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
That's 18 bytes, and is how computers ultimately store this data. If we look at
|
|
|
|
|
them as Unicode scalar values, which are what Rust's `char` type is, those
|
|
|
|
|
bytes look like this:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
['न', 'म', 'स', '्', 'त', 'े']
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
There are six `char` values here, but the fourth and sixth are not letters,
|
|
|
|
|
they're diacritics that don't make sense on their own. Finally, if we look at
|
|
|
|
|
them as grapheme clusters, we'd get what a person would call the four letters
|
|
|
|
|
that make up this word:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
["न", "म", "स्", "ते"]
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
Rust provides different ways of interpreting the raw string data that computers
|
|
|
|
|
store so that each program can choose the interpretation it needs, no matter
|
|
|
|
|
what human language the data is in.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
A final reason Rust does not allow you to index into a `String` to get a
|
|
|
|
|
character is that indexing operations are expected to always take constant time
|
|
|
|
|
(O(1)). It isn't possible to guarantee that performance with a `String`,
|
|
|
|
|
though, since Rust would have to walk through the contents from the beginning
|
|
|
|
|
to the index to determine how many valid characters there were.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
All of these problems mean that Rust does not implement `[]` for `String`, so
|
|
|
|
|
we cannot directly do this.
|
|
|
|
|
|
|
|
|
|
### Slicing Strings
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
However, indexing the *bytes* of a string is very useful, and is not expected
|
|
|
|
|
to be fast. While we can't use `[]` with a single number, we _can_ use `[]`
|
|
|
|
|
with a range to create a string slice containing particular bytes:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let hello = "Здравствуйте";
|
|
|
|
|
|
|
|
|
|
let s = &hello[0..4];
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Here, `s` will be a `&str` that contains the first four bytes of the string.
|
2016-11-28 23:32:29 +08:00
|
|
|
|
Earlier, we mentioned that each of these characters was two bytes, so that
|
|
|
|
|
means that `s` will be "Зд".
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
What would happen if we did `&hello[0..1]`? The answer: it will panic at
|
|
|
|
|
runtime, in the same way that accessing an invalid index in a vector does:
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
```text
|
2016-09-28 02:03:07 +08:00
|
|
|
|
thread 'main' panicked at 'index 0 and/or 1 in `Здравствуйте` do not lie on
|
|
|
|
|
character boundary', ../src/libcore/str/mod.rs:1694
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
You should use this with caution, since it can cause your program to crash.
|
|
|
|
|
|
2016-09-28 02:03:07 +08:00
|
|
|
|
### Methods for Iterating Over Strings
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
Luckily, there are other ways we can access elements in a String.
|
|
|
|
|
|
|
|
|
|
If we need to perform operations on individual characters, the best way to do
|
|
|
|
|
so is to use the `chars` method. Calling `chars` on "नमस्ते" separates out and
|
|
|
|
|
returns six values of type `char`, and you can iterate over the result in order
|
|
|
|
|
to access each element:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
for c in "नमस्ते".chars() {
|
|
|
|
|
println!("{}", c);
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This code will print:
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
```text
|
2016-09-28 02:03:07 +08:00
|
|
|
|
न
|
|
|
|
|
म
|
|
|
|
|
स
|
|
|
|
|
्
|
|
|
|
|
त
|
|
|
|
|
े
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The `bytes` method returns each raw byte, which might be appropriate for your
|
2016-11-28 23:32:29 +08:00
|
|
|
|
domain:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
for b in "नमस्ते".bytes() {
|
|
|
|
|
println!("{}", b);
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This code will print the 18 bytes that make up this `String`, starting with:
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
```text
|
2016-09-28 02:03:07 +08:00
|
|
|
|
224
|
|
|
|
|
164
|
|
|
|
|
168
|
|
|
|
|
224
|
|
|
|
|
// ... etc
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
But make sure to remember that valid UTF-8 characters may be made up of more
|
|
|
|
|
than one byte.
|
|
|
|
|
|
|
|
|
|
Getting grapheme clusters from `String`s is complex, so this functionality is
|
|
|
|
|
not provided by the standard library. There are crates available on crates.io
|
|
|
|
|
if this is the functionality you need.
|
|
|
|
|
|
|
|
|
|
<!-- Can you recommend some, or maybe just say why we aren't outlining the
|
|
|
|
|
method here, ie it's complicated and therefore best to use a crate? -->
|
|
|
|
|
|
|
|
|
|
<!-- We're trying not to mention too many crates in the book. Most crates are
|
|
|
|
|
provided by the community, so we don't want to mention some and not others and
|
|
|
|
|
seem biased towards certain crates, plus crates can change more quickly (and
|
|
|
|
|
new crates can be created) than the language and this book will. /Carol -->
|
|
|
|
|
|
|
|
|
|
### Strings are Not so Simple
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
To summarize, strings are complicated. Different programming languages make
|
|
|
|
|
different choices about how to present this complexity to the programmer. Rust
|
2016-11-28 23:32:29 +08:00
|
|
|
|
has chosen to make the correct handling of `String` data the default behavior
|
2016-09-28 02:03:07 +08:00
|
|
|
|
for all Rust programs, which does mean programmers have to put more thought
|
2016-11-28 23:32:29 +08:00
|
|
|
|
into handling UTF-8 data upfront. This tradeoff exposes more of the complexity
|
|
|
|
|
of strings than other programming languages do, but this will prevent you from
|
|
|
|
|
having to handle errors involving non-ASCII characters later in your
|
|
|
|
|
development lifecycle.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
Let's switch to something a bit less complex: Hash Map!
|
|
|
|
|
|
|
|
|
|
## Hash Maps
|
|
|
|
|
|
|
|
|
|
The last of our fundamental collections is the *hash map*. The type `HashMap<K,
|
|
|
|
|
V>` stores a mapping of keys of type `K` to values of type `V`. It does this
|
|
|
|
|
via a *hashing function*, which determines how it places these keys and values
|
2016-10-05 22:06:55 +08:00
|
|
|
|
into memory. Many different programming languages support this kind of data
|
2016-09-28 02:03:07 +08:00
|
|
|
|
structure, but often with a different name: hash, map, object, hash table, or
|
|
|
|
|
associative array, just to name a few.
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
Hash maps are useful for when you want to be able to look up data not by an
|
|
|
|
|
index, as you can with vectors, but by using a key that can be of any type. For
|
|
|
|
|
example, in a game, you could keep track of each team's score in a hash map
|
|
|
|
|
where each key is a team's name and the values are each team's score. Given a
|
|
|
|
|
team name, you can retrieve their score.
|
|
|
|
|
|
|
|
|
|
We'll go over the basic API of hash maps in this chapter, but there are many
|
|
|
|
|
more goodies hiding in the functions defined on `HashMap` by the standard
|
|
|
|
|
library. As always, check the standard library documentation for more
|
|
|
|
|
information.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
### Creating a New Hash Map
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
We can create an empty `HashMap` with `new`, and add elements with `insert`.
|
|
|
|
|
Here we're keeping track of the scores of two teams whose names are Blue and
|
|
|
|
|
Yellow. The Blue team will start with 10 points and the Yellow team starts with
|
|
|
|
|
50:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
let mut scores = HashMap::new();
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
scores.insert(String::from("Blue"), 10);
|
|
|
|
|
scores.insert(String::from("Yellow"), 50);
|
2016-09-28 02:03:07 +08:00
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
Note that we need to first `use` the `HashMap` from the collections portion of
|
|
|
|
|
the standard library. Of our three fundamental collections, this one is the
|
|
|
|
|
least often used, so it's not included in the features imported automatically
|
|
|
|
|
in the prelude. Hash maps also have less support from the standard library;
|
|
|
|
|
there's no built-in macro to construct them, for example.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
Just like vectors, hash maps store their data on the heap. This `HashMap` has
|
|
|
|
|
keys of type `i32` and values of type `&str`. Like vectors, hash maps are
|
2016-11-28 23:32:29 +08:00
|
|
|
|
homogenous: all of the keys must have the same type, and all of the values must
|
2016-09-28 02:03:07 +08:00
|
|
|
|
have the same type.
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
Another way of constructing a hash map is by using the `collect` method on a
|
|
|
|
|
vector of tuples, where each tuple consists of a key and its value. The
|
|
|
|
|
`collect` method gathers up data into a number of collection types, including
|
|
|
|
|
`HashMap`. For example, if we had the team names and initial scores in two
|
|
|
|
|
separate vectors, we can use the `zip` method to create a vector of tuples
|
|
|
|
|
where "Blue" is paired with 10, and so forth. Then we can use the `collect`
|
|
|
|
|
method to turn that vector of tuples into a `HashMap`:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
let teams = vec![String::from("Blue"), String::from("Yellow")];
|
|
|
|
|
let initial_scores = vec![10, 50];
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
let scores: HashMap<_, _> = teams.iter().zip(initial_scores.iter()).collect();
|
2016-09-28 02:03:07 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The type annotation `HashMap<_, _>` is needed here because it's possible to
|
2016-11-28 23:32:29 +08:00
|
|
|
|
`collect` into many different data structures, and Rust doesn't know which you
|
|
|
|
|
want unless you specify. For the type parameters for the key and value types,
|
|
|
|
|
however, we use underscores and Rust can infer the types that the hash map
|
|
|
|
|
contains based on the types of the data in the vector.
|
|
|
|
|
|
|
|
|
|
### Hashmaps and Ownership
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
For types that implement the `Copy` trait, like `i32`, the values are copied
|
|
|
|
|
into the hash map. For owned values like `String`, the values will be moved and
|
|
|
|
|
the hash map will be the owner of those values:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
|
|
|
|
let field_name = String::from("Favorite color");
|
|
|
|
|
let field_value = String::from("Blue");
|
|
|
|
|
|
|
|
|
|
let mut map = HashMap::new();
|
|
|
|
|
map.insert(field_name, field_value);
|
|
|
|
|
// field_name and field_value are invalid at this point
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
We would not be able to use the bindings `field_name` and `field_value` after
|
|
|
|
|
they have been moved into the hash map with the call to `insert`.
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
If we insert references to values into the hash map, the values themselves will
|
|
|
|
|
not be moved into the hash map. The values that the references point to must be
|
|
|
|
|
valid for at least as long as the hash map is valid, though. We will talk more
|
|
|
|
|
about these issues in the Lifetimes section of Chapter 10.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
### Accessing Values in a Hash Map
|
|
|
|
|
|
|
|
|
|
We can get a value out of the hash map by providing its key to the `get` method:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
let mut scores = HashMap::new();
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
scores.insert(String::from("Blue"), 10);
|
|
|
|
|
scores.insert(String::from("Yellow"), 50);
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
let team_name = String::from("Blue");
|
|
|
|
|
let score = scores.get(&team_name);
|
2016-09-28 02:03:07 +08:00
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
Here, `score` will have the value that's associated with the Blue team, and the
|
|
|
|
|
result will be `Some(10)`. The result is wrapped in `Some` because `get`
|
|
|
|
|
returns an `Option<V>`; if there's no value for that key in the hash map, `get`
|
|
|
|
|
will return `None`. The program will need to handle the `Option` in one of
|
|
|
|
|
the ways that we covered in Chapter 6.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
We can iterate over each key/value pair in a hash map in a similar manner as we
|
|
|
|
|
do with vectors, using a `for` loop:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
let mut scores = HashMap::new();
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
scores.insert(String::from("Blue"), 10);
|
|
|
|
|
scores.insert(String::from("Yellow"), 50);
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
for (key, value) in &scores {
|
2016-09-28 02:03:07 +08:00
|
|
|
|
println!("{}: {}", key, value);
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
This will print each pair, in an arbitrary order:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
```text
|
|
|
|
|
Yellow: 50
|
|
|
|
|
Blue: 10
|
2016-09-28 02:03:07 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Updating a Hash Map
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
<!-- So the quantity of keys must be defined up front, that's not growable?
|
|
|
|
|
That could be worthy saying -->
|
|
|
|
|
<!-- No, the number of keys is growable, it's just that for EACH individual
|
|
|
|
|
key, there can only be one value. I've tried to clarify. /Carol -->
|
|
|
|
|
|
|
|
|
|
While the number of keys and values is growable, each individual key can only
|
|
|
|
|
have one value associated with it at a time. When we want to change the data in
|
|
|
|
|
a hash map, we have to decide how to handle the case when a key already has a
|
|
|
|
|
value assigned. We could choose to replace the old value with the new value,
|
|
|
|
|
completely disregarding the old value. We could choose to keep the old value
|
|
|
|
|
and ignore the new value, and only add the new value if the key *doesn't*
|
|
|
|
|
already have a value. Or we could combine the old value and the new value.
|
|
|
|
|
Let's look at how to do each of these!
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
#### Overwriting a Value
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
If we insert a key and a value into a hashmap, then insert that same key with a
|
|
|
|
|
different value, the value associated with that key will be replaced. Even
|
|
|
|
|
though this following code calls `insert` twice, the hash map will only contain
|
|
|
|
|
one key/value pair because we're inserting the value for the Blue team's key
|
|
|
|
|
both times:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
let mut scores = HashMap::new();
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
scores.insert(String::from("Blue"), 10);
|
|
|
|
|
scores.insert(String::from("Blue"), 25);
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
println!("{:?}", scores);
|
2016-09-28 02:03:07 +08:00
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
This will print `{"Blue": 25}`. The original value of 25 has been overwritten.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
#### Only Insert If the Key Has No Value
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
It's common to want to check if a particular key has a value and, if it does
|
|
|
|
|
not, insert a value for it. Hash maps have a special API for this, called
|
|
|
|
|
`entry`, that takes the key we want to check as an argument. The return value
|
|
|
|
|
of the `entry` function is an enum, `Entry`, that represents a value that might
|
|
|
|
|
or might not exist. Let's say that we want to check if the key for the Yellow
|
|
|
|
|
team has a value associated with it. If it doesn't, we want to insert the value
|
|
|
|
|
50, and the same for the Blue team. With the entry API, the code for this
|
|
|
|
|
looks like:
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
let mut scores = HashMap::new();
|
|
|
|
|
scores.insert(String::from("Blue"), 10);
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
scores.entry(String::from("Yellow")).or_insert(50);
|
|
|
|
|
scores.entry(String::from("Blue")).or_insert(50);
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
println!("{:?}", scores);
|
2016-09-28 02:03:07 +08:00
|
|
|
|
```
|
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
The `or_insert` method on `Entry` returns the value for the `Entry`'s key if it
|
|
|
|
|
exists, and if not, inserts its argument as the new value for the `Entry`'s key
|
|
|
|
|
and returns that. This is much cleaner than writing the logic ourselves, and in
|
|
|
|
|
addition, plays more nicely with the borrow checker.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
2016-11-28 23:32:29 +08:00
|
|
|
|
This code will print `{"Yellow": 50, "Blue": 10}`. The first call to `entry`
|
|
|
|
|
will insert the key for the Yellow team with the value 50, since the Yellow
|
|
|
|
|
team doesn't have a value already. The second call to `entry` will not change
|
|
|
|
|
the hash map since the Blue team already has the value 10.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
#### Update a Value Based on the Old Value
|
|
|
|
|
|
|
|
|
|
Another common use case for hash maps is to look up a key's value then update
|
2016-11-28 23:32:29 +08:00
|
|
|
|
it, based on the old value. For instance, if we wanted to count how many times
|
2016-09-28 02:03:07 +08:00
|
|
|
|
each word appeared in some text, we could use a hash map with the words as keys
|
|
|
|
|
and increment the value to keep track of how many times we've seen that word.
|
|
|
|
|
If this is the first time we've seen a word, we'll first insert the value `0`.
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
use std::collections::HashMap;
|
|
|
|
|
|
|
|
|
|
let text = "hello world wonderful world";
|
|
|
|
|
|
|
|
|
|
let mut map = HashMap::new();
|
|
|
|
|
|
|
|
|
|
for word in text.split_whitespace() {
|
|
|
|
|
let count = map.entry(word).or_insert(0);
|
|
|
|
|
*count += 1;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
println!("{:?}", map);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This will print `{"world": 2, "hello": 1, "wonderful": 1}`. The `or_insert`
|
2016-11-28 23:32:29 +08:00
|
|
|
|
method actually returns a mutable reference (`&mut V`) to the value for this
|
|
|
|
|
key. Here we store that mutable reference in the `count` variable, so in order
|
|
|
|
|
to assign to that value we must first dereference `count` using the asterisk
|
|
|
|
|
(`*`). The mutable reference goes out of scope at the end of the `for` loop, so
|
|
|
|
|
all of these changes are safe and allowed by the borrowing rules.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
### Hashing Function
|
|
|
|
|
|
|
|
|
|
By default, `HashMap` uses a cryptographically secure hashing function that can
|
|
|
|
|
provide resistance to Denial of Service (DoS) attacks. This is not the fastest
|
|
|
|
|
hashing algorithm out there, but the tradeoff for better security that comes
|
2016-11-28 23:32:29 +08:00
|
|
|
|
with the drop in performance is worth it. If you profile your code and find
|
|
|
|
|
that the default hash function is too slow for your purposes, you can switch to
|
|
|
|
|
another function by specifying a different *hasher*. A hasher is a type that
|
|
|
|
|
implements the `BuildHasher` trait. We'll be talking about traits and how to
|
|
|
|
|
implement them in Chapter 10.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
## Summary
|
|
|
|
|
|
|
|
|
|
Vectors, strings, and hash maps will take you far in programs where you need to
|
2016-11-28 23:32:29 +08:00
|
|
|
|
store, access, and modify data. Here are some exercises you should now be
|
|
|
|
|
equipped to solve:
|
|
|
|
|
|
|
|
|
|
1. Given a list of integers, use a vector and return the mean (average), median
|
|
|
|
|
(when sorted, the value in the middle position), and mode (the value that
|
|
|
|
|
occurs most often; a hash map will be helpful here) of the list.
|
|
|
|
|
2. Convert strings to Pig Latin, where the first consonant of each word is
|
|
|
|
|
moved to the end of the word with an added "ay", so "first" becomes
|
|
|
|
|
"irst-fay". Words that start with a vowel get "hay" added to the end instead
|
|
|
|
|
("apple" becomes "apple-hay"). Remember about UTF-8 encoding!
|
|
|
|
|
3. Using a hash map and vectors, create a text interface to allow a user to add
|
|
|
|
|
employee names to a department in the company. For example, "Add Sally to
|
|
|
|
|
Engineering" or "Add Amir to Sales". Then let the user retrieve a list of all
|
|
|
|
|
people in a department or all people in the company by department, sorted
|
|
|
|
|
alphabetically.
|
2016-09-28 02:03:07 +08:00
|
|
|
|
|
|
|
|
|
The standard library API documentation describes methods these types have that
|
|
|
|
|
will be helpful for these exercises!
|
|
|
|
|
|
|
|
|
|
We're getting into more complex programs where operations can fail, which means
|
|
|
|
|
it's a perfect time to go over error handling next!
|