mirror of
https://github.com/rust-lang-cn/book-cn.git
synced 2025-02-03 07:48:41 +08:00
Make edits to ch8 as a result of edits/questions from nostarch
This commit is contained in:
parent
f3475a6652
commit
635f7b5202
@ -1,11 +1,14 @@
|
||||
# Fundamental Collections
|
||||
|
||||
Rust's standard library includes a number of really useful data structures
|
||||
called *collections*. Most other types represent one specific value, but
|
||||
collections can contain multiple values inside of them. Each collection has
|
||||
different capabilities and costs, and choosing an appropriate one for the
|
||||
situation you're in is a skill you'll develop over time. In this chapter, we'll
|
||||
go over three collections which are used very often in Rust programs:
|
||||
called *collections*. Most other data types represent one specific value, but
|
||||
collections can contain multiple values. Unlike the built-in array and tuple
|
||||
types, the data these collections point to is stored on the heap, which means
|
||||
the amount of data does not need to be known at compile time and can grow or
|
||||
shrink as the program runs. Each kind of collection has different capabilities
|
||||
and costs, and choosing an appropriate one for the situation you're in is a
|
||||
skill you'll develop over time. In this chapter, we'll go over three
|
||||
collections which are used very often in Rust programs:
|
||||
|
||||
* A *vector* allows us to store a variable number of values next to each other.
|
||||
* A *string* is a collection of characters. We've seen the `String` type
|
||||
|
@ -2,31 +2,45 @@
|
||||
|
||||
The first type we'll look at is `Vec<T>`, also known as a *vector*. Vectors
|
||||
allow us to store more than one value in a single data structure that puts all
|
||||
the values next to each other in memory.
|
||||
the values next to each other in memory. Vectors can only store values of the
|
||||
same type. They are useful in situations where you have a list of items, such
|
||||
as the lines of text in a file or the prices of items in a shopping cart.
|
||||
|
||||
### Creating a New Vector
|
||||
|
||||
To create a new vector, we can call the `new` function:
|
||||
To create a new, empty vector, we can call the `Vec::new` function:
|
||||
|
||||
```rust
|
||||
let v: Vec<i32> = Vec::new();
|
||||
```
|
||||
|
||||
Note that we added a type annotation here. Since we don't actually do
|
||||
anything with the vector, Rust doesn't know what kind of elements we intend to
|
||||
store. This is an important point. Vectors are homogeneous: they may store many
|
||||
values, but those values must all be the same type. Vectors are generic over
|
||||
the type stored inside them (we'll talk about Generics more thoroughly in
|
||||
Chapter 10), and the angle brackets here tell Rust that this vector will hold
|
||||
Note that we added a type annotation here. Since we aren't inserting any values
|
||||
into this vector, Rust doesn't know what kind of elements we intend to store.
|
||||
This is an important point. Vectors are homogenous: they may store many values,
|
||||
but those values must all be the same type. Vectors are implemented using
|
||||
generics, which Chapter 10 will cover how to use in your own types. For now,
|
||||
all you need to know is that the `Vec` type provided by the standard library
|
||||
can hold any type, and when a specific `Vec` holds a specific type, the type
|
||||
goes within angle brackets. We've told Rust that the `Vec` in `v` will hold
|
||||
elements of the `i32` type.
|
||||
|
||||
That said, in real code, we very rarely need to do this type annotation since
|
||||
Rust can infer the type of value we want to store once we insert values. Let's
|
||||
look at how to modify a vector next.
|
||||
In real code, Rust can infer the type of value we want to store once we insert
|
||||
values, so you rarely need to do this type annotation. It's more common to
|
||||
create a `Vec` that has initial values, and Rust provides the `vec!` macro for
|
||||
convenience. The macro will create a new `Vec` that holds the values we give
|
||||
it. This will create a new `Vec<i32>` that holds the values `1`, `2`, and `3`:
|
||||
|
||||
```rust
|
||||
let v = vec![1, 2, 3];
|
||||
```
|
||||
|
||||
Because we've given initial `i32` values, Rust can infer that the type of `v`
|
||||
is `Vec<i32>`, and the type annotation isn't necessary. Let's look at how to
|
||||
modify a vector next.
|
||||
|
||||
### Updating a Vector
|
||||
|
||||
To put elements in the vector, we can use the `push` method:
|
||||
To create a vector then add elements to it, we can use the `push` method:
|
||||
|
||||
```rust
|
||||
let mut v = Vec::new();
|
||||
@ -37,18 +51,10 @@ v.push(7);
|
||||
v.push(8);
|
||||
```
|
||||
|
||||
Since these numbers are `i32`s, Rust infers the type of data we want to store
|
||||
in the vector, so we don't need the `<i32>` annotation.
|
||||
|
||||
We can improve this code even further. Creating a vector with some initial
|
||||
values like this is very common, so there's a macro to do it for us:
|
||||
|
||||
```rust
|
||||
let v = vec![5, 6, 7, 8];
|
||||
```
|
||||
|
||||
This macro does a similar thing to our previous example, but it's much more
|
||||
convenient.
|
||||
As with any variable as we discussed in Chapter 3, if we want to be able to
|
||||
change its value, we need to make it mutable with the `mut` keyword. The
|
||||
numbers we place inside are all `i32`s, and Rust infers this from the data, so
|
||||
we don't need the `Vec<i32>` annotation.
|
||||
|
||||
### Dropping a Vector Drops its Elements
|
||||
|
||||
@ -63,18 +69,20 @@ Like any other `struct`, a vector will be freed when it goes out of scope:
|
||||
} // <- v goes out of scope and is freed here
|
||||
```
|
||||
|
||||
When the vector gets dropped, it will also drop all of its contents, so those
|
||||
integers are going to be cleaned up as well. This may seem like a
|
||||
When the vector gets dropped, all of its contents will also be dropped, meaning
|
||||
those integers it holds will be cleaned up. This may seem like a
|
||||
straightforward point, but can get a little more complicated once we start to
|
||||
introduce references to the elements of the vector. Let's tackle that next!
|
||||
|
||||
### Reading Elements of Vectors
|
||||
|
||||
Now that we know how creating and destroying vectors works, knowing how to read
|
||||
their contents is a good next step. There are two ways to reference a value
|
||||
stored in a vector. In the following examples of these two ways, we've
|
||||
annotated the types of the values that are returned from these functions for
|
||||
extra clarity:
|
||||
Now that you know how to create, update, and destroy vectors, knowing how to
|
||||
read their contents is a good next step. There are two ways to reference a
|
||||
value stored in a vector. In the examples, we've annotated the types of the
|
||||
values that are returned from these functions for extra clarity.
|
||||
|
||||
This example shows both methods of accessing a value in a vector either with
|
||||
indexing syntax or the `get` method:
|
||||
|
||||
```rust
|
||||
let v = vec![1, 2, 3, 4, 5];
|
||||
@ -83,13 +91,17 @@ let third: &i32 = &v[2];
|
||||
let third: Option<&i32> = v.get(2);
|
||||
```
|
||||
|
||||
First, note that we use the index value of `2` to get the third element:
|
||||
vectors are indexed by number, starting at zero. Secondly, the two different
|
||||
ways to get the third element are using `&` and `[]`s and using the `get`
|
||||
method. The square brackets give us a reference, and `get` gives us an
|
||||
`Option<&T>`. The reason we have two ways to reference an element is so that we
|
||||
can choose the behavior we'd like to have if we try to use an index value that
|
||||
the vector doesn't have an element for:
|
||||
There are a few things to note here. First, that we use the index value of `2`
|
||||
to get the third element: vectors are indexed by number, starting at zero.
|
||||
Second, the two different ways to get the third element are: using `&` and
|
||||
`[]`s, which gives us a reference, or using the `get` method with the index
|
||||
passed as an argument, which gives us an `Option<&T>`.
|
||||
|
||||
The reason Rust has two ways to reference an element is so that you can choose
|
||||
how the program behaves when you try to use an index value that the vector
|
||||
doesn't have an element for. As an example, what should a program do if it has
|
||||
a vector that holds five elements then tries to access an element at index 100
|
||||
like this:
|
||||
|
||||
```rust,should_panic
|
||||
let v = vec![1, 2, 3, 4, 5];
|
||||
@ -98,23 +110,45 @@ let does_not_exist = &v[100];
|
||||
let does_not_exist = v.get(100);
|
||||
```
|
||||
|
||||
With the `[]`s, Rust will cause a `panic!`. With the `get` method, it will
|
||||
instead return `None` without `panic!`ing. Deciding which way to access
|
||||
elements in a vector depends on whether we consider an attempted access past
|
||||
the end of the vector to be an error, in which case we'd want the `panic!`
|
||||
behavior, or whether this will happen occasionally under normal circumstances
|
||||
and our code will have logic to handle getting `Some(&element)` or `None`.
|
||||
When you run this, you will find that with the first `[]` method, Rust will
|
||||
cause a `panic!` when a non-existent element is referenced. This method would
|
||||
be preferable if you want your program to consider an attempt to access an
|
||||
element past the end of the vector to be a fatal error that should crash the
|
||||
program.
|
||||
|
||||
Once we have a valid reference, the borrow checker will enforce the ownership
|
||||
and borrowing rules we covered in Chapter 4 in order to ensure this and other
|
||||
references to the contents of the vector stay valid. This means in a function
|
||||
that owns a `Vec`, we can't return a reference to an element since the `Vec`
|
||||
will be cleaned up at the end of the function:
|
||||
When the `get` method is passed an index that is outside the array, it will
|
||||
return `None` without `panic!`ing. You would use this if accessing an element
|
||||
beyond the range of the vector will happen occasionally under normal
|
||||
circumstances. Your code can then have logic to handle having either
|
||||
`Some(&element)` or `None`, as we discussed in Chapter 6. For example, the
|
||||
index could be coming from a person entering a number. If they accidentally
|
||||
enter a number that's too large and your program gets a `None` value, you could
|
||||
tell the user how many items are in the current `Vec` and give them another
|
||||
chance to enter a valid value. That would be more user-friendly than crashing
|
||||
the program for a typo!
|
||||
|
||||
#### Invalid References
|
||||
|
||||
Once the program has a valid reference, the borrow checker will enforce the
|
||||
ownership and borrowing rules covered in Chapter 4 to ensure this reference and
|
||||
any other references to the contents of the vector stay valid. This means that
|
||||
in a function that owns a `Vec`, we can't return a reference to an element in
|
||||
the `Vec` to be used outside the function since the `Vec` will be cleaned up at
|
||||
the end of the function. Try it out with the following:
|
||||
|
||||
<!-- TODO: fix this code example https://github.com/rust-lang/book/issues/273 -->
|
||||
|
||||
```rust,ignore
|
||||
fn element() -> String {
|
||||
let list = vec![String::from("hi"), String::from("bye")];
|
||||
list[1]
|
||||
} // <-- list goes out of scope here
|
||||
|
||||
fn main() {
|
||||
let e = element();
|
||||
println!("{}", e); // <-- we can't have a reference to an element of
|
||||
// list out here since list was cleaned up at the end
|
||||
// of the element function.
|
||||
}
|
||||
```
|
||||
|
||||
@ -130,8 +164,8 @@ error: cannot move out of indexed content [--explain E0507]
|
||||
Since `list` goes out of scope and gets cleaned up at the end of the function,
|
||||
the reference `list[1]` cannot be returned because it would outlive `list`.
|
||||
|
||||
Here's another example of code that looks like it should be allowed, but it
|
||||
won't compile because the references actually aren't valid anymore:
|
||||
Here's another example of code that looks like it should be allowed, but won't
|
||||
compile because the references aren't valid:
|
||||
|
||||
```rust,ignore
|
||||
let mut v = vec![1, 2, 3, 4, 5];
|
||||
@ -144,43 +178,49 @@ v.push(6);
|
||||
Compiling this will give us this error:
|
||||
|
||||
```text
|
||||
error: cannot borrow `v` as mutable because it is also borrowed as immutable
|
||||
[--explain E0502]
|
||||
|>
|
||||
5 |> let first = &v[0];
|
||||
|> - immutable borrow occurs here
|
||||
7 |> v.push(6);
|
||||
|> ^ mutable borrow occurs here
|
||||
9 |> }
|
||||
|> - immutable borrow ends here
|
||||
error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable
|
||||
|
|
||||
4 | let first = &v[0];
|
||||
| - immutable borrow occurs here
|
||||
5 |
|
||||
6 | v.push(6);
|
||||
| ^ mutable borrow occurs here
|
||||
7 | }
|
||||
| - immutable borrow ends here
|
||||
```
|
||||
|
||||
This violates one of the ownership rules we covered in Chapter 4: the `push`
|
||||
method needs to have a mutable borrow to the `Vec`, and we aren't allowed to
|
||||
have any immutable borrows while we have a mutable borrow.
|
||||
method needs to have a mutable borrow to the `Vec`, and Rust doesn't allow any
|
||||
immutable borrows in the same scope as a mutable borrow.
|
||||
|
||||
Why is it an error to have a reference to the first element in a vector while
|
||||
we try to add a new item to the end, though? Due to the way vectors work,
|
||||
adding a new element onto the end might require allocating new memory and
|
||||
copying the old elements over to the new space if there wasn't enough room to
|
||||
put all the elements next to each other where the vector was. If this happened,
|
||||
our reference would be pointing to deallocated memory. For more on this, see
|
||||
[The Nomicon](https://doc.rust-lang.org/stable/nomicon/vec.html).
|
||||
The reason behind disallowing references to the first element in a vector while
|
||||
trying to add a new item to the end is due to the way vectors work. Adding a
|
||||
new element onto the end of the vector might require allocating new memory and
|
||||
copying the old elements over to the new space, in the circumstance that there
|
||||
isn't enough room to put all the elements next to each other where the vector
|
||||
was. In that case, the reference to the first element would be pointing to
|
||||
deallocated memory. The borrowing rules prevent programs from ending up in that
|
||||
situation.
|
||||
|
||||
> Note: For more on this, see [The Nomicon][nomicon].
|
||||
|
||||
[nomicon]: https://doc.rust-lang.org/stable/nomicon/vec.html
|
||||
|
||||
### Using an Enum to Store Multiple Types
|
||||
|
||||
Let's put vectors together with what we learned about enums in Chapter 6. At
|
||||
the beginning of this section, we said that vectors will only store values that
|
||||
are all the same type. This can be inconvenient; there are definitely use cases
|
||||
for needing to store a list of things that might be different types. Luckily,
|
||||
the variants of an enum are all the same type as each other, so when we're in
|
||||
this scenario, we can define and use an enum!
|
||||
At the beginning of this chapter, we said that vectors can only store values
|
||||
that are all the same type. This can be inconvenient; there are definitely use
|
||||
cases for needing to store a list of things of different types. Luckily, the
|
||||
variants of an enum are all defined under the same enum type. When we need to
|
||||
store elements of a different type in a vector this scenario, we can define and
|
||||
use an enum!
|
||||
|
||||
For example, let's say we're going to be getting values for a row in a
|
||||
spreadsheet. Some of the columns contain integers, some floating point numbers,
|
||||
For example, let's say we want to get values from a row in a spreadsheet, where
|
||||
some of the columns in the row contain integers, some floating point numbers,
|
||||
and some strings. We can define an enum whose variants will hold the different
|
||||
value types. All of the enum variants will then be the same type, that of the
|
||||
enum. Then we can create a vector that, ultimately, holds different types:
|
||||
value types, and then all of the enum variants will be considered the same
|
||||
type, that of the enum. Then we can create a vector that holds that enum and
|
||||
so, ultimately, holds different types:
|
||||
|
||||
```rust
|
||||
enum SpreadsheetCell {
|
||||
@ -196,20 +236,41 @@ let row = vec![
|
||||
];
|
||||
```
|
||||
|
||||
This has the advantage of being explicit about what types are allowed in this
|
||||
vector. If we allowed any type to be in a vector, there would be a chance that
|
||||
the vector would hold a type that would cause errors with the operations we
|
||||
performed on the vector. Using an enum plus a `match` where we access elements
|
||||
in a vector like this means that Rust will ensure at compile time that we
|
||||
always handle every possible case.
|
||||
The reason Rust needs to know exactly what types will be in the vector at
|
||||
compile time is so that it knows exactly how much memory on the heap will be
|
||||
needed to store each element. A secondary advantage to this is that we can be
|
||||
explicit about what types are allowed in this vector. If Rust allowed a vector
|
||||
to hold any type, there would be a chance that one or more of the types would
|
||||
cause errors with the operations performed on the elements of the vector. Using
|
||||
an enum plus a `match` means that Rust will ensure at compile time that we
|
||||
always handle every possible case, as we discussed in Chapter 6.
|
||||
|
||||
Using an enum for storing different types in a vector does imply that we need
|
||||
to know the set of types we'll want to store at compile time. If that's not the
|
||||
case, instead of an enum, we can use a trait object. We'll learn about those in
|
||||
Chapter 23.
|
||||
<!-- Can you briefly explain what the match is doing here, as a recap? How does
|
||||
it mean we always handle every possible case? I'm not sure it's totally clear.
|
||||
-->
|
||||
<!-- Because this is a focus of chapter 6 rather than this chapter's focus, we
|
||||
don't think we should repeat it here as well, but we added a reference. /Carol
|
||||
-->
|
||||
|
||||
If you don't know at the time that you're writing a program the exhaustive set
|
||||
of types the program will get at runtime to store in a vector, the enum
|
||||
technique won't work. Insetad, you can use a trait object, which we'll cover in
|
||||
Chapter 13.
|
||||
|
||||
Now that we've gone over some of the most common ways to use vectors, be sure
|
||||
to take a look at the API documentation for other useful methods defined on
|
||||
`Vec` by the standard library. For example, in addition to `push` there's a
|
||||
`pop` method that will remove and return the last element. Let's move on to the
|
||||
next collection type: `String`!
|
||||
to take a look at the API documentation for all of the many useful methods
|
||||
defined on `Vec` by the standard library. For example, in addition to `push`
|
||||
there's a `pop` method that will remove and return the last element. Let's move
|
||||
on to the next collection type: `String`!
|
||||
|
||||
<!-- Do you mean the Rust online documentation here? Are you not including it
|
||||
in the book for space reasons? We might want to justify sending them out of the
|
||||
book if we don't want to cover it here -->
|
||||
|
||||
<!-- Yes, there are many, many methods on Vec: https://doc.rust-lang.org/stable/std/vec/struct.Vec.html
|
||||
Also there are occcasionally new methods available with new versions of the
|
||||
language, so there's no way we can be comprehensive here. We want the reader to
|
||||
use the API documentation in these situations since the purpose of the online
|
||||
docs is to be comprehensive and up to date. I personally wouldn't expect a book
|
||||
like this to duplicate the info that's in the API docs, so I don't think a
|
||||
justification is necessary here. /Carol -->
|
||||
|
@ -1,32 +1,41 @@
|
||||
## Strings
|
||||
|
||||
We've already talked about strings a bunch in Chapter 4, but let's take a more
|
||||
in-depth look at them now.
|
||||
in-depth look at them now. Strings are an area that new Rustaceans commonly get
|
||||
stuck on. This is due to a combination of three things: Rust's propensity for
|
||||
making sure to expose possible errors, strings being a more complicated data
|
||||
structure than many programmers give them credit for, and UTF-8. These things
|
||||
combine in a way that can seem difficult when coming from other languages.
|
||||
|
||||
### Many Kinds of Strings
|
||||
The reason Strings are in the collections chapter is that strings are
|
||||
implemented as a collection of bytes plus some methods to provide useful
|
||||
functionality when those bytes are interpreted as text. In this section, we'll
|
||||
talk about the operations on `String` that every collection type has, like
|
||||
creating, updating, and reading. We'll also discuss the ways in which `String`
|
||||
is different than the other collections, namely how indexing into a `String` is
|
||||
complicated by the differences in which people and computers interpret `String`
|
||||
data.
|
||||
|
||||
Strings are a common place for new Rustaceans to get stuck. This is due to a
|
||||
combination of three things: Rust's propensity for making sure to expose
|
||||
possible errors, strings being a more complicated data structure than many
|
||||
programmers give them credit for, and UTF-8. These things combine in a way that
|
||||
can seem difficult coming from other languages.
|
||||
### What is a String?
|
||||
|
||||
Before we can dig into those aspects, we need to talk about what exactly we
|
||||
even mean by the word 'string'. Rust actually only has one string type in the
|
||||
core language itself: `&str`. We talked about *string slices* in Chapter 4:
|
||||
they're a reference to some UTF-8 encoded string data stored somewhere else.
|
||||
String literals, for example, are stored in the binary output of the program,
|
||||
and are therefore string slices.
|
||||
mean by the term 'string'. Rust actually only has one string type in the core
|
||||
language itself: `str`, the string slice, which is usually seen in its borrowed
|
||||
form, `&str`. We talked about *string slices* in Chapter 4: these are a
|
||||
reference to some UTF-8 encoded string data stored elsewhere. String literals,
|
||||
for example, are stored in the binary output of the program, and are therefore
|
||||
string slices.
|
||||
|
||||
Rust's standard library is what provides the type called `String`. This is a
|
||||
growable, mutable, owned, UTF-8 encoded string type. When Rustaceans talk about
|
||||
'strings' in Rust, they usually mean "`String` and `&str`". This chapter is
|
||||
largely about `String`, and these two types are used heavily in Rust's standard
|
||||
library. Both `String` and string slices are UTF-8 encoded.
|
||||
The type called `String` is provided in Rust's standard library rather than
|
||||
coded into the core language, and is a growable, mutable, owned, UTF-8 encoded
|
||||
string type. When Rustaceans talk about 'strings' in Rust, they usually mean
|
||||
both the `String` and the string slice `&str` types, not just one of those.
|
||||
This section is largely about `String`, but both these types are used heavily
|
||||
in Rust's standard library. Both `String` and string slices are UTF-8 encoded.
|
||||
|
||||
Rust's standard library also includes a number of other string types, such as
|
||||
`OsString`, `OsStr`, `CString`, and `CStr`. Library crates may provide even
|
||||
more options for storing string data. Similarly to the `*String`/`*Str` naming,
|
||||
more options for storing string data. Similar to the `*String`/`*Str` naming,
|
||||
they often provide an owned and borrowed variant, just like `String`/`&str`.
|
||||
These string types may store different encodings or be represented in memory in
|
||||
a different way, for example. We won't be talking about these other string
|
||||
@ -35,15 +44,18 @@ them and when each is appropriate.
|
||||
|
||||
### Creating a New String
|
||||
|
||||
Let's look at how to do the same operations on `String` as we did with `Vec`,
|
||||
starting with creating one. Similarly, `String` has `new`:
|
||||
Many of the same operations available with `Vec` are available with `String` as
|
||||
well, starting with the `new` function to create a string, like so:
|
||||
|
||||
```rust
|
||||
let s = String::new();
|
||||
```
|
||||
|
||||
Often, we'll have some initial data that we'd like to start the string off with.
|
||||
For that, there's the `to_string` method:
|
||||
This creates a new empty string called `s` that we can then load data into.
|
||||
|
||||
Often, we'll have some initial data that we'd like to start the string off
|
||||
with. For that, we use the `to_string` method, which is available on any type
|
||||
that implements the `Display` trait, which string literals do:
|
||||
|
||||
```rust
|
||||
let data = "initial contents";
|
||||
@ -54,19 +66,20 @@ let s = data.to_string();
|
||||
let s = "initial contents".to_string();
|
||||
```
|
||||
|
||||
This form is equivalent to using `to_string`:
|
||||
This creates a string containing `initial contents`.
|
||||
|
||||
We can also use the function `String::from` to create a `String` from a string
|
||||
literal. This is equivalent to using `to_string`:
|
||||
|
||||
```rust
|
||||
let s = String::from("Initial contents");
|
||||
let s = String::from("initial contents");
|
||||
```
|
||||
|
||||
Since strings are used for so many things, there are many different generic
|
||||
APIs that make sense for strings. There are a lot of options, and some of them
|
||||
can feel redundant because of this, but they all have their place! In this
|
||||
case, `String::from` and `.to_string` end up doing the exact same thing, so
|
||||
which you choose is a matter of style. Some people use `String::from` for
|
||||
literals, and `.to_string` for variables. Most Rust style is pretty
|
||||
uniform, but this specific question is one of the most debated.
|
||||
Because strings are used for so many things, there are many different generic
|
||||
APIs that can be used for strings, so there are a lot of options. Some of them
|
||||
can feel redundant, but they all have their place! In this case, `String::from`
|
||||
and `.to_string` end up doing the exact same thing, so which you choose is a
|
||||
matter of style.
|
||||
|
||||
Remember that strings are UTF-8 encoded, so we can include any properly encoded
|
||||
data in them:
|
||||
@ -87,80 +100,85 @@ let hello = "Hola";
|
||||
|
||||
### Updating a String
|
||||
|
||||
A `String` can be changed and can grow in size, just like a `Vec` can.
|
||||
A `String` can can grow in size and its contents can change just like the
|
||||
contents of a `Vec`, by pushing more data into it. In addition, `String` has
|
||||
concatenation operations implemented with the `+` operator for convenience.
|
||||
|
||||
#### Push
|
||||
#### Appending to a String with Push
|
||||
|
||||
We can grow a `String` by using the `push_str` method to append another
|
||||
string:
|
||||
We can grow a `String` by using the `push_str` method to append a string slice:
|
||||
|
||||
```rust
|
||||
let mut s = String::from("foo");
|
||||
s.push_str("bar");
|
||||
```
|
||||
|
||||
`s` will contain "foobar" after these two lines.
|
||||
`s` will contain "foobar" after these two lines. The `push_str` method takes a
|
||||
string slice because we don't necessarily want to take ownership of the
|
||||
argument. For example, it would be unfortunate if we weren't able to use `s2`
|
||||
after appending its contents to `s1`:
|
||||
|
||||
The `push` method will add a `char`:
|
||||
```rust
|
||||
let mut s1 = String::from("foo");
|
||||
let s2 = String::from("bar");
|
||||
s1.push_str(&s2);
|
||||
```
|
||||
|
||||
The `push` method is defined to take a single character as an argument and add
|
||||
it to the `String`:
|
||||
|
||||
```rust
|
||||
let mut s = String::from("lo");
|
||||
s.push('l');
|
||||
```
|
||||
|
||||
`s` will contain "lol" after this point.
|
||||
After this, `s` will contain "lol".
|
||||
|
||||
We can make any `String` contain the empty string with the `clear` method:
|
||||
#### Concatenation with the + Operator or the `format!` Macro
|
||||
|
||||
```rust
|
||||
let mut s = String::from("Noooooooooooooooooooooo!");
|
||||
s.clear();
|
||||
```
|
||||
|
||||
Now `s` will be the empty string, "".
|
||||
|
||||
#### Concatenation
|
||||
|
||||
Often, we'll want to combine two strings together. One way is to use the `+`
|
||||
operator:
|
||||
Often, we'll want to combine two existing strings together. One way is to use
|
||||
the `+` operator like this:
|
||||
|
||||
```rust
|
||||
let s1 = String::from("Hello, ");
|
||||
let s2 = String::from("world!");
|
||||
let s3 = s1 + &s2;
|
||||
let s3 = s1 + &s2; // Note that s1 has been moved here and can no longer be used
|
||||
```
|
||||
|
||||
This code will make `s3` contain "Hello, world!" There's some tricky bits here,
|
||||
though, that come from the type signature of `+` for `String`. The signature
|
||||
for the `add` method that the `+` operator uses looks something like this:
|
||||
After this code the String `s3` will contain `Hello, world!`. The reason that
|
||||
`s1` is no longer valid after the addition and the reason that we used a
|
||||
reference to `s2` has to do with the signature of the method that gets called
|
||||
when we use the `+` operator. The `+` operator uses the `add` method, whose
|
||||
signature looks something like this:
|
||||
|
||||
```rust,ignore
|
||||
fn add(self, s: &str) -> String {
|
||||
```
|
||||
|
||||
This isn't exactly what the actual signature is in the standard library because
|
||||
`add` is defined using generics there. Here, we're just looking at what the
|
||||
signature of the method would be if `add` was defined specifically for
|
||||
`String`. This signature gives us the clues we need in order to understand the
|
||||
tricky bits of `+`.
|
||||
This isn't the exact signature that's in the standard library; there `add` is
|
||||
defined using generics. Here, we're looking at the signature of `add` with
|
||||
concrete types substituted for the generic ones, which is what happens when we
|
||||
call this method with `String` values. This signature gives us the clues we
|
||||
need to understand the tricky bits of the `+` operator.
|
||||
|
||||
First of all, `s2` has an `&`. This is because of the `s` argument in the `add`
|
||||
function: we can only add a `&str` to a `String`, we can't add two `String`s
|
||||
together. Remember back in Chapter 4 when we talked about how `&String` will
|
||||
coerce to `&str`: we write `&s2` so that the `String` will coerce to the proper
|
||||
type, `&str`.
|
||||
First of all, `s2` has an `&`, meaning that we are adding a *reference* of the
|
||||
second string to the first string. This is because of the `s` argument in the
|
||||
`add` function: we can only add a `&str` to a `String`, we can't add two
|
||||
`String`s together. Remember back in Chapter 4 when we talked about how
|
||||
`&String` will coerce to `&str`: we write `&s2` so that the `String` will
|
||||
coerce to the proper type, `&str`. Because this method does not take ownership
|
||||
of the argument, `s2` will still be valid after this operation.
|
||||
|
||||
Secondly, `add` takes ownership of `self`, which we can tell because `self`
|
||||
does *not* have an `&` in the signature. This means `s1` in the above example
|
||||
will be moved into the `add` call and no longer be a valid variable after that.
|
||||
So while `let s3 = s1 + &s2;` looks like it will copy both strings and create a
|
||||
new one, this statement actually takes ownership of `s1`, appends a copy of
|
||||
`s2`'s contents, then returns ownership of the result. In other words, it looks
|
||||
like it's making a lot of copies, but isn't: the implementation is more
|
||||
efficient than copying.
|
||||
Second, we can see in the signature that `add` takes ownership of `self`,
|
||||
because `self` does *not* have an `&`. This means `s1` in the above example
|
||||
will be moved into the `add` call and no longer be valid after that. So while
|
||||
`let s3 = s1 + &s2;` looks like it will copy both strings and create a new one,
|
||||
this statement actually takes ownership of `s1`, appends a copy of `s2`'s
|
||||
contents, then returns ownership of the result. In other words, it looks like
|
||||
it's making a lot of copies, but isn't: the implementation is more efficient
|
||||
than copying.
|
||||
|
||||
If we need to concatenate multiple strings, this behavior of `+` gets
|
||||
unwieldy:
|
||||
If we need to concatenate multiple strings, the behavior of `+` gets unwieldy:
|
||||
|
||||
```rust
|
||||
let s1 = String::from("tic");
|
||||
@ -182,17 +200,32 @@ let s3 = String::from("toe");
|
||||
let s = format!("{}-{}-{}", s1, s2, s3);
|
||||
```
|
||||
|
||||
<!-- Are we going to discuss the format macro elsewhere at all? If not, some
|
||||
more info here might be good, this seems like a really useful tool. Is it only
|
||||
used on strings? -->
|
||||
|
||||
<!-- No, we weren't planning on it. We thought it would be sufficient to
|
||||
mention that it works the same way as `println!` since we've covered how
|
||||
`println!` works in Ch 2, "Printing Values with `println!` Placeholders" and Ch
|
||||
5, Ch 5, "Adding Useful Functionality with Derived Traits". `format!` can be
|
||||
used on anything that `println!` can; using `{}` in the format string works
|
||||
with anything that implements the `Display` trait and `{:?}` works with
|
||||
anything that implements the `Debug` trait. Do you have any thoughts on how we
|
||||
could make the similarities with `format!` and `println!` clearer than what we
|
||||
have in the next paragraph without repeating the `println!` content too much?
|
||||
/Carol -->
|
||||
|
||||
This code will also set `s` to "tic-tac-toe". The `format!` macro works in the
|
||||
same way as `println!`, but instead of printing the output to the screen, it
|
||||
returns a `String` with the contents. This version is much easier to read than
|
||||
all of the `+`s.
|
||||
returns a `String` with the contents. This version is much easier to read, and
|
||||
also does not take ownership of any of its arguments.
|
||||
|
||||
### Indexing into Strings
|
||||
|
||||
In many other languages, accessing individual characters in a string by
|
||||
referencing the characters by index is a valid and common operation. In Rust,
|
||||
however, if we try to access parts of a `String` using indexing syntax, we'll
|
||||
get an error. That is, this code:
|
||||
referencing them by index is a valid and common operation. In Rust, however, if
|
||||
we try to access parts of a `String` using indexing syntax, we'll get an error.
|
||||
That is, this code:
|
||||
|
||||
```rust,ignore
|
||||
let s1 = String::from("hello");
|
||||
@ -231,69 +264,77 @@ UTF-8. What about this example, though?
|
||||
let len = "Здравствуйте".len();
|
||||
```
|
||||
|
||||
There are two answers that potentially make sense here: the first is 12, which
|
||||
is the number of letters that a person would count if we asked someone how long
|
||||
this string was. The second, though, is what Rust's answer is: 24. This is the
|
||||
number of bytes that it takes to encode "Здравствуйте" in UTF-8, because each
|
||||
character takes two bytes of storage.
|
||||
A person asked how long the string is might say 12. However, Rust's answer
|
||||
is 24. This is the number of bytes that it takes to encode "Здравствуйте" in
|
||||
UTF-8, since each character takes two bytes of storage. Therefore, an index
|
||||
into the string's bytes will not always correlate to a valid character.
|
||||
|
||||
By the same token, imagine this invalid Rust code:
|
||||
To demonstrate, consider this invalid Rust code:
|
||||
|
||||
```rust,ignore
|
||||
let hello = "Здравствуйте";
|
||||
let answer = &h[0];
|
||||
let answer = &hello[0];
|
||||
```
|
||||
|
||||
What should the value of `answer` be? Should it be `З`, the first letter? When
|
||||
encoded in UTF-8, the first byte of `З` is `208`, and the second is `151`. So
|
||||
should `answer` be `208`? `208` is not a valid character on its own, though.
|
||||
Plus, for Latin letters, this would not return the answer most people would
|
||||
expect: `&"hello"[0]` would then return `104`, not `h`.
|
||||
encoded in UTF-8, the first byte of `З` is `208`, and the second is `151`, so
|
||||
`answer` should in fact be `208`, but `208` is not a valid character on its
|
||||
own. Returning `208` is likely not what a person would want if they asked for
|
||||
the first letter of this string, but that's the only data that Rust has at byte
|
||||
index 0. Returning the byte value is probably not what people want, even with
|
||||
only latin letters: `&"hello"[0]` would return `104`, not `h`. To avoid
|
||||
returning an unexpected value and causing bugs that might not be discovered
|
||||
immediately, Rust chooses to not compile this code at all and prevent
|
||||
misunderstandings earlier.
|
||||
|
||||
#### Bytes and Scalar Values and Grapheme Clusters! Oh my!
|
||||
|
||||
This leads to another point about UTF-8: there are really three relevant ways
|
||||
to look at strings, from Rust's perspective: bytes, scalar values, and grapheme
|
||||
clusters. If we look at the string "नमस्ते", it is ultimately stored as a `Vec`
|
||||
of `u8` values that looks like this:
|
||||
to look at strings, from Rust's perspective: as bytes, scalar values, and
|
||||
grapheme clusters (the closest thing to what people would call 'letters').
|
||||
|
||||
If we look at the Hindi word "नमस्ते" written in the Devanagari script, it is
|
||||
ultimately stored as a `Vec` of `u8` values that looks like this:
|
||||
|
||||
```text
|
||||
[224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164, 224, 165, 135]
|
||||
```
|
||||
|
||||
That's 18 bytes. But if we look at them as Unicode scalar values, which are
|
||||
what Rust's `char` type is, those bytes look like this:
|
||||
That's 18 bytes, and is how computers ultimately store this data. If we look at
|
||||
them as Unicode scalar values, which are what Rust's `char` type is, those
|
||||
bytes look like this:
|
||||
|
||||
```text
|
||||
['न', 'म', 'स', '्', 'त', 'े']
|
||||
```
|
||||
|
||||
There are six `char` values here. Finally, if we look at them as grapheme
|
||||
clusters, which is the closest thing to what humans would call 'letters', we'd
|
||||
get this:
|
||||
There are six `char` values here, but the fourth and sixth are not letters,
|
||||
they're diacritics that don't make sense on their own. Finally, if we look at
|
||||
them as grapheme clusters, we'd get what a person would call the four letters
|
||||
that make up this word:
|
||||
|
||||
```text
|
||||
["न", "म", "स्", "ते"]
|
||||
```
|
||||
|
||||
Four elements! It turns out that even within 'grapheme cluster', there are
|
||||
multiple ways of grouping things. Convinced that strings are actually really
|
||||
complicated yet?
|
||||
Rust provides different ways of interpreting the raw string data that computers
|
||||
store so that each program can choose the interpretation it needs, no matter
|
||||
what human language the data is in.
|
||||
|
||||
Another reason that indexing into a `String` to get a character is not available
|
||||
is that indexing operations are expected to always be fast. This isn't possible
|
||||
with a `String`, since Rust would have to walk through the contents from the
|
||||
beginning to the index to determine how many valid characters there were, no
|
||||
matter how we define "character".
|
||||
A final reason Rust does not allow you to index into a `String` to get a
|
||||
character is that indexing operations are expected to always take constant time
|
||||
(O(1)). It isn't possible to guarantee that performance with a `String`,
|
||||
though, since Rust would have to walk through the contents from the beginning
|
||||
to the index to determine how many valid characters there were.
|
||||
|
||||
All of these problems mean that Rust does not implement `[]` for `String`, so
|
||||
we cannot directly do this.
|
||||
|
||||
### Slicing Strings
|
||||
|
||||
However, indexing the bytes of a string is very useful, and is not expected to
|
||||
be fast. While we can't use `[]` with a single number, we *can* use `[]` with
|
||||
a range to create a string slice from particular bytes:
|
||||
However, indexing the *bytes* of a string is very useful, and is not expected
|
||||
to be fast. While we can't use `[]` with a single number, we _can_ use `[]`
|
||||
with a range to create a string slice containing particular bytes:
|
||||
|
||||
```rust
|
||||
let hello = "Здравствуйте";
|
||||
@ -302,8 +343,8 @@ let s = &hello[0..4];
|
||||
```
|
||||
|
||||
Here, `s` will be a `&str` that contains the first four bytes of the string.
|
||||
Earlier, we mentioned that each of these characters was two bytes, so that means
|
||||
that `s` will be "Зд".
|
||||
Earlier, we mentioned that each of these characters was two bytes, so that
|
||||
means that `s` will be "Зд".
|
||||
|
||||
What would happen if we did `&hello[0..1]`? The answer: it will panic at
|
||||
runtime, in the same way that accessing an invalid index in a vector does:
|
||||
@ -313,11 +354,16 @@ thread 'main' panicked at 'index 0 and/or 1 in `Здравствуйте` do not
|
||||
character boundary', ../src/libcore/str/mod.rs:1694
|
||||
```
|
||||
|
||||
You should use this with caution, since it can cause your program to crash.
|
||||
|
||||
### Methods for Iterating Over Strings
|
||||
|
||||
If we do need to perform operations on individual characters, the best way to
|
||||
do that is using the `chars` method. Calling `chars` on "नमस्ते" gives us the six
|
||||
Rust `char` values:
|
||||
Luckily, there are other ways we can access elements in a String.
|
||||
|
||||
If we need to perform operations on individual characters, the best way to do
|
||||
so is to use the `chars` method. Calling `chars` on "नमस्ते" separates out and
|
||||
returns six values of type `char`, and you can iterate over the result in order
|
||||
to access each element:
|
||||
|
||||
```rust
|
||||
for c in "नमस्ते".chars() {
|
||||
@ -337,8 +383,7 @@ This code will print:
|
||||
```
|
||||
|
||||
The `bytes` method returns each raw byte, which might be appropriate for your
|
||||
domain, but remember that valid UTF-8 characters may be made up of more than
|
||||
one byte:
|
||||
domain:
|
||||
|
||||
```rust
|
||||
for b in "नमस्ते".bytes() {
|
||||
@ -356,15 +401,30 @@ This code will print the 18 bytes that make up this `String`, starting with:
|
||||
// ... etc
|
||||
```
|
||||
|
||||
There are crates available on crates.io to get grapheme clusters from `String`s.
|
||||
But make sure to remember that valid UTF-8 characters may be made up of more
|
||||
than one byte.
|
||||
|
||||
Getting grapheme clusters from `String`s is complex, so this functionality is
|
||||
not provided by the standard library. There are crates available on crates.io
|
||||
if this is the functionality you need.
|
||||
|
||||
<!-- Can you recommend some, or maybe just say why we aren't outlining the
|
||||
method here, ie it's complicated and therefore best to use a crate? -->
|
||||
|
||||
<!-- We're trying not to mention too many crates in the book. Most crates are
|
||||
provided by the community, so we don't want to mention some and not others and
|
||||
seem biased towards certain crates, plus crates can change more quickly (and
|
||||
new crates can be created) than the language and this book will. /Carol -->
|
||||
|
||||
### Strings are Not so Simple
|
||||
|
||||
To summarize, strings are complicated. Different programming languages make
|
||||
different choices about how to present this complexity to the programmer. Rust
|
||||
has chosen to attempt to make correct handling of `String` data be the default
|
||||
has chosen to make the correct handling of `String` data the default behavior
|
||||
for all Rust programs, which does mean programmers have to put more thought
|
||||
into handling UTF-8 data upfront. This tradeoff exposes us to more of the
|
||||
complexity of strings than we have to handle in other languages, but will
|
||||
prevent us from having to handle errors involving non-ASCII characters later in
|
||||
our development lifecycle.
|
||||
into handling UTF-8 data upfront. This tradeoff exposes more of the complexity
|
||||
of strings than other programming languages do, but this will prevent you from
|
||||
having to handle errors involving non-ASCII characters later in your
|
||||
development lifecycle.
|
||||
|
||||
Let's switch to something a bit less complex: Hash Map!
|
||||
|
@ -7,55 +7,72 @@ into memory. Many different programming languages support this kind of data
|
||||
structure, but often with a different name: hash, map, object, hash table, or
|
||||
associative array, just to name a few.
|
||||
|
||||
We'll go over the basic API in this chapter, but there are many more goodies
|
||||
hiding in the functions defined on `HashMap` by the standard library. As always,
|
||||
check the standard library documentation for more information.
|
||||
Hash maps are useful for when you want to be able to look up data not by an
|
||||
index, as you can with vectors, but by using a key that can be of any type. For
|
||||
example, in a game, you could keep track of each team's score in a hash map
|
||||
where each key is a team's name and the values are each team's score. Given a
|
||||
team name, you can retrieve their score.
|
||||
|
||||
We'll go over the basic API of hash maps in this chapter, but there are many
|
||||
more goodies hiding in the functions defined on `HashMap` by the standard
|
||||
library. As always, check the standard library documentation for more
|
||||
information.
|
||||
|
||||
### Creating a New Hash Map
|
||||
|
||||
We can create an empty `HashMap` with `new`, and add elements with `insert`:
|
||||
We can create an empty `HashMap` with `new`, and add elements with `insert`.
|
||||
Here we're keeping track of the scores of two teams whose names are Blue and
|
||||
Yellow. The Blue team will start with 10 points and the Yellow team starts with
|
||||
50:
|
||||
|
||||
```rust
|
||||
use std::collections::HashMap;
|
||||
|
||||
let mut map = HashMap::new();
|
||||
let mut scores = HashMap::new();
|
||||
|
||||
map.insert(1, "hello");
|
||||
map.insert(2, "world");
|
||||
scores.insert(String::from("Blue"), 10);
|
||||
scores.insert(String::from("Yellow"), 50);
|
||||
```
|
||||
|
||||
Note that we need to `use` the `HashMap` from the collections portion of the
|
||||
standard library. Of our three fundamental collections, this one is the least
|
||||
often used, so it has a bit less support from the language. There's no built-in
|
||||
macro to construct them, for example, and they're not in the prelude, so we
|
||||
need to add a `use` statement for them.
|
||||
Note that we need to first `use` the `HashMap` from the collections portion of
|
||||
the standard library. Of our three fundamental collections, this one is the
|
||||
least often used, so it's not included in the features imported automatically
|
||||
in the prelude. Hash maps also have less support from the standard library;
|
||||
there's no built-in macro to construct them, for example.
|
||||
|
||||
Just like vectors, hash maps store their data on the heap. This `HashMap` has
|
||||
keys of type `i32` and values of type `&str`. Like vectors, hash maps are
|
||||
homogeneous: all of the keys must have the same type, and all of the values must
|
||||
homogenous: all of the keys must have the same type, and all of the values must
|
||||
have the same type.
|
||||
|
||||
If we have a vector of tuples, we can convert it into a hash map with the
|
||||
`collect` method. The first element in each tuple will be the key, and the
|
||||
second element will be the value:
|
||||
Another way of constructing a hash map is by using the `collect` method on a
|
||||
vector of tuples, where each tuple consists of a key and its value. The
|
||||
`collect` method gathers up data into a number of collection types, including
|
||||
`HashMap`. For example, if we had the team names and initial scores in two
|
||||
separate vectors, we can use the `zip` method to create a vector of tuples
|
||||
where "Blue" is paired with 10, and so forth. Then we can use the `collect`
|
||||
method to turn that vector of tuples into a `HashMap`:
|
||||
|
||||
```rust
|
||||
use std::collections::HashMap;
|
||||
|
||||
let data = vec![(1, "hello"), (2, "world")];
|
||||
let teams = vec![String::from("Blue"), String::from("Yellow")];
|
||||
let initial_scores = vec![10, 50];
|
||||
|
||||
let map: HashMap<_, _> = data.into_iter().collect();
|
||||
let scores: HashMap<_, _> = teams.iter().zip(initial_scores.iter()).collect();
|
||||
```
|
||||
|
||||
The type annotation `HashMap<_, _>` is needed here because it's possible to
|
||||
`collect` into many different data structures, so Rust doesn't know which we
|
||||
want. For the type parameters for the key and value types, however, we can use
|
||||
underscores and Rust can infer the types that the hash map contains based on the
|
||||
types of the data in our vector.
|
||||
`collect` into many different data structures, and Rust doesn't know which you
|
||||
want unless you specify. For the type parameters for the key and value types,
|
||||
however, we use underscores and Rust can infer the types that the hash map
|
||||
contains based on the types of the data in the vector.
|
||||
|
||||
For types that implement the `Copy` trait like `i32` does, the values are
|
||||
copied into the hash map. If we insert owned values like `String`, the values
|
||||
will be moved and the hash map will be the owner of those values:
|
||||
### Hashmaps and Ownership
|
||||
|
||||
For types that implement the `Copy` trait, like `i32`, the values are copied
|
||||
into the hash map. For owned values like `String`, the values will be moved and
|
||||
the hash map will be the owner of those values:
|
||||
|
||||
```rust
|
||||
use std::collections::HashMap;
|
||||
@ -68,13 +85,13 @@ map.insert(field_name, field_value);
|
||||
// field_name and field_value are invalid at this point
|
||||
```
|
||||
|
||||
We would not be able to use the variables `field_name` and `field_value` after
|
||||
We would not be able to use the bindings `field_name` and `field_value` after
|
||||
they have been moved into the hash map with the call to `insert`.
|
||||
|
||||
If we insert references to values, the values themselves will not be moved into
|
||||
the hash map. The values that the references point to must be valid for at least
|
||||
as long as the hash map is valid, though. We will talk more about these issues
|
||||
in the Lifetimes section of Chapter 10.
|
||||
If we insert references to values into the hash map, the values themselves will
|
||||
not be moved into the hash map. The values that the references point to must be
|
||||
valid for at least as long as the hash map is valid, though. We will talk more
|
||||
about these issues in the Lifetimes section of Chapter 10.
|
||||
|
||||
### Accessing Values in a Hash Map
|
||||
|
||||
@ -83,18 +100,20 @@ We can get a value out of the hash map by providing its key to the `get` method:
|
||||
```rust
|
||||
use std::collections::HashMap;
|
||||
|
||||
let mut map = HashMap::new();
|
||||
let mut scores = HashMap::new();
|
||||
|
||||
map.insert(1, "hello");
|
||||
map.insert(2, "world");
|
||||
scores.insert(String::from("Blue"), 10);
|
||||
scores.insert(String::from("Yellow"), 50);
|
||||
|
||||
let value = map.get(&2);
|
||||
let team_name = String::from("Blue");
|
||||
let score = scores.get(&team_name);
|
||||
```
|
||||
|
||||
Here, `value` will have the value `Some("world")`, since that's the value
|
||||
associated with the `2` key. "world" is wrapped in `Some` because `get` returns
|
||||
an `Option<V>`. If there's no value for that key in the hash map, `get` will
|
||||
return `None`.
|
||||
Here, `score` will have the value that's associated with the Blue team, and the
|
||||
result will be `Some(10)`. The result is wrapped in `Some` because `get`
|
||||
returns an `Option<V>`; if there's no value for that key in the hash map, `get`
|
||||
will return `None`. The program will need to handle the `Option` in one of
|
||||
the ways that we covered in Chapter 6.
|
||||
|
||||
We can iterate over each key/value pair in a hash map in a similar manner as we
|
||||
do with vectors, using a `for` loop:
|
||||
@ -102,101 +121,98 @@ do with vectors, using a `for` loop:
|
||||
```rust
|
||||
use std::collections::HashMap;
|
||||
|
||||
let mut map = HashMap::new();
|
||||
let mut scores = HashMap::new();
|
||||
|
||||
map.insert(1, "hello");
|
||||
map.insert(2, "world");
|
||||
scores.insert(String::from("Blue"), 10);
|
||||
scores.insert(String::from("Yellow"), 50);
|
||||
|
||||
for (key, value) in &map {
|
||||
for (key, value) in &scores {
|
||||
println!("{}: {}", key, value);
|
||||
}
|
||||
```
|
||||
|
||||
This will print:
|
||||
This will print each pair, in an arbitrary order:
|
||||
|
||||
```text
|
||||
1: hello
|
||||
2: world
|
||||
Yellow: 50
|
||||
Blue: 10
|
||||
```
|
||||
|
||||
### Updating a Hash Map
|
||||
|
||||
Since each key can only have one value, when we want to change the data in a
|
||||
hash map, we have to decide how to handle the case when a key already has a
|
||||
value assigned. We could choose to replace the old value with the new value. We
|
||||
could choose to keep the old value and ignore the new value, and only add the
|
||||
new value if the key *doesn't* already have a value. Or we could change the
|
||||
existing value. Let's look at how to do each of these!
|
||||
<!-- So the quantity of keys must be defined up front, that's not growable?
|
||||
That could be worthy saying -->
|
||||
<!-- No, the number of keys is growable, it's just that for EACH individual
|
||||
key, there can only be one value. I've tried to clarify. /Carol -->
|
||||
|
||||
While the number of keys and values is growable, each individual key can only
|
||||
have one value associated with it at a time. When we want to change the data in
|
||||
a hash map, we have to decide how to handle the case when a key already has a
|
||||
value assigned. We could choose to replace the old value with the new value,
|
||||
completely disregarding the old value. We could choose to keep the old value
|
||||
and ignore the new value, and only add the new value if the key *doesn't*
|
||||
already have a value. Or we could combine the old value and the new value.
|
||||
Let's look at how to do each of these!
|
||||
|
||||
#### Overwriting a Value
|
||||
|
||||
If we insert a key and a value, then insert that key with a different value,
|
||||
the value associated with that key will be replaced. Even though this code
|
||||
calls `insert` twice, the hash map will only contain one key/value pair, since
|
||||
we're inserting with the key `1` both times:
|
||||
If we insert a key and a value into a hashmap, then insert that same key with a
|
||||
different value, the value associated with that key will be replaced. Even
|
||||
though this following code calls `insert` twice, the hash map will only contain
|
||||
one key/value pair because we're inserting the value for the Blue team's key
|
||||
both times:
|
||||
|
||||
```rust
|
||||
use std::collections::HashMap;
|
||||
|
||||
let mut map = HashMap::new();
|
||||
let mut scores = HashMap::new();
|
||||
|
||||
map.insert(1, "hello");
|
||||
map.insert(1, "Hi There");
|
||||
scores.insert(String::from("Blue"), 10);
|
||||
scores.insert(String::from("Blue"), 25);
|
||||
|
||||
println!("{:?}", map);
|
||||
println!("{:?}", scores);
|
||||
```
|
||||
|
||||
This will print `{1: "Hi There"}`.
|
||||
This will print `{"Blue": 25}`. The original value of 25 has been overwritten.
|
||||
|
||||
|
||||
#### Only Insert If the Key Has No Value
|
||||
|
||||
It's common to want to see if there's some sort of value already stored in the
|
||||
hash map for a particular key, and if not, insert a value. hash maps have a
|
||||
special API for this, called `entry`, that takes the key we want to check as an
|
||||
argument:
|
||||
It's common to want to check if a particular key has a value and, if it does
|
||||
not, insert a value for it. Hash maps have a special API for this, called
|
||||
`entry`, that takes the key we want to check as an argument. The return value
|
||||
of the `entry` function is an enum, `Entry`, that represents a value that might
|
||||
or might not exist. Let's say that we want to check if the key for the Yellow
|
||||
team has a value associated with it. If it doesn't, we want to insert the value
|
||||
50, and the same for the Blue team. With the entry API, the code for this
|
||||
looks like:
|
||||
|
||||
```rust
|
||||
use std::collections::HashMap;
|
||||
|
||||
let mut map = HashMap::new();
|
||||
map.insert(1, "hello");
|
||||
let mut scores = HashMap::new();
|
||||
scores.insert(String::from("Blue"), 10);
|
||||
|
||||
let e = map.entry(2);
|
||||
scores.entry(String::from("Yellow")).or_insert(50);
|
||||
scores.entry(String::from("Blue")).or_insert(50);
|
||||
|
||||
println!("{:?}", scores);
|
||||
```
|
||||
|
||||
Here, the value bound to `e` is a special enum, `Entry`. An `Entry` represents a
|
||||
value that might or might not exist. Let's say that we want to see if the key
|
||||
`2` has a value associated with it. If it doesn't, we want to insert the value
|
||||
"world". In both cases, we want to return the resulting value that now goes
|
||||
with `2`. With the entry API, it looks like this:
|
||||
The `or_insert` method on `Entry` returns the value for the `Entry`'s key if it
|
||||
exists, and if not, inserts its argument as the new value for the `Entry`'s key
|
||||
and returns that. This is much cleaner than writing the logic ourselves, and in
|
||||
addition, plays more nicely with the borrow checker.
|
||||
|
||||
```rust
|
||||
use std::collections::HashMap;
|
||||
|
||||
let mut map = HashMap::new();
|
||||
|
||||
map.insert(1, "hello");
|
||||
|
||||
map.entry(2).or_insert("world");
|
||||
map.entry(1).or_insert("Hi There");
|
||||
|
||||
println!("{:?}", map);
|
||||
```
|
||||
|
||||
The `or_insert` method on `Entry` does exactly this: returns the value for the
|
||||
`Entry`'s key if it exists, and if not, inserts its argument as the new value
|
||||
for the `Entry`'s key and returns that. This is much cleaner than writing the
|
||||
logic ourselves, and in addition, plays more nicely with the borrow checker.
|
||||
|
||||
This code will print `{1: "hello", 2: "world"}`. The first call to `entry` will
|
||||
insert the key `2` with the value "world", since `2` doesn't have a value
|
||||
already. The second call to `entry` will not change the hash map since `1`
|
||||
already has the value "hello".
|
||||
This code will print `{"Yellow": 50, "Blue": 10}`. The first call to `entry`
|
||||
will insert the key for the Yellow team with the value 50, since the Yellow
|
||||
team doesn't have a value already. The second call to `entry` will not change
|
||||
the hash map since the Blue team already has the value 10.
|
||||
|
||||
#### Update a Value Based on the Old Value
|
||||
|
||||
Another common use case for hash maps is to look up a key's value and then update
|
||||
it, using the old value. For instance, if we wanted to count how many times
|
||||
Another common use case for hash maps is to look up a key's value then update
|
||||
it, based on the old value. For instance, if we wanted to count how many times
|
||||
each word appeared in some text, we could use a hash map with the words as keys
|
||||
and increment the value to keep track of how many times we've seen that word.
|
||||
If this is the first time we've seen a word, we'll first insert the value `0`.
|
||||
@ -217,42 +233,41 @@ println!("{:?}", map);
|
||||
```
|
||||
|
||||
This will print `{"world": 2, "hello": 1, "wonderful": 1}`. The `or_insert`
|
||||
method actually returns a mutable reference (`&mut V`) to the value in the
|
||||
hash map for this key. Here we store that mutable reference in the `count`
|
||||
variable, so in order to assign to that value we must first dereference
|
||||
`count` using the asterisk (`*`). The mutable reference goes out of scope at
|
||||
the end of the `for` loop, so all of these changes are safe and allowed by the
|
||||
borrowing rules.
|
||||
method actually returns a mutable reference (`&mut V`) to the value for this
|
||||
key. Here we store that mutable reference in the `count` variable, so in order
|
||||
to assign to that value we must first dereference `count` using the asterisk
|
||||
(`*`). The mutable reference goes out of scope at the end of the `for` loop, so
|
||||
all of these changes are safe and allowed by the borrowing rules.
|
||||
|
||||
### Hashing Function
|
||||
|
||||
By default, `HashMap` uses a cryptographically secure hashing function that can
|
||||
provide resistance to Denial of Service (DoS) attacks. This is not the fastest
|
||||
hashing algorithm out there, but the tradeoff for better security that comes
|
||||
with the drop in performance is a good default tradeoff to make. If you profile
|
||||
your code and find that the default hash function is too slow for your
|
||||
purposes, you can switch to another function by specifying a different
|
||||
*hasher*. A hasher is an object that implements the `BuildHasher` trait. We'll
|
||||
be talking about traits and how to implement them in Chapter 10.
|
||||
with the drop in performance is worth it. If you profile your code and find
|
||||
that the default hash function is too slow for your purposes, you can switch to
|
||||
another function by specifying a different *hasher*. A hasher is a type that
|
||||
implements the `BuildHasher` trait. We'll be talking about traits and how to
|
||||
implement them in Chapter 10.
|
||||
|
||||
## Summary
|
||||
|
||||
Vectors, strings, and hash maps will take you far in programs where you need to
|
||||
store, access, and modify data. Some programs you are now equipped to write and
|
||||
might want to try include:
|
||||
store, access, and modify data. Here are some exercises you should now be
|
||||
equipped to solve:
|
||||
|
||||
* Given a list of integers, use a vector and return their mean (average),
|
||||
median (when sorted, the value in the middle position), and mode (the value
|
||||
that occurs most often; a hash map will be helpful here).
|
||||
* Convert strings to Pig Latin, where the first consonant of each word gets
|
||||
moved to the end with an added "ay", so "first" becomes "irst-fay". Words that
|
||||
start with a vowel get an h instead ("apple" becomes "apple-hay"). Remember
|
||||
about UTF-8 encoding!
|
||||
* Using a hash map and vectors, create a text interface to allow a user to add
|
||||
employee names to a department in the company. For example, "Add Sally to
|
||||
Engineering" or "Add Ron to Sales". Then let the user retrieve a list of all
|
||||
people in a department or all people in the company by department, sorted
|
||||
alphabetically.
|
||||
1. Given a list of integers, use a vector and return the mean (average), median
|
||||
(when sorted, the value in the middle position), and mode (the value that
|
||||
occurs most often; a hash map will be helpful here) of the list.
|
||||
2. Convert strings to Pig Latin, where the first consonant of each word is
|
||||
moved to the end of the word with an added "ay", so "first" becomes
|
||||
"irst-fay". Words that start with a vowel get "hay" added to the end instead
|
||||
("apple" becomes "apple-hay"). Remember about UTF-8 encoding!
|
||||
3. Using a hash map and vectors, create a text interface to allow a user to add
|
||||
employee names to a department in the company. For example, "Add Sally to
|
||||
Engineering" or "Add Amir to Sales". Then let the user retrieve a list of all
|
||||
people in a department or all people in the company by department, sorted
|
||||
alphabetically.
|
||||
|
||||
The standard library API documentation describes methods these types have that
|
||||
will be helpful for these exercises!
|
||||
|
Loading…
Reference in New Issue
Block a user