rust-book-cn/nostarch/chapter06.md

602 lines
19 KiB
Markdown
Raw Normal View History

2016-08-03 10:07:25 +08:00
# Enums
Next, lets look at *enumerations*, which allow you to define a type by
enumerating its possible values. Commonly called "enums", these unlock a lot of
power in Rust when combined with pattern matching. Enums are a feature that are
in many languages, but what they can do is different per-language. Rusts enums
are most similar to "algebraic data types" in functional languages like F#,
OCaml, or Haskell.
Heres an example of an enum definition:
```rust
enum IpAddrKind {
V4,
V6,
}
```
This enum represents the kind of an IP address. There are two major standards
used for IP addresses: version four and version six. Any IP address can be
either a version four address or a version six address, but it cannot be both
kinds at the same time. This is where enums get their name: they allow us to
enumerate all of the possible kinds that our value can have.
We can create values of `IpAddrKind` like this:
```rust
# enum IpAddrKind {
# V4,
# V6,
# }
#
let four = IpAddrKind::V4;
let six = IpAddrKind::V6;
```
Note that the variants of the enum are namespaced under its name, and we use
the double colon to separate the two.
Enums have more tricks up their sleeves, however. Thinking more about our IP
address type, we dont have a way to store the actual data of the IP address;
we only know what kind it is. Given that you just learned about structs, you
might tackle this problem like this:
```rust
enum IpAddrKind {
V4,
V6,
}
struct IpAddr {
kind: IpAddrKind,
address: String,
}
let home = IpAddr {
kind: IpAddrKind::V4,
address: String::from("127.0.0.1"),
};
let loopback = IpAddr {
kind: IpAddrKind::V6,
address: String::from("::1"),
};
```
Weve used a struct to bundle the two values together: now we keep the kind
with the value itself. We can represent the same thing in a different way with
just an enum:
```rust
enum IpAddr {
V4(String),
V6(String),
}
let home = IpAddr::V4(String::from("127.0.0.1"));
let loopback = IpAddr::V6(String::from("::1"));
```
We can attach data to each variant of the enum directly. No need for an extra
struct. But beyond that, this approach is better than using a struct alongside
our enum because we can attach different kinds of data to each variant.
Imagine that instead of a `String`, we would prefer to store a `V4` as its four
individual components while leaving the `V6` variant as a `String`. With our
struct, wed be stuck. But enums deal with this case with ease:
```rust
enum IpAddr {
V4(u8, u8, u8, u8),
V6(String),
}
let home = IpAddr::V4(127, 0, 0, 1);
let loopback = IpAddr::V6(String::from("::1"));
```
You can put any kind of data inside of an enum variant, including another enum!
The `IpAddr` enum is [in the standard library][IpAddr], but it embeds two
different structs inside of its variants:
```rust
struct Ipv4Addr {
// details elided
}
struct Ipv6Addr {
// details elided
}
enum IpAddr {
V4(Ipv4Addr),
V6(Ipv6Addr),
}
```
[IpAddr]: http://doc.rust-lang.org/std/net/enum.IpAddr.html
Heres an enum with a variety of types embedded in its variants:
```rust
enum Message {
Quit,
Move { x: i32, y: i32 },
Write(String),
ChangeColor(i32, i32, i32),
}
```
* `Quit` has no data associated with it at all.
* `Move` includes an anonymous struct inside of it.
* `Write` includes a single `String`.
* `ChangeColor` includes three `i32`s.
This might seem overwhelming, but another way to look at the different enum
possibilities is that they are just like different kinds of struct definitions
that you already know, except without the `struct` keyword and they are grouped
together under the `Message` type. These structs could hold the same data that
these enum variants hold:
```
struct QuitMessage; // unit struct
struct MoveMessage {
x: i32,
y: i32,
}
struct WriteMessage(String); // tuple struct
struct ChangeColorMessage(i32, i32, i32); // tuple struct
```
Let's look at another enum in the standard library that is very common and
useful: `Option`.
## Option
Now that we have had an introduction to enums, let's combine them with a
feature that we talked a little bit about in the previous chapter: generics.
Programming language design is often thought of as which features you include,
but it's also about which features you leave out. Rust does not have a feature
that is in many other languages: 'null'. In languages with this feature,
variables can have two states: null or not-null.
The inventor of this concept has this to say:
> I call it my billion-dollar mistake. At that time, I was designing the first
> comprehensive type system for references in an object-oriented language. My
> goal was to ensure that all use of references should be absolutely safe, with
> checking performed automatically by the compiler. But I couldn't resist the
> temptation to put in a null reference, simply because it was so easy to
> implement. This has led to innumerable errors, vulnerabilities, and system
> crashes, which have probably caused a billion dollars of pain and damage in
> the last forty years.
>
> - Tony Hoare "Null References: The Billion Dollar Mistake"
The problem with null values is twofold: first, a value can be null or not, at
any time. The second is that if you try to use a value that's null, you'll get
an error of some kind, depending on the language. Because this property is
pervasive, it's extremely easy to make this kind of error.
Even with these problems, the concept that null is trying to express is still a
useful one: this is a value which is currently invalid or not present for some
reason. The problem isn't with the concept itself, but with the particular
implementation. As such, Rust does not have the concept of null, but we do have
an enum which can encode the concept of a value being present or not present. We
call this enum `Option<T>`, and it looks like this:
```rust
enum Option<T> {
Some(T),
None,
}
```
This enum is [provided by the standard library][option], and is so useful that
it's even in the prelude; you don't need to import it explicitly. Furthermore,
so are its variants: you can say `Some` and `None` directly, without prefixing
them with `Option::`.
[option]: ../std/option/enum.Option.html
Here's an example of using `Option<T>`:
```rust
let some_number = Some(5);
let some_string = Some("a string");
// If we only say None, we need to tell Rust what type of Option<T> we have.
let absent_number: Option<i32> = None;
```
Let's dig in. First, you'll notice that we used the `<T>` syntax when defining
`Option<T>`: it's a generic enum. `Option<T>` has two variants: `Some`, which
contains a `T`, and `None`, which has no data associated with it. In some
sense, `None` means 'null', and `Some` means 'not null'. So why is this any
better than null?
In short, because `Option<T>` and `T` are different types. That's a bit too
short though. Here's an example:
```rust,ignore
let x: i8 = 5;
let y: Option<i8> = Some(5);
let sum = x + y;
```
This will not compile. We get an error message like this:
```text
error: the trait bound `i8: std::ops::Add<std::option::Option<i8>>` is not
satisfied [E0277]
let sum = x + y;
^~~~~
```
Intense! What this error message is trying to say is that Rust does not
understand how to add an `Option<T>` and a `T`. They're different types! This
shows one of the big advantages of an `Option<T>`: if you have a value that
may or may not exist, you have to deal with that fact before you can assume it
exists. In other words, you have to convert an `Option<T>` to a `T` before you
can do `T` stuff with it. This helps catch one of the most common issues with
null, generally: assuming that something isn't null when it actually is.
This is pretty powerful: in order to have a value that can possibly be null,
you have to explicitly opt in by making the type of that value an `Option<T>`.
Then, when you use that value, you are required to explicitly handle the case
when the value is null. Everywhere that a value has a type that isn't an
`Option<T>`, you *can* safely assume that the value isn't null. This was a
deliberate design decision for Rust to limit null's pervasiveness and increase
the safety of Rust code.
So, how _do_ you get a `T` from an `Option<T>`? The `Option<T>` enum has a
large number of methods that you can check out in [its documentation], and
becoming familiar with them will be extremely useful in your journey with Rust.
[its documentation]: ../std/option/enum.Option.html
But we want a deeper understanding than that. If we didn't have those methods
defined for us already, what would we do? And more generally, how do we get
the inner values out of any enum variant? We need a new feature: `match`.
## Match
Rust has an extremely powerful control-flow operator: `match`. It allows us to
compare a value against a series of patterns and then execute code based on
how they compare.
Think of a `match` expression kind of like a coin sorting machine. Coins slide
down a track that has variously sized holes along it, and each coin falls
through the first hole it encounters that it fits into. In the same way, values
go through each pattern in a `match`, and for the first pattern that the value
"fits", the value will fall into the associated code block to be used during
execution.
Since we're already talking about coins, let's use them for an example using
`match`! We can write a function that can take an unknown American coin and, in
a similar way as the coin counting machine, determine which coin it is and
return its value in cents:
```rust
enum Coin {
Penny,
Nickel,
Dime,
Quarter,
}
fn value_in_cents(coin: Coin) -> i32 {
match coin {
Coin::Penny => 1,
Coin::Nickel => 5,
Coin::Dime => 10,
Coin::Quarter => 25,
}
}
```
Let's break down the `match`! At a high-level, using `match` looks like this:
```text
match expression {
pattern => code,
}
```
First, we have the `match` keyword. Next, we have an expression. This feels
very similar to an expression used with `if`, but there's a big difference:
with `if`, the condition needs to return a boolean value. Here, it can be any
type.
Next, we have a "match arm". That's the part that looks like `pattern =>
code,`. We can have as many arms as we need to: our `match` above has four
arms. An arm has two parts: a pattern and some code. When the `match`
expression executes, it compares the resulting value against the pattern of
each arm, in order. If a pattern matches the value, the code associated
with that pattern is executed. If that pattern doesn't match the value,
execution continues to the next arm, much like a coin sorting machine.
The code associated with each arm is an expression, and the resulting value of
the code with the matching arm that gets executed is the value that gets
returned for the entire `match` expression.
Curly braces typically aren't used if the match arm code is short, as it is in
the above example where each arm just returns a value. If we wanted to run
multiple lines of code in a match arm, we can use curly braces. This code would
print out "Lucky penny!" every time the method was called with a `Coin::Penny`,
but would still return the last value of the block, `1`:
```rust
# enum Coin {
# Penny,
# Nickel,
# Dime,
# Quarter,
# }
#
fn value_in_cents(coin: Coin) -> i32 {
match coin {
Coin::Penny => {
println!("Lucky penny!");
1
},
Coin::Nickel => 5,
Coin::Dime => 10,
Coin::Quarter => 25,
}
}
```
Another useful feature of match arms is that they can create bindings to parts
of the values that match the pattern. From 1999 through 2008, the U.S. printed
quarters with different designs for each of the 50 states on one side. The other
coins did not get state designs, so only quarters have this extra attribute. We
can add this information to our `enum` by changing the `Quarter` variant to have
a `State` value:
```rust
enum UsState {
Alabama,
Alaska,
// ... etc
}
enum Coin {
Penny,
Nickel,
Dime,
Quarter(UsState),
}
```
Let's imagine that a friend of ours is trying to collect all 50 state quarters.
While we sort our loose change by coin type in order to count it, we're going
to call out the name of the state so that if it's one our friend doesn't have
yet, they can add it to their collection.
In the match statement to do this, the quarter case now has a binding, `state`,
that contains the value of the state of that quarter. The binding will only get
created if the coin matches the `Quarter` pattern. Then we can use the binding
in the code for that arm:
```rust
# #[derive(Debug)]
# enum UsState {
# Alabama,
# Alaska,
# }
#
# enum Coin {
# Penny,
# Nickel,
# Dime,
# Quarter(UsState),
# }
#
fn value_in_cents(coin: Coin) -> i32 {
match coin {
Coin::Penny => 1,
Coin::Nickel => 5,
Coin::Dime => 10,
Coin::Quarter(state) => {
println!("State quarter from {:?}!", state);
25
},
}
}
```
If we were to call `value_in_cents(Coin::Quarter(UsState::Alaska))`, `coin` will
be `Coin::Quarter(UsState::Alaska)`. When we compare that value with each of the
match arms, none of them match until we reach `Coin::Quarter(state)`. At that
point, the binding for `state` will be the value `UsState::Alaska`. We can then
use that binding in the `println!`, thus getting the inner state value out of
the `Coin` enum variant for `Quarter`.
Remember the `Option<T>` type from the previous section, and that we wanted to
be able to get the inner `T` value out of the `Some` case? This will be very
similar! Instead of coins, we will be comparing to other patterns, but the way
that the `match` expression works remains the same as a coin sorting machine in
the way that we look for the first pattern that fits the value.
Let's say that we want to write a function that takes an `Option<i32>`, and
if there's a value inside, add one to it. If there isn't a value inside, we
want to return the `None` value and not attempt to add.
This function is very easy to write, thanks to `match`. It looks like this:
```rust
fn plus_one(x: Option<i32>) -> Option<i32> {
match x {
None => None,
Some(i) => Some(i + 1),
}
}
let five = Some(5);
let six = plus_one(five);
let none = plus_one(None);
```
Let's examine the first execution of `plus_one()` in more detail. In the above
example, `x` will be `Some(5)`. Let's compare that against each arm:
```text
None => None,
```
Does `Some(5)` match `None`? No, it's the wrong variant. So let's continue.
```text
Some(i) => Some(i + 1),
```
Does `Some(5)` match `Some(i)`? Why yes it does! We have the same variant. The
`i` binds to the value inside of the `Some`, so `i` has the value `5`. Then we
execute the code in that match arm: take `i`, which is `5`, add one to it, and
create a new `Some` value with our total inside.
Now let's consider the second call of `plus_one()`. In this case, `x` is
`None`. We enter the `match`, and compare to the first arm:
```text
None => None,
```
Does `None` match `None`? Yup! There's no value to add to. So we stop and
return the `None` value that is on the right side of the `=>`. We don't
check any other arms since we found one that matched.
Combining `match` and enums together is extremely powerful. You'll see this
pattern a lot in Rust code: `match` against an enum, bind to the data
inside, and then execute code based on it. It's a bit tricky at first, but
once you get used to it, you'll wish you had it in languages that don't support
it. It's consistently a user favorite.
### Matches are exhaustive
There's one other aspect of `match` we didn't talk about. Consider this version
of `plus_one()`:
```rust,ignore
fn plus_one(x: Option<i32>) -> Option<i32> {
match x {
Some(i) => Some(i + 1),
}
}
```
A bug! We didn't handle the `None` case. Luckily, it's a bug Rust knows how to
catch. If we try to compile this code, we'll get an error:
```text
error: non-exhaustive patterns: `None` not covered [E0004]
match x {
Some(i) => Some(i + 1),
}
```
Rust knows that we did not cover every possible option, and even knows which
pattern we forgot! This is referred to as being "exhaustive": we must exhaust
every last option possible in order to be valid. Especially in the case of
`Option<T>`, when Rust prevents us from forgetting to explicitly handle the
`None` case, it protects us from assuming that we have a value when we might
have null and thus making the billion-dollar mistake we discussed in the
previous section.
### The _ placeholder
What if we don't care about all of the possible values, though? Especially when
there are a lot of possible values for a type: a `u8` can have valid values of
zero through 255-- if we only care about 1, 3, 5, and 7, does this mean we must
list out 0, 2, 4, 6, 8, 9, all the way up through 255? Thankfully, no! We can
use a special pattern, `_`:
```rust
let some_u8_value = 0u8;
match some_u8_value {
1 => println!("one"),
3 => println!("three"),
5 => println!("five"),
7 => println!("seven"),
_ => (),
}
```
The `_` pattern will match all the other cases, and `()` will do nothing, it's
the unit value. This way, we don't have to list individual match arms for all
the other possible values in order to say that we want to do nothing for all of
those-- the `_` is a placeholder for any value.
## if let
There's one more advanced control flow structure we haven't discussed: `if
let`. Imagine we're in a situation like this:
```rust
# let some_option = Some(5);
match some_option {
Some(x) => {
// do something with x
},
None => {},
}
```
We care about the `Some` case, but don't want to do anything with the `None`
case. With an `Option`, this isn't _too_ bad, but with a more complex enum,
adding `_ => {}` after processing just one variant doesn't feel great. We have
this boilerplate arm and an extra level of indentation (the code that
does something with `x` is indented twice, rather than just once). We really want
a construct that says "Do something with this one case; I don't care about the
others."
Enter `if let`:
```rust
# let some_option = Some(5);
if let Some(x) = some_option {
// do something with x
}
```
`if let` takes a pattern and an expression, separated by an `=`. It works
exactly like a `match`, where the expression is given to the `match` and the
pattern is its first arm. In other words, you can think of `if let` as syntax
sugar:
```rust,ignore
if let pattern = expression {
body
}
match expression {
pattern => body,
_ => {}
}
```
And in fact, we can include an `else` and it becomes the body of the `_`
case:
```rust,ignore
if let pattern = expression {
body
} else {
else_body
}
match expression {
pattern => body,
_ => else_body,
}
```
In other words, it's the high-level construct we were originally looking for:
do something special with only one pattern.