rust-book-cn/nostarch/chapter06.md
2017-01-27 17:28:53 -05:00

28 KiB
Raw Blame History

[TOC]

Enums and Pattern Matching

In this chapter well look at enumerations, also referred to as enums. Enums allow you to define a type by enumerating its possible values. First, well define and use an enum to show how an enum can encode meaning along with data. Next, well explore a particularly useful enum, called Option, which expresses that a value can be either something or nothing. Then well look at how pattern matching in the match expression makes it easy to run different code for different values of an enum. Finally, well cover how the if let construct is another convenient and concise idiom available to you to handle enums in your code.

Enums are a feature in many languages, but their capabilities differ in each language. Rusts enums are most similar to algebraic data types in functional languages like F#, OCaml, and Haskell.

Defining an Enum

Lets look at a situation we might want to express in code and see why enums are useful and more appropriate than structs in this case. Say we need to work with IP addresses. Currently, two major standards are used for IP addresses: version four and version six. These are the only possibilities for an IP address that our program will come across: we can enumerate all possible values, which is where enumeration gets its name.

Any IP address can be either a version four or a version six address but not both at the same time. That property of IP addresses makes the enum data structure appropriate for this case, because enum values can only be one of the variants. Both version four and version six addresses are still fundamentally IP addresses, so they should be treated as the same type when the code is handling situations that apply to any kind of IP address.

We can express this concept in code by defining an IpAddrKind enumeration and listing the possible kinds an IP address can be, V4 and V6. These are known as the variants of the enum:

enum IpAddrKind {
    V4,
    V6,
}

IpAddrKind is now a custom data type that we can use elsewhere in our code.

Enum Values

We can create instances of each of the two variants of IpAddrKind like this:

let four = IpAddrKind::V4;
let six = IpAddrKind::V6;

Note that the variants of the enum are namespaced under its identifier, and we use a double colon to separate the two. The reason this is useful is that now both values IpAddrKind::V4 and IpAddrKind::V6 are of the same type: IpAddrKind. We can then, for instance, define a function that takes any IpAddrKind:

fn route(ip_type: IpAddrKind) { }

And we can call this function with either variant:

route(IpAddrKind::V4);
route(IpAddrKind::V6);

Using enums has even more advantages. Thinking more about our IP address type, at the moment we dont have a way to store the actual IP address data; we only know what kind it is. Given that you just learned about structs in Chapter 5, you might tackle this problem as shown in Listing 6-1:

enum IpAddrKind {
    V4,
    V6,
}

struct IpAddr {
    kind: IpAddrKind,
    address: String,
}

let home = IpAddr {
    kind: IpAddrKind::V4,
    address: String::from("127.0.0.1"),
};

let loopback = IpAddr {
    kind: IpAddrKind::V6,
    address: String::from("::1"),
};
Listing 6-1: Storing the data and `IpAddrKind` variant of an IP address using a `struct`

Here, weve defined a struct IpAddr that has two fields: a kind field that is of type IpAddrKind (the enum we defined previously) and an address field of type String. We have two instances of this struct. The first, home, has the value IpAddrKind::V4 as its kind with associated address data of 127.0.0.1. The second instance, loopback, has the other variant of IpAddrKind as its kind value, V6, and has address ::1 associated with it. Weve used a struct to bundle the kind and address values together, so now the variant is associated with the value.

We can represent the same concept in a more concise way using just an enum rather than an enum as part of a struct by putting data directly into each enum variant. This new definition of the IpAddr enum says that both V4 and V6 variants will have associated String values:

enum IpAddr {
    V4(String),
    V6(String),
}

let home = IpAddr::V4(String::from("127.0.0.1"));

let loopback = IpAddr::V6(String::from("::1"));

We attach data to each variant of the enum directly, so there is no need for an extra struct.

Theres another advantage to using an enum rather than a struct: each variant can have different types and amounts of associated data. Version four type IP addresses will always have four numeric components that will have values between 0 and 255. If we wanted to store V4 addresses as four u8 values but still express V6 addresses as one String value, we wouldnt be able to with a struct. Enums handle this case with ease:

enum IpAddr {
    V4(u8, u8, u8, u8),
    V6(String),
}

let home = IpAddr::V4(127, 0, 0, 1);

let loopback = IpAddr::V6(String::from("::1"));

Weve shown several different possibilities that we could define in our code for storing IP addresses of the two different varieties using an enum. However, as it turns out, wanting to store IP addresses and encode which kind they are is so common that the standard library has a definition we can use! Lets look at how the standard library defines IpAddr: it has the exact enum and variants that weve defined and used, but it embeds the address data inside the variants in the form of two different structs, which are defined differently for each variant:

struct Ipv4Addr {
    // details elided
}

struct Ipv6Addr {
    // details elided
}

enum IpAddr {
    V4(Ipv4Addr),
    V6(Ipv6Addr),
}

This code illustrates that you can put any kind of data inside an enum variant: strings, numeric types, or structs, for example. You can even include another enum! Also, standard library types are often not much more complicated than what you might come up with.

Note that even though the standard library contains a definition for IpAddr, we can still create and use our own definition without conflict because we havent brought the standard librarys definition into our scope. Well talk more about importing types in Chapter 7.

Lets look at another example of an enum in Listing 6-2: this one has a wide variety of types embedded in its variants:

enum Message {
    Quit,
    Move { x: i32, y: i32 },
    Write(String),
    ChangeColor(i32, i32, i32),
}
Listing 6-2: A `Message` enum whose variants each store different amounts and types of values

This enum has four variants with different types:

  • Quit has no data associated with it at all.
  • Move includes an anonymous struct inside it.
  • Write includes a single String.
  • ChangeColor includes three i32s.

Defining an enum with variants like the ones in Listing 6-2 is similar to defining different kinds of struct definitions except the enum doesnt use the struct keyword and all the variants are grouped together under the Message type. The following structs could hold the same data that the preceding enum variants hold:

struct QuitMessage; // unit struct
struct MoveMessage {
    x: i32,
    y: i32,
}
struct WriteMessage(String); // tuple struct
struct ChangeColorMessage(i32, i32, i32); // tuple struct

But if we used the different structs, which each have their own type, we wouldnt be able to as easily define a function that could take any of these kinds of messages as we could with the Message enum defined in Listing 6-2, which is a single type.

There is one more similarity between enums and structs: just as were able to define methods on structs using impl, were also able to define methods on enums. Heres a method named call that we could define on our Message enum:

impl Message {
    fn call(&self) {
        // method body would be defined here
    }
}

let m = Message::Write(String::from("hello"));
m.call();

The body of the method would use self to get the value that we called the method on. In this example, weve created a variable m that has the value Message::Write("hello"), and that is what self will be in the body of the call method when m.call() runs.

Lets look at another enum in the standard library that is very common and useful: Option.

The Option Enum and Its Advantages Over Null Values

In the previous section, we looked at how the IpAddr enum let us use Rusts type system to encode more information than just the data into our program. This section explores a case study of Option, which is another enum defined by the standard library. The Option type is used in many places because it encodes the very common scenario in which a value could be something or it could be nothing. Expressing this concept in terms of the type system means the compiler can check that youve handled all the cases you should be handling, which can prevent bugs that are extremely common in other programming languages.

Programming language design is often thought of in terms of which features you include, but the features you exclude are important too. Rust doesnt have the null feature that many other languages have. Null is a value that means there is no value there. In languages with null, variables can always be in one of two states: null or not-null.

In “Null References: The Billion Dollar Mistake,” Tony Hoare, the inventor of null, has this to say:

I call it my billion-dollar mistake. At that time, I was designing the first comprehensive type system for references in an object-oriented language. My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.

The problem with null values is that if you try to actually use a value thats null as if it is a not-null value, youll get an error of some kind. Because this null or not-null property is pervasive, its extremely easy to make this kind of error.

However, the concept that null is trying to express is still a useful one: a null is a value that is currently invalid or absent for some reason.

The problem isnt with the actual concept but with the particular implementation. As such, Rust does not have nulls, but it does have an enum that can encode the concept of a value being present or absent. This enum is Option<T>, and it is defined by the standard library as follows:

enum Option<T> {
    Some(T),
    None,
}

The Option<T> enum is so useful that its even included in the prelude; you dont need to import it explicitly. In addition, so are its variants: you can use Some and None directly without prefixing them with Option::. Option<T> is still just a regular enum, and Some(T) and None are still variants of type Option<T>.

The <T> syntax is a feature of Rust we havent talked about yet. Its a generic type parameter, and well cover generics in more detail in Chapter 10. For now, all you need to know is that <T> means the Some variant of the Option enum can hold one piece of data of any type. Here are some examples of using Option values to hold number types and string types:

let some_number = Some(5);
let some_string = Some("a string");

let absent_number: Option<i32> = None;

If we use None rather than Some, we need to tell Rust what type of Option<T> we have, because the compiler can't infer the type that the Some variant will hold by looking only at a None value.

When we have a Some value, we know that a value is present, and the value is held within the Some. When we have a None value, in some sense, it means the same thing as null: we dont have a valid value. So why is having Option<T> any better than having null?

In short, because Option<T> and T (where T can be any type) are different types, the compiler wont let us use an Option<T> value as if it was definitely a valid value. For example, this code wont compile because its trying to compare an Option<i8> to an i8:

let x: i8 = 5;
let y: Option<i8> = Some(5);

let sum = x + y;

If we run this code, we get an error message like this:

error[E0277]: the trait bound `i8: std::ops::Add<std::option::Option<i8>>` is
not satisfied
 -->
  |
7 | let sum = x + y;
  |           ^^^^^
  |

Intense! In effect, this error message means that Rust doesnt understand how to add an Option<i8> and an i8, because theyre different types. When we have a value of a type like i8 in Rust, the compiler will ensure that we always have a valid value. We can proceed confidently without having to check for null before using that value. Only when we have an Option<i8> (or whatever type of value were working with) do we have to worry about possibly not having a value, and the compiler will make sure we handle that case before using the value.

In other words, you have to convert an Option<T> to a T before you can perform T operations with it. Generally, this helps catch one of the most common issues with null: assuming that something isnt null when it actually is.

Not having to worry about missing an assumption of having a not-null value helps you to be more confident in your code. In order to have a value that can possibly be null, you must explicitly opt in by making the type of that value Option<T>. Then, when you use that value, you are required to explicitly handle the case when the value is null. Everywhere that a value has a type that isnt an Option<T>, you can safely assume that the value isnt null. This was a deliberate design decision for Rust to limit nulls pervasiveness and increase the safety of Rust code.

So, how do you get the T value out of a Some variant when you have a value of type Option<T> so you can use that value? The Option<T> enum has a large number of methods that are useful in a variety of situations; you can check them out in its documentation. Becoming familiar with the methods on Option<T> will be extremely useful in your journey with Rust.

In general, in order to use an Option<T> value, we want to have code that will handle each variant. We want some code that will run only when we have a Some(T) value, and this code is allowed to use the inner T. We want some other code to run if we have a None value, and that code doesnt have a T value available. The match expression is a control flow construct that does just this when used with enums: it will run different code depending on which variant of the enum it has, and that code can use the data inside the matching value.

The match Control Flow Operator

Rust has an extremely powerful control-flow operator called match that allows us to compare a value against a series of patterns and then execute code based on which pattern matches. Patterns can be made up of literal values, variable names, wildcards, and many other things; Chapter 18 will be about all the different kinds of patterns and what they do. The power of match comes from the expressiveness of the patterns and the compiler checks that make sure all possible cases are handled.

Think of a match expression kind of like a coin sorting machine: coins slide down a track with variously sized holes along it, and each coin falls through the first hole it encounters that it fits into. In the same way, values go through each pattern in a match, and at the first pattern the value “fits,” the value will fall into the associated code block to be used during execution.

Because we just mentioned coins, lets use them as an example using match! We can write a function that can take an unknown United States coin and, in a similar way as the counting machine, determine which coin it is and return its value in cents, as shown here in Listing 6-3:

enum Coin {
    Penny,
    Nickel,
    Dime,
    Quarter,
}

fn value_in_cents(coin: Coin) -> i32 {
    match coin {
        Coin::Penny => 1,
        Coin::Nickel => 5,
        Coin::Dime => 10,
        Coin::Quarter => 25,
    }
}
Listing 6-3: An enum and a `match` expression that has the variants of the enum as its patterns.

Lets break down the match in the value_in_cents function. First, we list the match keyword followed by an expression, which in this case is the value coin. This seems very similar to an expression used with if, but theres a big difference: with if, the expression needs to return a boolean value. Here, it can be any type. The type of coin in this example is the Coin enum that we defined in Listing 6-3.

Next are the match arms. An arm has two parts: a pattern and some code. The first arm here has a pattern that is the value Coin::Penny and then the => operator that separates the pattern and the code to run. The code in this case is just the value 1. Each arm is separated from the next with a comma.

When the match expression executes, it compares the resulting value against the pattern of each arm, in order. If a pattern matches the value, the code associated with that pattern is executed. If that pattern doesnt match the value, execution continues to the next arm, much like a coin sorting machine. We can have as many arms as we need: in Listing 6-3, our match has four arms.

The code associated with each arm is an expression, and the resulting value of the expression in the matching arm is the value that gets returned for the entire match expression.

Curly braces typically arent used if the match arm code is short, as it is in Listing 6-3 where each arm just returns a value. If you want to run multiple lines of code in a match arm, you can use curly braces. For example, the following code would print out “Lucky penny!” every time the method was called with a Coin::Penny but would still return the last value of the block, 1:

fn value_in_cents(coin: Coin) -> i32 {
    match coin {
        Coin::Penny => {
            println!("Lucky penny!");
            1
        },
        Coin::Nickel => 5,
        Coin::Dime => 10,
        Coin::Quarter => 25,
    }
}

Patterns that Bind to Values

Another useful feature of match arms is that they can bind to parts of the values that match the pattern. This is how we can extract values out of enum variants.

As an example, lets change one of our enum variants to hold data inside it. From 1999 through 2008, the United States printed quarters with different designs for each of the 50 states on one side. No other coins got state designs, so only quarters have this extra value. We can add this information to our enum by changing the Quarter variant to include a State value stored inside it, which we've done here in Listing 6-4:

#[derive(Debug)] // So we can inspect the state in a minute
enum UsState {
    Alabama,
    Alaska,
    // ... etc
}

enum Coin {
    Penny,
    Nickel,
    Dime,
    Quarter(UsState),
}
Listing 6-4: A `Coin` enum where the `Quarter` variant also holds a `UsState` value

Lets imagine that a friend of ours is trying to collect all 50 state quarters. While we sort our loose change by coin type, well also call out the name of the state associated with each quarter so if its one our friend doesnt have, they can add it to their collection.

In the match expression for this code, we add a variable called state to the pattern that matches values of the variant Coin::Quarter. When a Coin::Quarter matches, the state variable will bind to the value of that quarters state. Then we can use state in the code for that arm, like so:

fn value_in_cents(coin: Coin) -> i32 {
    match coin {
        Coin::Penny => 1,
        Coin::Nickel => 5,
        Coin::Dime => 10,
        Coin::Quarter(state) => {
            println!("State quarter from {:?}!", state);
            25
        },
    }
}

If we were to call value_in_cents(Coin::Quarter(UsState::Alaska)), coin would be Coin::Quarter(UsState::Alaska). When we compare that value with each of the match arms, none of them match until we reach Coin::Quarter(state). At that point, the binding for state will be the value UsState::Alaska. We can then use that binding in the println! expression, thus getting the inner state value out of the Coin enum variant for Quarter.

Matching with Option<T>

In the previous section we wanted to get the inner T value out of the Some case when using Option<T>; we can also handle Option<T> using match as we did with the Coin enum! Instead of comparing coins, well compare the variants of Option<T>, but the way that the match expression works remains the same.

Lets say we want to write a function that takes an Option<i32>, and if theres a value inside, adds one to that value. If there isnt a value inside, the function should return the None value and not attempt to perform any operations.

This function is very easy to write, thanks to match, and will look like Listing 6-5:

fn plus_one(x: Option<i32>) -> Option<i32> {
    match x {
        None => None,
        Some(i) => Some(i + 1),
    }
}

let five = Some(5);
let six = plus_one(five);
let none = plus_one(None);
Listing 6-5: A function that uses a `match` expression on an `Option`

Matching Some(T)

Lets examine the first execution of plus_one in more detail. When we call plus_one(five) w, the variable x in the body of plus_one will have the value Some(5). We then compare that against each match arm.

None => None,

The Some(5) value doesnt match the pattern None u, so we continue to the next arm.

Some(i) => Some(i + 1),

Does Some(5) match Some(i) v? Why yes it does! We have the same variant. The i binds to the value contained in Some, so i takes the value 5. The code in the match arm is then executed, so we add one to the value of i and create a new Some value with our total 6 inside.

Matching None

Now lets consider the second call of plus_one in Listing 6-5 where x is None x. We enter the match and compare to the first arm u.

None => None,

It matches! Theres no value to add to, so the program stops and returns the None value on the right side of =>. Because the first arm matched, no other arms are compared.

Combining match and enums is useful in many situations. Youll see this pattern a lot in Rust code: match against an enum, bind a variable to the data inside, and then execute code based on it. Its a bit tricky at first, but once you get used to it, youll wish you had it in all languages. Its consistently a user favorite.

Matches Are Exhaustive

Theres one other aspect of match we need to discuss. Consider this version of our plus_one function:

fn plus_one(x: Option<i32>) -> Option<i32> {
    match x {
        Some(i) => Some(i + 1),
    }
}

We didnt handle the None case, so this code will cause a bug. Luckily, its a bug Rust knows how to catch. If we try to compile this code, well get this error:

error[E0004]: non-exhaustive patterns: `None` not covered
 -->
  |
6 |         match x {
  |               ^ pattern `None` not covered

Rust knows that we didnt cover every possible case and even knows which pattern we forgot! Matches in Rust are exhaustive: we must exhaust every last possibility in order for the code to be valid. Especially in the case of Option<T>, when Rust prevents us from forgetting to explicitly handle the None case, it protects us from assuming that we have a value when we might have null, thus making the billion dollar mistake discussed earlier.

The _ Placeholder

Rust also has a pattern we can use in situations when we dont want to list all possible values. For example, a u8 can have valid values of 0 through 255. If we only care about the values 1, 3, 5, and 7, we dont want to have to list out 0, 2, 4, 6, 8, 9 all the way up to 255. Fortunately, we dont have to: we can use the special pattern _ instead:

let some_u8_value = 0u8;
match some_u8_value {
    1 => println!("one"),
    3 => println!("three"),
    5 => println!("five"),
    7 => println!("seven"),
    _ => (),
}

The _ pattern will match any value. By putting it after our other arms, the _ will match all the possible cases that arent specified before it. The () is just the unit value, so nothing will happen in the _ case. As a result, we can say that we want to do nothing for all the possible values that we dont list before the _ placeholder.

However, the match expression can be a bit wordy in a situation in which we only care about one of the cases. For this situation, Rust provides if let.

Concise Control Flow with if let

The if let syntax lets you combine if and let into a less verbose way to handle values that match one pattern and ignore the rest. Consider the program in Listing 6-6 that matches on an Option<u8> value but only wants to execute code if the value is three:

let some_u8_value = Some(0u8);
match some_u8_value {
    Some(3) => println!("three"),
    _ => (),
}
Listing 6-6: A `match` that only cares about executing code when the value is `Some(3)`

We want to do something with the Some(3) match but do nothing with any other Some<u8> value or the None value. To satisfy the match expression, we have to add _ => () after processing just one variant, which is a lot of boilerplate code to add.

Instead, we could write this in a shorter way using if let. The following code behaves the same as the match in Listing 6-6:

if let Some(3) = some_u8_value {
    println!("three");
}

if let takes a pattern and an expression separated by an =. It works the same way as a match, where the expression is given to the match and the pattern is its first arm.

Using if let means you have less to type, less indentation, and less boilerplate code. However, weve lost the exhaustive checking that match enforces. Choosing between match and if let depends on what youre doing in your particular situation and if gaining conciseness is an appropriate trade-off for losing exhaustive checking.

In other words, you can think of if let as syntax sugar for a match that runs code when the value matches one pattern and then ignores all other values.

We can include an else with an if let. The block of code that goes with the else is the same as the block of code that would go with the _ case in the match expression that is equivalent to the if let and else. Recall the Coin enum definition in Listing 6-4, where the Quarter variant also held a UsState value. If we wanted to count all non-quarter coins we see while also announcing the state of the quarters, we could do that with a match expression like this:

let mut count = 0;
match coin {
    Coin::Quarter(state) => println!("State quarter from {:?}!", state),
    _ => count += 1,
}

Or we could use an if let and else expression like this:

let mut count = 0;
if let Coin::Quarter(state) = coin {
    println!("State quarter from {:?}!", state);
} else {
    count += 1;
}

If you have a situation in which your program has logic that is too verbose to express using a match, remember that if let is in your Rust toolbox as well.

Summary

Weve now covered how to use enums to create custom types that can be one of a set of enumerated values. Weve shown how the standard librarys Option<T> type helps you use the type system to prevent errors. When enum values have data inside them, you can use match or if let to extract and use those values, depending on how many cases you need to handle.

Your Rust programs can now express concepts in your domain using structs and enums. Creating custom types to use in your API ensures type safety: the compiler will make certain your functions only get values of the type each function expects.

In order to provide a well-organized API to your users that is straightforward to use and only exposes exactly what your users will need, lets now turn to Rusts modules.