mirror of https://github.com/rust-lang-cn/book-cn.git synced 2025-01-23 23:50:25 +08:00

Carol (Nichols || Goulding) 38c1c09437 Merge remote-tracking branch 'origin/generics'

2016-11-14 14:30:16 -05:00

43 KiB

Raw Blame History

[TOC]

Generics

One of the core tools a programming language gives you is the ability to deal effectively with duplication of code. It's important to minimize the amount of code that is duplicated throughout a program to make maintenace easier and minimize logic errors. Maintenance will be easier if there's only one place that you need to change the code if you change your mind about how the program should work, rather than multiple places in the code. If your program's logic is duplicated in different places and those places don't match, you'll get errors or unexpected and undesired behavior from your program that could be hard to track down. Rust has the concept of generics as one way to eliminate duplicate code. Generics come in the form of generic types, traits that those generic types have, and generic lifetimes. We'll cover how to use all of these in this chapter.

Removing Duplication by Extracting a Function

Let's first go through a technique for dealing with duplication that you're probably familiar with: extracting a function. Consider a small program that finds the largest number in a list, shown in Listing 10-1:

Filename: src/main.rs

fn main() {
    let numbers = vec![34, 50, 25, 100, 65];

    let mut largest = numbers[0];

    for number in numbers {
        if largest > number {
            largest = number;
        }
    }

    println!("The largest number is {}", largest);
}

Listing 10-1: Code to find the largest number in a list of numbers

If we needed to find the largest number in two different lists of numbers, we could duplicate the code in Listing 10-1 and have the same logic exist in two places in the program:

Filename: src/main.rs

fn main() {
    let numbers = vec![34, 50, 25, 100, 65];

    let mut largest = numbers[0];

    for number in numbers {
        if largest > number {
            largest = number;
        }
    }

    println!("The largest number is {}", largest);

    let numbers = vec![102, 34, 6000, 89, 54, 2, 43, 8];

    let mut largest = numbers[0];

    for number in numbers {
        if largest > number {
            largest = number;
        }
    }

    println!("The largest number is {}", largest);
}

Copying code is tedious and error-prone, plus now we have two places to update the logic if we need it to change. Rust, like many languages, gives us a way to deal with this duplication by creating an abstraction, and in this case the abstraction we'll use is a function. Here's a program where we've extracted the code in Listing 10-1 that finds the largest number into a function named largest. This program can find the largest number in two different lists of numbers, but the code from Listing 10-1 only exists in one spot:

Filename: src/main.rs

fn largest(numbers: Vec<i32>) {
    let mut largest = numbers[0];

    for number in numbers {
        if largest > number {
            largest = number;
        }
    }

    println!("The largest number is {}", largest);
}

fn main() {
    let numbers = vec![34, 50, 25, 100, 65];

    largest(numbers);

    let numbers = vec![102, 34, 6000, 89, 54, 2, 43, 8];

    largest(numbers);
}

The function takes an argument, numbers, which represents any concrete Vec<i32> that we might pass into the function. The code in the function definition operates on the numbers representation of any Vec<i32>. When we call the largest function, the code actually runs on the specific values that we pass in.

Functions aren't the only way to eliminate duplication. For example, our largest function only works for vectors of i32. What if we wanted to find the largest number in a list of floats? Or the largest value in some sort of custom struct or enum? We can't solve those kinds of duplication with regular functions.

To solve these kinds of problems, Rust provides a feature called generics. In the same way that functions allow us to abstract over common code, generics allow us to abstract over types. This ability gives us tremendous power to write code that works in a large number of situations. First, we'll examine the syntax of generics. Then, we'll talk about another feature that's used to augment generics: traits. Finally, we'll discuss one of Rust's most unique uses of generics: lifetimes.

Generics Syntax

We've already hinted at the idea of generics in previous chapters, but we never dug into what exactly they are or how to use them. In places where we specify a type, like function signatures or structs, instead we can use generics. Generics are stand-ins that represent an abstract set instead of something concrete. In this section, we're going to cover generic data types.

You can recognize when any kind of generics are used by the way that they fit into Rust's syntax: any time you see angle brackets, <>, you're dealing with generics. Types we've seen before, like in Chapter 8 where we discussed vectors with types like Vec<i32>, employ generics. The type that the standard library defines for vectors is Vec<T>. That T is called a type parameter, and it serves a similar function as parameters to functions: you fill in the parameter with a concrete type, and that determines how the overall type works. In the same way that a function like foo(x: i32) can be called with a specific value such as foo(5), a Vec<T> can be created with a specific type, like Vec<i32>.

Duplicated Enum Definitions

Let's dive into generic data types in more detail. We learned about how to use the Option<T> enum in Chapter 6, but we never examined its definition. Let's try to imagine how we'd write it! We'll start from duplicated code like we did in the "Removing Duplication by Extracting a Function" section. This time, we'll remove the duplication by extracting a generic data type instead of extracting a function, but the mechanics of doing the extraction will be similar. First, let's consider an Option enum with a Some variant that can only hold an i32. We'll call this enum OptionalNumber:

Filename: src/main.rs

enum OptionalNumber {
    Some(i32),
    None,
}

fn main() {
    let number = OptionalNumber::Some(5);
    let no_number = OptionalNumber::None;
}

This works just fine for i32s. But what if we also wanted to store f64s? We would have to duplicate code to define a separate Option enum type for each type we wanted to be able to hold in the Some variants. For example, here is how we could define and use OptionalFloatingPointNumber:

Filename: src/main.rs

enum OptionalFloatingPointNumber {
    Some(f64),
    None,
}

fn main() {
    let number = OptionalFloatingPointNumber::Some(5.0);
    let no_number = OptionalFloatingPointNumber::None;
}

We've made the enum's name a bit long in order to drive the point home. With what we currently know how to do in Rust, we would have to write a unique type for every single kind of value we wanted to have either Some or None of. In other words, the idea of "an optional value" is a more abstract concept than one specific type. We want it to work for any type at all.

Removing Duplication by Extracting a Generic Data Type

Let's see how to get from duplicated types to the generic type. Here are the definitions of our two enums side-by-side:

enum OptionalNumber {   enum OptionalFloatingPointNumber {
    Some(i32),              Some(f64),
    None,                   None,
}                       }

Aside from the names, we have one line where the two definitions are very close, but still different: the line with the Some definitions. The only difference is the type of the data in that variant, i32 and f64.

Just like we can parameterize arguments to a function by choosing a name, we can parameterize the type by choosing a name. In this case, we've chosen the name T. We could choose any identifier here, but Rust style has type parameters follow the same style as types themselves: CamelCase. In addition, they tend to be short, often one letter. T is the traditional default choice, short for 'type'. Let's use that name in our Some variant definitions where the i32 and f64 types were:

enum OptionalNumber {   enum OptionalFloatingPointNumber {
    Some(T),                Some(T),
    None,                   None,
}                       }

There's one problem, though: we've used T, but not defined it. This would be similar to using an argument to a function in the body without declaring it in the signature. We need to tell Rust that we've introduced a generic parameter. The syntax to do that is the angle brackets, like this:

enum OptionalNumber<T> {   enum OptionalFloatingPointNumber<T> {
    Some(T),                Some(T),
    None,                   None,
}                       }

The <>s after the enum name indicate a list of type parameters, just like () after a function name indicates a list of value parameters. Now the only difference between our two enums is the name. Since we've made them generic, they're not specific to integers or floating point numbers anymore, so they can have the same name:

enum Option<T> {	enum Option<T> {
    Some(T),            Some(T),
    None,               None,
}                   }

Now they're identical! We've made our type fully generic. This definition is also how Option is defined in the standard library. If we were to read this definition aloud, we'd say, "Option is an enum with one type parameter, T. It has two variants: Some, which has a value with type T, and None, which has no value." We can now use the same Option type whether we're holding an i32 or an f64:

let integer = Option::Some(5);
let float = Option::Some(5.0);

We've left in the Option:: namespace for consistency with the previous examples, but since use Option::* is in the prelude, it's not needed. Usually using Option looks like this:

let integer = Some(5);
let float = Some(5.0);

When you recognize situations with almost-duplicate types like this in your code, you can follow this process to reduce duplication using generics.

Monomorphization at Compile Time

Understanding this refactoring process is also useful in understanding how generics work behind the scenes: the compiler does the exact opposite of this process when compiling your code. Monomorphization means taking code that uses generic type parameters and generating code that is specific for each concrete type that is used with the generic code. Monomorphization is why Rust's generics are extremely efficient at runtime. Consider this code that uses the standard library's Option:

let integer = Some(5);
let float = Some(5.0);

When Rust compiles this code, it will perform monomorphization. What this means is the compiler will see that we've used two kinds of Option<T>: one where T is i32, and one where T is f64. As such, it will expand the generic definition of Option<T> into Option_i32 and Option_f64, thereby replacing the generic definition with the specific ones. The more specific version looks like the duplicated code we started with at the beginning of this section:

Filename: src/main.rs

enum Option_i32 {
    Some(i32),
    None,
}

enum Option_f64 {
    Some(f64),
    None,
}

fn main() {
    let integer = Option_i32::Some(5);
    let float = Option_f64::Some(5.0);
}

In other words, we can write the non-duplicated form that uses generics in our code, but Rust will compile that into code that acts as though we wrote the specific type out in each instance. This means we pay no runtime cost for using generics; it's just like we duplicated each particular definition.

Generic Structs

In a similar fashion as we did with enums, we can use <>s with structs as well in order to define structs that have a generic type parameter in one or more of their fields. Generic structs also get monomorphized into specialized types at compile time. Listing 10-2 shows the definition and use of a Point struct that could hold x and y coordinate values that are any type:

Filename: src/main.rs

struct Point<T> {
    x: T,
    y: T,
}

fn main() {
    let integer = Point { x: 5, y: 10 };
    let float = Point { x: 1.0, y: 4.0 };
}

Listing 10-2: A `Point` struct that holds `x` and `y` values of type `T`

The syntax is the same with structs: add a <T> after the name of the struct, then use T in the definition where you want to use that generic type instead of a specific type.

Multiple Type Parameters

Note that in the Point definition in Listing 10-2, we've used the same T parameter for both fields. This means x and y must always be values of the same type. Trying to instantiate a Point that uses an i32 for x and an f64 for y, like this:

let p = Point { x: 5, y: 20.0 };

results in a compile-time error that indicates the type of y must match the type of x:

error[E0308]: mismatched types
  |
7 | let p = Point { x: 5, y: 20.0 };
  |                          ^^^^ expected integral variable, found floating-point variable
  |
  = note: expected type `{integer}`
  = note:    found type `{float}`

If we need to be able to have fields with generic but different types, we can declare multiple type parameters within the angle brackets, separated by a comma. Listing 10-3 shows how to define a Point that can have different types for x and y:

Filename: src/main.rs

struct Point<X, Y> {
    x: X,
    y: Y,
}

fn main() {
    let integer = Point { x: 5, y: 10 };
    let float = Point { x: 1.0, y: 4.0 };
    let p = Point { x: 5, y: 20.0 };
}

Listing 10-2: A `Point` struct that holds an `x` value of type `X` and a `y` value of type `Y`

Now x will have the type of X, and y will have the type of Y, and we can instantiate a Point with an i32 for x and an f64 for y.

We can make enums with multiple type parameters as well. Recall the enum Result<T, E> from Chapter 9 that we used for recoverable errors. Here's its definition:

enum Result<T, E> {
    Ok(T),
    Err(E),
}

Each variant stores a different kind of information, and they're both generic.

You can have as many type parameters as you'd like. Similarly to parameters of values in function signatures, if you have a lot of parameters, the code can get quite confusing, so try to keep the number of parameters defined in any one type small if you can.

Generic Functions and Methods

In a similar way to data structures, we can use the <> syntax in function or method definitions. The angle brackets for type parameters go after the function or method name and before the argument list in parentheses:

fn generic_function<T>(value: T) {
    // code goes here
}

We can use the same process that we used to refactor duplicated type definitions using generics to refactor duplicated function definitions using generics. Consider these two side-by-side function signatures that differ in the type of value:

fn takes_integer(value: i32) {          fn takes_float(value: f64) {
    // code goes here                       // code goes here
}                                       }

We can add a type parameter list that declares the generic type T after the function names, then use T where the specific i32 and f64 types were:

fn takes_integer<T>(value: T) {       fn takes_float<T>(value: T) {
    // code goes here                     // code goes here
}                                     }

At this point, only the names differ, so we could unify the two functions into one:

fn takes<T>(value: T) {
    // code goes here
}

There's one problem though. We've got some function definitions that work, but if we try to use value in code in the function body, we'll get an error. For example, the function definition in Listing 10-3 tries to print out value in its body:

Filename: src/lib.rs

fn show_anything<T>(value: T) {
    println!("I have something to show you!");
    println!("It's: {}", value);
}

Listing 10-3: A `show_anything` function definition that does not yet compile

Compiling this definition results in an error:

	error[E0277]: the trait bound `T: std::fmt::Display` is not satisfied
 --> <anon>:3:37
  |
3 |     println!("It's: {}", value);
  |                          ^^^^^ trait `T: std::fmt::Display` not satisfied
  |
  = help: consider adding a `where T: std::fmt::Display` bound
  = note: required by `std::fmt::Display::fmt`

error: aborting due to previous error(s)

This error mentions something we haven't learned about yet: traits. In the next section, we'll learn how to make this compile.

Traits

Traits are similar to a feature often called 'interfaces' in other languages, but are also different. Traits let us do another kind of abstraction: they let us abstract over behavior that types can have in common.

When we use a generic type parameter, we are telling Rust that any type is valid in that location. When other code uses a value that could be of any type, we need to also tell Rust that the type has the functionality that we need. Traits let us specify that, for example, we need any type T that has methods defined on it that allow us to print a value of that type. This is powerful because we can still leave our definitions generic to allow use of many different types, but we can constrain the type at compile-time to types that have the behavior we need to be able to use.

Here's an example definition of a trait named Printable that has a method named print:

Filename: src/lib.rs

trait Printable {
    fn print(&self);
}

Listing 10-4: A `Printable` trait definition with one method, `print`

We declare a trait with the trait keyword, then the trait's name. In this case, our trait will describe types which can be printed. Inside of curly braces, we declare a method signature, but instead of providing an implementation inside curly braces, we put a semicolon after the signature. A trait can have multiple methods in its body, with the method signatures listend one per line and each line ending in a semicolon.

Implementing a trait for a particular type looks similar to implementing methods on a type since it's also done with the impl keyword, but we specify the trait name as well. Inside the impl block, we specify definitions for the trait's methods in the context of the specific type. Listing 10-5 has an example of implementing the Printable trait from Listing 10-4 (that only has the print method) for a Temperature enum:

Filename: src/lib.rs

enum Temperature {
    Celsius(i32),
    Fahrenheit(i32),
}

impl Printable for Temperature {
    fn print(&self) {
        match *self {
            Temperature::Celsius(val) => println!("{}°C", val),
            Temperature::Fahrenheit(val) => println!("{}°F", val),
        }
    }
}

Listing 10-5: Implementing the `Printable` trait on a `Temperature` enum

In the same way impl lets us define methods, we've used it to define methods that pertain to our trait. We can call methods that our trait has defined just like we can call other methods:

Filename: src/main.rs

fn main() {
    let t = Temperature::Celsius(37);

    t.print();
}

Note that in order to use a trait's methods, the trait itself must be in scope. If the definition of Printable was in a module, the definition would need to be defined as pub and we would need to use the trait in the scope where we wanted to call the print method. This is because it's possible to have two traits that both define a method named print, and our Temperature enum might implement both. Rust wouldn't know which print method we wanted unless we brought the trait we wanted into our current scope with use.

Trait Bounds

Defining traits with methods and implementing the trait methods on a particular type gives Rust more information than just defining methods on a type directly. The information Rust gets is that the type that implements the trait can be used in places where the code specifies that it needs some type that implements a trait. To illustrate this, Listing 10-6 has a print_anything function definition. This is similar to the show_anything function from Listing 10-3, but this function has a trait bound on the generic type T and uses the print function from the trait. A trait bound constrains the generic type to be any type that implements the trait specified, instead of any type at all. With the trait bound, we're then allowed to use the trait method print in the function body:

Filename: src/lib.rs

fn print_anything<T: Printable>(value: T) {
    println!("I have something to print for you!");
    value.print();
}

Listing 10-6: A `print_anything` function that uses the trait bound `Printable` on type `T`

Trait bounds are specified in the type name declarations within the angle brackets. After the name of the type that you want to apply the bound to, add a colon (:) and then specify the name of the trait. This function now specifies that it takes a value parameter that can be of any type, as long as that type implements the trait Printable. We need to specify the Printable trait in the type name declarations because we want to be able to call the print method that is part of the Printable trait.

Now we are able to call the print_anything function from Listing 10-6 and pass it a Temperature instance as the value parameter, since we implemented the trait Printable on Temperature in Listing 10-5:

Filename: src/main.rs

fn main() {
    let temperature = Temperature::Fahrenheit(98);
    print_anything(temperature);
}

If we implement the Printable trait on other types, we can use them with the print_anything method too. If we try to call print_anything with an i32, which does not implement the Printable trait, we get a compile-time error that looks like this:

error[E0277]: the trait bound `{integer}: Printable` is not satisfied
   |
29 | print_anything(3);
   | ^^^^^^^^^^^^^^ trait `{integer}: Printable` not satisfied
   |
   = help: the following implementations were found:
   = help:   <Point as Printable>
   = note: required by `print_anything`

Traits are an extremely useful feature of Rust. You'll almost never see generic functions without an accompanying trait bound. There are many traits in the standard library, and they're used for many, many different things. For example, our Printable trait is similar to one of those traits, Display. And in fact, that's how println! decides how to format things with {}. The Display trait has a fmt method that determines how to format something.

Listing 10-7 shows our original example from Listing 10-3, but this time using the standard library's Display trait in the trait bound on the generic type in the show_anything function:

Filename: src/lib.rs

use std::fmt::Display;

fn show_anything<T: Display>(value: T) {
    println!("I have something to show you!");
    println!("It's: {}", value);
}

Listing 10-7: The `show_anything` function with trait bounds

Now that this function specifies that T can be any type as long as that type implements the Display trait, this code will compile.

Multiple Trait Bounds and `where` Syntax

Each generic type can have its own trait bounds. The signature for a function that takes a type T that implements Display and a type U that implements Printable looks like:

fn some_function<T: Display, U: Printable>(value: T, other_value: U) {

To specify multiple trait bounds on one type, list the trait bounds in a list with a + between each trait. For example, here's the signature of a function that takes a type T that implements Display and Clone (which is another standard library trait we have mentioned):

fn some_function<T: Display + Clone>(value: T) {

When trait bounds start getting complicated, there is another syntax that's a bit cleaner: where. And in fact, the error we got when we ran the code from Listing 10-3 referred to it:

help: consider adding a `where T: std::fmt::Display` bound

The where syntax moves the trait bounds after the function arguments list. This definition of show_anything means the exact same thing as the definition in Listing 10-7, just said a different way:

Filename: src/lib.rs

use std::fmt::Display;

fn show_anything<T>(value: T) where T: Display {
    println!("I have something to show you!");
    println!("It's: {}", value);
}

Instead of T: Display going inside the angle brackets, they go after the where keyword at the end of the function signature. This can make complex signatures easier to read. The where clause and its parts can also go on new lines. Here's the signature of a function that takes three generic type parameters that each have multiple trait bounds:

fn some_function<T, U, V>(t: T, u: U, v: V)
    where T: Display + Clone,
          U: Printable + Debug,
          V: Clone + Printable
{

Generic type parameters and trait bounds are part of Rust's rich type system. Another important kind of generic in Rust interacts with Rust's ownership and references features, and they're called lifetimes.

Lifetime Syntax

Generic type parameters let us abstract over types, and traits let us abstract over behavior. There's one more way that Rust allows us to do something similar: lifetimes allow us to be generic over scopes of code.

Scopes of code? Yes, it's a bit unusual. Lifetimes are, in some ways, Rust's most distinctive feature. They are a bit different than the tools you have used in other programming languages. Lifetimes are a big topic, so we're not going to cover everything about them in this chapter. What we are going to do is talk about the very basics of lifetimes, so that when you see the syntax in documentation or other places, you'll be familiar with the concepts. Chapter 20 will contain more advanced information about everything lifetimes can do.

Core Syntax

We talked about references in Chapter 4, but we left out an important detail. As it turns out, every reference in Rust has a lifetime, which is the scope for which that reference is valid. Most of the time, lifetimes are implicit, but just like we can choose to annotate types everywhere, we can choose to annotate lifetimes.

Lifetimes have a slightly unusual syntax:

&i32 // a reference
&'a i32 // a reference with an explicit lifetime

The 'a there is a lifetime with the name a. A single apostrophe indicates that this name is for a lifetime. Lifetime names need to be declared before they're used. Here's a function signature with lifetime declarations and annotations:

fn some_function<'a>(argument: &'a i32) {

Notice anything? In the same way that generic type declarations go inside angle brackets after the function name, lifetime declarations also go inside those same angle brackets. We can even write functions that take both a lifetime declaration and a generic type declaration:

fn some_function<'a, T>(argument: &'a T) {

This function takes one argument, a reference to some type, T, and the reference has the lifetime 'a. In the same way that we parameterize functions that take generic types, we parameterize references with lifetimes.

So, that's the syntax, but why? What does a lifetime do, anyway?

Lifetimes Prevent Dangling References

Consider the program in listing 10-8. There's an outer scope and an inner scope. The outer scope declares a variable named r with no initial value, and the inner scope declares a variable named x with the initial value of 5. Inside the inner scope, we attempt to set the value of r to a reference to x. Then the inner scope ends and we attempt to print out the value in r:

{
    let r;

    {
        let x = 5;
        r = &x;
    }

    println!("r: {}", r);
}

Listing 10-8: An attempt to use a reference whose value has gone out of scope

If we compile this code, we get an error:

	error: `x` does not live long enough
  --> <anon>:6:10
   |
6  |     r = &x;
   |          ^ does not live long enough
7  | }
   | - borrowed value only lives until here
...
10 | }
   | - borrowed value needs to live until here

The variable x doesn't "live long enough." Why not? Well, x is going to go out of scope when we hit the closing curly brace on line 7, ending the inner scope. But r is valid for the outer scope; its scope is larger and we say that it "lives longer." If Rust allowed this code to work, r would be referencing memory that was deallocated when x went out of scope. That'd be bad! Once it's deallocated, it's meaningless.

So how does Rust determine that this code should not be allowed? Part of the compiler called the borrow checker compares scopes to determine that all borrows are valid. Here's the same example from Listing 10-8 with some annotations:

{
    let r;         // -------+-- 'a
                   //        |
    {              //        |
        let x = 5; // -+-----+-- 'b
        r = &x;    //  |     |
    }              // -+     |
                   //        |
    println!("r: {}", r); // |
                   //        |
                   // -------+
}

Here, we've annotated the lifetime of r with 'a and the lifetime of x with 'b. Rust looks at these lifetimes and sees that r has a lifetime of 'a, but that it refers to something with a lifetime of 'b. It rejects the program because the lifetime 'b is shorter than the lifetime of 'a-- the value that the reference is referring to does not live as long as the reference does.

Let's look at a different example that compiles because it does not try to make a dangling reference, and see what the lifetimes look like:

{
    let x = 5;            // -----+-- 'b
                          //      |
    let r = &x;           // --+--+-- 'a
                          //   |  |
    println!("r: {}", r); //   |  |
                          // --+  |
                          // -----+
}

Here, x lives for 'b, which in this case is larger than 'a. This is allowed: Rust knows that the reference in r will always be valid, as it has a smaller scope than x, the value it refers to.

Note that we didn't have to name any lifetimes in the code itself; Rust figured it out for us. One situation in which Rust can't figure out the lifetimes is for a function or method when one of the arguments or return values is a reference, except for a few scenarios we'll discuss in the lifetime elision section.

Lifetime Annotations in Struct Definitions

Another time that Rust can't figure out the lifetimes is when structs have a field that holds a reference. In that case, naming the lifetimes looks like this:

struct Ref<'a> {
    x: &'a i32,
}

Again, the lifetime names are declared in the angle brackets where generic type parameters are declared, and this is because lifetimes are a form of generics. In the examples above, 'a and 'b were concrete lifetimes: we knew about r and x and how long they would live exactly. However, when we write a function, we can't know beforehand exactly all of the arguments that it could be called with and how long they will be valid for. We have to explain to Rust what we expect the lifetime of the argument to be (we'll learn about how to know what you expect the lifetime to be in a bit). This is similar to writing a function that has an argument of a generic type: we don't know what type the arguments will actually end up being when the function gets called. Lifetimes are the same idea, but they are generic over the scope of a reference, rather than a type.

Lifetime Annotations in Function Signatures

Lifetime annotations for functions go on the function signature, but we don't have to annotate any of the code in the function body with lifetimes. That's because Rust can analyze the specific code inside the function without any help. When a function interacts with references that come from or go to code outside that function, however, the lifetimes of those arguments or return values will potentially be different each time that function gets called. Rust would have to analyze every place the function is called to determine that there were no dangling references. That would be impossible because a library that you provide to someone else might be called in code that hasn't been written yet, at the time that you're compiling your library.

Lifetime parameters specify generic lifetimes that will apply to any specific lifetimes the function gets called with. The annotation of lifetime parameters tell Rust what it needs to know in order to be able to analyze a function without knowing about all possible calling code. Lifetime annotations do not change how long any of the references involved live. In the same way that functions can accept any type when the signature specifies a generic type parameter, functions can accept references with any lifetime when the signature specifies a generic lifetime parameter.

To understand lifetime annotations in context, let's write a function that will return the longest of two string slices. The way we want to be able to call this function is by passing two string slices, and we want to get back a string slice. The code in Listing 10-9 should print The longest string is abcd once we've implemented the longest function:

Filename: src/main.rs

fn main() {
    let a = String::from("abcd");
    let b = "xyz";

    let c = longest(a.as_str(), b);
    println!("The longest string is {}", c);
}

Listing 10-9: A `main` function that demonstrates how we'd like to use the `longest` function

Note that we want the function to take string slices because we don't want the longest function to take ownership of its arguments, and we want the function to be able to accept slices of a String (like a) is as well as string literals (b). Refer back to the "String Slices as Arguments" section of Chapter 4 for more discussion about why these are the arguments we want.

Here's the start of an implementation of the longest function that won't compile yet:

fn longest(x: &str, y: &str) -> &str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

If we try to compile this, we get an error that talks about lifetimes:

error[E0106]: missing lifetime specifier
   |
1  | fn longest(x: &str, y: &str) -> &str {
   |                                 ^ expected lifetime parameter
   |
   = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `x` or `y`

The help text is telling us that the return type needs a generic lifetime parameter on it because this function is returning a reference and Rust can't tell if the reference being returned refers to x or y. Actually, we don't know either, since in the if block in the body of this function returns a reference to x and the else block returns a reference to y! The way to specify the lifetime parameters in this case is to have the same lifetime for all of the input parameters and the return type:

Filename: src/main.rs

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

This will compile and will produce the result we want with the main function in Listing 10-9. This function signature is now saying that for some lifetime named 'a, it will get two arguments, both which are string slices that live at least as long as the lifetime 'a. The function will return a string slice that also will last at least as long as the lifetime 'a. This is the contract we are telling Rust we want it to enforce. By specifying the lifetime parameters in this function signature, we are not changing the lifetimes of any values passed in or returned, but we are saying that any values that do not adhere to this contract should be rejected by the borrow checker. This function does not know (or need to know) exactly how long x and y will live since it knows that there is some scope that can be substituted for 'a that will satisfy this signature.

The exact way to specify lifetime parameters depends on what your function is doing. If the function didn't actually return the longest string slice but instead always returned the first argument, we wouldn't need to specify a lifetime on y. This code compiles:

Filename: src/main.rs

fn longest<'a>(x: &'a str, y: &str) -> &'a str {
    x
}

The lifetime parameter for the return type needs to be specified and needs to match one of the arguments' lifetime parameters. If the reference returned does not refer to one of the arguments, the only other possibility is that it refers to a value created within this function, and that would be a dangling reference since the value will go out of scope at the end of the function. Consider this attempted implementation of longest:

Filename: src/main.rs

fn longest<'a>(x: &str, y: &str) -> &'a str {
    let result = String::from("really long string");
    result.as_str()
}

Even though we've specified a lifetime for the return type, this function fails to compile with the following error message:

error: `result` does not live long enough
  |
3 |     result.as_str()
  |     ^^^^^^ does not live long enough
4 | }
  | - borrowed value only lives until here
  |
note: borrowed value must be valid for the lifetime 'a as defined on the block at 1:44...
  |
1 | fn longest<'a>(x: &str, y: &str) -> &'a str {
  |                                             ^

The problem is that result will go out of scope and get cleaned up at the end of the longest function, and we're trying to return a reference to result from the function. There's no way we can specify lifetime parameters that would change the dangling reference, and Rust won't let us create a dangling reference. In this case, the best fix would be to return an owned data type rather than a reference so that the calling function is then responsible for cleaning up the value.

Ultimately, lifetime syntax is about connecting the lifetimes of various arguments and return values of functions. Once they're connected, Rust has enough information to allow memory-safe operations and disallow operations that would create dangling pointers or otherwise violate memory safety.

Lifetime Elision

If every reference has a lifetime, and we need to provide them for functions that use references as arguments or return values, then why did this function from the "String Slices" section of Chapter 4 compile? We haven't annotated any lifetimes here, yet Rust happily compiles this function:

Filename: src/lib.rs

fn first_word(s: &str) -> &str {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return &s[0..i];
        }
    }

    &s[..]
}

The answer is historical: in early versions of pre-1.0 Rust, this would not have compiled. Every reference needed an explicit lifetime. At that time, the function signature would have been written like this:

fn first_word<'a>(s: &'a str) -> &'a str {

After writing a lot of Rust code, some patterns developed. The Rust team noticed that the vast majority of code followed the pattern, and being forced to use explicit lifetime syntax on every reference wasn't a very great developer experience.

To make it so that lifetime annotations weren't needed as often, they added lifetime elision rules to Rust's analysis of references. This feature isn't full inference: Rust doesn't try to guess what you meant in places where there could be ambiguity. The rules are a very basic set of particular cases, and if your code fits one of those cases, you don't need to write the lifetimes explicitly. Here are the rules:

Lifetimes on function arguments are called input lifetimes, and lifetimes on return values are called output lifetimes. There's one rule related to how Rust infers input lifetimes in the absence of explicit annotations:

Each argument that is a reference and therefore needs a lifetime parameter gets its own. In other words, a function with one argument gets one lifetime parameter: fn foo<'a>(x: &'a i32), a function with two arguments gets two separate lifetime parameters: fn foo<'a, 'b>(x: &'a i32, y: &'b i32), and so on.

And two rules related to output lifetimes:

If there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters: fn foo<'a>(x: &'a i32) -> &'a i32.
If there are multiple input lifetime parameters, but one of them is &self or &mut self, then the lifetime of self is the lifetime assigned to all output lifetime parameters. This makes writing methods much nicer.

If none of these three rules apply, then you must explicitly annotate input and output lifetimes. These rules do apply in the first_word function, which is why we didn't have to specify any lifetimes.

These rules cover the vast majority of cases, allowing you to write a lot of code without needing to specify explicit lifetimes. However, Rust is always checking these rules and the lifetimes in your program, and cases in which the lifetime elision rules do not apply are cases where you'll need to add lifetime parameters to help Rust understand the contracts of your code.

Lifetime Annotations in Method Definitions

Now that we've gone over the lifetime elision rules, defining methods on structs that hold references will make more sense. The lifetime name needs to be declared after the impl keyword and then used after the struct's name, since the lifetime is part of the struct's type. The lifetimes can be elided in any methods where the output type's lifetime is the same as that of the struct's because of the third elision rule. Here's a struct called App that holds a reference to another struct, Config, defined elsewhere. The append_to_name method does not need lifetime annotations even though the method has a reference as an argument and is returning a reference; the lifetime of the return value will be the lifetime of self:

Filename: src/lib.rs

struct App<'a> {
    name: String,
    config: &'a Config,
}

impl<'a> App<'a> {
    fn append_to_name(&mut self, suffix: &str) -> &str {
        self.name.push_str(suffix);
        self.name.as_str()
    }
}

The Static Lifetime

There is one special lifetime that Rust knows about: 'static. The 'static lifetime is the entire duration of the program. All string literals have the 'static lifetime:

let s: &'static str = "I have a static lifetime.";

The text of this string is stored directly in the binary of your program and the binary of your program is always available. Therefore, the lifetime of all string literals is 'static. You may see suggestions to use the 'static lifetime in error message help text, but before adding it, think about whether the reference you have is one that actually lives the entire lifetime of your program or not (or even if you want it to live that long, if it could). Most of the time, the problem in the code is an attempt to create a dangling reference or a mismatch of the available lifetimes, and the solution is fixing those problems, not specifying the 'static lifetime.

Summary

We've covered the basics of Rust's system of generics. Generics are the core to building good abstractions, and can be used in a number of ways. There's more to learn about them, particularly lifetimes, but we'll cover those in later chapters. Let's move on to I/O functionality.

43 KiB Raw Blame History

Generics

Removing Duplication by Extracting a Function

Generics Syntax

Duplicated Enum Definitions

Removing Duplication by Extracting a Generic Data Type

Monomorphization at Compile Time

Generic Structs

Multiple Type Parameters

Generic Functions and Methods

Traits

Trait Bounds

Multiple Trait Bounds and where Syntax

Lifetime Syntax

Core Syntax

Lifetimes Prevent Dangling References

Lifetime Annotations in Struct Definitions

Lifetime Annotations in Function Signatures

Lifetime Elision

Lifetime Annotations in Method Definitions

The Static Lifetime

Summary

43 KiB

Raw Blame History

Multiple Trait Bounds and `where` Syntax