rust-book-cn/nostarch/chapter05.md
2016-09-27 14:00:25 -04:00

19 KiB
Raw Blame History

[TOC]

Structs

A struct, short for structure, is a custom data type that lets us name and package together multiple related values that make up a meaningful group. If you come from an object-oriented language, a struct is like an object's data attributes. In the next section of this chapter, we'll talk about how to define methods on our structs; methods are how you specify the behavior that goes along with a struct's data. The struct and enum (that we will talk about in Chapter 6) concepts are the building blocks for creating new types in your program's domain in order to take full advantage of Rust's compile-time type checking.

One way of thinking about structs is that they are similar to tuples that we talked about in Chapter 3. Like tuples, the pieces of a struct can be different types. Unlike tuples, we name each piece of data so that it's clearer what the values mean. Structs are more flexible as a result of these names: we don't have to rely on the order of the data to specify or access the values of an instance.

To define a struct, we enter the keyword struct and give the whole struct a name. A struct's name should describe what the significance is of these pieces of data being grouped together. Then, inside curly braces, we define the names of the pieces of data, which we call fields, and specify each field's type. For example, a struct to store information about a user account might look like:

struct User {
    username: String,
    email: String,
    sign_in_count: u64,
    active: bool,
}

To use a struct, we create an instance of that struct by specifying concrete values for each of the fields. Creating an instance is done by declaring a binding with let, stating the name of the struct, then curly braces with key: value pairs inside it where the keys are the names of the fields and the values are the data we want to store in those fields. The fields don't have to be specified in the same order in which the struct declared them. In other words, the struct definition is like a general template for the type, and instances fill in that template with particular data to create values of the type. For example, we can declare a particular user like this:

let user1 = User {
    email: String::from("someone@example.com"),
    username: String::from("someusername123"),
    active: true,
    sign_in_count: 1,
};

To get a particular value out of a struct, we can use dot notation. If we wanted just this user's email address, we can say user1.email.

An Example Program

To understand when we might want to use structs, lets write a program that calculates the area of a rectangle. Well start off with single variable bindings, then refactor our program until we're using structs instead.

Lets make a new binary project with Cargo called rectangles that will take the length and width of a rectangle specified in pixels and will calculate the area of the rectangle. Heres a short program that has one way of doing just that to put into our project's src/main.rs:

Filename: src/main.rs

fn main() {
    let length1 = 50;
    let width1 = 30;

    println!(
        "The area of the rectangle is {} square pixels.",
        area(length1, width1)
    );
}

fn area(length: u32, width: u32) -> u32 {
    length * width
}

Let's try running this program with cargo run:

The area of the rectangle is 1500 square pixels.

Refactoring with Tuples

Our little program works okay; it figures out the area of the rectangle by calling the area function with each dimension. But we can do better. The length and the width are related to each other since together they describe one rectangle.

The issue with this method is evident in the signature of area:

fn area(length: u32, width: u32) -> u32 {

The area function is supposed to calculate the area of one rectangle, but our function takes two arguments. The arguments are related, but that's not expressed anywhere in our program itself. It would be more readable and more manageable to group length and width together.

Weve already discussed one way we might do that in Chapter 3: tuples. Heres a version of our program which uses tuples:

Filename: src/main.rs

fn main() {
    let rect1 = (50, 30);

    println!(
        "The area of the rectangle is {} square pixels.",
        area(rect1)
    );
}

fn area(dimensions: (u32, u32)) -> u32 {
    dimensions.0 * dimensions.1
}

In one way, this is a little better. Tuples let us add a bit of structure, and were now passing just one argument. But in another way this method less clear: tuples dont give names to their elements, so our calculation has gotten more confusing because we have to index into the parts of the tuple:

dimensions.0 * dimensions.1

It doesn't matter if we mix up length and width for the area calculation, but if we were to draw the rectangle on the screen it would matter! We would have to remember that length was the tuple index 0 and width was the tuple index 1. If someone else was to work on this code, they would have to figure this out and remember it as well. It would be easy to forget or mix these values up and cause errors, since we haven't conveyed the meaning of our data in our code.

Refactoring with Structs: Adding More Meaning

Here is where we bring in structs. We can transform our tuple into a data type with a name for the whole as well as names for the parts:

Filename: src/main.rs

struct Rectangle {
    length: u32,
    width: u32,
}

fn main() {
    let rect1 = Rectangle { length: 50, width: 30 };

    println!(
        "The area of the rectangle is {} square pixels.",
        area(&rect1)
    );
}

fn area(rectangle: &Rectangle) -> u32 {
    rectangle.length * rectangle.width
}

Here we've defined a struct and given it the name Rectangle. Inside the {} we defined the fields to be length and width, both of which have type u32. Then in main, we create a particular instance of a Rectangle that has a length of 50 and a width of 30.

Our area function now takes one argument that we've named rectangle whose type is an immutable borrow of a struct Rectangle instance. As we covered in Chapter 4, we want to borrow the struct rather than take ownership of it so that main keeps its ownership and can continue using rect1, so that's why we have the & in the function signature and at the call site.

The area function accesses the length and width fields of the Rectangle instance it got as an argument. Our function signature for area now says exactly what we mean: calculate the area of a Rectangle, using its length and width fields. This conveys that the length and width are related to each other, and gives descriptive names to the values rather than using the tuple index values of 0 and 1. This is a win for clarity.

Adding Useful Functionality with Derived Traits

It'd be nice to be able to print out an instance of our Rectangle while we're debugging our program and see the values for all its fields. Let's try using the println! macro as we have been and see what happens:

Filename: src/main.rs

struct Rectangle {
    length: u32,
    width: u32,
}

fn main() {
    let rect1 = Rectangle { length: 50, width: 30 };

    println!("rect1 is {}", rect1);
}

If we run this, we get an error with this core message:

error: the trait bound `Rectangle: std::fmt::Display` is not satisfied

The println! macro can do many kinds of formatting, and by default, {} tells println! to use formatting known as Display: output intended for direct end-user consumption. The primitive types weve seen so far implement Display by default, as theres only one way youd want to show a 1 or any other primitive type to a user. But with structs, the way println! should format the output is less clear as there are more display possibilities: Do you want commas or not? Do you want to print the struct {}s? Should all the fields be shown? Because of this ambiguity, Rust doesn't try to guess what we want and structs do not have a provided implementation of Display.

If we keep reading the errors, though, we'll find this helpful note:

note: `Rectangle` cannot be formatted with the default formatter; try using
`:?` instead if you are using a format string

Let's try it! The println! will now look like println!("rect1 is {:?}", rect1);. Putting the specifier :? inside the {} tells println! we want to use an output format called Debug. Debug is a trait that enables us to print out our struct in a way that is useful for developers so that we can see its value while we are debugging our code.

Let's try running with this change and... drat. We still get an error:

error: the trait bound `Rectangle: std::fmt::Debug` is not satisfied

Again, though, the compliler has given us a helpful note!

note: `Rectangle` cannot be formatted using `:?`; if it is defined in your
crate, add `#[derive(Debug)]` or manually implement it

Rust does include functionality to print out debugging information, but we have to explicitly opt-in to having that functionality be available for our struct. To do that, we add the annotation #[derive(Debug)] just before our struct definition. Now our program looks like this:

#[derive(Debug)]
struct Rectangle {
    length: u32,
    width: u32,
}

fn main() {
    let rect1 = Rectangle { length: 50, width: 30 };

    println!("rect1 is {:?}", rect1);
}

At this point, if we run this program, we won't get any errors and we'll see the following output:

rect1 is Rectangle { length: 50, width: 30 }

Nice! It's not the prettiest output, but it shows the values of all the fields for this instance, which would definitely help during debugging.

There are a number of traits Rust has provided for us to use with the derive annotation that can add useful behavior to our custom types. Those traits and their behaviors are listed in Appendix XX. We'll be covering how to implement these traits with custom behavior, as well as creating your own traits, in Chapter 10.

Our area function is pretty specific-- it only computes the area of rectangles. It would be nice to tie this behavior together more closely with our Rectangle struct, since it's behavior that our Rectangle type has specifically. Let's now look at how we can continue to refactor this code by turning the area function into an area method defined on our Rectangle type.

Method Syntax

Methods are similar to functions: they're declared with the fn keyword and their name, they can take arguments and return values, and they contain some code that gets run when they're called from somewhere else. Methods are different from functions, however, because they're defined within the context of a struct (or an enum or a trait object, which we will cover in Chapters 6 and XX respectively), and their first argument is always self, which represents the instance of the struct that the method is being called on.

Defining Methods

Let's change our area function that takes a Rectangle instance as an argument and instead make an area method defined on the Rectangle struct:

#[derive(Debug)]
struct Rectangle {
    length: u32,
    width: u32,
}

impl Rectangle {
    fn area(&self) -> u32 {
        self.length * self.width
    }
}

fn main() {
    let rect1 = Rectangle { length: 50, width: 30 };

    println!(
        "The area of the rectangle is {} square pixels.",
        rect1.area()
    );
}

In order to make the function be defined within the context of Rectangle, we start an impl block (impl is short for implementation). Then we move the function within the impl curly braces, and change the first (and in this case, only) argument to be self in the signature and everywhere within the body. Then in main where we called the area function and passed rect1 as an argument, we can instead use method syntax to call the area method on our Rectangle instance.

In the signature for area, we get to use &self instead of rectangle: &Rectangle because Rust knows the type of self is Rectangle due to this method being inside the impl Rectangle context. Note we still need to have the & before self, just like we had &Rectangle. Methods can choose to take ownership of self, borrow self immutably as we've done here, or borrow self mutably, just like any other argument.

We've chosen &self here for the same reason we used &Rectangle in the function version: we don't want to take ownership, and we just want to be able to read the data in the struct, not write to it. If we wanted to be able to change the instance that we've called the method on as part of what the method does, we'd put &mut self as the first argument instead. Having a method that takes ownership of the instance by having just self as the first argument is rarer; this is usually used when the method transforms self into something else and we want to prevent the caller from using the original instance after the transformation.

The main benefit of using methods over functions, in addition to getting to use method syntax and not having to repeat the type of self in every method's signature, is for organization. We've put all the things we can do with an instance of a type together in one impl block, rather than make future users of our code search for capabilities of Rectangle all over the place.

PROD: START BOX

Where's the -> operator?

In languages like C++, there are two different operators for calling methods: . if you're calling a method on the object directly, and -> if you're calling the method on a pointer to the object and thus need to dereference the pointer first. In other words, if object is a pointer, object->something() is like (*object).something().

Rust doesn't have an equivalent to the -> operator; instead, Rust has a feature called automatic referencing and dereferencing. Calling methods is one of the few places in Rust that has behavior like this.

Heres how it works: when you call a method with object.something(), Rust will automatically add in &, &mut, or * so that object matches the signature of the method. In other words, these are the same:

p1.distance(&p2);
(&p1).distance(&p2);

The first one looks much, much cleaner. This automatic referencing behavior works because methods have a clear receiver — the type of self. Given the receiver and name of a method, Rust can figure out definitively whether the method is just reading (so needs &self), mutating (so &mut self), or consuming (so self). The fact that Rust makes borrowing implicit for method receivers is a big part of making ownership ergonomic in practice.

PROD: END BOX

Methods with More Arguments

Let's practice some more with methods by implementing a second method on our Rectangle struct. This time, we'd like for an instance of Rectangle to take another instance of Rectangle and return true if the second rectangle could fit completely within self and false if it would not. That is, if we run this code:

fn main() {
    let rect1 = Rectangle { length: 50, width: 30 };
    let rect2 = Rectangle { length: 40, width: 10 };
    let rect3 = Rectangle { length: 45, width: 60 };

    println!("Can rect1 hold rect2? {}", rect1.can_hold(&rect2));
    println!("Can rect1 hold rect3? {}", rect1.can_hold(&rect3));
}

We want to see this output, since both of rect2's dimensions are smaller than rect1's, but rect3 is wider than rect1:

Can rect1 hold rect2? true
Can rect1 hold rect3? false

We know we want to define a method, so it will be within the impl Rectangle block. The method name will be can_hold, and it will take an immutable borrow of another Rectangle as an argument. We can tell what the type of the argument will be by looking at a call site: rect1.can_hold(&rect2) passes in &rect2, which is an immutable borrow to rect2, an instance of Rectangle. This makes sense, since we only need to read rect2 (rather than write, which would mean we'd need a mutable borrow) and we want main to keep ownership of rect2 so that we could use it again after calling this method. The return value of can_hold will be a boolean, and the implementation will check to see if self's length and width are both greater than the length and width of the other Rectagle, respectively. Let's write that code!

impl Rectangle {
    fn area(&self) -> u32 {
        self.length * self.width
    }

    fn can_hold(&self, other: &Rectangle) -> bool {
        self.length > other.length && self.width > other.width
    }
}

If we run this with the main from earlier, we will get our desired output! Methods can take multiple arguments that we add to the signature after the self parameter, and those arguments work just like arguments in functions do.

Associated Functions

One more useful feature of impl blocks: we're allowed to define functions within impl blocks that don't take self as a parameter. These are called associated functions, since they're associated with the struct. They're still functions though, not methods, since they don't have an instance of the struct to work with. You've already used an associated function: String::from.

Associated functions are often used for constructors that will return a new instance of the struct. For example, we could provide an associated function that would take one dimension argument and use that as both length and width, thus making it easier to create a square Rectangle rather than having to specify the same value twice:

impl Rectangle {
    fn square(size: u32) -> Rectangle {
        Rectangle { length: size, width: size }
    }
}

To call this associated function, we use the :: syntax with the struct name: let sq = Rectange::square(3);, for example. It's kind of this function is namespaced by the struct: the :: syntax is used for both associated functions and namespaces created by modules, which we'll learn about in Chapter 7.

Summary

Structs let us create custom types that are meaningful for our domain. By using structs, we can keep associated pieces of data connected to each other and name each piece to make our code clear. Methods let us specify the behavior that instances of our structs have, and associated functions let us namespace functionality that is particular to our struct without having an instance available.

Structs aren't the only way we can create custom types, though; let's turn to the enum feature of Rust and add another tool to our toolbox.