first draft of 'ownership'

This commit is contained in:
Steve Klabnik 2016-01-12 17:50:31 -05:00
parent 40c93504dd
commit 3164bf5e38

View File

@ -1,238 +1,310 @@
# Ownership
This guide is one of three presenting Rusts ownership system. This is one of
Rusts most unique and compelling features, with which Rust developers should
become quite acquainted. Ownership is how Rust achieves its largest goal,
memory safety. There are a few distinct concepts, each with its own
chapter:
* ownership, which youre reading now
* [borrowing][borrowing], and their associated feature references
* [lifetimes][lifetimes], an advanced concept of borrowing
These three chapters are related, and in order. Youll need all three to fully
understand the ownership system.
[borrowing]: references-and-borrowing.html
[lifetimes]: lifetimes.html
# Meta
Before we get to the details, two important notes about the ownership system.
Rusts central feature is called ownership. It is a feature that is
straightforward to explain, but has deep implications for the rest of the
language.
Rust has a focus on safety and speed. It accomplishes these goals through many
zero-cost abstractions, which means that in Rust, abstractions cost as little
as possible in order to make them work. The ownership system is a prime example
of a zero-cost abstraction. All of the analysis well talk about in this guide
is _done at compile time_. You do not pay any run-time cost for any of these
is done at compile time. You do not pay any run-time cost for any of these
features.
However, this system does have a certain cost: learning curve. Many new users
to Rust experience something we like to call fighting with the borrow
However, this system does have a certain cost: learning curve. Many new
Rustaceans experience something we like to call fighting with the borrow
checker, where the Rust compiler refuses to compile a program that the author
thinks is valid. This often happens because the programmers mental model of
how ownership should work doesnt match the actual rules that Rust implements.
You probably will experience similar things at first. There is good news,
however: more experienced Rust developers report that once they work with the
rules of the ownership system for a period of time, they fight the borrow
checker less and less.
checker less and less. Keep at it!
With that in mind, lets learn about ownership.
This chapter will give you a foundation for understanding the rest of the
language. To do so, were going to learn through examples, focused around a
very common data structure: strings.
# Ownership
## Variable binding scope
[Variable bindings][bindings] have a property in Rust: they have ownership
of what theyre bound to. This means that when a binding goes out of scope,
Rust will free the bound resources. For example:
Lets take a step back and look at the very basics again. Now that were past
basic syntax, we wont include all of the `fn main() {` stuff in examples, so
if youre following along, you will have to add that yourself. It will be a bit
more concise, letting us focus on the actual example.
Anyway, here it is:
```rust
fn foo() {
let v = vec![1, 2, 3];
}
let s = "hello";
```
When `v` comes into scope, a new [`Vec<T>`][vect] is created. In this case, the
vector also allocates space on [the heap][heap], for the three elements. When
`v` goes out of scope at the end of `foo()`, Rust will clean up everything
related to the vector, even the heap-allocated memory. This happens
deterministically, at the end of the scope.
[vect]: ../std/vec/struct.Vec.html
[heap]: the-stack-and-the-heap.html
[bindings]: variable-bindings.html
# Move semantics
Theres some more subtlety here, though: Rust ensures that there is _exactly
one_ binding to any given resource. For example, if we have a vector, we can
assign it to another binding:
This variable binding refers to a string. Its valid from the point at which
its declared, until the end of the current _scope_. That is:
```rust
let v = vec![1, 2, 3];
{ // s is not valid here, its not yet in scope
let s = "hello"; // s is valid from this point forward
let v2 = v;
// do stuff with s
} // this scope is now over, and s is no longer valid
```
But, if we try to use `v` afterwards, we get an error:
In other words, there are two important points here: when `s` comes into scope,
it is valid, and remains so until it goes out of scope, the second point.
At this point, things are similar to other programming languages. Lets build
on top of this understanding by introducing a new type: `String`.
## Strings
String literals are convenient, but they arent the only way that you use strings.
For one thing, theyre immutable. This will not work:
```rust,ignore
let v = vec![1, 2, 3];
let mut s = "hello";
let v2 = v;
println!("v[0] is: {}", v[0]);
s = s + ", world!";
```
It looks like this:
It gives us an error:
```text
error: use of moved value: `v`
println!("v[0] is: {}", v[0]);
^
4:10 error: binary operation `+` cannot be applied to type `&str` [E0369]
s = s + ", world!";
^
```
A similar thing happens if we define a function which takes ownership, and
try to use something after weve passed it as an argument:
No dice. Also, not every string is literal: what about taking user input and
storing it in a string?
For this, Rust has a second string type, `String`. You can create a `String` from
a string literal using the `from` function:
```rust
let s = String::from("hello");
```
The double colon (`::`) is a kind of scope operator, allowing us to namespace this
particular `from()` function under the `String` type itself, rather than using
some sort of name like `string_from()`.
This kind of string can be mutated:
```rust
let mut s = String::from("hello");
s = s + ", world!";
```
## Memory and allocation
So, whats the difference here? Why can `String` be mutated, but literals
cannot? The difference comes down to how these two types deal with memory.
In the case of a string literal, because we know the contents of the string at
compile time, we can put the text of the string directly into the final
executable. This means that string literals are quite fast and efficient. But
these properties only come from its immutability; we cant put an
arbitrary-sized blob of memory into the binary for each string!
With `String`, to support a mutable, growable string, we need to allocate an
un-known chunk of memory to hold the contents. This means two things:
1. The memory must be requested from the operating system at runtime.
2. We need a way of giving this memory back to the operating system when were
done with our `String`.
That first part is done by us: when we call `String::from()`, its
implementation requests the memory it needs. This is pretty much universal in
programming languages.
The second case, however, is different. In languages with a garbage collector,
the GC handles that second case, and we, as the programmer, dont need to think
about it. In languages without a garbage collector, they often force you to
call a second function to give the memory back. Part of the difficulty of
languages that work like this is knowing exactly when to do so. If we forget,
we will leak memory. If we do it too early, we will have an invalid variable.
If we do it twice, thats a bug too. We need to pair exactly one allocate
with exactly one free.
Rust takes a different path. Remember our example? Heres a version with
`String`:
```rust
{
let s = String::from("hello"); // s is valid from this point forward
// do stuff with s
} // this scope is now over, and s is no longer valid
```
We have a natural point at which we can return the memory `String` needs back
to the operating system: when it goes out of scope! When a variable goes out of
scope, a special function is called. This function is called `drop()`, and it
is where the author of `String` can put the code to return the memory.
> Aside: This pattern is sometimes called “Resource Aquisition Is
> Initialization” in C++, or “RAII” for short. While they are very similar,
> Rusts take on this concept has a number of differences, and so we dont tend
> to use the same term. If youre familliar with this idea, keep in mind that it
> is _roughly_ similar in Rust, but not identical.
This pattern has a profound impact on the way that Rust code is written. It may
seem obvious right now, but things can get tricky in more advanced situations!
Lets go over the first one of those right now.
## Move
What would you expect this code to do?
```rust
let x = 5;
let y = x;
```
You might say “Make a copy of `5`.” Thatd be correct! We now have two
bindings, `x` and `y`, and both equal `5`.
Now lets look at `String`. What would you expect this code to do?
```rust
let s1 = String::from("hello");
let s2 = s1;
```
You might say “copy the `String`!” This is both correct and incorrect at the
same time. It does a _shallow_ copy of the `String`. Whats that mean? Well,
lets take a look at what `String` looks like under the covers:
CHART GOES HERE: (data, len, capacity) with a pointer to the data
A `String` is made up of three parts: a pointer to the memory that holds the
contents of the string, a length, and a capacity. The length is how long the
`String`s contents currently are. The capacity is the total amount of memory
the `String` has gotten from the operating system. The difference between
length and capacity matters, but not in this context, so dont worry about it
too much if it doesnt make sense, and just ignore the capacity.
> Weve talked about two kinds of composite types: arrays and tuples. `String`
> is a third type: a `struct`, which we will cover the details of in the next
> chapter of the book. For now, thinking about `String` as a tuple is close
> enough.
When we assign `s1` to `s2`, the `String` itself is copied. In other words:
CHART GOES HERE: two triples, data points to the same place
Theres a problem here! Both `data` pointers are pointing to the same place.
Why is this a problem? Well, when `s2` goes out of scope, it will free the
memory that `data` points to. And then `s1` goes out of scope, and it will
_also_ try to free the memory that `data` points to! Thats bad.
So whats the solution? Here, we stand at a crossroads. There are a few options
here. One would be to declare that assignment will also copy out any data. This
works, but is inefficient: what if our `String` contained a novel? Also, it
only works for memory. What if, instead of a `String`, we had a
`TcpConnection`? Opening and closing a network connection is very similar to
allocating and freeing memory. The solution that we could use there is to
create a callback, similar to `drop()`, that runs when we assign something.
That would work, but now, an `=` can run arbitrary code. Thats also not good,
and it doesnt solve our efficiency concerns either.
Lets take a step back: the root of the problem is that `s1` and `s2` both
think that they have control of the memory, and therefore, need to free it.
Instead of trying to copy the memory, we could say that `s1` is no longer
valid, and therefore, doesnt need to free anything. This is in fact the
choice that Rust makes. Check it out what happens when you try to use `s1`
after `s2` is created:
```rust,ignore
fn take(v: Vec<i32>) {
// what happens here isnt important.
}
let s1 = String::from("hello");
let s2 = s1;
let v = vec![1, 2, 3];
take(v);
println!("v[0] is: {}", v[0]);
println!("{}", s1);
```
Same error: use of moved value. When we transfer ownership to something else,
we say that weve moved the thing we refer to. You dont need some sort of
special annotation here, its the default thing that Rust does.
## The details
The reason that we cannot use a binding after weve moved it is subtle, but
important. When we write code like this:
```rust
let v = vec![1, 2, 3];
let v2 = v;
```
The first line allocates memory for the vector object, `v`, and for the data it
contains. The vector object is stored on the [stack][sh] and contains a pointer
to the content (`[1, 2, 3]`) stored on the [heap][sh]. When we move `v` to `v2`,
it creates a copy of that pointer, for `v2`. Which means that there would be two
pointers to the content of the vector on the heap. It would violate Rusts
safety guarantees by introducing a data race. Therefore, Rust forbids using `v`
after weve done the move.
[sh]: the-stack-and-the-heap.html
Its also important to note that optimizations may remove the actual copy of
the bytes on the stack, depending on circumstances. So it may not be as
inefficient as it initially seems.
## `Copy` types
Weve established that when ownership is transferred to another binding, you
cannot use the original binding. However, theres a [trait][traits] that changes this
behavior, and its called `Copy`. We havent discussed traits yet, but for now,
you can think of them as an annotation to a particular type that adds extra
behavior. For example:
```rust
let v = 1;
let v2 = v;
println!("v is: {}", v);
```
In this case, `v` is an `i32`, which implements the `Copy` trait. This means
that, just like a move, when we assign `v` to `v2`, a copy of the data is made.
But, unlike a move, we can still use `v` afterward. This is because an `i32`
has no pointers to data somewhere else, copying it is a full copy.
All primitive types implement the `Copy` trait and their ownership is
therefore not moved like one would assume, following the ´ownership rules´.
To give an example, the two following snippets of code only compile because the
`i32` and `bool` types implement the `Copy` trait.
```rust
fn main() {
let a = 5;
let _y = double(a);
println!("{}", a);
}
fn double(x: i32) -> i32 {
x * 2
}
```
```rust
fn main() {
let a = true;
let _y = change_truth(a);
println!("{}", a);
}
fn change_truth(x: bool) -> bool {
!x
}
```
If we had used types that do not implement the `Copy` trait,
we would have gotten a compile error because we tried to use a moved value.
Youll get an error like this:
```text
error: use of moved value: `a`
println!("{}", a);
^
5:22 error: use of moved value: `s1` [E0382]
println!("{}", s1);
^~
5:24 note: in this expansion of println! (defined in <std macros>)
3:11 note: `s1` moved here because it has type `collections::string::String`, which is moved by default
let s2 = s1;
^~
```
We will discuss how to make your own types `Copy` in the [traits][traits]
section.
We say that `s1` was _moved_ into `s2`. When a value moves, its data is copied,
but the original variable binding is no longer usable. That solves our problem:
[traits]: traits.html
CHART GOES HERE: two triples, data points to the same place
# More than ownership
With only `s2` valid, when it goes out of scope, it will free the memory, and were done!
Of course, if we had to hand ownership back with every function we wrote:
## Ownership Rules
This leads us to the Ownership Rules:
> 1. Each value in Rust has a variable binding thats called its owner.
> 2. There can only be one owner at a time.
> 3. When the owner goes out of scope, the value will be `drop()`ped.
Furthermore, theres a design choice thats implied by this: Rust will never
automatically create deep copies of your data. Any automatic copying must be
inexpensive.
## Clone
But what if we _do_ want to copy the `String`s data? Theres a common method
for that: `clone()`. Heres an example of `clone()` in action:
```rust
fn foo(v: Vec<i32>) -> Vec<i32> {
// do stuff with v
let s1 = String::from("hello");
let s2 = s1.clone();
// hand back ownership
v
}
println!("{}", s1);
```
This would get very tedious. It gets worse the more things we want to take ownership of:
This will work just fine:
CHART GOES HERE: two triples, data points to two places
When you see a call to `clone()`, you know that some arbitrary code is being
executed, which may be expensive. Its a visual indicator that something
different is going on here.
## Copy
Theres one last wrinkle that we havent talked about yet. This code works:
```rust
fn foo(v1: Vec<i32>, v2: Vec<i32>) -> (Vec<i32>, Vec<i32>, i32) {
// do stuff with v1 and v2
let x = 5;
let y = x;
// hand back ownership, and the result of our function
(v1, v2, 42)
}
let v1 = vec![1, 2, 3];
let v2 = vec![1, 2, 3];
let (v1, v2, answer) = foo(v1, v2);
println!("{}", x);
```
Ugh! The return type, return line, and calling the function gets way more
complicated.
But why? We dont have a call to `clone()`. Why didnt `x` get moved into `y`?
Luckily, Rust offers a feature, borrowing, which helps us solve this problem.
Its the topic of the next section!
For types that do not have any kind of complex storage requirements, like
integers, typing `clone()` is busy work. Theres no reason we would ever want
to prevent `x` from being valid here, as theres no situation in which its
incorrect. In other words, a call to `clone()` would do nothing special over
copying the data directly.
Rust has a special annotation that you can place on types, called `Copy`. If
a type is `Copy`, an older binding is still usable after assignment. Integers
are an example of such a type; most of the primitive types are `Copy`.
While we havent talked about how to mark a type as `Copy` yet, you might ask
yourself “what happens if we made `String` `Copy`?” The answer is, you cannot.
Remember `drop()`? Rust will not let you mark any type which has `drop()`
implemented as `Copy`. If you need to do something special when the value goes
out of scope, being `Copy` will be an error.
So what types are `Copy`? You can check the documentation for the given type
to be sure, but as a rule of thumb, any simple value that only represents some
memory can be `Copy`. Anything complicated will be the default, not-`Copy`.
And you cant get it wrong: the compiler will throw an error if you try to
use a type that moves incorrectly, as we saw above.