
The main ideas are the progression of Rust's smart pointers (Box, Rc, Arc) and their design philosophies. Box<T> is native, Rc<T> handles ownership, and Arc<T> uses atomic counters. The text emphasizes their necessity, functionality, and specific design choices.
Rust's smart pointers follow a clear progression: Box<T> -> Rc<T> -> Arc<T>. We will explore the design philosophy behind each smart pointer by examining their necessity, functionality, and the distinct reasons for their specific designs.
As the name implies, a smart pointer is essentially a pointer, but with added intelligence. It retains the capabilities of a standard pointer while possessing additional metadata and functionality. I will delve into the specific features of each smart pointer in the sections below.
Most smart pointers are implemented as structs. Since they function as pointers, they implement the Deref trait, allowing you to access the data using the dereference operator (*p). Furthermore, because they are responsible for cleaning up resources when they go out of scope, they also implement the Drop trait.
The most fundamental smart pointers include Box<T>, Rc<T>, and Arc<T>. Box<T> offers the simplest feature set, with functionality increasing as we move to Rc<T> and Arc<T>. Naturally, however, there is a trade-off between functionality and performance.
Viewing smart pointers merely as 'pointers' does not fully capture their essence. It is actually more accurate to view them as structs that possess ownership. Think of them as containers that hold a memory address while also retaining ownership of the data stored there.
Box<T> is the most fundamental smart pointer; it stores data on the heap while keeping only the memory address on the stack. Unlike Rc<T> or Arc<T>, Box<T> is considered a native type that receives special treatment from the Rust compiler.
By Rust's rules, the size of every variable must be known at compile time. However, this constraint can lead to problems in certain scenarios.
enum List {
Cons(i32, List),
Nil,
}
use crate::List::{Cons, Nil};
fn main() {
let list = Cons(1, Cons(2, Cons(3, Nil))); // ❌ Compilation Error: Unknown Size
}
The size cannot be determined because the type definition is recursive (containing itself). Box<T> was introduced to solve this exact problem.
enum List {
Cons(i32, Box<List>),
Nil,
}
use crate::List::{Cons, Nil};
fn main() {
let list = Cons(1, Box::new(Cons(2, Box::new(Cons(3, Box::new(Nil))))));
// ✅ Fixed Sizing Enabled by Box<T>
}
Since the compiler only needs to know the fixed size of the pointer itself, it can now handle types whose size is unknown at compile time.
1. Moving Ownership of Large Data without Copying
fn main() {
// Large data stored on the heap (only the pointer exists on the stack)
let b = Box::new([0u8; 1_000_000]); // 1 million byte array
// Ownership moves (no copy, only the pointer moves)
let b2 = b;
// println!("{:?}", b); // ❌ 'b' is no longer usable
println!("length = {}", b2.len());
}
If large data were placed directly on the stack, a copy would occur every time ownership is transferred to another variable.
2. Type Abstraction (Trait Objects) Box<dyn Trait> signifies that it will accept any struct that implements the specified trait.
// Trait Definition
trait Animal {
fn speak(&self);
}
// Trait Implementation
struct Dog;
impl Animal for Dog {
fn speak(&self) {
println!("Woof!");
}
}
struct Cat;
impl Animal for Cat {
fn speak(&self) {
println!("Meow!");
}
}
fn main() {
// Handling different types via the same Trait interface using Box<dyn Trait>
let animals: Vec<Box<dyn Animal>> = vec![
Box::new(Dog),
Box::new(Cat),
];
// Calling the common interface
for a in animals {
a.speak();
}
}
This allows for polymorphism, which is common in other OOP-style languages.
3. Mutability Box<T> allows you to modify the value inside it.
fn main() {
// Store i32 value 10 on the heap inside a Box
let mut b = Box::new(10);
// Modify the value via Deref
*b = 20;
// Since b is mutable, *b uses the DerefMut trait.
// It is interpreted as *(DerefMut::deref_mut(&mut b)) => *(&mut 10).
// From this point on, the compiler handles it like any other standard type.
println!("{}", b); // Output: 20
}
Standard variables allow access via the * operator, which the compiler handles natively. However, smart pointers implement the Deref trait to provide custom dereference behavior. As we will see later, Rc<T> and Arc<T> do not allow mutable references, meaning they cannot be modified (essentially, they do not implement DerefMut).
4. Moving Out of Dereference Box<T> allows you to move the value out via dereferencing.
fn main() {
// Box<T> owns the value, so moving is possible
let b = Box::new(String::from("hello"));
let s: String = *b;
println!("{}", s);
}
This is possible because Box<T> is a native type for which the compiler specifically implements this "DerefMove" behavior (Rust rfc). Rc and Arc do not support this. You might wonder if this works for standard types; for standard types (like i32), Copy occurs instead of Move, so this specific feature isn't necessary.
fn main() {
let x = 5;
let r = &x;
// i32 implements Copy → simple copy occurs
let y = *r;
println!("{}", y);
}
Box<T> is Send if T is Send: This means the Box itself can be moved to another thread. Since Box is simply a pointer to heap data with a single owner, it imposes no additional constraints.
Box<T> is Sync if T is Sync: Sync means that references can be safely shared across multiple threads. If &Box<T> can be referenced by multiple threads, it is safe. Since Box<T> implements Deref<Target = T>, &Box<T> is effectively &T. Therefore, if T is Sync, Box<T> is also Sync.
use std::thread;
fn main() {
let my_box = Box::new(vec![1, 2, 3]);
// Threads within the scope can borrow the parent's stack/heap data.
thread::scope(|s| {
// First thread: read-only reference
s.spawn(|| {
println!("Reading from thread 1: {:?}", my_box);
});
// Second thread: read-only reference
s.spawn(|| {
println!("Reading from thread 2: {:?}", my_box);
});
}); // Waits here until all threads finish.
println!("Task complete. Data is still in the main thread: {:?}", my_box);
}
Rc<T> stands for Reference Counted smart pointer. It is a type that allows multiple owners to share the same data within a single-threaded environment.
Rc manages an internal reference count for a single immutable value, tracking how many owners exist. When the strong count drops to zero, cleanup is triggered, and memory is reclaimed. Rc<T> is implemented via a struct:
struct RcInner<T: ?Sized> {
// Uses standard Cell (non-atomic)
strong: Cell<usize>,
weak: Cell<usize>,
value: T,
}
With Box<T>, two lists cannot share the same tail.
let a = Cons(5, Box::new(Cons(10, Box::new(Nil))));
let b = Cons(3, a); // ❌ Impossible (Ownership moved)
Rc<T> is used to allow both b and c to own the data.
use std::rc::Rc;
enum List {
Cons(i32, Rc<List>),
Nil,
}
fn main() {
let a = Rc::new(Cons(5, Rc::new(Cons(10, Rc::new(Nil)))));
let b = Cons(3, Rc::clone(&a));
let c = Cons(4, Rc::clone(&a));
}
Here, the strong counter for a increases to 2.
Why not just use references (&)? One might ask, "Can't b and c just hold a reference to a?" This design is not that simple. If they held references, the lifetime of a would be tied to where it was declared. Consequently, b and c could not outlive a. This would prevent passing b or c to other functions or returning them.
fn create_dangling() -> &Node {
let tail = Node { value: 3, next: None };
let list = Node {
value: 1,
next: Some(&tail),
};
// Compilation Error!
&list
}
tail is dropped from the stack as soon as the function ends. Returning &list would result in a dangling reference because tail no longer exists. Since Rust enforces single ownership at compile time, Rc was introduced to manage multi-ownership at runtime.
Rc<T> is ultimately used when you need to share the same data among multiple owners in a single-threaded environment, such as in tree structures.
use std::rc::Rc;
struct Node {
value: i32,
children: Vec<Rc<Node>>,
}
fn main() {
let leaf = Rc::new(Node {
value: 42,
children: vec![],
});
let parent1 = Node {
value: 1,
children: vec![Rc::clone(&leaf)],
};
let parent2 = Node {
value: 2,
children: vec![Rc::clone(&leaf)],
};
}
Unlike Box<T>, Rc<T> cannot modify its inner value. Since Box<T> has a single owner, implementing mutation is safe. However, Rc<T> involves multi-ownership, so preventing modification is the correct design. (Although Rc is a struct, its internal fields are private, and its methods take &self, effectively preventing modification).
Additionally, Rc<T> cannot be shared across threads. It must be used only within a single thread. This is evident in the source code: the strong and weak counters are managed by Cell (which is non-atomic). Because operations are not atomic, if Rc<T> were shared across threads, a race condition could occur where simultaneous cloning increases the count by only 1 instead of 2.
struct RcInner<T: ?Sized> {
// Uses standard Cell (non-atomic)
strong: Cell<usize>,
weak: Cell<usize>,
value: T,
}
// Conceptually, counting works like this:
fn main() {
let strong: Cell<usize> = Cell::new(1);
// Read current value
let current = strong.get();
// Replace with new value
strong.set(current + 1);
}
Rc<T> is never Send: As mentioned, the reference count is non-atomic, so it is restricted to single-thread use.
Rc<T> is never Sync: Naturally, since it is not Send, it is also not Sync.
Arc<T> stands for Atomically Reference Counted smart pointer. It is a type that allows multiple owners to share the same data in a multi-threaded environment.
Arc manages an internal atomic reference count for a single immutable value. Like Rc<T>, when the strong count drops to zero, the memory is cleaned up. Arc<T> is also implemented as a struct:
// Lines 369-379
#[repr(C)]
struct ArcInner<T: ?Sized> {
// Uses Atomic types (atomic operations)
strong: Atomic<usize>,
weak: Atomic<usize>,
data: T,
}
Arc<T> essentially exists because we wanted to use Rc<T> in multi-threaded environments. By accepting a slight performance overhead to introduce atomic counting, it enables safe multi-ownership across threads.
Arc<T> is used when multiple owners need to share the same data across multiple threads.
use std::sync::Arc;
use std::thread;
fn main() {
let data = Arc::new(vec![1, 2, 3, 4, 5]);
let mut handles = vec![];
for i in 0..3 {
let shared = Arc::clone(&data);
let handle = thread::spawn(move || {
println!("Thread {i}: {:?}", shared);
});
handles.push(handle);
}
for h in handles {
h.join().unwrap();
}
let foo = Arc::new(vec![1.0, 2.0, 3.0]);
// The two syntaxes below are equivalent.
let a = foo.clone();
let b = Arc::clone(&foo);
// a, b, and foo are all Arcs that point to the same memory location
}
Why is Arc usable in multi-threads? It is because the reference count is managed by Atomic types. These utilize CPU atomic operations. Like Cell, Atomic types appear immutable outwardly but can change internally.
What is an atomic operation? For example, if one thread accesses a memory address, no other thread can access it until that operation is complete. This is blocked at the CPU level. This mechanism allows Arc to be safely used in multi-threaded contexts.
You might assume Arc<T> is automatically Send and Sync, but that is not always the case.
Arc<T> is Send if $T$ is Send + Sync: When you clone an Arc and move it to another thread, multiple threads end up pointing to the same data $T$. Therefore, $T$ must be Sync. This prevents scenarios like the one below:
use std::rc::Rc;
use std::cell::RefCell;
use std::sync::Arc;
use std::thread;
fn main() {
// RefCell is !Sync (not thread-safe)
let arc_a = Arc::new(RefCell::new(10));
let arc_b = arc_a.clone();
thread::spawn(move || {
// Attempting to modify data in Thread B
*arc_b.borrow_mut() += 1;
});
// Attempting to modify data in Thread A
*arc_a.borrow_mut() += 1;
}
If this were allowed, RefCell (which is not Sync) would cause a data race/crash.
Arc<T> is Sync if T is Send + Sync: Consider if Thread A creates an Arc<T> and Thread B drops it. Effectively, ownership has been transferred, meaning Send is involved in the lifecycle.
Conclusion: To use the features of Arc<T>, the inner type T must be Send + Sync. When this condition is met, Arc<T> is naturally Send + Sync.