Programming Included

Rust vs Modern C++ Part 1

The Basics: Datastructures, Lambdas, and Heap

Charles Chen | 2023-01-18 00:16 PST

Table of Contents

Yet Another Rust Introduction

These past few months, we have seen a rise in the usage of Rust throughout many different organizations. Rust was written in 2010, designed by Graydon Hoare at Mozilla research. Much have been said about the benefits of Rust already but to name a few. Rust can:

  • Guarantee memory safety outside of unsafe blocks.
  • High level, zero-cost abstractions enable fast performance.
  • Rust is almost as fast C and can even beat out C++ (source).

Several well-known, battle-tested teams are switching over to Rust:

The list grows. The ecosystem also is vast, fast, and exciting. Below are some few that I've used in the past:

  • Pola.rs is faster than Pandas for datascience.
  • PyO3 enables interop with Python.
  • cxx and bindgen both enable C++ interop.
  • bevy a datadriven Game engine written in Rust.

The list continues to grow and many communities continue to onboard onto Rust. But is all these news just another bandwagon or is there something more?

Comparing Rust to Modern C++

My goal in today's post is to compare Rust in a more pragmatic setting. By Modern C++ I mean:

  • C++11
  • C++14

More specifically, coding techniques introduced by Scott Meyers in his series of books. Most notably Effective Modern C++. I myself have had the pleasure of taking the official C++ training course under Jon Kalb (go east const!) (Jon, if you are reading this, I have much appreciation for C++ and your course is a great recommendation.)

My thesis:

Modern C++ is arguably filled with gotcha's, edgecases, and solutions to problems stemming from C++ limitations.

As a result, Rust benefits from faster design cycles, more efficient code, and is arguablely an easier language to maintain even with its current rough edges in certain edge cases.

Let's begin, I'll be highlighting some common approaches in Modern C++ and arguably why Rust could be better (or worse.)

Overview of Articles Forthcoming

I would like to start out bottom-up. Going from comparing syntax and day-to-day sanity impact, up to higher level concepts like owernship / borrow checkers, and more. Then finally ending in specific use-cases that may not apply to everyone. This will be a long ride so buckle up!

The road map:

  • The Basics: Datastructures, Lambdas, and Heap
  • Traits and Ownership
  • Macros: Metaprogramming Made Simple
  • Cargo: The Rust Packages Ecosystem
  • C Interop and Unsafe Rust: The Dark Arts

Intializing your Datastructures

List Intializers

Auto-intializers are an interesting construct. Imagine you are trying to hard-code a vector array into C++. How would the runtime be able to generate an array of values hard-coded inside the code? Vector is considered a class that requires data from the heap and requires memory resolution during runtime.

Introducing initializer lists, a special proxy types that resolve during runtime.

1
2
3
4
5
6
7
8
9
10
11
12
13
#include<unordered_map>
#include<vector>
#include<algorithms>
#include<iostream>
using namespace std;

// Auto-intializer lists.
vector<int32_t> vec = {1, 2, 3};

map<string, string> myMap = {
    {"an example", "pair"},
    {"yet another", "hi"}
};

Auto-intializers look great until we start to get into edge-cases. C++11 treats initializer lists as proxy types which means each std library can override its behavior. When constructing classes and operators, you can override them leading to complicated results and weird casting behaviors:

1
2
3
4
5
6
7
8
// A new class that will take an intializer short hand for convenience.
// Which constructor is called for `Example({1, 2, 3})`?
class Example {
public:
    vector<int32_t> v;
    Example(initializer_list<int32_t> l) : v(l) { cout << "called?" << endl; }
    Example(vector<int32_t> vec) : v(vec) {}
}

Realistically, designers should try to avoid conflict in classes and/or use explicit keywords. However the point of this exercise shows how by making intializer lists a proxy type accessible during runtime, we run into type inference ambiguity in which the compiler won't fail during auto-casting.

In this case, C++ falls-back to the more narrower type so intializer_list constructor is called. There are more gotcha's with auto-initializers and list-intializer constructors:

1
2
3
4
5
6
7
8
9
10
11
12
13
// C++ can sometimes make list-intializer syntax optional:
auto ex = Example{1, 2, 3} // Equal to Example(1, 2, 3)
// This is also the same
auto ex = Example{{1, 2, 3}}

// However we can run into ambiguity.
class ExampleB {
public:
    vector<int32_t> v;
    ExampleB(initializer_list<int32_t> l) : v(l) { }
    ExampleB(float a, float b) {}
};
auto ex = ExampleB{1.1, 2.2} // Does not equal Example(1.1, 2.2)

Rust Macros

Rust says no to run-time proxy types. Instead, Rust offers native and macro based solution which removes the edge-cases and ambiguity:

1
2
3
4
5
6
7
8
9
10
11
12
// An std implemented macro prevents all the complexity of intializer list proxy type.
use std::vec::Vec;
let c: Vec<i32> = vec![1, 2, 3];
// Macro is auto-expanded into a block with inserts.
// { c.insert(1); c.insert(2); c.insert(3); }

use std::collections::HashMap;
// Rust does not natively have a macro with HashMap.
let m = HashMap::from([
    ("hello", "world"),
    ("good", "day")
]);

Rust's Macro system has access to the AST during compilation and is super flexible. We will get more into this in Part 2. With an extra crate, you can have even higher-level syntax:

1
2
3
4
5
6
use std::collections::HashMap;
use common_macros::hash_map;
let m = hash_map!{
    "hello" => "world",
    "good" => "day"
};

With Rust's powerful packaging and build system, these extra crates are readily available with a single install command. More on this in part 3. Rust by-passes the issues with intializer lists by removing proxy types and instead opting for a powerful macros system that can do the heavy lifting.

Iterating and Processing

C++ and Rust both use iterators in order to iterate through data as iterators have been around for some time. Both have similar concepts however Rust makes certain cases easier. Let's take a look. Here's how C++ does it.

1
2
3
4
5
6
7
8
9
10
11
12
using namespace std;
// With modern C++ we can use short-hand loops
vector<int32_t> vec = {1, 2, 3};
// Constant iteration
for (auto const& v : vec) {
    cout << v << endl;
}

// Non-constant
for (auto& v: vec) {
    cout << v << endl;
}

Likewise Rust can do the same, note that Rust assumes const unless otherwise but Rust also allows for range based iteration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
use std::vec::Vec;
let mut vec = vec![1, 2, 3];

for v in vec {
    println!("{}", v);
}

for v in &mut vec {
    println!("{}", v);
}

let n = 3;
for i in 0..n {
    println!("C++ requires libraries like Boost.")
}

Now let's say we want to apply common functional techniques to our data. C++ requires chaining individual helpers that require iterator input:

1
2
3
4
5
6
7
8
// Suppose we only want the sum of numbers greater than some tolerance T.
// We implement a immutable variant where the original vector is not modified.
auto v = {1, 2, 3};
auto T = 5;
vector<int> buffer;

copy_if(vec.begin(), vec.end(), buffer, [](int x) { return x >= T; }));
auto result = accumulate(buffer.begin(), buffer.end(), 0);

You can save a character using std::end() and std::begin() if you really wanted.

Let's take a look at how Rust borrows from its functional counterparts:

1
2
3
4
5
let v = vec![1, 3, 5, 7];
let T = 5;
let result: i32 = v.iter()
                   .filter(|&&x| x >= T) // Filter generates a slice of vector and no copy is made.
                   .sum();

There is significantly less text to process and the chaining reads like what it should be doing.

Now, one may cry out, "but we don't neccesarily want to chain multiple procedures!" or perhaps, "we can create a helper function anyways!" For the later, I agree, in C++, we write helper functions. But going back to my thesis, it is these little quirks that add up to large maintenance costs that focus not on the problem we are trying to solve but on the limitations of the language.

As for the former, high level functional programs and even C++ require many of these common patterns and these iterator type algorithms are also highly parallelizable in many cases.

Zero Cost Abstractions

I would also like to note how there is no overhead here with Rust's implementation. Rust is very much a strict language. Unless iter_mut() is used no data is mutable, unless filter().clone() is used no copy is made. In this case, filter() will return a slice of the existing array and sum over it.

With Rust's helpful compiler, it is always working for performance. One can rest assured that the code generated is the fasted you can implement. With C++, you have to try to get rid of buffer variable to reduce memory usage, figure out l-values and r-values so-forth.

Lambdas

C++ lambda requires explicit capturing:

1
2
3
4
5
6
7
8
9
10
11
// Capture everything by reference.
auto y = 1;
auto func = [&](auto x){ return x + 1; };
// Everything by value.
func = [=](auto x){ return x + 1; };

// Capture by value.
func = [y](auto x){ return x + y + 1;};

// Capture by reference
func = [&y](auto x){ return x + y + 1;};

Rust, on the other hand, due to the borrow checker rules, we are pretty much going to copy by reference to save memory unless we cannot, and a clone/copy of the object is made.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
let y = 1;
// This is always an immutable borrow as a reference to prevent copy
let f = |x|{ x + y + 1};

// Only if both the variable and lambda are mutable can we borrow
// as reference and modify the reference.
let mut t = 1;
let mut f = |x| { t = x + t };

// We can even go one layer deep and the compiler will enforce `mut` in order to compile:
let mut m = || { f(2) };

// We can force a move to make an explicit copy of each variable.
// This is almost never used trivially like this however. Unless we return lambdas from functions. See more below.
let h = move |x| { t = t + x }; // original t is not modified

Though the explicit move is an manual option, I rarely seen it used trivially. Instead, it is used in a very specific use case.

There are one more rules that I am not covering here but you can find out more here. One more rule is a unique edge case.

Might I also add just how much cleaner the lambda notation is for Rust? C++ requires a semicolon, bracket, and then a semicolon again to close a lambda. A return is also required. All for a single addition statement!

Unique Immutable Borrows in Captures

It seems in C++ and Rust, when dealing with const-ness in C++ or mut ownership in Rust, there are some unique edgecases that each have to address. Though the mechanisms driving the below scenarios are different, I group them together because they:

  • Show how const/mut properties are propogated (or lack thereof.)
  • How each compiler handles const / mut. (I see them as opposites of the same coin.)

I see const and mut as opposites of the same coin. const exists because everything is mut in C++ meanwhile mut exists because everything else is const in Rust.

In C++ you can do a normal cast to modify a const-value without const_cast:

1
2
3
4
5
6
7
int const i = 1;
// Almost every compiler allows this despite the dangers.
// This a natural consequence of allowing any type to be casted in C++.
// However, no explicit compiler check is done here to prevent this operation
// which in most environments, is an unwanted memory access.
int *ptr = (int*)&i;
*ptr += 2;

C++ allows you to do the above but at your discretion because a cast was made.

Please note that I understand that unsafe Rust does allow you to cast without constraint. However C++ compiler has this const edgecase that it does not attempt to warn or prevent. Rust enforces unsafe here, meanwhile C++, even though it has constructs like const_cast still allows you to cast away const without error.

In Rust, the following can occur with mut (as shown in the Rust book):

1
2
3
4
5
6
7
8
9
let mut b = false;
let x = &mut b;
{
    let mut c = || { *x = true; };
    // The following line is an error:
    // let y = &x;
    c();
}
let z = &x;

Since Rust requires a variable to only have one unique mutable reference, the capture cannot naturally be resolved as &&mut because x is not unique.

Thankfully Rust has got us covered. The compiler will catch this and make a special type of borrow called: Unique Immutable Borrows which only occurs in this use-case and is abstracted away by the compiler. It acts like an immutable variable but can be dereferenced and the borrow checker will enforce it like a mutable's uniqueness.

Rust pretty much prevents us from making mistakes by enforcing compiler time checks, even so far as to create a unique edge condition! Futhermore, the Rust compiler will attach necessary error messages for you if new users are unsure about these edge-cases:

error[E0501]: cannot borrow `x` as immutable because previous closure requires unique access
  --> main.rs:16:17
   |
14 |         let mut c = || { *x = true; };
   |                     --   -- first borrow occurs due to use of `x` in closure
   |                     |
   |                     closure construction occurs here
15 |         // The following line is an error:
16 |         let y = &x;
   |                 ^^ second borrow occurs here
17 |         c();
   |         - first borrow later used here
error: aborting due to previous error; 2 warnings emitted
For more information about this error, try `rustc --explain E0501`.

C++, true to it's philosophy, allows you to go with the cast operation and modify a const rather than catching them in the compiler.

Returning Lambdas on the Stack

This is perhaps one of category of basic Rust that doesn't seem as intuitive.

C++ Functional Types

With C++, a unique type std::function can be used to define lambdas. Lambdas in C++ are treated as Structs.

1
2
3
4
5
std::function<int(int)> ret_func(int a) {
  return [a](int b) { return b / a; };
}

cout << ret_func(1)(2) << endl;

Functions that share the same type signatures share the same type and works as expected:

1
2
3
4
5
6
7
std::function<int(int)> ret_func(int a) {
  if (a == 0) {
    return [a](int b) { return 0; };
  } else {
    return [a](int b) { return b / a; };
  }
}

All works as expected.

Rust impl Traits

Without getting into too much on traits, we will see how they are used for returning lambdas.

Unlike C++, each lambda is a unique type and instead share the same trait. This means that you have to use traits in order to reference lambdas and effectively bind each function to return one closure. traits can be a thought as a more powerful interface where no-concrete type is necessary to define constraints for an implementation. We will discuss more in detail in future parts.

1
2
3
4
5
fn ret_func(a: i32) -> impl Fn(i32) -> i32 {
    move |b| b / a
}

ret_func(2)(3);

The major limitation of lambdas is that they are identical by trait and not types. This means that Rust does not allow branching returns of lambdas on the stack as of Rust v1.26 and functions can only have one return type:

1
2
3
4
5
6
7
8
9
fn ret_func(a: i32) -> impl Fn(i32) -> i32 {
    if a == 0 {
        return move |b| 0
    } else {
        return move |b| b / a
    }
}

ret_func(2)(3);

The error from Rust also mentions this and recommends an alternative approach:

1 | fn ret_func(a: i32) -> impl Fn(i32) -> i32 {
  |                        ------------------- expected `[closure@src/main.rs:3:16: 3:24]` because of return type
2 |     if a == 0 {
3 |         return move |b| 0
  |                -------- the expected closure
4 |     } else {
5 |         return move |b| b / a
  |                ^^^^^^^^^^^^^^ expected closure, found a different closure
  |
  = note: expected closure `[closure@src/main.rs:3:16: 3:24]`
             found closure `[closure@src/main.rs:5:16: 5:24]`
  = note: no two closures, even if identical, have the same type
  = help: consider boxing your closure and/or using it as a trait object

Instead, we are forced to define the function in the heap:

1
2
3
4
5
6
7
fn ret_fun(a: i32) -> Box<dyn Fn(i32) -> i32> {
    if a == 0 {
        return Box::new(move |b| 0)
    } else {
        return Box::new(move |b| b / a)
    }
}

Of course, with Rust's borrow checker rules, heap allocations aren't that dangerous in this use-case. However, we do lose some performance compared to C++. This is a great segue into our final section for this part, heaps via smart pointers!

The implications of traits and their implementation details are ripe with discussion and potential contribution. Here is one on impl Trait: Meta tracking issue for impl Trait.

Addressing Memory on the Heap

C++ and Rust both have similar ways of being "smart" about tracking memory in the heap:

Smart Pointers

In Modern C++, you almost always never use malloc and free. Instead, a smart pointer is used instead:

1
2
3
4
5
auto iptr = shared_ptr<int>(new int{});

// It is recommended to not use `new` by some Modern C++ courses to
// avoid using this "dangerous" keyword all together.
auto iptr = std::make_shared<int>();

shared_ptr is most commonly used as usually multiple sources will want to read from the heap. Only when all references are gone will the shared_ptr be released. unique_ptr is used for single ownership.

Arc and Boxes

Rust, on the other hand, is slightly different and offers many more solutions we will not get into today. Two examples:

1
2
3
4
5
6
// `Box` is a `unique_ptr` resolved during build time. No actual reference counting during runtime exists.
let b_int: Box<int> = Box::new(3);

// `Arc` can be thought of as an `unique_ptr` resolved during runtime.
// Interestingly, most teach about Rc first, which is the non-thread safe counter part to Arc.
let a_int: Box<int> = Arc::new(3);

Here is a nice thread discussion. But basically:

  • std::unique_ptr<T> is like Option<Box<T>>
  • std::shared_ptr<const T> is like Option<Arc<T>>

If we want to get technical:

  • Arc is the threadsafe equivalent of Rc

But there are some questions as to if shared_ptr are truly atomic as in the thread above (blog here). Thus it may actually be more like shared_ptr is to Rc as atomic_shared_ptr is to Arc.

As a consequence the Rust borrow checker, Box is able to act like an unique_ptr so long as we are provablely able to do so. Arc and Rc is necessary when we can no longer prove, during build time, ownership constraints. They are also much more dangerous as they can cause runtime errors.

The Rust handbook does a much better job at describing this than me so I will leave a link: Rc and Box

Smart pointers in Rust arguably is both powerful (providing true multithreading support) but also complicated (weak links, cells, inner mutability, etc.) The handbook covers most edge-cases. We will cover more in the borrow checker sections.

Conclusion

We've seen several different sections on a several Modern C++ constructs and their Rust counter parts. Below are my summaries of each section and which is easier to code in:

  • Datastructures: Rust
    • Much flexibility that Rust provides with macros and arguably much cleaner.
  • Iteration: Rust
    • Once more, Rust wins in convenience, readability, and still just as fast.
  • Lambdas: Rust
    • Though a bit rough around the patches of lambda return types, the rest is cleaner and tighter.
    • Boxes solve most edge-cases surrounding lambda returns. Any more complex and you probably want a type.
  • Smart Pointers: C++
    • I have to give C++ this point. C++ is so much less complicated. Building a mutable tree in Rust is difficult.

There you have it! Rust wins 3/4 four parts in this first part of the series, in my opion. Smart pointers are definitely a bit rough around the edges with the variety of ways to manage memory and we will explore more in the borrw checker section of our article.

Until next time, peace!

comments powered by Disqus