2021-02-27

c/c++

cpp different levels to understand move and rvalue

Concepts about move rvalue and forward etc, in cpp, it seems important and ambiguous for the most of the people, we try to understand thoese concepts in different levels.

if we can take the addresses

Although there are lots of types such as lvalue rvalue xvalue prvalues etc, refer to this, we start from the general lvalue and rvalue.

Simply speaking, the value that we can take address is the lvalue (it is on the left side of the expression), the value that we can not take it’s address is an rvalue (it is on the right side of the expression). This article provides a good summary, one principle is if we can we take its address, another principle is can the object be move from. This is the detailed documentation.

How do we express that object in the parameter list of the fuction call? For the lvalue, we can use pointer to the original object or the reference like this TypeName &. In particular, if X is any type, then X&& is called an rvalue reference to X. For better distinction, the ordinary reference X& is now also called an lvalue reference.

Before the rvalue reference

lvalue and rvalue exists before the creation of rvalue reference, here are two typical examples:

std::string name0 = "rvalue";
const std::string& name = "rvalue";

for the second line, if we use the std::string& name = "rvalue", there is error:

non-const lvalue reference to type 'std::string' (aka 'basic_string<char, char_traits<char>, allocator<char> >') cannot bind to a value of unrelated type
      'const char [7]'
  std::string& name = "rvalue";

This example shows that the const lvalue reference can either reference to an lvalue or an rvalue without doing the deep copy. That is the whole point of references.

Before the creation of && notation , things can still go as it is. But one human nature is that, we may not always satisfy the current situation, and we want to do sth to optimize the current situation. So we might ask, what is the limitation of the const lvalue reference. It is obvious that with the limitaion of the const, we can not actually modify the contents of current object. In the previous exmaple, if we try to do:

const std::string& name = "rvalue";
name[1] = 'a';

we got this error for compiling:

cannot assign to return value because function 'operator[]' returns a const value

so in this case, we need to copy an object anyway if we want to update the contents in it. This is not flexible in some situations, for example, if the original object is a temprary one, we copy copntent of this opject and then udpate the content in new object, this copy operation is unnecessary. If we just want to keep the inner contents of original objects and replace its old shell/container (the old object), we need an new abstraction that can represent the inner things in the container. Just as the metaphore in this article “when you sell your old property and move to a new house, you do not have to toss all the furniture”. That is the motivation use case of the rvalue reference and the move semantics.

basic move semantic

The std::move function did not move anything, it just cast the varaible into the rvalue. Simply speaking by code, when there is a vector 1 and we want to get vector 2 from vector 1. Instead of using explicit copy, we can get the inner data of the vector 1, and cast it into an rvalue and assigne it to the vector 2. By this way, the vector 2 is responsible for the inner data. It looks that the inner data is moved from the vector 1 into the vector 2 externally, but actually, we did not copy anything, just change the ownership of the inner value. It might be convenint to consider the vector as an container and we move inner object back and forth logically.

This is the sample code to show what we described:


#include <iostream>
#include <vector>

void print(const std::vector<int>& vec) {
  for (auto&& val : vec) {
    std::cout << val << ", ";
  }
  std::cout << std::endl;
}
int main() {
  // initialize vec1 with 1, 2, 3, 4 and vec2 as an empty vector
  std::vector<int> vec1{1, 2, 3, 4};
  std::vector<int> vec2;
  // The following line will print 1, 2, 3, 4
  print(vec1);
  // The following line will print a new line
  // nothing here for the vec2
  print(vec2);
  // The vector vec2 is assigned with move assingment.
  // This will "steal" the value of vec1 without copying it.
  vec2 = std::move(vec1);
  // Here the vec1 object is in an indeterminate state, but still valid.
  // The object vec1 is not destroyed,
  // but there's is no guarantees about what it contains.
  // The following line will print 1, 2, 3, 4
  print(vec2);

  std::cout << "vec1 size " << vec1.size() << std::endl;
  return 0;
}

/*
output
1, 2, 3, 4, 

1, 2, 3, 4, 
vec1 size 0
*/

another small thing is the compiler optimization for the return value. We do not always need to consider the extra copy things and use the && or move extensively, we only use it when it is necessary.

for this sample code:

#include <iostream>

struct Test {
  Test() { std::cout << "construct" << std::endl; }
  double a = 0;
  // destructor
  ~Test() { std::cout << "destroy a is " << this->a << std::endl; }
};

Test getTest() {
  Test t;
  t.a = 123;
  return t;
}

int main() {
  Test t = getTest();
  return 0;
}
/*
output
construct
destroy a is 123
*/

the constructor is called only once, so we could make sure there is the optimization about the construction of the return value. This means that it is ok to return an struct and assing it to the new variable without worrying that this data structure is created twice.

copy constructor vs the move constructor

Another situation that we use the rvalue extensively is the case for the move constructor. This question provides lots of insights. The move constructor is always adopted firstly compared with the copy constructor which uses the const lvalue reference as the parameter. As mentioned in this article, copy constructor is always the second choice if we define a move constructor which uses rvalue as the parameter.

this is a good example to show relationship between move operation, const lvaue reference and the rvalue reference

#include <stdio.h>
#include <stdlib.h>
#include <algorithm>
#include <string>

using namespace std;

class ResourceOwner {
 public:
  ResourceOwner(const char res[]) {
    printf("default constructor %s\n", res);
    theResource = new string(res);
  }

  // copy constructor
  ResourceOwner(const ResourceOwner& other) {
    printf("copy constructor without copy and swap based on %s\n",
           other.theResource->c_str());
    theResource = new string(other.theResource->c_str());
  }
  // assign constructor
  ResourceOwner& operator=(const ResourceOwner& other) {
    ResourceOwner tmp(other);
    // swap the value a and b
    // when the temp object is deleted outomatically
    // its value is changed
    swap(theResource, tmp.theResource);

    printf("assignment constructor with copy and swap based on %s\n",
           other.theResource->c_str());
    printf("assingnment finish\n");
    return *this;
  }

  ~ResourceOwner() {
    printf("destructor %s\n", theResource->c_str());
    if (theResource) {
      delete theResource;
    }
  }

 private:
  string* theResource;
};

class ResourceOwnerWithMove {
 public:
  ResourceOwnerWithMove(const char res[]) {
    printf("default constructor %s\n", res);
    theResource = new string(res);
  }

  // copy constructor
  ResourceOwnerWithMove(const ResourceOwnerWithMove& other) {
    printf("copy constructor without copy and swap based on %s\n",
           other.theResource->c_str());
    theResource = new string(other.theResource->c_str());
  }
  // assign constructor
  // use the copy and swap
  ResourceOwnerWithMove& operator=(const ResourceOwnerWithMove& other) {
    ResourceOwnerWithMove tmp(other);
    swap(theResource, tmp.theResource);
    printf("assignment constructor based on %s\n", other.theResource->c_str());
    return *this;
  }
  // move assignment
  ResourceOwnerWithMove& operator=(ResourceOwnerWithMove&& other) {
    printf("move assignment constructor without copy and swap based on %s\n",
           other.theResource->c_str());
    // first implementation
    //theResource = other.theResource;
    //other.theResource = nullptr;
    // second implementation
    swap(theResource, other.theResource);
    return *this;
  }
  ~ResourceOwnerWithMove() {
    if (theResource) {
      printf("destructor %s\n", theResource->c_str());
      delete theResource;
    } else {
      printf("destructor with empty inner value\n");
    }
  }

 private:
  string* theResource;
};

void testCopy() {
  printf("=====start testCopy()=====\n");
  ResourceOwner res1("res1");
  ResourceOwner res2 = res1;
  printf("destructors for stack vars\n");
}

void testAssign() {
  printf("=====start testAssign()=====\n");
  ResourceOwner res1("res1");
  ResourceOwner res2("res2");
  res2 = res1;
  printf("destructors for stack vars\n");
}

void testRValue1() {
  printf("=====start testRValue1()=====\n");
  ResourceOwner res2("res2");
  res2 = ResourceOwner("res1");
  printf("destructors for stack vars\n");
}

void testRValue2() {
  printf("=====start testRValue2()=====\n");
  ResourceOwner res2("res2");
  res2 = std::move(ResourceOwner("res1"));
  printf("destructors for stack vars\n");
}

void testRValue3() {
  printf("=====start testRValue3()=====\n");
  ResourceOwnerWithMove res2("res2");
  // transfer the ownership to the res2
  res2 = std::move(ResourceOwnerWithMove("res1"));
  printf("destructors for stack vars\n");
}

int main() {
  testCopy();
  testAssign();
  testRValue1();
  testRValue2();
  testRValue3();
}

Let’s dive into the ouput one by one, for the testCopy():

=====start testCopy()=====
default constructor res1
copy constructor without copy and swap based on res1
destructors for stack vars
destructor res1
destructor res1

we create the res1 by default constructor and then use the copy constructor to create the res2, the inner value for these two instance is res1, and then two dectructors are called.

For the testAssign():

=====start testAssign()=====
default constructor res1
default constructor res2
copy constructor without copy and swap based on res1
assignment constructor with copy and swap based on res1
destructor res2
destructors for stack vars
destructor res1
destructor res1

For the first two line, two objects are created based on default constructor, then we execute res1=res2, here, one temp object is created for copy-and-swap based assignment implementation. So in the assignment function, we created a temporary object. Because of the swap operation, the value in the object is changed from the res1 into the res2, when the assignment function finish, the destructor res2 is printed out to show that the temp object is destroyed. At last, when the test function finish, both the res1 and res2 object is destroyed, both of these two objects have the inner value res1 in this case.

For the testRValue1():

=====start testRValue1()=====
default constructor res2
default constructor res1
copy constructor without copy and swap based on res1
assignment constructor with copy and swap based on res1
assingnment finish
destructor res2
destructor res1
destructors for stack vars
destructor res1

There is a slightly difference about the destructor calling. Since the ResourceOwner("res1"); is a temporary object, it is deleted after this line, so this is why there are two destructor log before the destructors for stack vars, one is from the destroying temporary object which is created in the assignment function, another is from the ResourceOwner("res1") in this line. At last, the destructor for res2 is called (it has the inner value res1 becasue of the assingment operation).

For the testRValue2():

=====start testRValue2()=====
default constructor res2
default constructor res1
copy constructor without copy and swap based on res1
assignment constructor with copy and swap based on res1
assingnment finish
destructor res2
destructor res1
destructors for stack vars
destructor res1

The printed results are same with the previous one, this aims to show that when we use the std::move(ResourceOwner("res1")), but we do not define the move assignment operator explicitly, the original assignment function which has the const lvalue parameter is still be called.

For the testRValue3():

=====start testRValue3()=====
default constructor res2
default constructor res1
move assignment constructor without copy and swap based on res1
destructor res2
destructors for stack vars
destructor res1

we update the original ResourceOwner class and created ResourceOwnerWithMove class. The only differnece is that we add a move assingment operator. If we use the std::move(ResourceOwner("res1")), this move assignment operator is called with first priority. Then since we use the swap in the move function, the value in the temporary operator is changed to the res2, which explains the destructor res2 when it is deleted. Finally, the original instance (with res1 in it) is destroyed.

However, we can achieve different version of the move assignment operator based on the flexibility of rvalue reference notation. Since we get rid of the const limitation in this case, so it is flexible to update the inner value. For example, we can update the move assignment operator:

ResourceOwnerWithMove& operator=(ResourceOwnerWithMove&& other) {
  printf("move assignment constructor without copy and swap based on %s\n",
         other.theResource->c_str());
  theResource = other.theResource;
  other.theResource = nullptr;
  return *this;
}

In this way, the original pointer is set as null and we assign it to the pointer at the left side of the object. In this way, we avoid the copy of the inner object and just transfer the ownership of the inner value.

range based for loop

refer to this question

references

http://thbecker.net/articles/rvalue_references/section_03.html

https://juejin.cn/post/6844903497075294216

https://stackoverflow.com/questions/24828074/passing-rvalue-reference-to-const-lvalue-reference-paremeter

move ways to implement the move constructor

http://www.vollmann.ch/en/blog/implementing-move-assignment-variations-in-c++.html

AverageMind