Sunday, November 18, 2012

C++11 Standard Explained: 1. Unrestricted Union

C++11 contain a lot of new and exciting features, and unrestricted union is one of those.

Here is the link to the proposal for the standard:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2544.pdf

The main difference between C++11 unrestricted union and the C and C++ union is that the new union is may contain non-POD types. Quoting from the proposal, "Our proposed solution is to remove all of the restrictions on the types of members of unions, with the exception of reference types."

There are profound significance for this change. I will use the example from the proposal to illustrate this point. If we have a very simple class called point:

struct point {
  point() {}
  point(int x, int y) : x_(x), y_(y) {}
  int x_, y_;
};


Note that this class only contains non-POD types, but the struct itself is NOT a POD types because it has a non-trivial constructor. This means that the following union is illegal with C++03:

union {
  point p_;
  int i_;
  const char* s_;
};

As you can see, this creates a lot of inconvenience, which effectively reduced the power of union, therefore the standard for unrestricted union is proposed. This means the above class is legal with the new C++ standard.

Before we proceed any further, I must note what is still not allowed with the new C++ standard, and that is reference types.

For example, if I have the following union declaration:

union cannotCompile
{
  int _i;
  char _c;
  char & _rc;
};

union.cpp:5:12: error: ‘cannotCompile::_rc’ may not have reference type ‘char&’ because it is a member of a union

Other than reference types, all the user-defined classes and structs are allowed to be a member of an union. This causes a new problem though: you need to allocate and delete the memory associated with the union member separately. If we have an union with the following definition:

union str_int
{
  std::string _str;
  int16_t _int;
}
Since the _str member has an non-trivial constructor, we need to be able to initialize the member specifically. Since you cannot directly call the constructor of a specific class, placement new is necessary here. This means that the union needs to have constructors, since it is non-trivial. Luckily, this is also part of the proposal for the standard of unrestricted union. To illustrate the point, I will add a constructor to this union:

union str_int
{
    std::string _str;
    std::vector<char> _raw;
    int16_t _int;
    str_int(std::string str)
    {
        _raw.~vector<char>();
        new (&str) std::string(str);
    }
    ~str_int()
    {
        _str.~string();
    }
};

You can also declare assignment constructor for the union. From the proposal, "The default constructor (12.1), copy constructor and copy assignment operator (12.8), and destructor (12.4) are special member functions."

You do need to declare the constructor and destructor specifically, because the constructor and destructor will be deleted: "if a non-trivial special member function is defined for any member of a union, or a member of an anonymous union inside a class, that special member function will be implicitly deleted (8.4 ¶10) for the union or class. This prevents the compiler from trying to write code that it cannot know how to write, and forces the programmer to write that code if it’s needed."

In the constructor, I called the destructor of the vector to deallocate the memory associated with it, and used placement new to create a std::string within the union. Here, the use of placement new is critical. Placement new takes a pointer as an argument, and allocate the memory to specified by the address. For more information with placement new, check out this post: http://stackoverflow.com/questions/222557/what-uses-are-there-for-placement-new.

Of course, there is one major bug associated with the constructor: how do you know if the union was interpreted as an vector, and there were memory allocated for the vector for us to destroy? The simply question is that we don't. The simple solution for this question would be: encapsulate this union with a wrapper class which keeps track of the current type of the union.

For a more complete and comprehensive example for unrestricted union, I implemented a JSON class last summer for the web framework I was working on using C++11 union. You may check it out here: https://github.com/benjibc/json-universal-container

Please point out any errors in the article by leaving a comment below. Have fun coding in C++11!
Coming up: C++11 constant expression.

4 comments:

  1. This looks wrong:

    str_int(std::string str)
    {
    _raw.~vector();
    new (&str) std::string(str);
    }

    Shouldn't it be:

    str_int(std::string str)
    {
    _raw.~vector();
    new (&_str) std::string(str);
    }

    ?

    ReplyDelete
  2. and another problem in your ctor:

    _raw.~vector(); you call a destructor on something that was not constructed yet.

    ReplyDelete
  3. "Note that this class only contains non-POD types, but the struct itself is NOT a POD types because it has a non-trivial constructor."

    should be:

    "Note that this struct only contains POD types, but the struct itself is NOT a POD, because it has a non-trivial constructor."

    ReplyDelete