Sunday 4 September 2016

What you need _not_ know about std::function - Part 2


  • Part-1 : Getting to know the basics
  • Part-3 : Performance benchmarks


In the previous post we saw some basic type traits and metafunctions that are used in the implementation of std::mem_fn and std::function.  The main gist of this post is to understand the implementation of std::function in libstd++.

Also, I would have liked to talk about things like:
  • The performance penalty of using std::function.
  • When to use std::function.
  • Some performance benchmarks.
I will leave the above topics as the centre of discussion in the third part of the series (This was supposed to be the sentinel post as per my original plan).  Discussing them here will just make this post a torture to read.

A naive implementation

Library development in C++ is tough. It becomes even more intricate when it is supposed to be generic. The most difficult part or rather detail oriented part is the type system itself. Dealing with the types together with their qualifications results in lot of metaprogramming boilerplate code and if you forget something it could lead to some nasty UB.

So, let's first try to implement our own but rather very simple version of std::function. I will be skipping a-great-many required details to keep the code small.

The requirements

1. Should be able to invoke any callable target.
2. The underlying type of the callable object must be hidden i.e the only type information that should be visible is the pure function signature only.
3. The target callable object must be valid as long as the function object is valid.

I have kept bare minimum requirements for my implementation to keep off from doing metaprogramming hackery which we would be seeing anyways while going through the implementation of std::function.

The first attempt

This version just implements the basic interface of Function and also implements the specialization for function pointer. In the later versions (or attempts), the interface developed for Function would more or less remain the same only the specialization required to handle other types of callable objects would be handled.

template <typename> class Function;

template <typename Ret, typename... Args>
class Function<Ret(Args...)>
{
public:
  using implementation = detail::function_impl_base<Ret, Args...>;

  //Default constructor
  Function() = default;

  template <typename Callable>
  Function(Callable f): // Requires callable be copyable
    impl_base_(std::make_unique<detail::func_impl<Callable>>(
                   std::move(f), detail::constructor_tag()))
  {}

  // Copy constructor
  Function(const Function& other)
  {
    this->impl_base_.reset(other.impl_base_->clone());
  }

  // Copy Assignment
  Function& operator=(Function other)
  {
    this->impl_base_.reset(other.impl_base_.release());
    return *this;
  }

  template <typename Callable>
  Function& operator=(Callable cb)
  {
    this->impl_base_ =
      std::make_unique<detail::func_impl<Callable>>(
                std::move(cb), detail::constructor_tag());
    return *this;
  }

  Ret operator()(Args&&... args)
  {
    assert (impl_base_.get());
    return (*impl_base_)(std::forward<Args>(args)...);
  }

private:
  std::unique_ptr<implementation> impl_base_ = nullptr;
};

Nothing much fancy. We have the same signature as that of std::function i.e the type Function is not aware about the type of the call target. That means that for function pointer or member function pointer or functor, the syntax of the Function object declaration would be exactly the same (provided the function signature is same of course) even though the actual types of all these callable objects are different. This is achieved using a technique called 'Type Erasure'.

Type Erasure

I am not intending to explain type-erasure in detail here. There are already a lot of content on it floating on the web. boost::any is one of the very basic example of type-erasure. shared_ptr, std::function are the other that I know of in the standard library that makes use of this technique.

Type-erasure in it's basic form is implemented by making use of polymorphism and thus requires dynamic memory allocation. This is how my implementation of Function does the type-erasure.

In the above code function_impl_base is the base class defining the interface. The derived classes implementing the specializations for different types of Callable types must implement the functions provided by the interface.

namespace detail {

  struct constructor_tag {};

  template <typename Ret, typename... Args>
  class function_impl_base
  {
  public:
    virtual Ret operator()(Args&&... args) const = 0;
    virtual function_impl_base* clone() = 0;
    virtual ~function_impl_base() {}
  };

  template <typename T> class func_impl;

  // Specialization for Function pointers
  template <typename Ret, typename... Args>
  class func_impl<Ret(*)(Args...)> final : public function_impl_base<Ret, Args...>
  {
  public:
    using callable_t = Ret(*)(Args...);

    /*
     * This is a greedy constructor. constructor_tag is just used so as
     * to not interfere with regular copy-constructor.
     */
    template <typename F>
    func_impl(F&& callable, constructor_tag): callable_(std::move(callable))
    {}

    func_impl(const func_impl&) = default;

    func_impl& operator=(func_impl&) = default;

    Ret operator()(Args&&... args) const
    {
      return callable_(std::forward<Args>(args)...);
    }

    func_impl* clone()
    {
      return (new func_impl(*this));
    }

  private:
    callable_t callable_;
  };

}


Let's now look at the constructor of Function in a bit more detail:
template <typename Callable>
  Function(Callable f): // Requires callable be copyable
    impl_base_(std::make_unique<detail::func_impl<Callable>>(std::move(f), detail::constructor_tag()))
  {}

The constructor is a template constructor and is the only part of  Function class (lets forget about assignment and copy constructors for the time being) which actually knows about the type of the callable object. But this knowledge is lost as soon as the constructor exits. But, we need to save this information in some form, right ? This is where inheritance comes into picture. Using inheritance, the type of the information is saved as RTTI (Run Time Type Information) probably in the vtable. This is exactly the purpose of function_impl_base.  
The class func_impl<Ret(*)(Args...)>  inherits from function_impl_base and implements the call operator.

The final attempt

Not much difference, only added specialization for member function pointer and functors:

namespace detail {

  template <typename Ret, typename... Args>gt;
  class function_impl_base
  {
  public:
    virtual Ret operator()(Args&&... args) = 0;
    virtual function_impl_base* clone() = 0;
    virtual ~function_impl_base() {}
  };

  template <typename Signature, typename Functor>gt; class func_impl;

  // Specialization for Function pointers
  template <typename Ret, typename... Args>gt;
  class func_impl<Ret(Args...), Ret(*)(Args...)>gt; final
            : public function_impl_base<Ret, Args...>gt;
  {
  public:
    using callable_t = Ret(*)(Args...);

    // The need for constructor tag removed in this version
    func_impl(callable_t callable): callable_(std::move(callable))
    {}

    func_impl(const func_impl&) = default;

    func_impl& operator=(func_impl&) = default;

    Ret operator()(Args&&... args)
    {
      return callable_(std::forward<Args>gt;(args)...);
    }

    func_impl* clone()
    {
      return (new func_impl(*this));
    }

  private:
    callable_t callable_;
  };

  // Specialization for Functors
  template <typename Functor, typename Ret, typename... Args>gt;
  class func_impl<Ret(Args...), Functor>gt;
            : public function_impl_base<Ret, Args...>gt;
  {
  public:
    using callable_t = Functor;

    func_impl(Functor callable): callable_(std::move(callable))
    {}

    func_impl(const func_impl&) = default;
    func_impl& operator=(func_impl&) = default;

    Ret operator()(Args&&... args)
    {
      return callable_(std::forward<Args>gt;(args)...);
    }

    func_impl* clone()
    {
      return (new func_impl(*this));
    }

  private:
      callable_t callable_;
  };

  // Specialization for Member function pointers
  // Leveraging the use of mem_fn
  template <typename Class, typename Member, typename Ret, typename... Args>gt;
  class func_impl<Ret(Args...), Member Class::*>gt;
            : public function_impl_base<Ret, Args...>gt;
  {
  public:
    using callable_t = Member (Class::*);

    func_impl(callable_t callable): callable_(std::move(callable))
    {}

    func_impl(const func_impl&) = default;
    func_impl& operator=(func_impl&) = default;

    Ret operator()(Args&&... args)
    {
      return call(std::forward<Args>gt;(args)...);
    }

    template <typename ClassType, typename... Params>gt;
    Ret call(ClassType obj, Params&&... args)
    {
      return std::mem_fn(callable_)(obj, std::forward<Params>gt;(args)...);
    }

    func_impl* clone()
    {
      return (new func_impl(*this));
    }

  private:
    callable_t callable_;
  };

}

template <typename>gt; class Function;

template <typename Ret, typename... Args>gt;
class Function<Ret(Args...)>gt;
{
public:
  using implementation = detail::function_impl_base<Ret, Args...>gt;;
  using call_signature = Ret(Args...);

  //Default constructor
  Function() = default;

  // Copy constructor
  Function(const Function& other)
  {
    this->gt;impl_base_.reset(other.impl_base_->gt;clone());
  }

  // Copy Assignment
  Function& operator=(Function other)
  {
    this->gt;impl_base_.reset(other.impl_base_.release());
    return *this;
  }

  template <typename Callable>gt;
  Function& operator=(Callable cb)
  {
    this->gt;impl_base_ =
      std::make_unique<detail::func_impl<call_signature, Callable>gt;>gt;(std::move(cb));
    return *this;
  }

  // constructor
  template <typename Callable>gt;
  Function(Callable f): // Requires callable be copyable
    impl_base_(std::make_unique<detail::func_impl<call_signature, Callable>gt;>gt;(std::move(f)))
  {
  }

  Ret operator()(Args&&... args)
  {
    assert (impl_base_.get());
    return (*impl_base_)(std::forward<Args>gt;(args)...);
  }

  explicit operator bool() const noexcept
  {
    return impl_base_ != nullptr;
  }

private:
  std::unique_ptr<implementation>gt; impl_base_ = nullptr;
};
  

The above code can be accessed from HERE.

Issues with the custom implementation

1. Memory allocation. Dynamic memory allocation is costly. It is probably ok if you know you are not going to create many instances of Function object and pass it around. But it's a big no-no in a performance critical code. This can be resolved in some cases by doing SBO (Small Buffer Optimization), which is what almost all implementation does.

2. No type checking at all. Since the type of the actual callable object gets erased after constructor and assignment operators. It is better to check whether the user is correctly using the function object. If not, it should be a compile time error rather than leading to subtle undefined behaviour in the code.

3. Overhead of a virtual function call and that of an additional call to the callable object. This can be pretty bad, if you going to call your implementation functions via Function.

4. The cost of generic programming. This is my personal opinion, customized solution can be made faster than the above Function implementation, but when you have to use something like Function as part of your interface to client code, then this is the way we have to do and improve upon it. This is something which I would like to touch upon in the final part of this series.

The Implementation of std::function 

In this section we will see the implementation of std::function in libstd++ 6.1. We will be making use of some of the type traits discussed in Part-1 and will see some new ones here. Let's first lookup at some concepts and tricks individually and connect them up later.

Location Invariance

The metafunction for this check is:

template <typename T>
struct is_location_invariant
     : is_trivially_copyable<T>::type
{ };

This is one of the checks that is made to determine if SBO can be performed on the passed callable type or not. This is mainly relevant for Functors, since function pointer and member function pointers are trivially copyable. So, it is necessary that one must make sure that his/her functor implements the TriviallyCopyable concept.

For the lazy people (from cppreference) :
  • Every copy constructor is Trivial or deleted (since C++14)
  • Every move constructor is Trivial or deleted (since C++14)
  • Every copy assignment operator is Trivial or deleted (since C++14)
  • Every move assignment operator is Trivial or deleted (since C++14)
  • at least one copy constructor, move constructor, copy assignment operator, or move assignment operator is non-deleted
(since C++14)
  • Trivial non-deleted (since C++14) destructor
This implies that the class has no virtual functions or virtual base classes.
Scalar types and arrays of TriviallyCopyable objects are TriviallyCopyable as well, as well as the const-qualified (but not volatile-qualified) versions of such types.


No-Copy Types

This is nothing but the types of callables that does not need to be copied:

  class Undefined_class;

  union Nocopy_types
  {
    void*       M_object;
    const void* M_const_object;
    void (*M_function_pointer)();
    void (Undefined_class::*M_member_pointer)();
  };

This basically means, below types are no-copy types:-

  1. Pointer to a callable object.
  2. Pointer to a const callable object.
  3. A function pointer.
  4. A member function pointer.
The lack of any template parameters should give you an hint that the actual types mentioned inside the union is not going to be used. Maybe it's the size of the union ? We will get to that shortly.

Small Buffer Optimization

Let's see the data structure employed for the all important small buffer optimization by libstd++

  union Any_data
  {
    void*       M_access()       { return &M_pod_data[0]; }
    const void* M_access() const { return &M_pod_data[0]; }

    template <typename T>
    T& M_access() {
      return *static_cast <T*>(M_access()); 
    }

    template <typename T >
    const T& M_access() const {
      return *static_cast<const T*>(M_access());
    }
      
    Nocopy_types M_unused;
    char M_pod_data[sizeof(Nocopy_types)];
  };

It is just a char buffer having a size of sizeof(Nocopy_types) bytes, which should be 8 bytes on 64 bit platform. This is another constraint that the functor must have other than the location invariance,  for the implementation to perform SBO.
If the size of your functor is more than 8 bytes, then you will have to pay for one dynamic allocation.

The Function_base class

This class basically deals with the small buffer or Any_data directly. It makes the decision of what needs to be stored in Any_data i.e whether it should be the callable object as it is or the memory location of the callable object after allocating it on heap.

There are 2 inner classes named Base_manager and Ref_manager (a specialization for reference_wrapped functors), which does the actual dealing with Any_data.

Lets dig into the implementation of Base_manager, which would give us a lot of idea on what is happening behind the scenes.

      template<typename Functor>
      class Base_manager
      {
      protected:
        static const bool stored_locally =
        (is_location_invariant<Functor>::value
         && sizeof(Functor) <= M_max_size  // 8 bytes on 64 bit
         && alignof(Functor) <= M_max_align
         && (M_max_align % alignof(Functor) == 0));

        typedef integral_constant<bool, stored_locally> Local_storage;

        // Retrieve a pointer to the function object
        static Functor*
        M_get_pointer(const Any_data& source)
        {
          const Functor* ptr =
            stored_locally? std::addressof(source.M_access<Functor>())
            /* have stored a pointer */ : source.M_access<Functor*>();
          return const_cast<Functor*>(ptr);
        }

        // Clone a location-invariant function object that fits within
        // an _Any_data structure.
        static void
        M_clone(Any_data& dest, const Any_data& source, true_type)
        {
          new (dest.M_access()) Functor(source.M_access<Functor>());
        }

        // Clone a function object that is not location-invariant or
        // that cannot fit into an _Any_data structure.
        static void
        M_clone(Any_data& dest, const Any_data& source, false_type)
        {
          dest.M_access<Functor*>() =
            new Functor(*source.M_access<Functor*>());
        }

        // Destroying a location-invariant object may still require
        // destruction.
        static void
        M_destroy(Any_data& victim, true_type)
        {
          victim.M_access<Functor>().~Functor();
        }

        // Destroying an object located on the heap.
        static void
        M_destroy(Any_data& victim, false_type)
        {
          delete victim.M_access<Functor*>();
        }

      public:
        static bool
        M_manager(Any_data& dest, const Any_data& source,
                   Manager_operation op)
        {
         //skipped
         ........
        }

        static void
        M_init_functor(Any_data& functor, Functor&& f)
        { M_init_functor(functor, std::move(f), Local_storage()); }


      private:
        static void
        M_init_functor(Any_data& functor, Functor&& f, true_type)
        { new (functor.M_access()) Functor(std::move(f)); }

        static void
        M_init_functor(Any_data& functor, Functor&& f, false_type)
        { functor.M_access<Functor*>() = new Functor(std::move(f)); }
      };

Now that's something. The integral constant Local_storage is set to true if the functor can be stored inside the Any_data buffer else it will be false and the other member functions gets called based on that using tag-dispatching. For eg. check out the definitions for M_clone member function.

Let's look at functions M_init_functor and M_get_pointer in bit more detail to understand how the callable object is stored and retrieved by std::function.
 static void
 M_init_functor(Any_data& functor, Functor&& f)
 { M_init_functor(functor, std::move(f), Local_storage()); }

 static void
 M_init_functor(Any_data& functor, Functor&& f, true_type)
 { new (functor.M_access()) Functor(std::move(f)); }

 static void
 M_init_functor(Any_data& functor, Functor&& f, false_type)
 { functor.M_access<Functor*>() = new Functor(std::move(f)); }
 

Considering the passed callable object meets the requirement for being stored on stack, Local_storage would be of true_type, hence the second M_init_functor will get called, which makes use of placement new operator to construct the object on the stack itself.

If the callable object does not meet the requirement for being stored locally i.e it is either not TriviallyCopyable or has data members making its size more than what is allowed (i.e 8 bytes on 64 bit platform), then the callable object is created on heap and its memory location gets stored in the Any_data buffer. This is what the third M_init_functor function does.

 static Functor*
 M_get_pointer(const Any_data& source)
 {
   const Functor* ptr =
        stored_locally ? std::addressof(source.M_access<Functor>())
             : source.M_access<Functor*>();
   return const_cast<Functor*>(ptr);
 }

This must be now easier to reason about. Depending upon whether the callable object is stored locally or not, M_get_pointer gets the address of the callable object. The other member functions are just based upon this technique.

The Ref_manager is just basically a specialization to take care of functor passed via a reference_wrapper. Which is something one must consider to do if their function object size is greater than 8 bytes (on 64 bit platform) AND if the scope of the callable object outlives the scope of the std::function object.

So, having seen Base_manager and Ref_manager class, lets see how Function_base class looks like:

 class Function_base {
 public:     
   static const std::size_t M_max_size = sizeof(Nocopy_types);
   static const std::size_t M_max_align = alignof(Nocopy_types);

   template <typename Functor>
   class Base_manager { ... };

   template <typename Functor>
   class Ref_manager : public Base_manager <Functor*>
   {
     .....
   };

   typedef bool (*Manager_type)(Any_data&, const Any_data&,
                                Manager_operation);

   Any_data     M_functor;
   Manager_type M_manager;
 }; 

As one can see, Manager_type is a function pointer pointing to either Base_manager::M_manager or Ref_manager::M_manager function. These functions basically fill the destination Any_data with the stored callable object from the source Any_data based upon the operation. Something like below:

        static bool
        M_manager(Any_data& dest, const Any_data& source,
                  Manager_operation op)
        {
          switch (op)
          {
            case get_functor_ptr:
              dest.M_access <Functor*>() = M_get_pointer(source);
              break;

            case clone_functor:
              M_clone(dest, source, Local_storage());
              break;

            case destroy_functor:
              M_destroy(dest,Local_storage());
              break;
            }
          return false;
        }


Function Handlers

The function handlers are kind of similar in functionality to func_impl in my custom implementation i.e to invoke the actual callable object. The classes which we saw till now were to only manage the storage part, they were not concerned with invoking the call operator. 

Function handler classes are derived from either Function_base::Base_manager or Function_base::Ref_manager based upon the functor type. The function handlers are also specialized for plain function pointers, member pointers and for functors. 

We will be seeing only few of those specialization.

1. Specialization for plain function pointer.

    template<typename Signature, typename Functor>
    class Function_handler;

  template<typename Res, typename Functor, typename... ArgTypes>
    class Function_handler<Res(ArgTypes...), Functor>
    : public Function_base::Base_manager<Functor>
    {
      typedef Function_base::Base_manager<Functor> Base;

    public:
      static Res
      M_invoke(const Any_data& functor, ArgTypes&&... args)
      {
        return (*Base::M_get_pointer(functor))(
            std::forward<ArgTypes>(args)...);
      }
    };

2. Specialization for Member pointer.

   template<typename Class, typename Member, typename Res,
           typename... ArgTypes>
    class Function_handler<Res(ArgTypes...), Member Class::*>
    : public Function_handler<void(ArgTypes...), Member Class::*>
    {
      typedef Function_handler<void(ArgTypes...), Member Class::*>
        Base;

     public:
      static Res
      M_invoke(const Any_data& functor, ArgTypes&&... args)
      {
        return std::mem_fn(Base::M_get_pointer(functor)->value)(
            std::forward<ArgTypes>(args)...);
      }
    };

This is very similar to what we did for member pointers in our custom implementation i.e let std::mem_fn do the heavy lifting of taking care of both member data pointer and member function pointer.


Design uptill now

Specialization of a regular functor is also not much different than what we saw for plain function pointer. The design is like below:
1. Function_base is responsible for storing and retrieving the pointers to the callable object.
2. The function handler objects fetches the pointer to the callable objects from Function_base and invokes them with the passed arguments.

The std::function class

This is  what we all have been waiting for.But, you would be surprised (or not) to know that we have already covered all the heavy lifting part ! function class is more about the interface it provides, the type checkings and conversions.

The basic interface is pretty close to what we implemented in our custom solution, so we will be only looking at the interesting parts i.e the things we left out from our custom implementation.

A stripped down version of std::function:

 template < typename Res, typename... Args >
 class function<Res(Args...)>
     : private Function_base
 {
   typedef Res Signature(Args...);
 public:
   function(): Function_base() {}

   template <typename Functor,
                typename = Requires<not<is_same<Functor, function>>, void>,
                typename = Requires<Callable<Functor>, void>>
   function(Functor);

   Res operator()(ArgTypes... args) const;

 private:
   using _Invoker_type = Res (*)(const Any_data&, ArgTypes&&...);
   Invoker_type M_invoker;
 };

 template <typename Res, typename... ArgTypes>
 template <typename Functor, typename, typename>
 function<Res(ArgTypes...)>::
 function(Functor f)
      : Function_base()
 {
      typedef Function_handler <Signature, Functor> My_handler;

      My_handler::M_init_functor(M_functor, std::move(f));
      M_invoker = &My_handler::M_invoke;
      M_manager = &My_handler::M_manager;
 }


This is the part of code which I found pretty interesting. It is interesting because, it does not make use of inheritance the way we used in our custom implementation. With this kind of implementation, it is not required to do any dynamic allocation unless required because of the nature of functor.

The Functor parameter will determine what kind of Function_handler class we need to use. The invoker, which is a function pointer will be set to the correct M_invoke function of the function handler.
Thus, a single call to a callable object would require at most 2 pointer dereferences, no virtual calls and no dynamic allocation (depending on your Functor as we have already seen).

Type checking on assigning target

If we look at the constructor (or copy or assignment) template parameters a little bit more closely, we will see something like:
Callable <Functor>

Let us see what it is:
 template<typename From, typename To>
 using check_func_return_type
      = or<is_void<To>, is_same<From, To>, is_convertible<From, To>>;

 template<typename Func,
          typename Res2 = typename result_of<Func(ArgTypes...)>::type>
 struct Callable : check_func_return_type<Res2, Res> { };

The idea is to check whether the provided callable type yields a return type (Res2) when invoked with the argument types passed as the template parameters of std::function and that return type i.e Res2 is compatible with what was provided in the std::function declaration or in its template parameter.

The reason for adding is_same is pretty interesting. For details check THIS bug report.

I am not sure why is_convertible is required since its failure would not result in lookup of a constructor with different signature as there is only one definition of the constructor. The only advantage I see is, any wrong usage of std::function i.e. assigning it with a target with incompatible arguments would result in error thrown at the constructor itself, rather than at the calling site. If anybody knows about any other advantage, then please do let me know.

For example, see the usage of func1 and func2 and the error it yields using both my custom implementation (which does not use conversion checks) and std::function.

Error with My implementation:
./my_function.hpp:72:14: error: ambiguous conversion from derived class 'D' to base class 'A':
    struct D -> struct B -> struct A
    struct D -> struct C -> struct A
      return callable_(std::forward<Args>(args)...);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As one can see, the error is thrown at the point where we are actually invoking the callable target. This is not good since it is pointing an issue in the library implementers code, which is actually not the case. A good C++ library must try its best to point the error in the users code rather than to its own internal implementation.

inheritance.cc:45:5: error: no viable overloaded '='
  f = func2;
  ~ ^ ~~~~~
It points to the user code which messed it up. This is one nice advantage of using is_convertible which assigning a target.

A note about libcxx

The libcxx implementation of std::function looked a lot more straightforward than the libstd++ implementation and they provide a buffer size of 24 bytes (on 64 bit arch) for the SBO, which is 3 times more than what libstd++ offers. Also, libcxx supports allocator other than global new (I am not sure whether standard mandates that or not till c++14), which libstd++ does not currently.

Another difference w.r.t libstd++ is that the libcxx version makes use of virtual function call instead of function pointer as in libstd++. Now, I wouldn't state that function pointer would perform faster than a virtual function call, but I remember seeing the performance measurements of both types of call and a call via function pointer was slightly faster than a virtual function call.  Now, don't get all paranoid by that, as we will see in the final part of the blog series, std::function should better be not present in a tight code where performance matters most. In all other cases, the difference (if at all) shouldn't matter at all.

One problem that I see with non-standard way of implementing SBO is performance regression when going from one standard library implementation to another i.e. while moving from a library implementation that has larger internal buffer size than to a one having less (eg: libc++ -> libstd++). This would certainly waste quite some time of the developer who is debugging the issue (as in most cases this developer would not be the one who wrote the code :))