When should I use __forceinline instead of inline?

断了今生、忘了曾经 提交于 2019-11-30 10:49:36

The compiler is making its decisions based on static code analysis, whereas if you profile as don says, you are carrying out a dynamic analysis that can be much farther reaching. The number of calls to a specific piece of code is often largely determined by the context in which it is used, e.g. the data. Profiling a typical set of use cases will do this. Personally, I gather this information by enabling profiling on my automated regression tests. In addition to forcing inlines, I have unrolled loops and carried out other manual optimizations on the basis of such data, to good effect. It is also imperative to profile again afterwards, as sometimes your best efforts can actually lead to decreased performance. Again, automation makes this a lot less painful.

More often than not though, in my experience, tweaking alogorithms gives much better results than straight code optimization.

You know better than the compiler only when your profiling data tells you so.

The one place I am using it is licence verification.

One important factor to protect against easy* cracking is to verify being licenced in multiple places rather than only one, and you don't want these places to be the same function call.


*) Please don't turn this in a discussion that everything can be cracked - I know. Also, this alone does not help much.

I've developed software for limited resource devices for 9 years or so and the only time I've ever seen the need to use __forceinline was in a tight loop where a camera driver needed to copy pixel data from a capture buffer to the device screen. There we could clearly see that the cost of a specific function call really hogged the overlay drawing performance.

The only way to be sure is to measure performance with and without. Unless you are writing highly performance critical code, this will usually be unnecessary.

The inline directive will be totally of no use when used for functions which are:

recursive, long, composed of loops,

If you want to force this decision using __forceinline

Actually, even with the __forceinline keyword. Visual C++ sometimes chooses not to inline the code. (Source: Resulting assembly source code.)

Always look at the resulting assembly code where speed is of importance (such as tight inner loops needed to be run on each frame).

Sometimes using #define instead of inline will do the trick. (of course you loose a lot of checking by using #define, so use it only when and where it really matters).

For SIMD code.

SIMD code often uses constants/magic numbers. In a regular function, every const __m128 c = _mm_setr_ps(1,2,3,4); becomes a memory reference.

With __forceinline, compiler can load it once and reuse the value, unless your code exhausts registers (usually 16).

CPU caches are great but registers are still faster.

P.S. Just got 12% performance improvement by __forceinline alone.

There are several situations where the compiler is not able to determine categorically whether it is appropriate or beneficial to inline a function. Inlining may involve trade-off's that the compiler is unwilling to make, but you are (e.g,, code bloat).

In general, modern compilers are actually pretty good at making this decision.

When you know that the function is going to be called in one place several times for a complicated calculation, then it is a good idea to use __forceinline. For instance, a matrix multiplication for animation may need to be called so many times that the calls to the function will start to be noticed by your profiler. As said by the others, the compiler can't really know about that, especially in a dynamic situation where the execution of the code is unknown at compile time.

Actually, boost is loaded with it.

For example

 BOOST_CONTAINER_FORCEINLINE flat_tree&  operator=(BOOST_RV_REF(flat_tree) x)
    BOOST_NOEXCEPT_IF( (allocator_traits_type::propagate_on_container_move_assignment::value ||
                        allocator_traits_type::is_always_equal::value) &&
                         boost::container::container_detail::is_nothrow_move_assignable<Compare>::value)
 {  m_data = boost::move(x.m_data); return *this;  }

 BOOST_CONTAINER_FORCEINLINE const value_compare &priv_value_comp() const
 { return static_cast<const value_compare &>(this->m_data); }

 BOOST_CONTAINER_FORCEINLINE value_compare &priv_value_comp()
 { return static_cast<value_compare &>(this->m_data); }

 BOOST_CONTAINER_FORCEINLINE const key_compare &priv_key_comp() const
 { return this->priv_value_comp().get_comp(); }

 BOOST_CONTAINER_FORCEINLINE key_compare &priv_key_comp()
 { return this->priv_value_comp().get_comp(); }

 public:
 // accessors:
 BOOST_CONTAINER_FORCEINLINE Compare key_comp() const
 { return this->m_data.get_comp(); }

 BOOST_CONTAINER_FORCEINLINE value_compare value_comp() const
 { return this->m_data; }

 BOOST_CONTAINER_FORCEINLINE allocator_type get_allocator() const
 { return this->m_data.m_vect.get_allocator(); }

 BOOST_CONTAINER_FORCEINLINE const stored_allocator_type &get_stored_allocator() const
 {  return this->m_data.m_vect.get_stored_allocator(); }

 BOOST_CONTAINER_FORCEINLINE stored_allocator_type &get_stored_allocator()
 {  return this->m_data.m_vect.get_stored_allocator(); }

 BOOST_CONTAINER_FORCEINLINE iterator begin()
 { return this->m_data.m_vect.begin(); }

 BOOST_CONTAINER_FORCEINLINE const_iterator begin() const
 { return this->cbegin(); }

 BOOST_CONTAINER_FORCEINLINE const_iterator cbegin() const
 { return this->m_data.m_vect.begin(); }
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!