Frustrating behaviour of the INLINE pragma

So I have a function "a", which uses another function "b" from another module.

Step 1. I benchmark its performance and get 250ns.
Step 2. I go and put the "INLINE" pragma on the function "b", run the benchmark again and get 500ns. That is twice as long.
Step 3. I go and add an explicit invocation of the inline function over function "b" in function "a" and finally get the desired optimization: 208ns.

You can reproduce the issue by executing cabal bench decoding --benchmark-options=numeric after cloning the trees of the following commits:

Trac metadata

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information