I started adding closures (lambdas) to my language that uses LLVM as the backend. I have implemented them for simple cases where they can be always inlined i.e. code for the clo
A dumb idea would be that for each closure you generate a thread local structure to hold the required data (could be just a pointer to a local structure, or several pointers).
The creator of the closure is the responsible for setting the TLS variables and "saving" the state they had (to allow recursive call).
The user then calls the function normally, it's executed and use the environemnt.
After the call, the creator of the closure "restores" the original values into the TLS variables.
Sounds doable and efficient.
The alternative way, that does not need trampolines, is to define closure type as a pair of function pointer and pointer to environment ie stack pointer. In C calling convention the extra arguments are ignored so if you provide environment as last argument you can even use (function_ptr, null) as callback for regular function.