I have experience in coding OpenMP for Shared Memory machines (in both C and FORTRAN) to carry out simple tasks like matrix addition, multiplication etc. (Just to see how it
Due to GIL there is no point to use threads for CPU intensive tasks in CPython. You need either multiprocessing (example) or use C extensions that release GIL during computations e.g., some of numpy functions, example.
You could easily write C extensions that use multiple threads in Cython, example.