I work on AVX2 and need to calculate 64-bit x64-bit -> 128-bit widening multiplication and got 64-bit high part in the fastest manner. Since AVX2 has not such an instruction
No. On modern architectures the crossover at which Karatsuba beats schoolbook multiplication is usually somewhere between 8 and 24 machine words (e.g. between 512 and 1536 bits on x86_64). For fixed sizes, the threshold is at the smaller end of that range, and the new ADCX/ADOX instructions likely bring it in somewhat further for scalar code, but 64x64 is still too small to benefit from Karatsuba.