A much better solution than my previous one is using np.einsum
:
np.einsum('...i,...j', p, p)
which is even faster than the broadcasting approach:
In [ ]: %timeit p[..., None] * p[:, None, :]
514 µs ± 4.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [ ]: %timeit np.einsum('...i,...j', p, p)
169 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
As for how it works I'm not quite sure, I just messed around with einsum
until I got the answer I wanted:
In [ ]: np.all(np.einsum('...i,...j', p, p) == p[..., None] * p[:, None, :])
Out[ ]: True