I refer to Multi-Scale Context Aggregation by Dilated Convolutions.
TLDR
The more important point is that the architecture is based on the fact that dilated convolutions support exponential expansion of the receptive field without loss of resolution or coverage.
Allows one to have larger receptive field with same computation and memory costs while also preserving resolution.
@Rahul referenced WaveNet, which put it very succinctly in 2.1 Dilated Causal Convolutions. It is also worth looking at Multi-Scale Context Aggregation by Dilated Convolutions I break it down further here:
To draw an explicit contrast, consider this:
In addition to the benefits you already mentioned such as larger receptive field, efficient computation and lesser memory consumption, the dilated causal convolutions also has the following benefits:
I'd refer you to read this amazing paper WaveNet which applies dilated causal convolutions to raw audio waveform for generating speech, music and even recognize speech from raw audio waveform.
I hope you find this answer helpful.