The Keras layer documentation specifies the input and output sizes for convolutional layers: https://keras.io/layers/convolutional/
Input shape: (samples, channels
I was also wondering this, and found another answer here, where it is stated (emphasis mine):
Maybe the most tangible example of a multi-channel input is when you have a color image which has 3 RGB channels. Let's get it to a convolution layer with 3 input channels and 1 output channel. (...) What it does is that it calculates the convolution of each filter with its corresponding input channel (...). The stride of all channels are the same, so they output matrices with the same size. Now, it sums up all matrices and output a single matrix which is the only channel at the output of the convolution layer.
Illustration:
Notice that the weights of the convolution kernels for each channel are different, which are then iteratively adjusted in the back-propagation steps by e.g. gradient decent based algorithms such as stochastic gradient descent (SDG).
Here is a more technical answer from TensorFlow API.