Going through this book, I am familiar with the following:
For each training instance the backpropagation algorithm first makes a prediction (forward pa
Automatic differentiation differs from the method taught in standard calculus classes on how gradients are computed, and in some features such as its native ability to take the gradient of a data structure and not just a well defined mathematical function. I'm not expert enough to go into further detail, but this is a great reference that explains it in much more depth:
https://alexey.radul.name/ideas/2013/introduction-to-automatic-differentiation/
Here's another guide that looks quite nice that I just found now.
https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation
I believe backprop may formally refer to the by-hand calculus algorithm for computing gradients, at least that's how it was originally derived and is how it's taught in classes on the subject. But in practice, backprop is used quite interchangeably with the automatic differentiation approach described in the above guides. So splitting those two terms is probably as much an effort in linguistics as it is mathematics.
I also noted this nice article on the backpropagation algorithm to compare against the above guides on automatic differentiation.
https://brilliant.org/wiki/backpropagation/