In tensorflow, can you use non-smooth function as loss function, such as piece-wise (or with if-else)? If you cant, why you can use ReLU?
In this link S
As far as Question #3 of OP goes, you actually don't have to implement the gradient computations yourself. Tensorflow will do that for you, which is one of the things I love about it!