pytorch - connection between loss.backward() and optimizer.step()

前端 未结 5 483
谎友^
谎友^ 2020-12-23 13:04

Where is an explicit connection between the optimizer and the loss?

How does the optimizer know where to get the gradients of the loss wit

相关标签:
5条回答
  • 2020-12-23 13:12

    Let's say we defined a model: model, and loss function: criterion and we have the following sequence of steps:

    pred = model(input)
    loss = criterion(pred, true_labels)
    loss.backward()
    

    pred will have an grad_fn attribute, that references a function that created it, and ties it back to the model. Therefore, loss.backward() will have information about the model it is working with.

    Try removing grad_fn attribute, for example with:

    pred = pred.clone().detach()
    

    Then the model gradients will be None and consequently weights will not get updated.

    And the optimizer is tied to the model because we pass model.parameters() when we create the optimizer.

    0 讨论(0)
  • 2020-12-23 13:24

    Short answer:

    loss.backward() # do gradient of all parameters for which we set required_grad= True. parameters could be any variable defined in code, like h2h or i2h.

    optimizer.step() # according to the optimizer function (defined previously in our code), we update those parameters to finally get the minimum loss(error).

    0 讨论(0)
  • 2020-12-23 13:26

    Perhaps this will clarify a little the connection between loss.backward and optim.step (although the other answers are to the point).

    # Our "model"
    x = torch.tensor([1., 2.], requires_grad=True)
    y = 100*x
    
    # Compute loss
    loss = y.sum()
    
    # Compute gradients of the parameters w.r.t. the loss
    print(x.grad)     # None
    loss.backward()      
    print(x.grad)     # tensor([100., 100.])
    
    # MOdify the parameters by subtracting the gradient
    optim = torch.optim.SGD([x], lr=0.001)
    print(x)        # tensor([1., 2.], requires_grad=True)
    optim.step()
    print(x)        # tensor([0.9000, 1.9000], requires_grad=True)
    

    loss.backward() sets the grad attribute of all tensors with requires_grad=True in the computational graph of which loss is the leaf (only x in this case).

    Optimizer just iterates through the list of parameters (tensors) it received on initialization and everywhere where a tensor has requires_grad=True, it subtracts the value of its gradient stored in its .grad property (simply multiplied by the learning rate in case of SGD). It doesn't need to know with respect to what loss the gradients were computed it just wants to access that .grad property so it can do x = x - lr * x.grad

    Note that if we were doing this in a train loop we would call optim.zero_grad() because in each train step we want to compute new gradients - we don't care about gradients from the previous batch. Not zeroing grads would lead to gradient accumulation across batches.

    0 讨论(0)
  • 2020-12-23 13:28

    Without delving too deep into the internals of pytorch, I can offer a simplistic answer:

    Recall that when initializing optimizer you explicitly tell it what parameters (tensors) of the model it should be updating. The gradients are "stored" by the tensors themselves (they have a grad and a requires_grad attributes) once you call backward() on the loss. After computing the gradients for all tensors in the model, calling optimizer.step() makes the optimizer iterate over all parameters (tensors) it is supposed to update and use their internally stored grad to update their values.

    More info on computational graphs and the additional "grad" information stored in pytorch tensors can be found in this answer.

    0 讨论(0)
  • 2020-12-23 13:31

    When you call loss.backward(), all it does is compute gradient of loss w.r.t all the parameters in loss that have requires_grad = True and store them in parameter.grad attribute for every parameter.

    optimizer.step() updates all the parameters based on parameter.grad

    0 讨论(0)
提交回复
热议问题