The loss is 0,but why the gradients are not 0?

Hi, my goal is to share a trick that I found from the following paper. The use case of knowing this trick is the ability to calculate the impact of steering vectors to a specific layer by doing matrix multiplication between the gradients and the steering vectors

You can pull this Pull Request to run it. This Pull Request use NDIF, so you can run it in your own laptop instead of a workstation node with a minimum of 40 GB of VRAM

Pull Request https://github.com/FlyingPumba/steering-thinking-models/pull/1

NDIF https://nnsight.net/notebooks/tutorials/start_remote_access/

The paper: Understanding Reasoning in Thinking Models via Steering Vectors

Though sharing this would expose my weakness in math 🤣

I was curious why the paper calculate the loss between `logits.detach()` and `logits`

These are 2 questions that I am curious to answer:

1. The loss is 0.0, but why the gradients are not 0.0?

2. How does it work?

So, the paper use log_softmax and KL Divergence, that made my head dizzy trying to manually calculate it. So, I made the loss simple `loss = (logits.detach() - logits).sum()` for the sake of my mental health

The explanation are in the code https://gist.github.com/jasonrichdarmawan/90b8d992232f40e721d5adf64caad609

TLDR; when we do `logits.detach()`, the `logits.detach()` is treated as a constant during backpropagation. That's why the gradients are not 0

in Blog

Do you know why ChatGPT understand context within a sentence?