How does ChatGPT able to predict the next sub-word of the following sentence << The pizza was just baked from the oven. The pi >> is "zza" instead of "e"?
So, in ChatGPT, there is something called layers and each layer does different thing.
In short, think this way:
1. The 1st layer will query << I am "zza" and I search for the sub-word before me >>, which we know the answer is "pi". The output is << I follow "pi" and I am "zza" >>.
2. The 2nd layer will query << I am "pi" and I am searching for sub-word containg "I follow "pi" >>, which we know the answer is "zza" based on the input provided by the 1st layer.
So that's how "pi" can predict the next sub-word "zza" instead of "e" in the sentence << The pizza was just basked from the oven. The pi >>.
Image is from https://www.lesswrong.com/posts/TvrfY4c9eaGLeyDkE/induction-heads-illustrated
#largelanguagemodel