How does ChatGPT able to predict the next sub-word?

How does ChatGPT able to predict the next sub-word of the following sentence << The pizza was just baked from the oven. The pi >> is "zza" instead of "e"?

So, in ChatGPT, there is something called layers and each layer does different thing.

In short, think this way:

1. The 1st layer will query << I am "zza" and I search for the sub-word before me >>, which we know the answer is "pi". The output is << I follow "pi" and I am "zza" >>.

2. The 2nd layer will query << I am "pi" and I am searching for sub-word containg "I follow "pi" >>, which we know the answer is "zza" based on the input provided by the 1st layer.

So that's how "pi" can predict the next sub-word "zza" instead of "e" in the sentence << The pizza was just basked from the oven. The pi >>.

Image is from https://www.lesswrong.com/posts/TvrfY4c9eaGLeyDkE/induction-heads-illustrated

#largelanguagemodel

in Large Language Model

I want to give you headache, Empirical Rule