I want to give you headache, human language does not follow a distribution of high probability next words

I want to give you a headache.

Ari Holtzman et al. (2019) argued that high quality human language does not follow a distribution of high probability next words. In other words, as humans, we want generated text to surprise us and not to be boring/predictable. The authors show this nicely by plotting the probability, a model would give to human text vs. what beam search does.

So, how do you interpret the model that is designed to be unpredictable? [Broken]

#largelanguagemodel

in Large Language Model

How does ChatGPT able to predict the next sub-word?