I want to give you headache, human language does not follow a distribution of high probability next words

I want to give you a headache.


Ari Holtzman et al. (2019) argued that high quality human language does not follow a distribution of high probability next words. In other words, as humans, we want generated text to surprise us and not to be boring/predictable. The authors show this nicely by plotting the probability, a model would give to human text vs. what beam search does.


So, how do you interpret the model that is designed to be unpredictable? [Broken]


#largelanguagemodel

How does ChatGPT able to predict the next sub-word?