@rk @wordshaper Indeed, when using local inference, it's possible to get perplexity on a per-token basis as well as for the entire context window. Perplexity is merely the exponential of the Shannon entropy, so a lower perplexity indicates a less-surprising text. The reason to prefer this over reading generated summaries is that perplexity is numeric and quantitative.