With models such as Chat-GPT there is another potential source of bias (and for mitigating bias). People have already mentioned that the model is selecting from a distribution of possible tokens. That is what the base GPT model does, but this process is then steered via reinforcement learning with human feedback. There are several ways to do this but essentially a human is shown various outputs of the model, and then they score or rank the outputs ,which go back into the objective function. The reinforcement during this process could steer the model either towards bias or away from it.
This is one argument proponents of open source models use: we don't know, and may never know, the reinforcement regimen of Chat-GPT. Open source communities are more transparent regarding what guidelines were used and who performed RL.
Makes me consider what we mean by learning. I wouldn't have considered this example learning, because the model weights have not updated due to this interaction. What has happened is that the model is using the context of the previous answers it has give. Essentially asking the model to generate the most likely outputs, given that the inputs already include wrong attempted answers. The default is 2048 tokens, with a current (and rapidly increasing) max of 4096.
I would put this in the domain of prompt engineering rather than learning, as it's up to the human to steer the model to the right answer. But maybe it is a type of learning?