On Monday, 10 July 2023, I plan to work a bit on the technologies around ChatGPT. The technology I am talking about is Reinforcement Learning with Human Feedback (RLHF). In the simplest words, RLHF includes a policy network and a reward network. The reward network teaches the policy network by reinforcement learning. The concept of two networks is not new. Actor-Critic or Student-Teacher are some examples. The difference in ChatGPT is that, the reward network is taught by human feedback. The simple law: it is very difficult to product something, however, it is very easy to evaluate. For instance, it is so difficult to draw, but it is so easy to say which paints are more beautiful. Applying this simple law to RLHF: we the human train the reward network to know how to evaluate two summaries of an essay, while the policy network needs to learn how to generate the summaries. Then we expand to other question-answers. The new approach, RLHF, has at least two important revolutions: The summar...