Professor Zhenzhong Lan of Westlake University Several Understandings about Large Models

On September 19, 2023, the “2023 Shanghai Blockchain International Week·9th Global Blockchain Summit” opened in Shanghai. Lan Zhenzhong, founder of Xihu Xincheng and professor at Xihu University, gave a live speech on the topic of “Several Cognitions on Large Models”.

For more information, please click: “Exciting Highlights of the 2023 Shanghai Blockchain International Week (Continuously Updated)”

LianGuai provided full live coverage of the conference. The following is a summary of the speech.

Good morning, everyone!

Today, I will mainly talk about large models and artificial intelligence, and later I will talk about the combination with Web3 and the work we are doing.

I have been working on artificial intelligence since 2007, for more than ten years now. I have gone from the CPU era to the GPU era, from small models to large models, and have been doing it for a long time. I have also done some more representative work. In 2019, when I was at Google, the large model I worked on was the best in the world, much better than GPT2, so at that time we didn’t think highly of the GPT series, but now they are doing very well.

In 2020, when I returned to China, I conducted the first evaluation of a large Chinese model, so I am deeply involved in large models. Now I have a laboratory and a company both doing research related to large models.

In the past, I rarely looked back at the history of the development of large models, and rarely did deep thinking. It wasn’t until ChatGPT became popular and people started asking me various questions that I summarized a few questions:

First, do people want models to become bigger or smaller?

Second, everyone is talking about general large models, so are there opportunities for general large models, or are there opportunities for industry-specific large models?

Third, should I invest in NVIDIA or invest in large model companies and application companies?

Fourth, how will large models change my work for the general public? How should I choose my career?

These questions made us review the history. I mainly present some past data, hoping to provide reference for everyone.

First, the first question, will large models become bigger and bigger? Looking back at history, from the beginning of the development of computers in the 1950s, models have actually become larger and larger, always getting bigger. It can be said that the increase in model size is basically the first element of the increase in model intelligence, so models will become bigger and bigger.

Until 2018, we found a method that can cause models to expand rapidly. Now the expansion is very fast. From 2018 to early 2021, it increased several hundred times every 18 months. The speed has slowed down now, but it is still expanding rapidly.

(As shown in the graph) This graph is for GPT4. The vertical axis represents the level of intelligence, the lower the axis, the higher the level of intelligence. The horizontal axis represents the model size and the amount of training. As the model becomes larger and the training increases, the level of intelligence becomes higher and higher. The green dot represents GPT4, at that point, there is still a slope, it will still decline. So it can be foreseen that when you make the model even larger, it will still become more intelligent. Humans always pursue limits, and we will definitely enlarge it.

The main concern now is that GPT4 is already a trillion-level model, and the cost of reasoning is particularly expensive. Is it useful to scale up?

Looking at another data, we can see that this concern is not necessary because the cost of training and reasoning is rapidly decreasing. When GPT3 was trained in 2020, the cost of a single training session was $4 million. By 2022, it had dropped to $400,000. The rate of cost reduction is very fast.

This is mainly due to several factors:

First, GPU performance has risen sharply and costs have decreased, far exceeding Moore’s Law. From 2016 to 2022, CPU performance has increased 8 times according to Moore’s Law, while GPU performance has increased 26 times, which is very significant.

Second, there have been improvements in software, which have led to increased training efficiency. The cost of training has decreased by about 47% per year. The combination of these two factors results in a significant decrease, one being hardware and the other being software.

Third, we are deploying computational power on a large scale. Before ChatGPT was released, global computational power increased by 20% to 40% per year. After ChatGPT was released, the increase in computational power may be doubled. When your computational power increases on a large scale and GPUs are produced in large quantities, operating costs also decrease. Overall, the cost of training and reasoning is rapidly decreasing, so we can see a 10-fold decrease in two years.

In the next few years, trillion-level models like GPT4 will become relatively cheap and everyone can use them.

In summary, I predict that models will continue to grow in size, their capabilities will continue to strengthen, the cost of training and reasoning will continue to decrease, and the iteration speed will be very fast.

(As shown in the figure) This figure illustrates GPT1, which I didn’t think highly of at the time. Looking back now, I made a big mistake. GPT1 made a significant contribution by transforming artificial intelligence from specialized AI to general AI.

There used to be hundreds of natural language processing tasks, with various models designed for each task, resulting in a large number of papers. But after GPT1 came out, it said, “Don’t use various models, I can handle most of them with a single model.”

The next article is from my colleague at Google at that time, which integrates various tasks into the same model. Therefore, the main contribution of this wave is universality, which is not only reflected in text but also in various data such as images, sounds, and protein sequences. As long as you can convert the data into sequences, it can be processed.

By dividing the image into many segments, it is now possible for the Transformer model to handle various tasks, covering a wide range of tasks and demonstrating strong versatility.

Although large models cannot currently handle many complex tasks, as long as you provide a little help and break down the tasks slightly, they can be done. Although GPT4 may seem powerful, its accuracy in solving 24-point problems directly is only 7.3%. However, if you decompose the problem a little, you can increase the accuracy to 74%. Many seemingly complex tasks can be solved by the GPT series models or general large models with the help of professionals, achieving automation.

One is that the model will become larger, and the other is that the generality can be slightly decomposed to solve many complex characters, so the landing is very strong. There are many successful cases in foreign countries, such as Duolingo, a company in Pittsburgh, which saw a 42% increase in revenue in the first quarter of 2023 because it added the application of ChatGPT.

Nowadays, many programmers are using Copilot. OpenAI’s revenue is estimated to reach $1.2 billion this year, which is a very difficult revenue scale for a startup company.

What is different about this wave of artificial intelligence is that it replaces intellectual laborers. The graph on the right shows the intelligence level (automation level) of various industries before this wave of general artificial intelligence. The bottom represents those without a degree, followed by those with a Master’s degree and a PhD. The higher the level, the lower the degree of replaceability. Now it’s different. After the emergence of general artificial intelligence, intellectual laborers can also be easily replaced.

In summary, the landing of large models will be faster than we imagine, although it may be slower than what many financial workers imagine, because the stock market always reacts faster than technology, at least faster than we imagine, and it can empower various industries. It is difficult to decompose various tasks, but if large model companies delve into the industry, there are great opportunities.

Now most people focus on the intelligence level of models and pay less attention to the level of “emotional intelligence” when interacting with people. For example, when I asked questions that my loved one would ask, ChatGPT gave me an answer that is methodical but lacks emotion. It seems that our interaction with the model is cold and lacks attention to users, which is a manifestation of the early stage of industry development.

You can compare it with search engines. When they were first launched, personalization was rare, but now, everyone uses different search engines like Baidu and Google because there is a lot of information for personalization, making the search more accurate. However, large models cannot achieve this at the moment.

Some companies have started doing it, such as a company called Character.ai, which was also created by my Google colleagues. They have added personalization to the model, which significantly improves the interaction time between the model and humans. The data for May shows that the average interaction time for OpenAI is 4 minutes, while the average interaction time for this company is 28 minutes, which is several times longer. The page is like this, which means I divide the large model into various Capitals and Agents and make it more personalized and emotional, so people are willing to interact with it. With the development of large models, there will be major breakthroughs in human-computer interaction in the future.

Our company and laboratory mainly research general large models with high intelligence and emotional intelligence, mainly multimodal large models. In the past, in order to improve the emotional intelligence of the model, we have made a series of improvements in memory, personalization, and emotional perception abilities.

The model was launched relatively early because I had been working on general large models at Google for a long time. Before ChatGPT was released in mid-2020, we already had our own general large model. At that time, the model’s writing ability was on par with 3.5, and substantial progress was made.

After more than a year of going live, there are more than 200 C-end users, including more than 100 B-end users such as Starbucks and Alipay.

One of the more typical applications is the cooperation with Tom Cat. Tom Cat is a companion product with 400 million monthly active users worldwide. Previously, it mainly replicated human speech and copied it through voice changing. We added multimodal interaction and conversation capabilities to it.

Now let’s return to Web3 related to the conference. This is my superficial understanding. I believe that large models and Web3 correspond to productivity and production relations, respectively. Large models greatly improve productivity, but in order to make good use of them, there must be corresponding production relations to match. From my summary, there are several problems with the implementation of large models:

First, the training cost is very high. Start-up companies do not have the incentive to open source the models they have spent millions of dollars training. After the models are open sourced, they are no longer related to me. It is difficult for them to open source. However, open sourcing is very important for models. Many models are currently black boxes, and many research institutions cannot afford the cost of training their own models. If everyone is training, then everyone is reinventing the wheel. Therefore, open sourcing is very important, but it requires corresponding incentive mechanisms.

Second, the cost of inference is high. The cost of a single dialogue inference with GPT4 is 60 cents, which is much more expensive than me speaking. Inference cost is very high, making implementation very difficult. GPT4 can be used in many places, but the cost is too high.

Third, there is data sensitivity. The previous Samsung data leak caused a stir. The data we upload to large models now are sensitive data. Many companies are unwilling to upload their own data. How can we handle these issues? We hope that Web3 can help us solve these problems.

Just now, I heard Professor Cao talk about many difficulties, but we hope that through research, we can help solve these problems. For example, we have a public chain where everyone can upload open source models. Even if you open source and upload them to the public chain, there are corresponding incentive mechanisms. For example, if users can upload data and allow us to train, there will also be corresponding incentives.

There is also a computational problem. Now everyone’s mobile phones have very powerful graphics cards. If everyone’s phones can contribute to inference, we can greatly reduce the cost of inference. I hope that with the power of Web3, we can truly achieve our ideals and empower various industries with large models, accompanying everyone, and truly becoming everyone’s assistant or companion.

Thank you all!

Like what you're reading? Subscribe to our top stories.

We will continue to update Gambling Chain; if you have any questions or suggestions, please contact us!

Follow us on Twitter, Facebook, YouTube, and TikTok.

Share:

Was this article helpful?

93 out of 132 found this helpful

Gambling Chain Logo
Industry
Digital Asset Investment
Location
Real world, Metaverse and Network.
Goals
Build Daos that bring Decentralized finance to more and more persons Who love Web3.
Type
Website and other Media Daos

Products used

GC Wallet

Send targeted currencies to the right people at the right time.