Author: Yiping, IOSG Ventures
As large language models (LLMs) continue to flourish, we see many projects integrating artificial intelligence (AI) and blockchain. The combination of LLM and blockchain is becoming more prevalent, and we are also seeing opportunities for the reintegration of AI and blockchain. One noteworthy example is Zero-Knowledge Machine Learning (ZKML).
Artificial intelligence and blockchain are two transformative technologies with fundamentally different characteristics. AI requires powerful computing power, usually provided by centralized data centers. On the other hand, blockchain provides decentralized computing and privacy protection but performs poorly in tasks that require large-scale computation and storage. We are still exploring and researching the best practices for integrating AI and blockchain, and we will also introduce some “AI + blockchain” project cases in the future.
- Stablecoin adds a new player Payment giant LianGuaiyLianGuail enters the game with PYUSD, and chooses to initially launch on Solana, a blockchain platform built on Ethereum.
- Curve storm comes to an end, let’s dig into the funding trends of key interest groups.
- LianGuai Daily | LianGuai launches USD stablecoin PYUSD
Source: IOSG Ventures
This research report is divided into two parts, and this article focuses on the application of LLM in the encryption field and explores the strategies for application implementation.
What is LLM?
LLM (Large Language Model) is a computerized language model consisting of an artificial neural network with a large number of parameters (typically billions). These models are trained on a large amount of unlabeled text.
Around 2018, the emergence of LLM completely changed the research of natural language processing. Unlike the previous approach that required training specific supervised models for specific tasks, LLM, as a general model, performs well in various tasks. Its capabilities and applications include:
Understanding and summarizing text: LLM can understand and summarize a large amount of human language and textual data. It can extract key information and generate concise summaries.
Generating new content: LLM has the ability to generate content based on text. By providing prompts to the model, it can answer questions, generate new text, summaries, or sentiment analysis.
Translation: LLM can be used for translation between different languages. They utilize deep learning algorithms and neural networks to understand the context and relationships between vocabulary.
Predicting and generating text: LLM can predict and generate text based on contextual background, similar to human-generated content, including songs, poems, stories, marketing materials, etc.
Applications in various fields: Large language models have broad applicability in natural language processing tasks. They are used in conversational AI, chatbots, healthcare, software development, search engines, tutoring, writing tools, and many other fields.
The advantages of LLM include its ability to understand large amounts of data, perform various language-related tasks, and the potential to customize results based on user needs.
Common Applications of Large Language Models
Due to their outstanding natural language understanding capabilities, LLMs have great potential, and developers mainly focus on the following two aspects:
Providing users with accurate and up-to-date answers based on a large amount of contextual data and content
Completing specific tasks given by users using different agents and tools
It is these two aspects that have led to the explosive growth of LLM applications for chatting with XX. For example, chatting with PDFs, documents, and academic papers.
Subsequently, people have tried to integrate LLM with various data sources. Developers have successfully integrated platforms such as Github, Notion, and some note-taking software with LLM.
In order to overcome the inherent limitations of LLMs, different tools have been incorporated into the system. The first such tool is the search engine, which provides LLM with the ability to access the latest knowledge. Further progress will integrate tools such as WolframAlpha, Google Suites, and Etherscan with large language models.
Architecture of LLM Apps
The following diagram outlines the process of LLM applications in responding to user queries: First, the relevant data sources are transformed into embedding vectors and stored in a vector database. The LLM adapter uses the user query and similarity search to find relevant context from the vector database. The relevant context is placed in the Prompt and sent to LLM. LLM executes these Prompts and generates answers using tools. Sometimes, LLM fine-tunes on specific datasets to improve accuracy and reduce costs.
The workflow of LLM applications can be roughly divided into three main stages:
Data Preparation and Embedding: This stage involves preserving confidential information (such as project memos) for future access. Typically, files are split and processed through embedding models and saved in a special type of database called a vector database.
Prompt Formulation and Extraction: When a user submits a search request (in this case, searching for project information), the software creates a series of Prompts that are input into the language model. The final Prompt typically includes prompt templates hard-coded by software developers as effective output examples for few-shot demonstrations, as well as any necessary data obtained from external APIs and relevant files extracted from the vector database.
Prompt Execution and Inference: After completing the Prompts, they are provided to pre-existing language models for inference, which may include proprietary model APIs, open-source models, or individually fine-tuned models. At this stage, some developers may also integrate operating systems (such as logging, caching, and validation) into the system.
Introducing LLM to the Cryptocurrency Field
Although the cryptocurrency field (Web3) has some similar applications to Web2, developing excellent LLM applications in the cryptocurrency field requires particular caution.
The cryptocurrency ecosystem is unique, with its own culture, data, and inclusiveness. LLM fine-tuned on these cryptocurrency-specific datasets can provide superior results at a relatively low cost. Although the data is abundant, there is a clear lack of open datasets on platforms such as HuggingFace. Currently, there is only one dataset related to smart contracts, which contains 113,000 smart contracts.
Developers also face the challenge of integrating different tools into LLM. These tools are different from those used in Web2, as they give LLM access to transaction-related data, interact with decentralized applications (Dapps), and execute transactions. So far, we haven’t found any integrated Dapps in Langchain.
Although developing high-quality encrypted LLM applications may require additional investment, LLM is naturally suitable for the encryption field. This field provides rich, clean, and structured data. Combined with the fact that Solidity code is usually concise and clear, this makes it easier for LLM to generate functional code.
In the “Next Part”, we will discuss 8 potential directions in which LLM can help the blockchain field, such as:
Integrating built-in artificial intelligence/LLM functionality into the blockchain
Analyzing transaction records using LLM
Identifying potential robots using LLM
Writing code using LLM
Reading code using LLM
Assisting the community using LLM
Tracking the market using LLM
Analyzing projects using LLM