How do large language models take root and flourish in the field of cryptography?

Author: Yiping, IOSG Ventures

This article is original content from IOSG and is only for industry learning and communication purposes. It does not constitute any investment reference. If you need to quote, please indicate the source. For reprinting, please contact the IOSG team for authorization and reprinting instructions.

Introduction

With the booming of large language models (LLMs), we have seen many projects integrating artificial intelligence (AI) and blockchain. The combination of LLMs and blockchain is becoming more and more common, and we also see opportunities for AI to re-integrate with blockchain. One notable integration is Zero-Knowledge Machine Learning (ZKML).
AI and blockchain are two transformative technologies with fundamentally different characteristics. AI requires powerful computing power, usually provided by centralized data centers. On the other hand, blockchain provides decentralized computing and privacy protection, but performs poorly in large-scale computing and storage tasks. We are still exploring and researching the best practices for integrating AI and blockchain, and will also introduce some “AI + blockchain” project cases in the future.

Source: IOSG Ventures

This research report is published in two parts, and this article focuses on the application of LLMs in the encryption field and explores the strategies for application implementation.

What is LLM?

LLM (Large Language Model) is a computerized language model composed of an artificial neural network with a large number of parameters (usually billions). These models are trained on a large amount of unlabeled text.

Around 2018, the birth of LLMs completely changed the research on natural language processing. Unlike previous methods that required training specific supervised models for specific tasks, LLMs, as general models, perform well on various tasks. Their capabilities and applications include:

Understanding and summarizing text: LLMs can understand and summarize large amounts of human language and text data. They can extract key information and generate concise summaries.
Generating new content: LLMs have the ability to generate content based on text. By providing prompts to the model, it can answer questions, generate new text, summaries, or sentiment analysis.
Translation: LLMs can be used for translation between different languages. They utilize deep learning algorithms and neural networks to understand the context and relationships between words.
Predicting and generating text: LLMs can predict and generate text based on contextual background, similar to human-generated content, including songs, poems, stories, marketing materials, etc.
Applications in various fields: Large language models have wide applicability in natural language processing tasks. They are used in conversational AI, chatbots, healthcare, software development, search engines, tutoring, writing tools, and many other fields.

The advantages of LLM include its ability to understand large amounts of data, perform various language-related tasks, and the potential to customize results based on user needs.

Common applications of large language models

Due to its outstanding natural language understanding ability, LLM has great potential, and developers mainly focus on the following two aspects:

Providing accurate and up-to-date answers to users based on a large amount of contextual data and content
Completing specific tasks given by users using different agents and tools

It is these two aspects that have led to the explosion of LLM applications such as chatting with XX, chatting with PDFs, chatting with documents, and chatting with academic papers.

Subsequently, people have attempted to integrate LLM with various data sources. Developers have successfully integrated platforms such as Github, Notion, and some note-taking software with LLM.

To overcome the inherent limitations of LLM, different tools have been incorporated into the system. The first such tool is a search engine, which provides LLM with the ability to access the latest knowledge. Further progress will integrate tools such as WolframAlpha, Google Suites, and Etherscan with large language models.

The architecture of LLM Apps

The following diagram outlines the process of LLM applications in responding to user queries: First, relevant data sources are transformed into embedded vectors and stored in a vector database. The LLM adapter finds relevant contexts from the vector database using user queries and similarity search. The relevant contexts are put into a Prompt and sent to LLM. LLM executes these Prompts and generates answers using tools. Sometimes, LLM will fine-tune on specific datasets to improve accuracy and reduce costs.

The workflow of LLM applications can be roughly divided into three main stages:

Data preparation and embedding: This stage involves preserving confidential information (such as project memos) for future access. Typically, files are split and processed using embedding models and saved in a special type of database called a vector database.
Prompt formulation and extraction: When a user submits a search request (in this case, searching for project information), the software creates a series of Prompts that are input into the language model. The final Prompt usually includes prompt templates hardcoded by software developers as effective output examples for few-shot demonstrations, as well as any required data obtained from external APIs and relevant documents extracted from the vector database.
Prompt execution and inference: After completing the Prompts, they are provided to pre-existing language models for inference, which may include proprietary model APIs, open-source models, or individually fine-tuned models. At this stage, some developers may also integrate operating systems (such as logging, caching, and validation) into the system.

Introducing LLM to the crypto field

Although the crypto field (Web3) has some similar applications to Web2, developing excellent LLM applications in the crypto field requires particular caution.

The encrypted ecosystem is unique, with its own culture, data, and integration. Fine-tuned LLM on these encrypted-limited datasets can provide superior results at relatively low cost. Although the data is abundant, there is a clear lack of open datasets on platforms like HuggingFace. Currently, there is only one dataset related to smart contracts, which contains 113,000 smart contracts.

Developers also face the challenge of integrating different tools into LLM. These tools are different from those used in Web2, as they provide LLM with access to transaction-related data, interaction with decentralized applications (DApps), and the ability to execute transactions. So far, we have not found any integrated DApps in Langchain.

Although developing high-quality encrypted LLM applications may require additional investment, LLM is naturally suitable for the field of encryption. This field provides rich, clean, and structured data. Coupled with the fact that Solidity code is usually concise and clear, this makes it easier for LLM to generate functional code.

In the “Next Steps,” we will discuss eight potential directions in which LLM can help the blockchain field, such as:

Integrating built-in AI/LLM functionality into the blockchain

Using LLM to analyze transaction records
Using LLM to identify potential bots
Using LLM to write code
Using LLM to read code
Using LLM to assist the community
Using LLM to track the market
Using LLM to analyze projects

Stay tuned!

Like what you're reading? Subscribe to our top stories.

We will continue to update Gambling Chain; if you have any questions or suggestions, please contact us!

AIazDAOETH

Gambling Chain

How do large language models take root and flourish in the field of cryptography?

Introduction

What is LLM?

Common applications of large language models

The architecture of LLM Apps

Introducing LLM to the crypto field

Like what you're reading? Subscribe to our top stories.

Was this article helpful?

From Blockchain to LLM In-depth Interpretation of the Evolution and Challenges of Data Indexing Technology

Curve storm comes to an end, let’s dig into the funding trends of key interest groups.

Products used

GC Wallet