[Pond Security Model] Enhancing Web3 Security: A Comprehensive Study of a Novel GNN-Based Security Model

The decentralized networks of Web3 present new opportunities, but they also face increasingly complex and frequent security challenges. On-chain malicious activities, such as phishing, fraud, hacking, and money laundering, are becoming more common. The continuous growth in the variety and number of public blockchains and DApps, along with the development of on-chain privacy protocols and cross-chain interoperability protocols, has further amplified the volume and complexity of on-chain behavioral data, making the detection and analysis of these malicious activities a significant technical challenge.

Traditional detection methods, such as rule-based detection defined by expert knowledge, often struggle to adapt to the complex and ever-changing on-chain behaviors, resulting in high rates of missed detections and false positives. With the advancement of artificial intelligence, AI-based approaches have also been widely applied in practice. Deep learning, a machine learning technique, can learn representations from vast amounts of data without requiring any feature engineering. It can model patterns from any data, acquire knowledge from the data, and apply that knowledge to various tasks. For example, deep learning models can learn patterns from large volumes of blockchain transaction data, gaining knowledge about certain behavioral patterns. With this knowledge, they can then begin detecting abnormal or malicious transactions.

In the face of the complex graph structures of on-chain data in Web3, traditional deep learning methods are no longer suitable, requiring more specialized graph algorithms for analysis. This is where Graph Neural Networks (GNNs) demonstrate their strong potential for application. GNNs can handle complex graph data and effectively capture the relationships between nodes and edges, providing a powerful tool for on-chain behavior analysis. Pond is building a new security model based on graph algorithms, aiming to enhance on-chain security. Graph models help to capture latent malicious behavior patterns within complex network structures, enabling high-precision detection of malicious entities and activities. By integrating graph models with existing security products and protective measures, we are constructing a multi-layered, dynamically responsive security system to address the increasingly complex security threats in Web3.

We begin by defining the threat model:

  • Adversary's Goal:
    • Steal funds from users or smart contracts: The attacker's main goal is to exploit vulnerabilities in smart contracts, use phishing attacks, or other malicious methods to steal funds from end-users or smart contracts.
    • Obfuscate the source of funds for money laundering: The attacker attempts to hide the source of illegally obtained funds by using complex transaction paths, cross-chain transfers, privacy protocols (such as mixing services), and other means to evade regulatory scrutiny and complete the money laundering process.
  • Adversary's Knowledge:
    • The attacker is familiar with blockchain architecture, transaction mechanisms, and how smart contracts work, and can identify potential vulnerabilities in smart contracts, such as reentrancy attacks, improper access control, overflow/underflow issues, etc.
    • The attacker is knowledgeable about fund obfuscation techniques and understands how to use mixing services, cross-chain transfers, anonymization protocols (like Zcash, Tornado Cash, etc.), and wash trading to obscure the movement of funds and evade tracking by regulatory bodies.
    • The attacker possesses a certain level of DeFi (Decentralized Finance) financial tools knowledge, allowing them to effectively use tools like flash loans for short-term arbitrage, price manipulation, speculative behavior, or quickly moving funds.
  • Adversary's Capability:
    • Vulnerability exploitation, writing, and deploying malicious smart contracts: The attacker can identify and exploit known vulnerabilities in smart contracts, such as reentrancy attacks, overflow vulnerabilities in token contracts, and improper access control. The attacker has the ability to write complex smart contracts and can create or forge contracts to trick users into interacting with them, exploiting vulnerabilities to steal funds or carry out other malicious activities.
    • While the attacker may have limited personal funds, they can use decentralized financial tools like flash loans to borrow large sums of money in a very short time, enabling high-frequency, high-risk attack operations, and repay the loans quickly, avoiding the constraint of insufficient funds.
    • The attacker is capable of operating on multiple chains and has proficient cross-chain operational skills, using the multi-chain ecosystem to move funds.
    • The attacker’s computational resources are limited and unable to perform brute-force attacks against strong cryptographic protocols, nor can they control the consensus mechanism to influence the generation of the next block (e.g., they cannot control the network through a 51% attack).
    • The attacker is capable of concealing their transaction identity and the source of funds by using mixing services, privacy protocols, or dispersing funds across multiple addresses to evade tracking.

The attack can be divided into four stages based on its characteristics:

  • Funding
    • The Funding stage is the initial phase of the attack, during which the attacker needs funds to pay for gas, execute transactions, or use as collateral for loans to carry out the attack. Since centralized exchanges implement KYC & AML, the seed funds for the Funding stage must come from anonymous sources. Attackers often use privacy protocols like Tornado Cash to obfuscate the true origin of the funds, making it harder to trace the source through fund flow analysis.
  • Preparation
    • The second stage is the Preparation phase. Depending on the type of attack, the attacker may need to perform some setup and initialization before moving into the Exploitation phase. For example, to exploit a reentrancy attack, the attacker needs to set up an attack contract beforehand; in the case of Ice Phishing, the attacker needs to deceive the user into granting Token Approval.
  • Exploitation
    • The third stage is Exploitation, where the attacker drains funds from the smart contract or users. The methods at this stage are varied and can include logic errors, flash loans, reentrancy attacks, and more.
  • Money Laundering
    • The final stage is money laundering, which often involves interaction with Tornado Cash.

Graph Neural Networks (GNNs), by capturing the graph structure of on-chain data, can effectively detect complex malicious behaviors. GNNs can perform analysis at multiple levels, helping us understand the intricate relationships between on-chain transaction behaviors and attackers. GNN tasks can be categorized into four main types based on the granularity of their analysis: Node-level, Edge-level, Path-level, and Graph-level. Each granularity corresponds to different threat detection scenarios.

Node-level tasks are one of the most common applications of GNNs, primarily used in Web3 to analyze the behavior of individual addresses (or entities) within a blockchain network. Nodes can represent smart contracts, user wallet addresses, etc. In Web3 security scenarios, node tasks are mainly used to identify malicious addresses, phishing accounts, or vulnerable smart contracts.

  • By learning from the on-chain interaction network, GNNs can identify addresses exhibiting abnormal behavior, such as unusual interaction frequency or irregular fund flows, aiding in the detection of addresses that may be conducting phishing attacks. Node classification can be used to identify phishing addresses, with GNNs learning from historical interaction patterns to detect addresses maliciously deceiving users into granting Token Approval.
  • Smart contracts often serve as critical nodes in blockchain systems, and attackers may deploy malicious contracts to carry out attacks. By combining program analysis with GNN algorithms to analyze the behavior patterns of contracts, GNNs can predict potentially malicious contracts.

Edge-level tasks focus on the interaction behaviors between nodes. In the graph structure of a blockchain, edges represent transactions, contract calls, or asset interactions between nodes. Therefore, edge-level analysis can help detect abnormal interaction behaviors, especially in cases where attackers manipulate multiple accounts.

  • Attackers may manipulate multiple accounts to create false trading volumes (such as wash trading) to manipulate market prices or obscure the flow of funds. GNNs can detect these abnormal edge behaviors by learning the relationships between transactions, identifying potential market manipulation activities.
  • In address poisoning attacks, attackers use methods like zero-transfer transactions to trick users into sending funds to the wrong address. By analyzing edges, GNNs can detect abnormal or invalid interactions, providing early warnings of potential risks.

  • GNNs can identify abnormal large fund transfers through edge classification, such as the complex paths involved in borrowing and repaying assets via flash loans, allowing for the timely detection of manipulative behaviors behind these fund movements.

Path-level tasks refer to the analysis of entire transaction paths or fund flow chains by GNNs, making them particularly useful for identifying complex money laundering activities and cross-chain transactions. Attackers often obscure the origin of funds through multiple transfers or cross-chain operations, but path-level analysis can effectively trace the flow and source of funds, uncovering abnormal behaviors hidden behind multi-layered transfers.

  • Attackers typically conduct money laundering through multiple addresses or cross-chain transfers. GNNs can analyze the flow paths of funds on-chain, identify complex money laundering chains, and assist in tracking and preventing the transfer of illicit funds.

  • Attackers may use privacy protocols like Tornado Cash to “clean” their funds. By constructing a fund flow graph and employing clustering and classification algorithms to extract characteristics of mixing behavior, GNNs can analyze the outflow paths of Tornado Cash funds, partially achieve de-anonymization, and thereby detect the underlying money laundering activities.

Graph-level tasks involve analyzing the entire graph or subgraphs to observe interaction patterns between nodes from a global perspective, focusing on identifying potential risk points or large-scale malicious activities within the network. By analyzing the entire graph, GNNs can recognize overall interaction patterns, aiding in the detection of complex malicious behaviors. For example, attackers may launch a Sybil attack by creating a large number of fake accounts, and GNNs can identify these malicious activities from a global graph perspective.

  • Attackers may create a large number of fake accounts to fraudulently claim airdrop rewards by disguising their activities as normal network interactions. Through graph-level analysis, GNNs can identify abnormal collaborative operations between these accounts.
  • GNNs can analyze the entire blockchain network to identify clusters of malicious smart contracts. Attackers often use similar phishing or malicious contracts repeatedly to achieve their attack objectives. By performing a comprehensive analysis, we can quickly associate a new contract with known clusters of malicious contracts, thereby preventing potential damage. Additionally, when a contract is found to have a security incident, we can promptly check similar contracts and issue risk warnings.

Currently, Pond have developed a GNN-based malicious wallet prediction model. In small-scale trials, the model achieved an accuracy of 0.936 and a precision of 0.935, indicating very high precision and reliability in detecting malicious wallets. We are now conducting larger-scale data validations and developing more use cases for GNNs in security scenarios, such as:

  • Reducing false positives in traditional program analysis tools using GNNs;
  • Analyzing on-chain data in real time to help operational monitoring platforms detect potential attacks early, and integrating with automated response systems to enable automated defenses;
  • Assisting fund tracing and root cause analysis systems to more quickly reconstruct attack paths and identify the flow of attack funds post-attack;

By integrating with various Web3 security tools, GNNs can enhance the intelligent analysis capabilities across the entire security chain—from prevention and detection to response—providing comprehensive on-chain security protection.