Blockchain 101 for Data Scientists

As a data scientist, you've probably heard that blockchain data is open and public. However, when you start exploring it, you might find yourself puzzled by terms like "addresses," "smart contracts," "EVM," and more. This article aims to provide you with an intuitive understanding of the blockchain world, focusing on its data aspects, and help you get started with blockchain data. For blockchain experts, the concepts below are oversimplified for quick illustration, but if there is any mistake please feel free to let us know.

Bitcoin: The Dawn of Blockchain

First of all, what is a blockchain? Imagine a special kind of database designed to securely record transactions over time. In the case of Bitcoin, the first and most well-known blockchain, it's like a public ledger that records transfers of bitcoins from one user to another.

Key Concepts

  • Address: A unique identifier (similar to an email address or bank account number) for a user on the blockchain. It's where bitcoins can be sent and received.
  • Transaction: A record of a transfer of bitcoins from one address to another.
  • Block: A group of transactions bundled together. Blocks are added to the blockchain in chronological order.
  • Blockchain: A chain of blocks linked together, containing the entire history of transactions.

Why Blocks?

You might wonder why transactions aren't just added individually to the database. Grouping transactions into blocks is fundamental for security and decentralization. Instead of a central authority managing the database, the blockchain is maintained by a network of computers called nodes.

When you submit a transaction (e.g., sending someone 1 bitcoin), it goes into a waiting area known as the mempool (short for "memory pool"). A miner—a participant in the network—selects a batch of transactions from the mempool, verifies them, and includes them in a new block. This block is then added to the blockchain, and all nodes update their copies accordingly.

Miners are incentivized through a reward in bitcoins for their work, which is why they are called "miners." Transactions become effective only after the block is added to the chain. All transactions in a block share the same timestamp—the time when the block is added—even though they might have been submitted at different times.

Figure: A blockchain consists of multiple blocks, each containing several transactions. Each transaction has a unique identifier (Tx_Hash), the address initiating the transaction (From_Address), the address being interacted with (To_Address), the amount of value transacted (Value), and other related information. Please see our TRANSACTIONS table for details.

Ethereum: A New Era Begins

Blockchain technology has evolved significantly since Bitcoin, with a major milestone being the introduction of smart contracts on the Ethereum blockchain. Think of smart contracts as self-executing programs that run on the blockchain when certain conditions are met, similar to how a vending machine dispenses a product when the right amount of money is inserted.

Smart Contracts and the EVM

  • Smart Contract: A piece of code stored on the blockchain that automatically executes actions when predefined conditions are met such as when called by a user or another smart contract.
  • EVM (Ethereum Virtual Machine): The runtime environment for smart contracts on Ethereum. It ensures that smart contracts execute consistently across all nodes in the network.

With smart contracts, Ethereum transformed the blockchain from a simple ledger of transactions into a decentralized computing platform.

Addresses in Ethereum

Every smart contract has an address, just like user accounts do. To differentiate:

  • Externally Owned Account (EOA): Controlled by private keys held by users. Think of it as your personal wallet.
  • Contract Address: Associated with a smart contract. It's like an application or service on the blockchain.

Transactions can only occur between EOAs or from an EOA to a contract address. But in a transaction, contracts can call other contracts.

Logs and Events

When smart contracts execute, they emit logs (also known as events), which are stored on the blockchain. These logs contain information about what happened during the execution, such as which functions were called and the results. For data scientists, these logs are invaluable for understanding blockchain activities.


Tokens and Standards

Smart contracts paved the way for creating new digital assets on Ethereum, leading to the development of tokens and Non-Fungible Tokens (NFTs).

ERC-20 Tokens

The first major application of smart contracts was the creation of new cryptocurrencies or tokens. Each token is managed by a smart contract, and people interact with the contract to transfer tokens.

The Importance of ERC-20:

  • ERC-20 Standard: A technical standard defining a common set of rules for Ethereum tokens.
  • Why It Matters: Standardization ensures that all ERC-20 tokens behave predictably, making it easier for wallets, exchanges, and applications to interact with them.

For data scientists, this means that token transfer data is structured consistently, simplifying analysis.

Non-Fungible Tokens (NFTs)

NFTs represent unique digital assets. Unlike ERC-20 tokens, which are fungible (interchangeable), NFTs are one-of-a-kind and cannot be exchanged on a one-to-one basis.

ERC-721 and ERC-1155 Standards

  • ERC-721: The first standard for NFTs, where each token is unique.
  • ERC-1155: A multi-token standard allowing for both fungible and non-fungible tokens in a single contract.
  • Significance: Standards ensure that NFTs can be universally recognized and interacted with across different platforms.

Summing Up

  • ERC-20 Tokens: Like identical coins—each one is the same as any other. They can be traded in fraction. For example, you can send 0.1 token.
  • NFTs: Like unique collectibles—each one has distinct characteristics.

Users can initiate a transaction with a smart contract to call a specific function, such as transferring a token associated with the contract. To facilitate data analysis, we can extract these transfers from the contracts' logs. See our NATIVE_TRANSFERS, TOKEN_TRANSFERS, and NFT_TRANSFERS tables for examples.

Figure: Transfers can be extracted from transactions. Each transfer inherits data such as block_timestamp and tx_hash from the associated transaction but also contains parsed data, including the sending and receiving addresses of the assets, and more.

What's Next?

This article has barely scratched the surface of blockchain ecosystems. In upcoming articles, we'll delve into more advanced topics such as Decentralized Exchanges (DEX), Decentralized Finance (DeFi), Miner Extractable Value (MEV) and so on.

Meanwhile, for the curious minds, there are plenty of learning resources out there for you to explore. The list below is by no means exhaustive: