Merkle Trees

A Merkle tree is a binary tree in which every leaf node represents a data block and every non-leaf node represents the hash of its child nodes. The root node of the tree represents the overall hash of all the data in the tree. It is named after Ralph Merkle, a computer scientist who proposed the concept in 1979.
Merkle trees are used in a variety of computer science applications, including in file systems, cryptocurrency systems, and networking protocols. In the context of cryptocurrencies like Bitcoin and Ethereum, Merkle trees are used to efficiently prove that a transaction or block is part of the blockchain. Each block header in the blockchain contains a Merkle root hash, which represents the hash of all the transactions in that block. By checking this hash against the root hash stored in the blockchain, nodes can verify that the transactions in a block have not been tampered with.


Root Hash
| |
Hash 1 Hash 2
| |
+-----+-----+ +-----+-----+
| | | |
Leaf 1 Leaf 2 Leaf 3 Leaf 4
In this example, the tree has four leaves (i.e., data elements) labeled "Leaf 1" through "Leaf 4". Each leaf is hashed to produce a hash value, which is then combined with the hash of its sibling (i.e., adjacent node at the same level) to produce a parent node. This process continues recursively up the tree until there is only one node left, called the root hash.
So in this case, the hash of Leaf 1 is combined with the hash of Leaf 2 to produce the parent node "Hash 1", and the hash of Leaf 3 is combined with the hash of Leaf 4 to produce the parent node "Hash 2". Finally, the hash of Hash 1 is combined with the hash of Hash 2 to produce the root hash.
By verifying that a leaf node's hash value is included in the Merkle tree's root hash, a user can ensure the authenticity of the corresponding data element without having to know the entire tree.

Characteristics of a Merkle tree

  1. 1.
    Security: A Merkle tree provides a high level of security by ensuring the integrity of data. Any modification to a single data element in the tree will result in a change in the hash of its parent node and subsequently all the way up to the root node.
  2. 2.
    Efficiency: Merkle trees are efficient and enable efficient verification of data integrity. They do not require the entire dataset to be available to verify the integrity of specific data. Instead, a branch of the tree can be verified by checking only the nodes along the path from the leaf to the root node.
  3. 3.
    Scalability: Merkle trees are scalable and can handle large amounts of data. The number of leaf nodes in the tree determines the size of the tree. In practice, Merkle trees can handle millions or even billions of leaf nodes.
  4. 4.
    Flexibility: Merkle trees are flexible and can be used in various applications, including data storage, peer-to-peer networks, and blockchain.
  5. 5.
    Determinism: Merkle trees are deterministic, meaning that they always produce the same root hash for the same set of data, enabling easy verification of data consistency.

Merkle Tree vs Patricia Merkle Tree

Both Merkle trees and Patricia trees are data structures used for efficient storage and retrieval of data, but they differ in their internal structures and the types of data they are optimized for.
A Merkle tree is a binary tree structure where each leaf node contains the hash of a data block, and each non-leaf node contains the hash of the concatenation of its two child nodes. The root of the tree represents the hash of the entire dataset, and any change to the data will require updating the hashes of all affected nodes up to the root. Merkle trees are typically used to verify the integrity of large datasets, as only the root hash needs to be checked to ensure that the entire dataset has not been tampered with.
A Patricia tree, also known as a radix tree or trie, is a tree data structure where each node represents a string of characters, and the edges represent characters that lead to the next node. Unlike a Merkle tree, Patricia trees can store arbitrary data, not just hashes. The Patricia tree is optimized for key-value storage and retrieval, and is often used in databases and file systems.
The Ethereum state tree is implemented as a variant of the Patricia tree called the Modified Merkle Patricia Tree (MMPT), which combines the benefits of both Merkle and Patricia trees. MMPT is used to store the account state, contract storage, and transaction trie of the Ethereum network.