The Khedra Book

Khedra (prononced kɛd-ɾɑ) is an all-in-one "long-running" tool for indexing and sharing the Unchained Index and monitoring individual addresses on EVM-compatible blockchains.

The tool creates and shares the Unchained Index which is a permissionless index of "address appearances," including appearances in event logs, execution traces, incoming transactions, modifications to smart contract state, staking or block rewards, prefund allocations and many other locations.

This detailed indexing allows for near-perfect monitoring and notifications of address activity, which leads to many benefits. The benefits include native and ERC-20 account balance histories, address auditing and accounting, and even custom indexing. It works for any address on any chain (as long as you have access to the chain's RPC).

Enjoy!

Please help us improve this software by providing any feedback or suggestions. Contact information and links to our socials are available at our website.

About the Name

The name khedra (prononced kɛd-ɾɑ) is inspired by the Persian word خدمت (khedmat), meaning "service."

In ancient Persian culture, service was considered a noble pursuit, emphasizing dedication, reliability, and humility in action. Drawing from this tradition, the name khedra embodies the essence of a system designed to serve--efficiently, continuously, and with purpose.

Simliar to its counterpart, chifra (derived from the Persian word for "cipher"), the name khedra symbolizes a long-running, dependable processes that tirelessly "serves" the needs of its users.

More technically, khedra is a collection of go routines that:

  • creates and publishes the Unchained Index,
  • monitors a user-provided customized list of addresses automating caching, notifications, and other ETL processes,
  • provides a RESTful API exposing chifra's many data access commands,
  • allows for starting, stopping, pausing, and resuming these individual services.

By choosing the name khedra, we honor a legacy of service while committing to building tools that are as resilient, adaptive, and reliable as the meaning behind its name.

User Manual

Overview of Khedra

Khedra is a blockchain indexing and monitoring application designed to provide users with an efficient way to interact with and manage transactional histories for EVM-compatible blockchains. It supports functionalities such as transaction monitoring, address indexing, publishing and pinning the indexes to IPFS and a smart contract, and a RESTful API for accessing data.

Purpose of this Document

This "User's Manual" is designed to help users get started with Khedra, understand its features, and operate the application effectively for both basic and advanced use cases. For a more technical treatment of the software, refer to the Technical Specification.

Intended Audience

This manual is intended for:

  • End-users looking to index and monitor blockchain data.
  • Developers integrating blockchain data into their applications.
  • System administrators managing blockchain-related infrastructure.

Introduction

Blockchains are long running-processes that continually create new data (in the form of blocks). For this reason, any process that wishes to monitor, index, or access data from a blockchain must also be long running.

Khedra is such a long-running process.

In order to remain decentralized and permissionless, blockchains must be "freed" from the stranglehold of large data providers. One way to do that is to help people run blockchain nodes locally. However, as soon as one does that, one learns that blockchains are not very good databases. This is for a simple reason, they lack an index.

TrueBlocks Core (of which chifra and khedra are a part) is a set of command-line tools, SDKs, and packages that help users who are running their own blockchain nodes make better use of the data. Khedra indexes and monitors the data. Chifra helps access the data providing various useful commands for exporting, filtering, and processing on-chain activity.

Of primary importance in the design of both systems are:

  • speed - we cache nearly everything
  • permisionless access - no servers, no API keys, you run your own infrastructure
  • accuracy - the goal is 100% off-chain reconciliation of account balances and state history
  • depth of detail - required to enable 100% accurate reconciliations
  • ease of use - so shoot us - this one is hard

Enjoy!

Please help us improve this software by providing any feedback or suggestions. Contact information and links to our socials are available at our website.

Getting Started

Overview

Khedra runs primarily from a configuration file called config.yaml. This file lives at ~/.khedra/config.yaml by default. If the file is not found, Khedra creates a default configuration in this location.

The config file allows you to specify key parameters for running khedra, including which chains to index/monitor, which services to enable, how detailed to log the processes, and where and how to publish (that is, share) the results.

You may use environment variables to override specific options. This document outlines the configuration file structure, validation rules, default values, and environment variable usage.


Quick Start

  1. Download, build, and test khedra:

    git clone https://github.com/TrueBlocks/trueblocks-khedra.git
    cd trueblocks-khedra
    go build -o khedra main.go
    ./khedra version
    

    You should get something similar to khedra v4.0.0-release.

  2. Establish the config file and edit values for your system:

    mkdir -p ~/.khedra
    cp config.yaml.example ~/.khedra/config.yaml
    ./khedra config edit
    

    Modify the file according to your requirements (see below).

    The minimal configuration needed is to provide a valid RPC to Ethereum mainnet. (All configurations require access to Ethereum mainnet.)

    You may configure as many other EVM-compatible chains (each with its own RPC) as you like.

  3. Location of the configuration file:

    By default, the config file resides at ~/.khedra/config.yaml. (The folder and the file will be created if it does not exist).

    You may, however, place a config.yaml file in the current working folder (the folder from which you run khedra). If found locally, this configuration file will dominate. This allows for running multiple instances of the software concurrently.

    If no config.yaml file is found, khedra creates a default configuration in its default location.

  4. Using Environment Variables:

    You may override configuration options using environment variables, each of which must take the form TB_KHEDRA_<section>_<key>.

    For example, the following overrides the general.dataFolder value.

    export TB_KHEDRA_GENERAL_DATAFOLDER="/path/override"
    

    You'll notice that underbars (_) in the <key> names are not needed.


Configuration File Format

The config.yaml file (shown here with default values) is structured as follows:

# Khedra Configuration File
# Version: 2.0

general:
  dataFolder: "~/.khedra/data"     # See note 1

chains:
  mainnet:                       # Blockchain name (see notes 2, 3, and 4)
    rpcs:                        # A list of RPC endpoints (at least one is required)
      - "rpc_endpoint_for_mainnet"
    enabled: true                # `true` if this chain is enabled
  sepolia:
    rpcs:
      - "rpc_endpoint_for_sepolia"
    enabled: true
  gnosis:                         # Add as many chains as your machine can handle
    rpcs:
      - "rpc_endpoint_for_gnosis" # must be a reachable, valid URL if the chain is enabled
    enabled: false                # this chain is disabled
  optimism:
    rpcs:
      - "rpc_endpoint_for_optimism"
    enabled: false

services:                          # See note 5
  scraper:               # Required. (One of: api, scraper, monitor, ipfs, control)
    enabled: true                  # `true` if the service is enabled
    sleep: 12                      # Seconds between scraping batches (see note 6)
    batchSize: 500                # Number of blocks to process in a batch (range: 50-10000)

  monitor:
    enabled: true
    sleep: 12                      # Seconds between scraping batches (see note 6)
    batchSize: 500                # Number of blocks processed in a batch (range: 50-10000)

  api:
    enabled: true
    port: 8080                     # Port number for API service (the port must be available)

  ipfs:
    enabled: true
    port: 5001                     # Port number for IPFS service (the port must be available)

  control:
    enabled: true                  # Always enabled - false values are invalid
    port: 5001                     # Port number for IPFS service (the port must be available)

logging:
  folder: "~/.khedra/logs"         # Path to log directory (must exist and be writable)
  filename: "khedra.log"           # Log file name (must end with .log)
  level: "info"                    # One of: debug, info, warn, error
  maxSize: 10                      # Max log file size in MB
  maxBackups: 5                    # Number of backup log files to keep
  maxAge: 30                       # Number of days to retain old logs
  compress: true                   # Whether to compress backup logs

Notes:

  1. The dataFolder value must be a valid, existing directory that is writable. You may wish to change this value to a location with suitable disc scape. Depending on configuration, the Unchained Index and binary caches may approach 200GB.

  2. The chains section is required. At least one chain must be enabled.

  3. If chains other than Ethereum mainnet are configured, you must also configure Ethereum mainnet. The software reads mainnet smart contracts (such as the Unchained Index and UniSwap) during normal operation.

  4. We've used this repository to identify chain names. Using consistent chain names aides in sharing indexes. Use these values in your configuration if you wish to fully participate in sharing the Unchained Index.

  5. The services section is required. At least one service must be enabled.

  6. When a scraper or monitor is "catching up" to a chain, the sleep value is ignored.


Using Environment Variables

Khedra allows configuration values to be overridden at runtime using environment variables. The value of an environment variable takes precedence over the defaults and the configuration file.

The environment variable naming convention is:

TB_KHEDRA_<section>_<key>

For example:

  • To override the general.dataFolder value:

    export TB_KHEDRA_GENERAL_DATAFOLDER="/path/override"
    
  • To override logging.level:

    export TB_KHEDRA_LOGGING_LEVEL="debug"
    
  • To override services[0].batchSize:

    export TB_KHEDRA_LOGGING_BATCHSIZE="100"
    

Underbars (_) in <key> names are not used and should be omitted.

Overriding Chains and Services

Environment variables can also be used to override values for chains and services settings. The naming convention for these sections is as follows:

TB_KHEDRA_<section>_<name>_<key>

Where:

  • <section> is either CHAINS or SERVICES.
  • <name> is the name of the chain or service (converted to uppercase).
  • <key> is the specific field to override.

Examples

To override the RPC endpoints for the mainnet chain:

export TB_KHEDRA_CHAINS_MAINNET_RPCS="http://rpc1.mainnet,http://rpc2.mainnet"

You may list mulitple RPC endpoints by separating them with commas.

To disable the mainnet chain:

export TB_KHEDRA_CHAINS_MAINNET_ENABLED="false"

To enable the api service:

export TB_KHEDRA_SERVICES_API_ENABLED="true"

To set the port for the api service:

export TB_KHEDRA_SERVICES_API_PORT="8088"

Precedence Rules

  1. Default values are loaded first,
  2. Values from config.yaml override the defaults,
  3. Environment variables take precedence over both the defaults and the file.

The values set by environment variables must conform to the same validation rules as the configuration file.


Configuration Sections

General Settings

  • dataFolder: The location where khedra stores all of its data. This directory must exist and be writable.

Chains (Blockchains)

Defines the blockchain networks to interact with. Each chain must have:

  • name: Chain name (e.g., mainnet).
  • rpcs: List of RPC endpoints. At least one valid and reachable endpoint is required.
  • enabled: Whether the chain is active.

Behavior for Empty RPCs

  • If the RPCs field is empty in the environment, it is ignored and the configuration file's value is preserved.
  • If the RPCs field is empty in the final configuration (after merging), the configuration will be rejected.

Services (API, Scraper, Monitor, IPFS)

Defines various services provided by Khedra. Supported services:

  • API:
    • Requires port to be specified.
  • Scraper and Monitor:
    • sleep: Duration (seconds) between operations.
    • batchSize: Number of blocks to process in each operation (50-10,000).
  • IPFS:
    • Requires port to be specified.

Logging Configuration

Controls the application's logging behavior:

  • folder: Directory for storing logs.
  • filename: Name of the log file.
  • level: Logging level. Possible values: debug, info, warn, error.
  • maxSize: Maximum log file size before rotation.
  • maxBackups: Number of old log files to retain.
  • maxAge: Retention period for old logs.
  • compress: Whether to compress rotated logs.

Validation Rules

The configuration file and environment variables are validated on load with the following rules:

General

  • dataFolder: Must be a valid, existing directory and writable.

Chains

  • name: Required and non-empty.
  • rpcs: Must include at least one valid and reachable RPC URL.
  • Empty RPC Behavior: Ignored from the environment, but required in the final configuration.
  • enabled: Defaults to false if not specified.

Services

  • name: Required and non-empty. Must be one of api, scraper, monitor, ipfs.
  • enabled: Defaults to false if not specified.
  • port: For API and IPFS services, must be between 1024 and 65535.
  • sleep: Must be non-negative.
  • batchSize: Must be between 50 and 10,000.

Logging

  • folder: Must exist and be writable.
  • filename: Must end with .log.
  • level: Must be one of debug, info, warn, error.
  • maxSize: Minimum value of 5.
  • maxBackups: Minimum value of 1.
  • maxAge: Minimum value of 1.

Default Values

If the configuration file is not found or incomplete, Khedra uses the following defaults:

  • Data directory: ~/.khedra/data
  • Logging configuration:
    • Folder: ~/.khedra/logs
    • Filename: khedra.log
    • Max size: 10 MB
    • Max backups: 3
    • Max age: 10 days
    • Compression: Enabled
    • Log level: info
  • Chains: Only mainnet and sepolia enabled by default.
  • Services: All services (api, scraper, monitor, ipfs) enabled with default configurations.

Common Commands

  1. Validate Configuration: Khedra validates the config.yaml file and environment variables automatically on startup.

  2. Run Khedra:

    ./khedra --version
    

    Ensure that your config.yaml file is properly set up.

  3. Override Configuration with Environment Variables:

    Use environment variables to override specific configurations:

    export TB_KHEDRA_GENERAL_DATAFOLDER="/new/path"
    ./khedra
    

For additional details, see the technical specification.

Understanding Khedra

Key Features

  • Blockchain Indexing: Active indexing of EVM-compatible chains.
  • REST API: Expose blockchain data and chifra commands via a RESTful interface.
  • Address Monitoring: Track specific blockchain addresses for transactions.
  • IPFS Integration: Pin indexed data to IPFS for decentralized storage.

Application Interface Overview

Khedra operates through:

  • Command-Line Interface (Cli): For configuration and command execution.
  • REST API: For programmatic interaction with indexed data.

Terminology and Concepts

  • Unchained Index: An index of blockchain data optimized for querying.
  • Chains: EVM-compatible blockchains (e.g., Ethereum mainnet, Sepolia).
  • Providers: RPC endpoints for interacting with blockchains.

Using Khedra

Indexing Blockchains

To index a blockchain, ensure the required environment variables are set for your RPC endpoints, then run:

./khedra --init all --scrape on

This will initialize the blockchain index and start the scraping process.

Accessing the REST API

Enable the REST API by running the application with:

./khedra --api on

Access the API through the default endpoint:

curl http://localhost:8080

Refer to the API documentation for available endpoints and usage.

Monitoring Addresses

You can monitor specific blockchain addresses for transactions. Configure the monitored addresses in your .env file or through the API, and enable monitoring:

./trueblocks-node --monitor on

Managing Configurations

Khedra configurations can be managed using the .env file. Changes to the .env file require a restart of the application to take effect.

Advanced Operations

Integrating with IPFS

Enable IPFS support with:

./khedra --ipfs on

This will pin indexed blockchain data to IPFS, ensuring decentralized storage and retrieval.

Customizing Chain Indexing

Specify additional chains by updating the TB_NODE_CHAINS environment variable. Example:

TB_NODE_CHAINS="mainnet,sepolia,gnosis"

Ensure each chain has a valid RPC endpoint configured.

Utilizing Command-Line Options

Key options include:

  • --init [all|blooms|none]: Specify the type of index initialization.
  • --scrape [on|off]: Enable or disable the scraper.
  • --api [on|off]: Enable or disable the API.
  • --sleep [int]: Set the sleep duration between updates in seconds.

Maintenance and Troubleshooting

Updating Khedra

To update the application, pull the latest changes from the repository and rebuild the binary:

git pull
go build -o khedra .

Common Issues and Solutions

  • Missing RPC Provider: Ensure your .env file contains valid RPC URLs.
  • Configuration Errors: Use --help to validate command-line arguments.

Log Files and Debugging

Logs are written to the standard output by default. Set the log level in the .env file:

TB_KHEDRA_LOGGING_LEVEL="Debug"

Contacting Support

If you encounter issues not covered in this guide, contact support at: TrueBlocks Support

Appendices

Glossary of Terms

  • EVM: Ethereum Virtual Machine, the runtime environment for smart contracts in Ethereum and similar blockchains.
  • RPC: Remote Procedure Call, a protocol allowing the application to communicate with blockchain nodes.
  • Indexing: The process of organizing blockchain data for fast and efficient retrieval.
  • IPFS: InterPlanetary File System, a decentralized storage system for sharing and retrieving data.

Frequently Asked Questions (FAQ)

1. What chains are supported by Khedra?

Khedra supports Ethereum mainnet and other EVM-compatible chains such as Sepolia and Gnosis. Additional chains can be added by configuring the TB_NODE_CHAINS environment variable.

2. Do I need an RPC endpoint for every chain?

Yes, each chain you want to index or interact with requires a valid RPC endpoint specified in the .env file.

3. Can I run Khedra without IPFS?

Yes, IPFS integration is optional and can be enabled or disabled using the --ipfs command-line option.

References and Further Reading

Index

  • Address Monitoring: Chapter 4, Section "Monitoring Addresses"
  • Advanced Operations: Chapter 5
  • API Access: Chapter 4, Section "Accessing the REST API"
  • Blockchain Indexing: Chapter 4, Section "Indexing Blockchains"
  • Chains: Chapter 3, Section "Terminology and Concepts"
  • Configuration Management: Chapter 4, Section "Managing Configurations"
  • Glossary: Chapter 7, Section "Glossary of Terms"
  • IPFS Integration: Chapter 5, Section "Integrating with IPFS"
  • Logging and Debugging: Chapter 6, Section "Log Files and Debugging"
  • RPC Endpoints: Chapter 2, Section "Initial Configuration"
  • Troubleshooting: Chapter 6

Technical Specification

Purpose of this Document

This document defines the technical architecture, design, and functionalities of Khedra, enabling developers and engineers to understand its internal workings and design principles. For a less technical overview of the application, refer to the User Manual.

Intended Audience

This specification is for:

  • Developers working on Khedra or integrating it into applications.
  • System architects designing systems that use Khedra.
  • Technical professionals looking for a detailed understanding of the system.

Scope and Objectives

The specification covers:

  • High-level architecture.
  • Core functionalities such as blockchain indexing, REST API, and address monitoring.
  • Design principles, including scalability, error handling, and integration with IPFS.
  • Supported chains, RPC requirements, and testing methodologies.

Introduction

System Architecture

High-Level Architecture Diagram

(Include a diagram here if needed. Replace this text with a Markdown-compatible diagram or a link to an image.)

graph TD
    config.go --> service.go
    service.go --> logging.go
    service.go --> chain.go
    chain.go --> validate.go
    validate.go --> general.go
    general.go --> testing.go

    chain_test.go --> chain.go
    validate_test.go --> validate.go
    logging_test.go --> logging.go
    general_test.go --> general.go
    config_test.go --> config.go

Key Components Overview

  1. Blockchain Indexer: Handles blockchain data collection and indexing.
  2. REST API Server: Exposes APIs for data access.
  3. IPFS Integrator: Manages decentralized storage.
  4. Configuration Manager: Parses .env files and other configurations.

Interactions Between Components

  • The Blockchain Indexer collects data from RPC endpoints and stores it in the local database.
  • The REST API retrieves indexed data and exposes it via endpoints.
  • The IPFS Integrator uploads and pins indexed data to IPFS for decentralized access.

Core Functionalities

Blockchain Indexing

Indexes blockchain data for fast and efficient retrieval. Supports multiple chains and tracks transactions.

REST API

Exposes indexed data through a REST API. Includes endpoints for:

  • Retrieving transactions and blocks.
  • Accessing monitored address data.

Address Monitoring

Allows tracking of specific blockchain addresses. Captures transactions and updates in real-time.

IPFS Integration

Pins portions of the Unchained Index to IPFS for decentralized and tamper-proof storage.

Technical Design

Configuration Files and Environment Variables

Khedra uses a .env file for configuration. Key variables include:

  • TB_NODE_DATAFOLDER: Directory for storing data.
  • TB_NODE_MAINNETRPC: RPC endpoint for Ethereum mainnet.
  • TB_NODE_CHAINS: List of chains to index.

Initialization Process

  1. Validate .env configuration.
  2. Connect to RPC endpoints for the specified chains.
  3. Initialize the blockchain index if necessary.

Data Flow and Processing

  • Input: Blockchain data retrieved via RPC.
  • Processing: Indexing, storing, and optionally pinning data to IPFS.
  • Output: Indexed data accessible through the REST API.

Error Handling and Logging

Logs are written to the console with adjustable levels (Debug, Info, Warn, Error). Errors during initialization or RPC interactions are logged and reported.

Supported Chains

List of Supported Blockchains

Khedra supports Ethereum mainnet and other EVM-compatible chains like:

  • Sepolia
  • Gnosis
  • Optimism

Requirements for RPC Endpoints

Each chain requires a valid RPC endpoint. For example:

  • TB_NODE_MAINNETRPC: Mainnet RPC URL.
  • TB_NODE_SEPOLIARPC: Sepolia RPC URL.

Handling Multiple Chains

To enable multiple chains, set TB_NODE_CHAINS in the .env file:

TB_NODE_CHAINS="mainnet,sepolia,gnosis"

Ensure each chain has a corresponding RPC endpoint.

Command-Line Interface

Available Commands and Options

Initialization

./khedra --init all
  • Options: all, blooms, none

Scraper

./khedra --scrape on
  • Enables or disables the blockchain scraper.

REST API

./khedra --api on
  • Starts the API server.

Sleep Duration

./khedra --sleep 60
  • Sets the duration (in seconds) between updates.

Detailed Behavior for Each Command

  1. --init: Controls how the blockchain index is initialized.
  2. --scrape: Toggles the blockchain scraper.
  3. --api: Starts or stops the API server.

Performance and Scalability

Performance Benchmarks

Khedra is designed to handle high-throughput blockchain data. Typical performance benchmarks include:

  • Processing speed: ~500 blocks per second (depending on RPC response time).
  • REST API response time: <50ms for standard queries.

Strategies for Handling Large-Scale Data

  1. Use high-performance RPC endpoints with low latency.
  2. Increase local storage capacity to handle large blockchain data.
  3. Scale horizontally by running multiple instances of Khedra for different chains.

Resource Optimization Guidelines

  • Limit the number of chains processed simultaneously to reduce system load.
  • Configure --sleep duration to balance processing speed with system resource usage.

Integration Points

Integration with External APIs

Khedra exposes data through a REST API, making it compatible with external applications. Example use cases:

  • Fetching transaction details for a given address.
  • Retrieving block information for analysis.

Interfacing with IPFS

Data indexed by Khedra can be pinned to IPFS for decentralized storage:

./khedra --ipfs on

Customizing for Specific Use Cases

Users can tailor the configuration by:

  • Adjusting .env variables to include specific chains and RPC endpoints.
  • Writing custom scripts to query the REST API and process the data.

Testing and Validation

Unit Testing

Unit tests cover:

  • Blockchain indexing logic.
  • Configuration parsing and validation.
  • REST API endpoint functionality.

Run tests with:

go test ./...

Integration Testing

Integration tests ensure all components work together as expected. Tests include:

  • RPC connectivity validation.
  • Multi-chain indexing workflows.

Testing Guidelines for Developers

  1. Use mock RPC endpoints for testing without consuming live resources.
  2. Validate .env configuration in test environments before deployment.
  3. Automate tests with CI/CD pipelines to ensure reliability.

Appendices

Glossary of Technical Terms

  • EVM: Ethereum Virtual Machine, the runtime environment for smart contracts.
  • RPC: Remote Procedure Call, a protocol for interacting with blockchain nodes.
  • IPFS: InterPlanetary File System, a decentralized storage solution.

References and Resources

Index

  • Address Monitoring: Section 3, Core Functionalities
  • API Access: Section 3, Core Functionalities
  • Architecture Overview: Section 2, System Architecture
  • Blockchain Indexing: Section 3, Core Functionalities
  • Configuration Files: Section 4, Technical Design
  • Data Flow: Section 4, Technical Design
  • Error Handling: Section 4, Technical Design
  • Integration Points: Section 8, Integration Points
  • IPFS Integration: Section 3, Core Functionalities; Section 8, Integration Points
  • Logging: Section 4, Technical Design
  • Performance Benchmarks: Section 7, Performance and Scalability
  • REST API: Section 3, Core Functionalities; Section 8, Integration Points
  • RPC Requirements: Section 5, Supported Chains and RPCs
  • Scalability Strategies: Section 7, Performance and Scalability
  • System Components: Section 2, System Architecture
  • Testing Guidelines: Section 9, Testing and Validation