> ## Documentation Index
> Fetch the complete documentation index at: https://docs.near.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Running Lake Indexer

> Learn how to set up and run a NEAR Lake Indexer, including prerequisites, network configuration, and commands for syncing from the latest or a specific block.

At NEAR we already have a working solution to index blockchain data and store it in AWS S3 buckets called **NEAR Lake**. In this guide you will learn how to set up and run an instance of the NEAR Lake Indexer.

The Lake Indexer setup consists of the following components:

* AWS S3 Bucket as a storage
* NEAR Lake binary that operates as a regular NEAR Protocol peer-to-peer node, so you will operate it as
  any other [Regular/RPC Node in NEAR](https://near-nodes.io/rpc/hardware-rpc)

<Info>
  NEAR Lake is an indexer specialized on storing events as JSON files on AWS S3, built on top of the [NEAR Indexer microframework](https://github.com/nearprotocol/nearcore/tree/master/chain/indexer)
</Info>

### Prepare Development Environment

Before you proceed, make sure you have the following software installed:

* [Rust compiler](https://rustup.rs/) of the version that is mentioned in `rust-toolchain` file in the root of
  [nearcore](https://github.com/nearprotocol/nearcore) project.
* Ensure you have [AWS Credentials configured](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)
  From AWS Docs:

  > For example, the files generated by the AWS CLI for a default profile configured with aws configure looks similar to the following.
  >
  > \~/.aws/credentials
  >
  > ```
  > [default]
  > aws_access_key_id=AKIAIOSFODNN7EXAMPLE
  > aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
  > ```

### Compile NEAR Lake

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
$ cargo build --release
```

### Configure NEAR Lake

To connect NEAR Lake to the specific chain you need to have necessary configs, you can generate it as follows:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
$ ./target/release/near-lake --home ~/.near/testnet init --chain-id testnet --download-config --download-genesis
```

The above code will download the official genesis config and generate necessary configs. You can replace `testnet` in the command above to different network ID (`betanet`, `mainnet`).

Configs for the specified network are in the `--home` provided folder. We need to ensure that NEAR Lake follows
all the necessary shards, so `"tracked_shards"` parameters in `~/.near/testnet/config.json` needs to be configured properly.
Currently, `nearcore` treats empty value for `"tracked_shards"` as "do not track any shard" and **any value** as "track all shards".
For example, in order to track all shards, you just add the shard #0 to the list:

```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
...
"tracked_shards": [0],
...
```

### Run NEAR Lake

Commands to run NEAR Lake, after `./target/release/near-lake`

| Command | Key/Subcommand                | Required/Default                                                         | Responsible for                                                                                                                                                                                                                                                                                                                                                                  |
| ------- | ----------------------------- | ------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|         | `--home`                      | Default <br />`~/.near`                                                  | Tells the node where too look for necessary files: <br />`config.json`<br />, <br />`genesis.json`<br />, <br />`node_key.json`<br />, and <br />`data`<br /> folder                                                                                                                                                                                                             |
| `init`  |                               |                                                                          | Tells the node to generate config files in `--home-dir`                                                                                                                                                                                                                                                                                                                          |
|         | `--chain-id`                  | Required<br /><br /> \_ `localnet`<br /> \_ `testnet`<br /> \* `mainnet` | Defines the chain to generate config files for                                                                                                                                                                                                                                                                                                                                   |
|         | `--download-config`           | Optional                                                                 | If provided tells the node to download `config.json` from the public URL. You can download them manually<br /><br /> - [testnet config.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/rpc/config.json)<br /> - [mainnet config.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/mainnet/rpc/config.json) |
|         | `--download-genesis`          | Optional                                                                 | If provided tells the node to download `genesis.json` from the public URL. You can download them manually<br /><br /> - [testnet genesis.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/testnet/genesis.json)<br /> - [mainnet genesis.json](https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/mainnet/genesis.json)    |
|         | TODO:<br />Other `neard` keys |                                                                          |                                                                                                                                                                                                                                                                                                                                                                                  |
| `run`   |                               |                                                                          | Runs the node                                                                                                                                                                                                                                                                                                                                                                    |
|         | `--bucket`                    | Required                                                                 | AWS S3 Bucket name                                                                                                                                                                                                                                                                                                                                                               |
|         | `--region`                    | Required                                                                 | AWS S3 Bucket region                                                                                                                                                                                                                                                                                                                                                             |
|         | `--fallback-region`           | Default eu-central-1                                                     | AWS S3 Fallback region                                                                                                                                                                                                                                                                                                                                                           |
|         | `--endpoint`                  | Optional                                                                 | AWS S3 compatible API endpoint                                                                                                                                                                                                                                                                                                                                                   |
|         | `--stream-while-syncing`      | Optional                                                                 | If provided Indexer streams blocks while they appear on the node instead of waiting the node to be fully synced                                                                                                                                                                                                                                                                  |
|         | `--concurrency`               | Default 1                                                                | Defines the concurrency for the process of saving block data to AWS S3                                                                                                                                                                                                                                                                                                           |
|         | `sync-from-latest`            | One of the `sync-` subcommands is required                               | Tells the node to start indexing from the latest block in the network                                                                                                                                                                                                                                                                                                            |
|         | `sync-from-interruption`      | One of the `sync-` subcommands is required                               | Tells the node to start indexing from the block the node was interrupted on (if it is a first start it will fallback to `sync-from-latest`)                                                                                                                                                                                                                                      |
|         | `sync-from-block --height N`  | One of the <br />`sync-`<br /> subcommands is required                   | Tells the node to start indexing from the specified block height `N` (**Ensure** you node data has the block you want to start from)                                                                                                                                                                                                                                             |

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
$ ./target/release/near-lake --home ~/.near/testnet run --stream-while-syncing --concurrency 50 sync-from-latest
```

After the network is synced, you should see logs of every block height currently received by NEAR Lake.

***

## Syncing

Whenever you run NEAR Lake for any network except localnet you'll need to sync with the network.
This is required because it's a natural behavior of `nearcore` node and NEAR Lake is a wrapper
for the regular `nearcore` node. In order to work and index the data your node must be synced
with the network.

While snapshot-based syncing was previously the recommended default, we now recommend [Epoch Sync](https://near-nodes.io/intro/node-epoch-sync) —a faster, more lightweight method that allows a node to catch up from genesis without downloading a large state snapshot.

***

## Running NEAR Lake as an archival node

It's not necessary but in order to index everything in the network it is better to do it from the genesis.
`nearcore` node is running in non-archival mode by default. That means that the node keeps data only
for [5 last epochs](/protocol/network/epoch). In order to index data from the genesis
you need to turn the node in archival mode.

To do it you need to update `config.json` located in `--home-dir` (by default it is `~/.near`).

Find next keys in the config and update them as following:

```json theme={"theme":{"light":"github-light","dark":"github-dark"}}
{
  ...
  "archive": true,
  "tracked_shards": [0],
  ...
}
```

The syncing process in archival mode can take a lot of time, so it's better to download a backup provided by NEAR
and put it in your `data` folder. After that your node will download only the data after the backup was cut and it
takes reasonable amount time.

All the backups can be downloaded from the public [S3 bucket](https://docs.fastnear.com/docs/snapshots/mainnet#archival-mainnet-snapshot) provided by FastNEAR.

See [this link](https://near-nodes.io/archival/run-archival-node-with-nearup) for reference

***

## Using the data

We write all the data to AWS S3 buckets:

* `near-lake-data-testnet` (`eu-central-1` region) for testnet
* `near-lake-data-mainnet` (`eu-central-1` region) for mainnet

***

## Custom S3 storage

In case you want to run you own `near-lake` instance and store data in some S3 compatible storage ([Minio](https://min.io/) or [Localstack](https://localstack.cloud/) as example)
You can override default S3 API endpoint by using `--endpoint` option

* run `minio`

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
$ mkdir -p /data/near-lake-custom && minio server /data
```

* run `near-lake`

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
$ ./target/release/near-lake --home ~/.near/testnet run --endpoint http://127.0.0.1:9000 --bucket near-lake-custom sync-from-latest
```

### Data structure

The data structure we use is the following:

```text theme={"theme":{"light":"github-light","dark":"github-dark"}}
<block_height>/
  block.json
  shard_0.json
  shard_1.json
  ...
  shard_N.json
```

* `<block_height>` is a 12-character-long `u64` string with leading zeros (e.g `000042839521`). [See this issue for a reasoning](https://github.com/near/near-lake/issues/23)
* `block_json` contains JSON-serialized [`BlockView`](https://github.com/near/nearcore/blob/e9a28c46c2bea505b817abf484e6015a61ea7d01/core/primitives/src/views.rs#L711-L716) struct. **Note:** this struct might change in the future, we will announce it
* `shard_N.json` where `N` is `u64` starting from `0`. Represents the index number of the shard. In order to find out the expected number of shards in the block you can look in `block.json` at `.header.chunks_included`

### Access the data

All NEAR Lake AWS S3 buckets have [Request Payer](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RequesterPaysBuckets.html) enabled. It means that anyone with their own AWS credentials can List and Read the bucket's content and **be charged for it by AWS**. Connections to the bucket have to be done with AWS credentials provided. See [NEAR Lake Framework](https://github.com/near/near-lake-framework) for a reference.

### NEAR Lake Framework

Once we [set up the public access to the buckets](https://github.com/near/near-lake/issues/22) anyone will be able to build their own code to read it through.

For our own needs we are working on [NEAR Lake Framework](https://github.com/near/near-lake-framework) to have a simple way to create an indexer on top of the data stored by NEAR Lake itself.

<Note>
  See the official NEAR Lake Framework [announcement on the NEAR Gov Forum](https://gov.near.org/t/announcement-near-lake-framework-brand-new-word-in-indexer-building-approach/17668).
</Note>
