SQL Sink
StreamingFast Substreams SQL sink
Purpose
Learn how to use the StreamingFast substreams-sink-sql
tool with this documentation. A basic Substreams module example is provided to help you get started. We are going to showcase a Substreams module to extract data from the Ethereum blockchain and route it into a Protobuf for persistence in a SQL database.
The substreams-sink-sql
today supports two database drivers namely PostgresSQL and Clickhouse. The tutorial below will focus on Postgres but we will describe how to connect to the other supported drivers.
Installation
1. Install substreams-sink-sql
substreams-sink-sql
Install substreams-sink-sql
by using the pre-built binary release available in the official GitHub repository.
Extract substreams-sink-sql
into a folder and ensure this folder is referenced globally via your PATH
environment variable.
2. Set up accompanying code example
Access the accompanying code example for this tutorial in the official substreams-sink-sql
repository. You will find the Substreams project for the tutorial in the docs/tutorial directory.
To create the required Protobuf files, run the included make protogen
command.
To ensure proper setup and functionality, use your installation of the substreams
CLI to run the example code.
Use the make build
and make stream_db_out
commands to verify the setup for the example project. Use the included make
command to build the Substreams module.
Module handler for sink
The Rust source code file lib.rs
contains an example code, the db_out
module handler, which prepares and returns the module's DatabaseChanges
output. The substreams-sink-sql
tool captures the data sent out of the Substreams module and routes it into the appropriate columns and tables in the SQL database.
To gain a full understanding of the procedures and steps required for a database sink Substreams module, review the code in lib.rs
. The complete code includes the addition of a Substreams store module and other helper functions related to the database.
DatabaseChanges
The DatabaseChanges
Protobuf definition can be viewed at the following link for a peek into the crates implementation.
When developing your Substreams, the Rust crate substreams-database-change can be used to create the required DatabaseChanges
output type.
Note: An output type of proto:sf.substreams.sink.database.v1.DatabaseChanges
is required by the map module in the Substreams manifest when working with this sink.
3. Install PostgreSQL
To proceed with this tutorial, you must have a working PostgreSQL installation. Obtain the software by downloading it from the vendor and install it by following the instructions for your operating system and platform.
If you encounter any issues, refer to the Troubleshooting Installation page on the official PostgreSQL Wiki for assistance.
4. Create example database
To store the blockchain data output by the Substreams module, you must create a new database in your PostgreSQL installation. The tutorial provides a schema and the PostgreSQL sink tool that handle the detailed aspects of the database design.
Use the psql
command in your terminal to launch PostgreSQL.
Upon successful launch, you will see a prompt similar to the following, ready to accept commands for PostgreSQL.
Use the following SQL
command to create the example database:
5. Create configuration file
Once the database has been created, you must now define the Substreams Sink Config in a Substreams manifest creating a deployable unit.
Let's create a folder sink
and in it create a file called substreams.dev.yaml
with the following content:
The package.name
and package.version
are meant to be replaced to fit your project.
The imports.main
defines your Substreams manifest that you want to sink. The sink.module
defines which import key (main
here) and which module's name (db_out
here).
The network
field defines which network this deployment should be part of, in our case mainnet
The sink.type
defines the type of the config that we are expecting, in our case it's sf.substreams.sink.sql.v1.Service (click on the link to see the message definition).
The sink.config
is the instantiation of this sink.type
with the config fully filled. Some config are special because they load from a file or from a folder. For example in our case the sink.config.schema
is defined with a Protobuf option load_from_file
which means the content of the ../schema.sql
will actually be inlined in the Substreams manifest.
The final final can be found at
sink/substreams.dev.yaml
6. Run setup command
Use the following command to run the substreams-sink-sql
tool and set up the database for the tutorial.
The "psql://..."
is the DSN (Database Source Name) containing the connection details to your database packed as an URL. The scheme
(psql
here) part of the DSN's url defines which driver to use, psql
is what we are going to use here, see Drivers section below to see what other DSN you can use here.
The DSN's URL defines the database IP address, username, and password, which depend on your PostgreSQL installation. Adjust dev-node
to your own username insecure-change-me-in-prod
to your password and 127.0.0.1:5432
to where your database can be reached.
Drivers
For DSN configuration for the currently supported drivers, see the list below:
7. Sink data to database
The substreams-sink-sql
tool sinks data from the Substreams module to the SQL database. Use the tool's run
command, followed by the endpoint to reach and your Substreams config file to use:
The endpoint needs to match the blockchain targeted in the Substreams module. The example Substreams module uses the Ethereum blockchain.
Successful output from the substreams-sink-sql
tool will resemble the following:
Note: If you have an error looking like load psql table: retrieving table and schema: pq: SSL is not enabled on the server
, it's because SSL is not enabled to reach you database, add ?sslmode=disable
at the end of the sink.config.dsn
value to connect without SSL.
You can view the database structure by using the following command, after launching PostgreSQL through the psql
command.
The table information is displayed in the terminal resembling the following:
You can view the data extracted by Substreams and routed into the database table by using the following command:
Output similar to the following is displayed in the terminal:
Cursors
When you use Substreams, it sends back a block to a consumer using an opaque cursor. This cursor points to the exact location within the blockchain where the block is. In case your connection terminates or the process restarts, upon re-connection, Substreams sends back the cursor of the last written bundle in the request so that the stream of data can be resumed exactly where it left off and data integrity is maintained.
You will find that the cursor is saved in the cursors
table of the substreams_example
database.
Batching
Insertion for historical blocks is performed in batched to increase ingestion speed. The --flush-interval
flag can be used to change the default value of 1000 blocks. Also, the flag --live-block-time-delta <duration>
can be used to change the delta at which we start considering blocks to be live, the logic is isLive = (now() - block.timestamp) < valueOfFlag(live-block-time-delta)
.
Conclusion and review
Routing data extracted from the blockchain using Substreams is a powerful and useful feature. With Substreams, you can route data to various types of sinks, including files and databases such as PostgreSQL. For more information on other types of sinks and sinking strategies, consult the core Substreams sinks documentation at https://substreams.streamingfast.io/developers-guide/substreams-sinks.
Last updated