Manifests Reference

StreamingFast Substreams manifests reference

This reference documentation provides a guide for all fields and values used in a Substreams manifest.

Tip: When writing and checking your substreams.yaml file, it may help to check your manifest against our JSON schema to ensure there are no problems. JSON schemas can be used in Jetbrains and VSCode. Our manifest schema can be seen here.

Manifests overview

In simple terms, a Substreams manifest (substreams.yaml) is a configuration file (a YAML file) for your Substreams. The manifest file is used for defining properties specific to the current Substreams module and identifying the dependencies between the inputs and outputs of modules. For example, the following manifest receives a raw Ethereum block as input (sf.ethereum.type.v2.Block) and outputs a custom object (eth.example.MyBlock).

modules:
  - name: map_block
    kind: map
    initialBlock: 12287507
    inputs:
      - source: sf.ethereum.type.v2.Block
    output:
      type: proto:eth.example.MyBlock

Among other things, the manifest allows you to define:

  • How many modules your Substreams uses, along with their corresponding inputs and outputs.

  • The schema(s) (i.e. the data model) your Substreams uses.

  • How you will consume the data emitted by your Substreams (SQL, Webhooks...).

specVersion

Excerpt pulled from the example Substreams manifest.

manifest excerpt
specVersion: v0.1.0

Use v0.1.0 for the specVersion field.

package

Excerpt pulled from the example Substreams manifest.

manifest excerpt
package:
  name: module_name_for_project
  version: v0.5.0
  doc: |
    Documentation heading for the package.

    More detailed documentation for the package.

package.name

The package.name field is used to identify the package.

The package.name field infers the filename when the pack command is run by using substreams.yaml as a flag for the Substreams package.

The content of the name field must match the regular expression: ^([a-zA-Z][a-zA-Z0-9_]{0,63})$. For consistency, use the snake_case naming convention.

The regular expression ruleset translates to the following:

  • 64 characters maximum

  • Separate words by using _

  • Starts by using a-z or A-Z and can contain numbers thereafter

package.version

The package.version field identifies the package for the Substreams module.

Note: Thepackage.version must respect Semantic Versioning, version 2.0

package.url

The package.url field identifies and helps users discover the source of the Substreams package.

package.doc

The package.doc field is the documentation string of the package. The first line is used by the different UIs as a short-form description.

This field should be written in Markdown format.

imports

The imports section allow you to import third-party Substreams packages. It adds local references to modules in those packages, and pull in the WASM code, Protobuf and modules into the current Package.

Relying on imports rather than copying source code from third-party packages allows you to leverage server-side caches, and lower your costs.

Example:

imports:
  sol: https://spkg.io/streamingfast/solana-explorer-v0.2.0.spkg
  # or:
  ethereum: substreams-ethereum-v1.0.0.spkg
  token: ../eth-token/substreams.yaml

...

modules:
...
    inputs:
      - map: sol:map_block_without_votes
# replacing:
#    inputs:
#      - source: sf.solana.type.v1.Block

Note the : separator that signifies to use the imported namespace, as defined under imports.

The filename can be absolute or relative or a remote path prefixed by http:// or https://. It can also be an IPFS reference.

protobuf

The protobuf section points to the Google Protocol Buffer (protobuf) definitions used by the Rust modules in the Substreams module.

protobuf:
  files:
    - google/protobuf/timestamp.proto
    - pcs/v1/pcs.proto
    - pcs/v1/database.proto
  importPaths:
    - ./proto
    - ../../external-proto

The Substreams packager loads files in any of the listed importPaths.

Note: The imports section of the manifest also affects which .proto files are used in the final Substreams package.

Protobufs and modules are packaged together to help Substreams clients decode the incoming streams. Protobufs are not sent to the Substreams server in network requests.

Learn more about Google Protocol Buffers in the official documentation provided by Google.

binaries

The binaries field specifies the WASM binary code to use when executing modules.

The modules[].binary field uses a default value of default.

binaries:
  default:
    type: wasm/rust-v1
    file: ./target/wasm32-unknown-unknown/release/my_package.wasm
  other:
    type: wasm/rust-v1
    file: ./snapshot_of_my_package.wasm

Important: Defining the default binary is required when creating a Substreams manifest.

See the binary field under modules to see its use.

binaries[name].type

The type of code and implied virtual machine for execution. There is only one virtual machine available that uses a value of: wasm/rust-v1.

binaries[name].file

The binaries[name].file field references a locally compiled WASM module. Paths for the binaries[name].file field are absolute or relative to the manifest's directory. The standard location of the compiled WASM module is the root directory of the Substreams module.

Tip: The WASM file referenced by the binary field is picked up and packaged into an .spkg when invoking the pack and run commands through the substreams CLI.

network

The network field specifies the blockchain where the Substreams will be executed.

network: solana

or

network: ethereum

image

The image field specifies the icon displayed for the Substreams package, which is used in the Substreams Registry. The path is relative to the folder where the manifest is.

image: ./ethereum-icon.png

sink

The sink field specifies the sink you want to use to consume your data (for example, a database or a subgraph).

Sink module

Specifies the name of the module that emits the data to the sink. For example, db_out or graph_out.

Sink type

Specifies the service used to consume the data. For example, sf.substreams.sink.subgraph.v1.Service for subgraphs, or sf.substreams.sink.sql.v1.Service for databases.

Sink config

Specifies the configuration specific to every sink. This field is different for every sink.

Database Config

sink:
  module: db_out
  type: sf.substreams.sink.sql.v1.Service
  config:
    schema: "./schema.sql"
    engine: clickhouse
    postgraphile_frontend:
      enabled: false
    pgweb_frontend:
      enabled: false
    dbt_config:
      enabled: true
      files: "./path/to/folder"
      run_interval_seconds: 300
  • schema: SQL file specifying the schema.

  • engine: postgres or clickhouse.

  • postgraphile_frontend.enabled: enables or disables the Postgraphile portal.

  • pgweb_frontend.enabled: enables or disables the PGWeb portal.

  • dbt_config: specifies the configuration of dbt engine.

    • enabled: enables or disabled the dbt engine.

    • files: path to the dbt models.

    • run_interval_seconds: execution intervals in seconds.

Subgraph Config

sink:
  module: graph_out
  type: sf.substreams.sink.subgraph.v1.Service
  config:
    schema: "./schema.graphql"
    subgraph_yaml: "./subgraph.yaml"
  • schema: path to the GraphQL schema.

  • subgraph_yaml: path to the Subgraph manifest.

modules

This example shows one map module, named events_extractor and one store module, named totals :

substreams.yaml
  - name: events_extractor
    kind: map
    initialBlock: 5000000
    binary: default  # Implicit
    inputs:
      - source: sf.ethereum.type.v2.Block
      - store: myimport:prices
    output:
      type: proto:my.types.v1.Events
    doc:
      This module extracts events
      
      Use in such and such situations

  - name: totals
    kind: store
    updatePolicy: add
    valueType: int64
    inputs:
      - source: sf.ethereum.type.v2.Block
      - map: events_extractor

Module name

The identifier for the module, prefixed by a letter, followed by a maximum of 64 characters of [a-zA-Z0-9_]. The same rules applied to the package.name field applies to the module name, including the convention to use snake_case names.

The module name is the reference identifier used on the command line for the substreams run command. The module name is also used in the inputs defined in the Substreams manifest.

The module name also corresponds to the name of the Rust function invoked on the compiled WASM code upon execution. The module name is the same #[substreams::handlers::map] as defined in the Rust code. Maps and stores both work in the same fashion.

Important: When importing another package, all module names are prefixed by the package's name and a colon. Prefixing ensures there are no name clashes across multiple imported packages and almost any name can be safely used for a module name.

Module initialBlock

The initial block for the module is where Substreams begins processing data for a module. The runtime never processes blocks prior to the one for any given module.

If all the inputs have the same initialBlock, the field can be omitted and its value is inferred by its dependent inputs.

initialBlock becomes mandatory when inputs have different values.

Module kind

There are two module types for modules[].kind:

  • map

  • store

Module updatePolicy

Specifies the merge strategy for two contiguous partial stores produced by parallelized operations.

The values for modules[].updatePolicy are defined using specific rules stating:

  • set, the last key wins the merge strategy

  • set_if_not_exists, the first key wins the merge strategy

  • append, concatenates two keys' values

  • add, sum the two keys' values

  • min, min between two keys' values

  • max, max between two keys' values

  • set_sum, either set the value or sum the two keys' values

Module valueType

Tip: The module updatePolicy field is only available for modules of kind: store.

Specifies the data type of all keys in the store, and determines what WASM imports are available to the module and are able to write to the store.

The values for modules[].valueTypes can use various types including:

  • bigfloat

  • bigint

  • int64

  • bytes

  • string

  • proto:path.to.custom.protobuf.Model

Tip: The module valueType field is only available for modules of kind: store.

Module binary

An identifier referring to the binaries section of the Substreams manifest.

The modules[].binary field overrides which binary is used from the binaries declaration section. This means multiple WASM files can be bundled in the Package.

modules:
  - name: hello
    binary: other
  ...

The default value for binary is default. Therefore, a default binary must be defined under binaries.

Module inputs

substreams.yaml
inputs:
    - params: string
    - source: sf.ethereum.type.v2.Block
    - store: my_store
      mode: deltas
    - store: my_store # defaults to mode: get
    - map: my_map

The inputs field is a list of input structures. One of three keys is required for every object.

The key types for inputs include:

  • source

  • store, used to define mode keys

  • map

  • params

You can find more details about inputs in the Developer Guide's section about Modules.

Module output

substreams.yaml
output:
    type: proto:eth.erc721.v1.Transfers

The value for type is always prefixed using proto: followed by a definition specified in the protobuf definitions, and referenced in the protobuf section of the Substreams manifest.

Tip: The module output field is only available for modules of kind: map.

Module doc

This field should contain Markdown documentation of the module. Use it to describe how to use the params, or what to expect from the module.

params

The params mapping changes the default values for modules' parameterizable inputs.

modules:
  ...
params:
  module_name: "default value"
  "imported:module": "overridden value"

You can override those values with the -p parameter of substreams run.

When rolling out your consuming code -- in this example, Python -- you can use something like:

my_mod = [mod for mod in pkg.modules.modules if mod.name == "store_pools"][0]
my_mod.inputs[0].params.value = "myvalue"

which would be inserted just before starting the stream.

Params that are defined under networks do not need to be repeated here (their value will be overwritten)

network

The network field specifies the default network to be used with this Substreams. It will help the client choose an endpoint if necessary, and will be used as the default value when applying the values defined under networks.

networks

The networks allows specifying per-network params and initialBlock for each module:

networks:
  mainnet:
    initialBlock:
      mod1: 200
      lib:mod1: 400
    params:
      mod2: "addr=0x1234"
  sepolia:
    [...]

You can override values for modules imported from other .spkg.

Every local module specified under networks must have a value for each network

Last updated