Materializations

The results of a Fenl query can be written to an external data store and kept up to date as the data underlying the query changes using materializations. A materialization is similar to a query, except that the results are updated any time data is added to a table used by the query. Materializations can be used to populate feature vectors in a variety of feature stores to be used in production for low-latency inference.

Supported Destinations

Kaskada supports materializing into different external data stores.

Pulsar

Example configuration:

materializations:
  # The name of the materialization
- materialization_name: PulsarExample
  # The epxression to materializa
  expression: PurchaseStats
  # Where the expression's final results will be written
  pulsar:
    broker_service_url: pulsar://127.0.0.1:6650
    tenant: public
    namespace: default
    topic_name: pulsar-example

Object Store

Example configuration:

materializations:
- materialization_name: PulsarExample
  expression: PurchaseStats
  object_store:
    file_type: parquet
    output_prefix_location: s3://my-bucket/path/to/results/

Managing Materializations

Creating a Materialization

To create a materialization, we’ll start by describing the expression we’d like to materialize. In this case, we’re interested in some purchase statistics for each user. This definition depends on business logic and might require some iteration to get just right.

  • Python

  • CLI

from kaskada import materialization
from kaskada.api.session import LocalBuilder

session = LocalBuilder().build()

purchase_stats = """
{
    time: Purchase.purchase_time,
    entity: Purchase.customer_id,
    max_amount: Purchase.amount | max(),
    min_amount: Purchase.amount | min(),
}
"""

tenant = "public"
namespace = "default"
topic_name = "model_features"
broker_service_url = "pulsar://127.0.0.1:6650"
destination = materialization.PulsarDestination(tenant, namespace, topic_name, broker_service_url)

materialization.create_materialization(
    name = "PurchaseStats",
    destination = destination,
    query = purchase_stats,
)
kaskada-cli materialization create PurchaseStats \
    "{time: Purchase.purchase_time,entity: Purchase.customer_id,max_amount: Purchase.amount | max(),min_amount: Purchase.amount | min()}" \
    --path-uri "file:///path/on/your/machine/"

Currently the CLI can only create materializaitons that use the Object Store destination type. This is done using the --path-uri option.

List Materializations

The list materializations method returns all materializations defined for your user. An optional search string can filter the response set.

Here is an example of listing materializations:

  • Python

  • CLI

from kaskada import materialization
from kaskada.api.session import LocalBuilder

session = LocalBuilder().build()

materialization.list_materializations()
kaskada-cli materialization list

Get Materialization

You can get a materialization using its name:

  • Python

  • CLI

from kaskada import materialization
from kaskada.api.session import LocalBuilder

session = LocalBuilder().build()

materialization.get_materialization("PurchaseStats")
kaskada-cli materialization get PurchaseStats

Updating a Materialization

Materializations are currently immutable. Updating a materialization requires deleting that materialization and then re-creating it with a new expression.

Deleting a materialization

You can delete a materialization using its name:

  • Python

  • CLI

from kaskada import materialization
from kaskada.api.session import LocalBuilder

session = LocalBuilder().build()

materialization.delete_materialization("PurchaseStats")
kaskada-cli materialization delete PurchaseStats

Deleting a materialization does not delete any data persisted in the external data store.