Materializations
The results of a Fenl query can be written to an external data store and kept up to date as the data underlying the query changes using materializations. A materialization is similar to a query, except that the results are updated any time data is added to a table used by the query. Materializations can be used to populate feature vectors in a variety of feature stores to be used in production for low-latency inference.
Supported Destinations
Kaskada supports materializing into different external data stores.
Pulsar
Example configuration:
materializations:
# The name of the materialization
- materialization_name: PulsarExample
# The epxression to materializa
expression: PurchaseStats
# Where the expression's final results will be written
pulsar:
broker_service_url: pulsar://127.0.0.1:6650
tenant: public
namespace: default
topic_name: pulsar-example
Object Store
Example configuration:
materializations:
- materialization_name: PulsarExample
expression: PurchaseStats
object_store:
file_type: parquet
output_prefix_location: s3://my-bucket/path/to/results/
Managing Materializations
Creating a Materialization
To create a materialization, we’ll start by describing the expression we’d like to materialize. In this case, we’re interested in some purchase statistics for each user. This definition depends on business logic and might require some iteration to get just right.
-
Python
-
CLI
from kaskada import materialization
from kaskada.api.session import LocalBuilder
session = LocalBuilder().build()
purchase_stats = """
{
time: Purchase.purchase_time,
entity: Purchase.customer_id,
max_amount: Purchase.amount | max(),
min_amount: Purchase.amount | min(),
}
"""
tenant = "public"
namespace = "default"
topic_name = "model_features"
broker_service_url = "pulsar://127.0.0.1:6650"
destination = materialization.PulsarDestination(tenant, namespace, topic_name, broker_service_url)
materialization.create_materialization(
name = "PurchaseStats",
destination = destination,
query = purchase_stats,
)
kaskada-cli materialization create PurchaseStats \
"{time: Purchase.purchase_time,entity: Purchase.customer_id,max_amount: Purchase.amount | max(),min_amount: Purchase.amount | min()}" \
--path-uri "file:///path/on/your/machine/"
Currently the CLI can only create materializaitons that use the Object Store destination type. This is done using the |
List Materializations
The list materializations method returns all materializations defined for your user. An optional search string can filter the response set.
Here is an example of listing materializations:
-
Python
-
CLI
from kaskada import materialization
from kaskada.api.session import LocalBuilder
session = LocalBuilder().build()
materialization.list_materializations()
kaskada-cli materialization list
Get Materialization
You can get a materialization using its name:
-
Python
-
CLI
from kaskada import materialization
from kaskada.api.session import LocalBuilder
session = LocalBuilder().build()
materialization.get_materialization("PurchaseStats")
kaskada-cli materialization get PurchaseStats
Updating a Materialization
Materializations are currently immutable. Updating a materialization requires deleting that materialization and then re-creating it with a new expression.
Deleting a materialization
You can delete a materialization using its name:
-
Python
-
CLI
from kaskada import materialization
from kaskada.api.session import LocalBuilder
session = LocalBuilder().build()
materialization.delete_materialization("PurchaseStats")
kaskada-cli materialization delete PurchaseStats
Deleting a materialization does not delete any data persisted in the external data store.