Spec Files
In addition to modifiying resources directly, you can also use a spec file to describe the desired state of Kaskada.
A spec file is a YAML file describing a set of Kaskada resources, inlcuding tables, views, and materializations.
The sync
command is used to update the state of your Kaskada system to match the resource descriptions in the spec file.
Currently the |
Spec File Format
A spec file can contain any of the following keys: tables
, views
, materializations
. Each key contains a list of objects describing the resources to be created or updated.
Tables
Tables are described in a spec file as a list of table objects under the tables
key:
tables:
# The name of the table
- tableName: GamePlay
# A field containing the time associated with each event
timeColumnName: event_at
# An initial entity key associated with each event
entityKeyColumnName: entity_key
# An (optional) subsort column associated with each event
subsortColumnName: offset
# A name describing the entity key
groupingId: User
# Where the table's data will be stored
# The default storage location is 'kaskada', and uses local files to store events.
source:
kaskada: {}
# The name of the table
- tableName: Purchase
# A field containing the time associated with each event
timeColumnName: event_at
# An initial entity key associated with each event
entityKeyColumnName: entity_id
# A name describing the entity key
groupingId: User
# Where the table's data will be stored
# The default storage location is 'kaskada', and uses local files to store events.
source:
kaskada: {}
Views
Views are described in a spec file as a list of view objects under the views
key:
views:
# A name used to refer to the view in queries
- view_name: PurchaseStats
# The expression to substitute anywhere the view's name is used
expression: |
{
time: Purchase.purchase_time,
entity: Purchase.customer_id,
max_amount: Purchase.amount | max(),
min_amount: Purchase.amount | min(),
count: CountPurchase,
}
# A name used to refer to the view in queries
- view_name: CountPurchase
# The expression to substitute anywhere the view's name is used
expression: count(Purchase)
Materializations
Materializations are described in a spec file as a list of materialization objects under the materializations
key:
materializations:
- materializationName: PurchaseStats
expression: |
{
time: Purchase.purchase_time,
entity: Purchase.customer_id,
max_amount: Purchase.amount | max(),
min_amount: Purchase.amount | min(),
count: CountPurchase,
}
destination:
objectStore:
fileType: FILE_TYPE_PARQUET
outputPrefixUri: s3://my-bucket/materialization-output-prefix/
slice: {}
Exporting the current resources as a spec file.
You can create a spec file from all the resources currently defined in the system using the sync export
command with the --all
flag.
-
CLI
kaskada-cli sync export --all
An example export result is shown below
tables:
- tableName: GamePlay
timeColumnName: event_at
entityKeyColumnName: entity_key
subsortColumnName: offset
groupingId: User
source:
kaskada: {}
- tableName: Purchase
timeColumnName: event_at
entityKeyColumnName: entity_id
groupingId: User
source:
kaskada: {}
views:
- view_name: CountPurchase
expression: count(Purchase)
materializations:
- materializationName: PurchaseStats
expression: |
{
time: Purchase.purchase_time,
entity: Purchase.customer_id,
max_amount: Purchase.amount | max(),
min_amount: Purchase.amount | min(),
count: CountPurchase,
}
destination:
objectStore:
fileType: FILE_TYPE_PARQUET
outputPrefixUri: s3://my-bucket/materialization-output-prefix/
slice: {}
Alternately, if you know a specific table, view, or materialization you’d like to export you can specify it explicitly.
-
CLI
kaskada-cli sync export --table Purchase
kaskada-cli sync export --view CountPurchase
kaskada-cli sync export --materialization PurchaseStats
tables:
- tableName: Purchase
timeColumnName: event_at
entityKeyColumnName: entity_id
groupingId: User
source:
kaskada: {}
views:
- view_name: CountPurchase
expression: count(Purchase)
materializations:
- materializationName: PurchaseStats
expression: |
{
time: Purchase.purchase_time,
entity: Purchase.customer_id,
max_amount: Purchase.amount | max(),
min_amount: Purchase.amount | min(),
count: CountPurchase,
}
destination:
objectStore:
fileType: FILE_TYPE_PARQUET
outputPrefixUri: s3://my-bucket/materialization-output-prefix/
slice: {}
Updating Kaskada to reflect the contents of a spec file
To update a resource (table, view, or materialization), you first modify the resource in your spec file,
then use the spec plan
command to preview the changes that will be made to the system. To make the
acutal changes, use the spec apply
command.
When a spec file is updated, the CLI inspects all of the server’s resources and all of the resources defined in your spec file, then takes whatever actions are necessary to reconcile the server’s state. Applying a spec can create new resources, or update resources by deleting them & then recreating them.
If you remove a resource from a spec file, it will not be deleted from the system. Instead you must delete those resources using the standard delete commands: Delete Table , Delete View, or Delete Materialization. |
Table updates are destructive
Tables are currently immutable. When the CLI updates a table, it does so by deleting the table and re-creating it. When this happens, all data previously loaded into the table is lost. |
Previewing the changes
Running this command will not make any changes to the server, but will print out the changes that will be made if you apply the given spec file.
kaskada-cli sync plan --file spec.yaml
# > 2:18PM INF starting plan
# > 2:18PM INF resource not found on system, will create it kind=*kaskadav1alpha.Table name=GamePlay
# > 2:18PM INF resource not found on system, will create it kind=*kaskadav1alpha.Table name=Purchase
# > 2:18PM INF Success!
Applying the changes
Running this command will apply the changes to the server.
kaskada-cli sync apply --file spec.yaml
# > 2:25PM INF starting apply
# > 2:25PM INF resource not found on system, will create it kind=*kaskadav1alpha.Table name=GamePlay
# > 2:25PM INF resource not found on system, will create it kind=*kaskadav1alpha.Table name=Purchase
# > 2:25PM INF created resource with provided spec kind=*kaskadav1alpha.Table name=GamePlay
# > 2:25PM INF created resource with provided spec kind=*kaskadav1alpha.Table name=Purchase
# > 2:25PM INF Success!