Tables

Kaskada stores data in tables. Tables consist of multiple rows, and each row is a value of the same type.

Managing tables

Creating a Table

Every table is associated with a schema which defines the structure of each event in the table. Schemas are inferred from the data you load into a table, however, some columns are required by Kaskada’s data model. Every table must include a column identifying the time and entity associated with each row.

When creating a table, you must tell Kaskada which columns contain the time and entity of each row:

  • The time column is specified using the time_column_name parameter. This parameter must identify a column name in the table’s data which contains time values. The time should refer to when the event occurred.

  • The entity key is specified using the entity_key_column_name parameter. This parameter must identify a column name in the table’s data which contains the entity key value. The entity key should identify a thing in the world that each event is associated with. Don’t worry too much about picking the "right" value - it’s easy to change the entity using the with_key() function.

You may additionally configure the table’s behavior by specifying the following parameters:

  • An subsort column associated with each row is specified using the subsort_column_name parameter. This value is used to order rows associated with the same time value. If no subsort column is provided, Kaskada will generate one.

  • The type of entity is specified using the grouping_id parameter. The grouping ID specifies what kind of entity each event is associated with, for example "User" or "Purchase". When combining events from different tables, events with the same entity key and grouping ID are treated as being part of the same entity.

For more information about the expected structure of input files, see Expected File Format

Here is an example of creating a table:

  • Python

  • CLI

from kaskada import table
from kaskada.api.session import LocalBuilder

session = LocalBuilder().build()

table.create_table(
  # The table's name
  table_name = "Purchase",
  # The name of a column in your data that contains the time associated with each row
  time_column_name = "purchase_time",
  # The name of a column in your data that contains the entity key associated with each row
  entity_key_column_name = "customer_id",
)
kaskada-cli table create Purchase --timeColumn purchase_time --entityKeyColumn customer_id

This creates a table named Purchase. Any data loaded into this table must have a timestamp field named purchase_time and a customer_id.

Idiomatic Kaskada

We like to use CamelCase to name tables because it helps distinguish data sources from transformed values and function names.

List Tables

The list table method returns all tables defined for your user. An optional search string can filter the results.

Here is an example of listing tables:

  • Python

  • CLI

from kaskada import table
from kaskada.api.session import LocalBuilder

session = LocalBuilder().build()

table.list_tables()
kaskada-cli table list

Get Table

You can get a table using its name:

  • Python

  • CLI

from kaskada import table
from kaskada.api.session import LocalBuilder

session = LocalBuilder().build()

table.get_table("Purchase")
kaskada-cli table get Purchase

Updating a Table

Tables are currently immutable. Updating a table requires deleting it and then re-creating it with a new expression.

Deleting a Table

You can delete a table using its name:

  • Python

  • CLI

from kaskada import table
from kaskada.api.session import LocalBuilder

session = LocalBuilder().build()

table.delete_table("Purchase")
kaskada-cli table delete Purchase

Note that deleting a table also deletes any events uploaded to it.

A failed precondition error is returned if another view and/or materialization references the table. To continue with the deletion of the table, delete the dependent resources or supply the force flag to delete the table forcefully. Forcefully deleting a table without deleting the dependent resources may result in the dependent resources functioning incorrectly.

  • Python

  • CLI

from kaskada import table
from kaskada.api.session import LocalBuilder

session = LocalBuilder().build()

table.delete_table("Purchase", force = True)
kaskada-cli table delete Purchase --force