Tables
Kaskada stores data in tables. Tables consist of multiple rows, and each row is a value of the same type.
Managing tables
Creating a Table
Every table is associated with a schema which defines the structure of each event in the table. Schemas are inferred from the data you load into a table, however, some columns are required by Kaskada’s data model. Every table must include a column identifying the time and entity associated with each row.
When creating a table, you must tell Kaskada which columns contain the time and entity of each row:
-
The time column is specified using the
time_column_name
parameter. This parameter must identify a column name in the table’s data which contains time values. The time should refer to when the event occurred. -
The entity key is specified using the
entity_key_column_name
parameter. This parameter must identify a column name in the table’s data which contains the entity key value. The entity key should identify a thing in the world that each event is associated with. Don’t worry too much about picking the "right" value - it’s easy to change the entity using thewith_key()
function.
You may additionally configure the table’s behavior by specifying the following parameters:
-
An subsort column associated with each row is specified using the
subsort_column_name
parameter. This value is used to order rows associated with the same time value. If no subsort column is provided, Kaskada will generate one. -
The type of entity is specified using the
grouping_id
parameter. The grouping ID specifies what kind of entity each event is associated with, for example "User" or "Purchase". When combining events from different tables, events with the same entity key and grouping ID are treated as being part of the same entity.
For more information about the expected structure of input files, see Expected File Format
Here is an example of creating a table:
-
Python
-
CLI
from kaskada import table
from kaskada.api.session import LocalBuilder
session = LocalBuilder().build()
table.create_table(
# The table's name
table_name = "Purchase",
# The name of a column in your data that contains the time associated with each row
time_column_name = "purchase_time",
# The name of a column in your data that contains the entity key associated with each row
entity_key_column_name = "customer_id",
)
kaskada-cli table create Purchase --timeColumn purchase_time --entityKeyColumn customer_id
This creates a table named Purchase
. Any data loaded into this table
must have a timestamp field named purchase_time
and a customer_id
.
Idiomatic Kaskada
We like to use CamelCase to name tables because it helps distinguish data sources from transformed values and function names. |
List Tables
The list table method returns all tables defined for your user. An optional search string can filter the results.
Here is an example of listing tables:
-
Python
-
CLI
from kaskada import table
from kaskada.api.session import LocalBuilder
session = LocalBuilder().build()
table.list_tables()
kaskada-cli table list
Get Table
You can get a table using its name:
-
Python
-
CLI
from kaskada import table
from kaskada.api.session import LocalBuilder
session = LocalBuilder().build()
table.get_table("Purchase")
kaskada-cli table get Purchase
Updating a Table
Tables are currently immutable. Updating a table requires deleting it and then re-creating it with a new expression.
Deleting a Table
You can delete a table using its name:
-
Python
-
CLI
from kaskada import table
from kaskada.api.session import LocalBuilder
session = LocalBuilder().build()
table.delete_table("Purchase")
kaskada-cli table delete Purchase
Note that deleting a table also deletes any events uploaded to it. |
A failed precondition error is returned if another view and/or
materialization references the table. To continue with the deletion of
the table, delete the dependent resources or supply the force
flag to
delete the table forcefully. Forcefully deleting a table without
deleting the dependent resources may result in the dependent resources
functioning incorrectly.
-
Python
-
CLI
from kaskada import table
from kaskada.api.session import LocalBuilder
session = LocalBuilder().build()
table.delete_table("Purchase", force = True)
kaskada-cli table delete Purchase --force