Hello World (CLI)
Installation
To use Kaskada on the command line, you’ll need to install three components:
-
The Kaskada command-line executable
-
The Kaskada manager, which serves the Kaskada API
-
The Kaskada engine, which executes queries
Each Kaskada release has pre-compiled binaries for each component. You can visit the Releases page on Github to obtain the latest Kaskada release version binaries for your platform. The example commands below will download the latest Kaskada binaries and applies to Linux and OSX.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/kaskada-ai/kaskada/main/install/install.sh)"
Update PATH (optional)
Update your environment’s PATH
to include the downloaded binaries. If this is not preferred, skip this section.
Print a colon-separated list of the directories in your PATH
.
echo $PATH
Move the Kaskada binaries to one of the listed locations.
This command assumes that the binaries are currently in your working directory and that your PATH
includes /usr/local/bin
, but you can customize it if your locations are different.
mv kaskada-* /usr/local/bin/
Authorizing applications on OSX
If you’re using OSX, you may need to unblock the applications. OSX prevents applications you download from running as a security feature. You can remove the block placed on the file when it was downloaded with the following command:
|
Start the services
You can start a local instance of the Kaskada service by running the manager and engine:
./kaskada-manager 2>&1 > manager.log 2>&1 &
./kaskada-engine serve > engine.log 2>&1 &
Allowing services to listen on OSX
When using OSX, you may need to allow these services to create an API listener the first time you run these commands. This is normal, and indicates the services are working as expected - the API allows services to communicate between themselves. |
To verify they’re installed correctly and executable, try running the following command (which lists any resources you’ve created):
./kaskada-cli sync export --all
You should see output similar to the following:
10:18AM INF starting export
{}
10:18AM INF Success!
Loading Data into a Table
Kaskada stores data in tables. Tables consist of multiple rows, and each row is a value of the same type. When querying Kaskada, the contents of a table are interpreted as a discrete timeline: the value associated with each event corresponds to a value in the timeline.
Creating a Table
Every table is associated with a schema which defines the structure of each event in the table. Schemas are inferred from the data you load into a table, however, some columns are required by Kaskada’s data model. Every table must include a column identifying the time and entity associated with each row.
When creating a table, you must tell Kaskada which columns contain the time and entity of each row:
-
The time column is specified using the
time_column_name
parameter. This parameter must identify a column name in the table’s data which contains time values. The time should refer to when the event occurred. -
The entity key is specified using the
entity_key_column_name
parameter. This parameter must identify a column name in the table’s data which contains the entity key value. The entity key should identify a thing in the world that each event is associated with. Don’t worry too much about picking the "right" value - it’s easy to change the entity using thewith_key()
function.
Start by running the following command:
./kaskada-cli table create Purchase --timeColumn purchase_time --entityKeyColumn customer_id
Show result
> tableId: 1ba8ed9a-76bd-4302-b9fa-3c8655535f4a
> tableName: Purchase
> timeColumnName: purchase_time
> entityKeyColumnName: customer_id
> createTime: 2023-05-08T13:16:00.237166Z
> updateTime: 2023-05-08T13:16:00.237167Z
This creates a table named Purchase
. Any data loaded into this table
must have a timestamp field named purchase_time
, and a field named
customer_id
.
Idiomatic Kaskada
We like to use CamelCase to name tables because it helps distinguish data sources from transformed values and function names. |
Loading data
Now that we’ve created a table, we’re ready to load some data into it.
A table must be created before data can be loaded into it. |
Data can be loaded into a table in multiple ways. In this example we’ll load the contents of a Parquet file into the table.
# Download a file to load and save it to path 'purchase.parquet'
curl -L "https://drive.google.com/uc?export=download&id=1SLdIw9uc0RGHY-eKzS30UBhN0NJtslkk" -o purchase.parquet
# Load the file into the Purchase table (which was created in the previous step)
./kaskada-cli table load Purchase file://${PWD}/purchase.parquet
Show result
> Successfully loaded "purchases.parquet" into "Purchase" table
The file’s content is added to the table.
Querying data
Data loaded into Kaskada is accessed by performing Fenl Queries.
Identity query
Let’s start by looking at the Purchase table without any filters. Begin by creating a text file with the following query:
Purchase
This query will return all of the columns and rows contained in a table.
Run it by sending the query to kaskada-cli query run
:
cat query.fenl | ./kaskada-cli query run --stdout
Show result
Enter the expression to run and then press CTRL+D to execute it, or CTRL+C to cancel:
Executing query...
_time,_subsort,_key_hash,_key,id,purchase_time,customer_id,vendor_id,amount,subsort_id
2020-01-01T00:00:00.000000000,12232903146196084293,10966214875107816766,karen,cb_001,2020-01-01T00:00:00.000000000,karen,chum_bucket,9,0
2020-01-01T00:00:00.000000000,12232903146196084294,15119067519137142314,patrick,kk_001,2020-01-01T00:00:00.000000000,patrick,krusty_krab,3,1
2020-01-02T00:00:00.000000000,12232903146196084295,10966214875107816766,karen,cb_002,2020-01-02T00:00:00.000000000,karen,chum_bucket,2,2
2020-01-02T00:00:00.000000000,12232903146196084296,15119067519137142314,patrick,kk_002,2020-01-02T00:00:00.000000000,patrick,krusty_krab,5,3
2020-01-03T00:00:00.000000000,12232903146196084297,10966214875107816766,karen,cb_003,2020-01-03T00:00:00.000000000,karen,chum_bucket,4,4
2020-01-03T00:00:00.000000000,12232903146196084298,15119067519137142314,patrick,kk_003,2020-01-03T00:00:00.000000000,patrick,krusty_krab,12,5
2020-01-04T00:00:00.000000000,12232903146196084299,15119067519137142314,patrick,cb_004,2020-01-04T00:00:00.000000000,patrick,chum_bucket,5000,6
2020-01-04T00:00:00.000000000,12232903146196084300,10966214875107816766,karen,cb_005,2020-01-04T00:00:00.000000000,karen,chum_bucket,3,7
2020-01-05T00:00:00.000000000,12232903146196084301,10966214875107816766,karen,cb_006,2020-01-05T00:00:00.000000000,karen,chum_bucket,5,8
2020-01-05T00:00:00.000000000,12232903146196084302,15119067519137142314,patrick,kk_004,2020-01-05T00:00:00.000000000,patrick,krusty_krab,9,9
Filtering by a single Entity
It can be helpful to limit your results to a single entity. This makes it easier to see how a single entity changes over time.
Purchase | when(Purchase.customer_id == "patrick")
cat query.fenl | ./kaskada-cli query run --stdout
Show result
Enter the expression to run and then press CTRL+D to execute it, or CTRL+C to cancel:
Executing query...
_time,_subsort,_key_hash,_key,id,purchase_time,customer_id,vendor_id,amount,subsort_id
2020-01-01T00:00:00.000000000,12232903146196084294,15119067519137142314,patrick,kk_001,2020-01-01T00:00:00.000000000,patrick,krusty_krab,3,1
2020-01-02T00:00:00.000000000,12232903146196084296,15119067519137142314,patrick,kk_002,2020-01-02T00:00:00.000000000,patrick,krusty_krab,5,3
2020-01-03T00:00:00.000000000,12232903146196084298,15119067519137142314,patrick,kk_003,2020-01-03T00:00:00.000000000,patrick,krusty_krab,12,5
2020-01-04T00:00:00.000000000,12232903146196084299,15119067519137142314,patrick,cb_004,2020-01-04T00:00:00.000000000,patrick,chum_bucket,5000,6
2020-01-05T00:00:00.000000000,12232903146196084302,15119067519137142314,patrick,kk_004,2020-01-05T00:00:00.000000000,patrick,krusty_krab,9,9
Complex Examples with Fenl functions
In this example, we build a pipeline of functions using the |
character.
We begin with the timeline produced by the table Purchase
, then filter it to the set of times where the purchase’s customer is "patrick"
using the when()
function.
Kaskada’s query language provides a rich set of operations for reasoning about time. Here’s a more sophisticated example that touches on many of the unique features of Kaskada queries:
# How many big purchases happen each hour and where?
# Anything can be named and re-used
let hourly_big_purchases = Purchase
| when(Purchase.amount > 10)
# Filter anywhere
| count(window=since(hourly()))
# Aggregate anything
| when(hourly())
# Shift timelines relative to each other
let purchases_now = count(Purchase)
let purchases_yesterday =
purchases_now | shift_by(days(1))
# Records are just another type
in { hourly_big_purchases, purchases_in_last_day: purchases_now - purchases_yesterday }
Configuring query execution
A given query can be computed in different ways. You can configure how a query is executed by providing arguments to the CLI command.
Changing how the result timeline is output
When you make a query, the resulting timeline is interpreted in one of two ways: as a history or as a snapshot.
-
A timeline History generates a value each time there is a change in the value for the entity, and each row is associated with a different entity and point in time.
-
A timeline Snapshot generates a value for each entity at the same point in time; each row is associated with a different entity, but all rows are associated with the same time.
By default, timelines are output as histories.
You can output a timeline as a snapshot by setting the --result-behavior
argument to final-results
.
cat query.fenl | ./kaskada-cli query run --result-behavior final-results
Limiting how many rows are returned
You can limit the number of rows returned from a query:
cat query.fenl | ./kaskada-cli query run --preview-rows 10
This may return more rows that you asked for.
Kaskada computes data in batches.
When you configure |
Cleaning Up
When you’re done with this tutorial, you can delete the table you created in order to free up resources. Note that this also deletes all of the data loaded into the table.
# Delete the Purchase table
kaskada-cli table delete --table Purchase
Conclusion
Congratulations, you’ve begun processing events with Kaskada!
Where you go now is up to you