Getting Started

Kaskada is a unified event processing engine that provides stateful stream processing in a high-level query language designed specifically for reasoning about events in bulk and in real time.

This document will show you how to quickly get started using Kaskada.

Before jumping in to writing queries in Kaskada, it’s a good idea to take a minute understanding "how to think with Kaskada". Kaskada is built on timelines - a simple, powerful abstraction that may be different than what you’re accustomed to.

If you’d prefer to jump right in, scroll down to the Quick-starts section

Thinking with Kaskada’s timelines

Kaskada is built on the idea of a timeline - the history of how a value changes over time for a specific entity or group.

basic sum
Figure 1. Aggregating events as a timeline

Timelines allow you to reason about temporal context, time travel, sequencing, time-series, and more. They allow simple, composable, declarative queries over events. Transforming and combining timelines allows you to intuitively express computations over events.

⭢ Read more about timelines in What Is Kaskada?

With Kaskada, it’s timelines all the way down - every operation’s inputs and outputs are timelines. The timeline is a flexible abstraction that can be used in different ways depending on your needs. In some cases, you may want to know what the timeline’s value is at a specific point in time, in other cases you may want to know how the timeline’s value changes over time.

When making a query, you configure how to use the query’s output timeline: you can use the timeline as either a history or as a snapshot.

  • A timeline History contains a value each time the timeline changes, and each row describes a different entity at a different point in time.

  • A timeline Snapshot contains a value for each entity at the same point in time; each row is associated with a different entity, but all rows reflect the same point in time.

This output may be written in different ways — for example, they may be written to a Parquet file or sent as events to a stream.

⭢ Read more about converting to tables in Queries

Kaskada’s basic architecture

Kaskada is implemented as a standalone service with two main components.

  • The engine is a stateless compute engine for transforming event data.

  • The manager is an (optionally) stateful API for communicating with clients.

These components are distributed as standalone, precompiled binaries. The Kaskada service is accessed using a gRPC API, and multiple clients exist. You can access the API using the provided Python client or using the CLI binary. Kaskada can be configured to run as a remote service or as a local process.

⭢ Read more about installing Kaskada in Locally

Kaskada’s data model

Timelines begin with tables. Tables are how Kaskada stores input events. Tables consist of multiple events, and each event is a value of the same type.

Table 1. Events in the Purchase table
time entity value

8:20

Alice

{amount: 5}

9:45

Alice

{amount: 2}

2:10

Alice

{amount: 13}

8:52

Alice

{amount: 4}

When querying Kaskada, the contents of a table are interpreted as a discrete timeline: the value associated with each event corresponds to a value in the timeline.

purchase timeline
Figure 2. Discrete timeline describing the Purchase table

Events must be loaded into a table before they can be queried. You can load events into a table from various data sources, for example files formatted as Parquet or CSV, or by reading events from a stream.

⭢ Read more about connecting to data in Loading Data into a Table

Quick-starts

The next section of the docs shows how to quickly get started with Kaskada. You’ll see how to install Kaskada, create tables, load data, and make simple queries. You can pick the development environment that’s most familiar to you.

⭢ Try out Kaskada using Jupyter in Hello World (Jupyter)

⭢ Try out Kaskada using the CLI in Hello World (CLI)

Installing different clients

The Python and CLI clients are independent and are installed separately. For example, you don’t need to install the Python client in order to use the CLI. If you would like use both, you must install them independently.