Locally

Kaskada is an event-processing engine comprised of two main components:

The kaskada-engine component is a stateful compute engine for transforming event data.
The kaskada-manager is an (optionally) stateful API for communicating with clients

Using Kaskada with Python

To use Kaskada with Python you’ll need to have Python version 3.6 or above installed. Once you have Python installed ensure that you can run it. Open a terminal in your OS (command line prompt on Windows) and check the output of the following commands

Verifying Kaskada prerequisites. The output on your machine may vary.

python --version
# Python 3.10.6

Installing Kaskada using pip

pip install kaskada

Pip and pip3 and permissions

Depending on you Python installation and configuration you may have pip3 instead of pip available in your terminal. If you do have pip3 replace pip with pip3 in your command, i.e., pip3 install kaskada

If you get a permission error when running the pip command, you may

need to run as an administrator using sudo pip install kaskada
or use pip’s `--user flag to install the package in your user directory for environments where you don’t have administrator access (e.g., in Google Collab or other hosted environments).

Kaskada is now installed. Kaskada can be used locally by creating a local session.

Creating a local Kaskada session in Python

from kaskada.api.session import LocalBuilder
session = LocalBuilder().build()

Creating a local session will download and run the Kaskada service as a non-persistent Python subprocess. To connect to an existing Kaskada process, initialize the client with the appropriate endpoint.

Creating a remote Kaskada session in Python

from kaskada import client
client.init(
  # The host and port of the Kaskada manager API.
  endpoint = "localhost:50051",

  # If true, TLS encryption will be required for the API connection.
  is_secure = False,

  # An (optional) string identifying the client.
  client_id = "python-client"
)

For more information, please check out the Kaskada Python Client documentation.

Using Kaskada with IPython (Jupyter)

IPython is an interactive Python runtime used by Jupyter and other notebooks to evaluate Python code blocks. IPython supports "magic extensions" for customizing how code blocks are interpreted. Kaskada provides a magic extension that simplifies querying Kaskada.

To use Kaskada within a Jupyter notebook you’ll need to have the following pieces of software installed

Python (version 3.6 and above)
Jupyter

Once you have both prerequisites installed ensure that you can run them. Open a terminal in your OS (command line prompt on Windows) and check the output of the following commands

Verifying Kaskada prerequisites. The output shown here is from an Ubuntu system—the output on your machine may vary.

python --version
# Python 3.10.6


jupyter --version
# Selected Jupyter core packages...
# IPython          : 7.34.0
# ipykernel        : 6.17.0
# bipywidgets       : 8.0.2
# jupyter_client   : 7.4.4
# jupyter_core     : 4.11.2
# jupyter_server   : 1.21.0
# jupyterlab       : 3.6.1
# nbclient         : 0.7.0
# nbconvert        : 7.2.3
# nbformat         : 5.7.0
# notebook         : 6.5.2
# qtconsole        : 5.3.2
# traitlets        : 5.5.0

Kaskada’s python library includes notebook customizations that allow us to write queries in the Fenl language but also receive and render the results of our queries in our notebooks. We need to enable these customizations first before we can use them.

Enable fenlmagic in this notebook

%load_ext fenlmagic

This will load the extension into the IPython context. You can verify the install worked by initializing the extension:

%%fenl?

  %fenl [--as-view AS_VIEW] [--data-token DATA_TOKEN] [--debug DEBUG]
            [--output OUTPUT] [--preview-rows PREVIEW_ROWS]
            [--result-behavior RESULT_BEHAVIOR] [--var VAR]

fenl query magic

optional arguments:
  --as-view AS_VIEW     adds the body as a view with the given name to all
                        subsequent fenl queries.
  --data-token DATA_TOKEN
                        A data token to run queries against. Enables
                        repeatable queries.
  --debug DEBUG         Shows debugging information
  --output OUTPUT       Output format for the query results. One of "df"
                        (default), "json", "parquet" or "redis-bulk". "redis-
                        bulk" implies --result-behavior "final-results"
  --preview-rows PREVIEW_ROWS
                        Produces a preview of the data with at least this many
                        rows.
  --result-behavior RESULT_BEHAVIOR
                        Determines which results are returned. Either "all-
                        results" (default), or "final-results" which returns
                        only the final values for each entity.
  --var VAR             Assigns the QueryResponse to a local variable with the given
                        name. The QueryResponse contains result_url, query and dataframe.

For more information, please check out the Fenlmagic Client documentation.

Using Kaskada with the command line (CLI)

To use Kaskada on the command line, you’ll need to install three components:

The Kaskada command-line executable
The Kaskada manager, which serves the Kaskada API
The Kaskada engine, which executes queries

Each of these are available as pre-compiled binaries in the Releases section of Kaskada’s Github repository. This example assumes you have installed curl.

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/kaskada-ai/kaskada/main/install/install.sh)"

To simplify running the Kaskada components you can move them to a directory in your path. First, print a colon-separated list of the directories in your PATH.

echo PATH

Move the Kaskada binaries to one of the listed locations. This command assumes that the binaries are currently in your working directory and that your PATH` includes /usr/local/bin, but you can customize it if your locations are different.

mv kaskada-* /usr/local/bin/

For more information about adding binaries to your path, see this StackOverflow article.

Authorizing applications on OSX

If you’re using OSX, you may need to unblock the applications. OSX prevents applications you download from running as a security feature. You can remove the block placed on the file when it was downloaded with the following command:

xattr -dr com.apple.quarantine <path to file>

You should now be able to run all three components. To verify they’re installed correctly and executable, try running the following command:

kaskada-cli -h

You should see output similar to the following:

A CLI tool for interacting with the Kaskada API

Usage:
  cli [command]

Available Commands:
  completion  Generate the autocompletion script for the specified shell
  help        Help about any command
  load        A set of commands for loading data into kaskada
  query       A set of commands for running queries on kaskada
  sync        A set of commands for interacting with kaskada resources as code

Flags:
      --config string               config file (default is $HOME/.cli.yaml)
  -d, --debug                       get debug log output
  -h, --help                        help for cli
      --kaskada-api-server string   Kaskada API Server
      --kaskada-client-id string    Kaskada Client ID
      --use-tls                     Use TLS when connecting to the Kaskada API (default true)

You can start a local instance of the Kaskada service by running the manager and engine:

kaskada-manager 2>&1 > manager.log 2>&1 &
kaskada-engine serve > engine.log 2>&1 &