Introduction

What is Fenl?

Fenl is a declarative query language for feature engineering. It allows you to focus on declaring what you want computed, rather than how it should be computed. Because Fenl is focused on the what, expressions are easy to combine and re-use.

How is Fenl Different?

Computations in Fenl are temporal: they produce a time-series of values describing the full history of a computation’s results. Temporal computation allows Fenl to capture what an expression’s value would have been at arbitrary times in the past.

Fenl values can time-travel forward through time. Time travel allows combining the result of different computations at different points in time. Because values can only travel forward in time, Fenl prevents information about the future from "leaking" into the past.

As a query language, Fenl is focused on succinctly expressing the most common operations used in feature engineering. Fenl makes it easy to read and write feature definitions by providing intuitive syntax for:

  • Temporal lookups

  • Chained operations

  • Named expressions

  • Reading & constructing structured values

Data Model

Features will be built from two event tables; a Purchase table and a FraudReport table. The goal will be to build a model predicting if a given purchase will result in a fraud report within the next 30 days.

A Purchase event occurs when a transaction is recorded. It describes the items that were purchased, the vendor selling the items, the customer buying the items and the total value of the transaction.

# Purchases
{ time: timestamp_ns, id: string, vendor_id: string, customer_id: string, total: i64 }
entity(id) time vendor_id customer_id total

cb_001

100

chum_bucket

karen

9

cb_002

101

chum_bucket

karen

2

cb_003

102

chum_bucket

karen

4

cb_004

103

chum_bucket

patrick

5000

cb_005

103

chum_bucket

karen

3

cb_006

104

chum_bucket

karen

5

kk_001

100

krusty_krab

patrick

3

kk_002

101

krusty_krab

patrick

5

kk_003

102

krusty_krab

patrick

12

kk_004

104

krusty_krab

patrick

9

A FraudReport event occurs when a transaction is reported as fraudulent. It identifies the purchase that was reported as fraudulent.

# FraudReports
{ time: timestamp_ns, purchase_id: string }
entity (purchase_id) time

cb_004

120

The values produced by a Fenl expression are associated with an entity key. Entity keys describe something each value is associated with. For example, a purchase could be related to a specific user, and a fraud report could be related to a specific vendor. The Purchase table’s entity key is the id field, while the FraudReport table’s entity key is the purchase_id field. Any entity key will do - these specific keys are chosen because they’re convenient for this exercise.

Simple Aggregation: Target Value

Before we can start building the inputs to our model, we need to describe the target value the model will predict. We would like to predict if a given purchase will result in a fraud report - if the number of daily fraud reports is greater than zero.

let Target = count(FraudReport, window=since(daily())) > 0
entity time entity

cb_004

120

true

Aggregations in Fenl are scoped to entity key; the Target expression produces a bool value associated with each purchase (as identified by FraudReport.purchase_id). In this case we’ve applied a window operation to the aggregation - the target value is the number of FraudReport values so far in a given day.

First Feature: Purchase Total

We can describe some simple features based on attributes of a purchase event. For example, we can describe the purchase total by referencing the appropriate event field:

let PurchaseTotal = Purchase.total
entity time Purchase.total

cb_001

100

9

cb_002

101

2

cb_003

102

4

cb_004

103

5000

cb_005

103

3

cb_006

104

5

kk_001

100

3

kk_002

101

5

kk_003

102

12

kk_004

104

9

Fenl expressions are either continuous or discrete. Discrete expressions are defined at a finite set of times and their value is null at all other times. For example, PurchaseTotal is a discrete expression: it is defined at the times associated with each purchase event.

Continuous expressions are defined at all times, and are generally the result of an aggregation. For example, Target is a continuous expression because it uses the count() aggregation: at any point in time its value is true if there have been 1 or more FraudReport events before that time or false otherwise.

Changing Entity Key: Purchase Average by Customer part I

It could be useful to compare how each individual purchase compares to the customer’s other purchases. We can describe a given customer’s purchases by transforming the purchase table to use customer_id as the entity key rather than id. The resulting expression contains the same values, but aggregations will now be scoped to customer ID rather than a purchase ID.

let PurchaseByCustomer = Purchase | with_key($input.customer_id)
entity time vendor_id customer_id total

karen

100

chum_bucket

karen

9

karen

101

chum_bucket

karen

2

karen

102

chum_bucket

karen

4

karen

103

chum_bucket

karen

3

karen

104

chum_bucket

karen

5

patrick

100

krusty_krab

patrick

3

patrick

101

krusty_krab

patrick

5

patrick

102

krusty_krab

patrick

12

patrick

103

chum_bucket

patrick

5000

patrick

104

krusty_krab

patrick

9

This expression uses "pipe syntax" which allows sequential operations to be chained.

Pipe syntax works by assigning the left-hand-side of the pipe to the name $input in the right-hand-side of the pipe. Within the right-hand-side of a pipe expression, required function arguments that are omitted from the function call default to $input.

An equivalent way to write this expression is let PurchaseByCustomer = with_key(Purchase.customer_id, Purchase)

This allows us to describe the average of each customer’s purchases:

let AveragePurchaseByCustomer = PurchaseByCustomer.total | mean()
time entity …​

mean()

karen

100

9

karen

101

5.5

karen

102

5

karen

103

4.5

karen

104

4.6

patrick

100

3

patrick

101

4

patrick

102

6.666

patrick

103

1255

patrick

104

Expressions in Fenl are temporal; they describe the result of a given computation at every point in time. In this case, AveragePurchaseByCustomer is an expression whose value changes over time as purchase events occur. The temporal nature of expressions allows Fenl to describe the values as they would have been computed at arbitrary times in the past.

Joining Between Entities: Purchase Average By Customer part II

Our goal is to predict if a given purchase will be reported as fraudulent, but the entity key of AveragePurchaseByCustomer describes a customer. We can operate between entities by "looking up" the average purchase of a particular purchase’s customer:

let CustomerAveragePurchase = AveragePurchaseByCustomer | lookup(Purchase.customer_id)
entity time customer_id …​

lookup(…​)

cb_001

100

karen

9

cb_002

101

karen

5.5

cb_003

102

karen

5

cb_004

103

patrick

1255

cb_005

103

karen

4.5

cb_006

104

karen

4.6

kk_001

100

patrick

3

kk_002

101

patrick

4

kk_003

102

patrick

6.666

kk_004

104

patrick

In this case, for each Purchase event, the value of AveragePurchaseByCustomer computed for the purchases customer_id at the time of the purchase is produced. The value being looked up (in this case AveragePurchaseByCustomer) is referred to as the foreign value, while the value describing the foreign entity (in this case Purchase.customer_id) is referred to as the key value.

Lookups are similar to SQL left-joins: a foreign value is produced for each key value. In contrast to SQL joins, the lookup produces the foreign expression value at the point in time associated with each key expression value.

Time Travel: Shifting Features Forward in Time

We would like to predict if a purchase will result in a fraud report within 30 days of the purchase. We began by describing our Target value, and then we described two features that could be useful for making such a prediction: PurchaseTotal and CustomerAveragePurchase.

For our model to make predictions about the future, it must be trained on features and target values computed at different points in time - we would like the target value to be computed 30 days after the feature values.

Fenl allows values to "time-travel" forward in time. This can be accomplished by shifting the feature expressions forward in time by 30 days:

let ShiftedPurchaseTime            = PurchaseTotal.time | add_time(days(30))
let ShiftedCustomerAverageTime     = CustomerAveragePurchase.time | add_time(days(30))
let ShiftedPurchaseTotal           = PurchaseTotal | shift_to(ShiftedPurchaseTime)
let ShiftedCustomerAveragePurchase = CustomerAveragePurchase | shift_to(ShiftedCustomerAverageTime)
entity time ShiftedPurchaseTotal ShiftedCustomerAveragePurchase

cb_001

130

9

9

cb_002

131

2

5.5

cb_003

132

4

5

cb_004

133

5000

1255

cb_005

133

3

4.5

cb_006

134

5

4.6

kk_001

130

3

3

kk_002

131

5

4

kk_003

132

12

6.666

kk_004

134

9

1005.8

The result of these shift operations contain the same values as PurchaseTotal and CustomerAveragePurchase, but the times associated with each value will be 30 days later. We can now describe our training set by combining the shifted predictor values with the non-shifted target value:

let TrainingExample = {
  p_total: ShiftedPurchaseTotal,
  avg_purchase: ShiftedCustomerAveragePurchase,
  target: Target,
}
entity time p_total avg_purchase target

cb_001

130

9

9

false

cb_002

131

2

5.5

false

cb_003

132

4

5

false

cb_004

133

5000

1255

true

cb_005

133

3

4.5

false

cb_006

134

5

4.6

false

kk_001

130

3

3

false

kk_002

131

5

4

false

kk_003

132

12

6.666

false

kk_004

134

9

1005.8

false

Values cannot travel backwards in time. This helps to ensure that temporal leakage cannot happen.

Going to Production: Feature Vectors

Once a model has been trained, we’ll need to compute feature vectors for making predictions. Feature vectors consist of the non-shifted predictor expressions but not the target value.

let FeatureVector = {
  p_total: PurchaseTotal,
  avg_purchase: CustomerAveragePurchase,
}
entity time p_total avg_purchase

cb_001

100

9

9

cb_002

101

2

5.5

cb_003

102

4

5

cb_004

103

5000

1255

cb_005

103

3

4.5

cb_006

104

5

4.6

kk_001

100

3

3

kk_002

101

5

4

kk_003

102

12

6.666

kk_004

104

9

1005.8

PurchaseTotal is a discrete expression whose value depends on the purchase event. A feature store implementation would seem to require some way of providing the "current" event. Alternately, we may want to omit discrete values and tell users they have to provide this type of information to the model.