Introduction
What is Fenl?
Fenl is a declarative query language for feature engineering. It allows you to focus on declaring what you want computed, rather than how it should be computed. Because Fenl is focused on the what, expressions are easy to combine and re-use.
How is Fenl Different?
Computations in Fenl are temporal: they produce a time-series of values describing the full history of a computation’s results. Temporal computation allows Fenl to capture what an expression’s value would have been at arbitrary times in the past.
Fenl values can time-travel forward through time. Time travel allows combining the result of different computations at different points in time. Because values can only travel forward in time, Fenl prevents information about the future from "leaking" into the past.
As a query language, Fenl is focused on succinctly expressing the most common operations used in feature engineering. Fenl makes it easy to read and write feature definitions by providing intuitive syntax for:
-
Temporal lookups
-
Chained operations
-
Named expressions
-
Reading & constructing structured values
Data Model
Features will be built from two event tables; a Purchase
table and a
FraudReport
table. The goal will be to build a model predicting if a
given purchase will result in a fraud report within the next 30 days.
A Purchase
event occurs when a transaction is recorded. It describes
the items that were purchased, the vendor selling the items, the
customer buying the items and the total value of the transaction.
# Purchases
{ time: timestamp_ns, id: string, vendor_id: string, customer_id: string, total: i64 }
entity(id) | time | vendor_id | customer_id | total |
---|---|---|---|---|
cb_001 |
100 |
chum_bucket |
karen |
9 |
cb_002 |
101 |
chum_bucket |
karen |
2 |
cb_003 |
102 |
chum_bucket |
karen |
4 |
cb_004 |
103 |
chum_bucket |
patrick |
5000 |
cb_005 |
103 |
chum_bucket |
karen |
3 |
cb_006 |
104 |
chum_bucket |
karen |
5 |
kk_001 |
100 |
krusty_krab |
patrick |
3 |
kk_002 |
101 |
krusty_krab |
patrick |
5 |
kk_003 |
102 |
krusty_krab |
patrick |
12 |
kk_004 |
104 |
krusty_krab |
patrick |
9 |
A FraudReport
event occurs when a transaction is reported as
fraudulent. It identifies the purchase that was reported as fraudulent.
# FraudReports
{ time: timestamp_ns, purchase_id: string }
entity (purchase_id) | time |
---|---|
cb_004 |
120 |
The values produced by a Fenl expression are associated with an entity
key. Entity keys describe something each value is associated with. For
example, a purchase could be related to a specific user, and a fraud
report could be related to a specific vendor. The Purchase
table’s
entity key is the id
field, while the FraudReport
table’s entity key
is the purchase_id
field. Any entity key will do - these specific keys
are chosen because they’re convenient for this exercise.
Simple Aggregation: Target Value
Before we can start building the inputs to our model, we need to describe the target value the model will predict. We would like to predict if a given purchase will result in a fraud report - if the number of daily fraud reports is greater than zero.
let Target = count(FraudReport, window=since(daily())) > 0
entity | time | entity |
---|---|---|
cb_004 |
120 |
true |
Aggregations in Fenl are scoped to entity key; the Target
expression
produces a bool
value associated with each purchase (as identified
by FraudReport.purchase_id
). In this case we’ve applied a window
operation to the aggregation - the target value is the number of
FraudReport
values so far in a given day.
First Feature: Purchase Total
We can describe some simple features based on attributes of a purchase event. For example, we can describe the purchase total by referencing the appropriate event field:
let PurchaseTotal = Purchase.total
entity | time | Purchase.total |
---|---|---|
cb_001 |
100 |
9 |
cb_002 |
101 |
2 |
cb_003 |
102 |
4 |
cb_004 |
103 |
5000 |
cb_005 |
103 |
3 |
cb_006 |
104 |
5 |
kk_001 |
100 |
3 |
kk_002 |
101 |
5 |
kk_003 |
102 |
12 |
kk_004 |
104 |
9 |
Fenl expressions are either continuous or discrete. Discrete
expressions are defined at a finite set of times and their value is
null
at all other times. For example, PurchaseTotal
is a discrete
expression: it is defined at the times associated with each purchase
event.
Continuous expressions are defined at all times, and are generally the
result of an aggregation. For example, Target
is a continuous
expression because it uses the count()
aggregation: at any point in
time its value is true
if there have been 1 or more FraudReport
events before that time or false
otherwise.
Changing Entity Key: Purchase Average by Customer part I
It could be useful to compare how each individual purchase compares to
the customer’s other purchases. We can describe a given customer’s
purchases by transforming the purchase table to use customer_id
as the
entity key rather than id
. The resulting expression contains the same
values, but aggregations will now be scoped to customer ID rather than a
purchase ID.
let PurchaseByCustomer = Purchase | with_key($input.customer_id)
entity | time | vendor_id | customer_id | total |
---|---|---|---|---|
karen |
100 |
chum_bucket |
karen |
9 |
karen |
101 |
chum_bucket |
karen |
2 |
karen |
102 |
chum_bucket |
karen |
4 |
karen |
103 |
chum_bucket |
karen |
3 |
karen |
104 |
chum_bucket |
karen |
5 |
patrick |
100 |
krusty_krab |
patrick |
3 |
patrick |
101 |
krusty_krab |
patrick |
5 |
patrick |
102 |
krusty_krab |
patrick |
12 |
patrick |
103 |
chum_bucket |
patrick |
5000 |
patrick |
104 |
krusty_krab |
patrick |
9 |
This expression uses "pipe syntax" which allows sequential operations to be chained. Pipe syntax works by assigning the left-hand-side of the pipe to the
name An equivalent way to write this expression is
|
This allows us to describe the average of each customer’s purchases:
let AveragePurchaseByCustomer = PurchaseByCustomer.total | mean()
time | entity | … |
---|---|---|
mean() |
karen |
100 |
9 |
karen |
101 |
5.5 |
karen |
102 |
5 |
karen |
103 |
4.5 |
karen |
104 |
4.6 |
patrick |
100 |
3 |
patrick |
101 |
4 |
patrick |
102 |
6.666 |
patrick |
103 |
1255 |
patrick |
104 |
Expressions in Fenl are temporal; they describe the result of a given
computation at every point in time. In this case,
AveragePurchaseByCustomer
is an expression whose value changes over
time as purchase events occur. The temporal nature of expressions allows
Fenl to describe the values as they would have been computed at
arbitrary times in the past.
Joining Between Entities: Purchase Average By Customer part II
Our goal is to predict if a given purchase will be reported as
fraudulent, but the entity key of AveragePurchaseByCustomer
describes
a customer. We can operate between entities by "looking up" the
average purchase of a particular purchase’s customer:
let CustomerAveragePurchase = AveragePurchaseByCustomer | lookup(Purchase.customer_id)
entity | time | customer_id | … |
---|---|---|---|
lookup(…) |
cb_001 |
100 |
karen |
9 |
cb_002 |
101 |
karen |
5.5 |
cb_003 |
102 |
karen |
5 |
cb_004 |
103 |
patrick |
1255 |
cb_005 |
103 |
karen |
4.5 |
cb_006 |
104 |
karen |
4.6 |
kk_001 |
100 |
patrick |
3 |
kk_002 |
101 |
patrick |
4 |
kk_003 |
102 |
patrick |
6.666 |
kk_004 |
104 |
patrick |
In this case, for each Purchase
event, the value of
AveragePurchaseByCustomer
computed for the purchases customer_id
at
the time of the purchase is produced. The value being looked up (in
this case AveragePurchaseByCustomer
) is referred to as the foreign
value, while the value describing the foreign entity (in this case
Purchase.customer_id
) is referred to as the key value.
Lookups are similar to SQL left-joins: a foreign value is produced for each key value. In contrast to SQL joins, the lookup produces the foreign expression value at the point in time associated with each key expression value.
Time Travel: Shifting Features Forward in Time
We would like to predict if a purchase will result in a fraud report
within 30 days of the purchase. We began by describing our Target
value, and then we described two features that could be useful for
making such a prediction: PurchaseTotal
and CustomerAveragePurchase
.
For our model to make predictions about the future, it must be trained on features and target values computed at different points in time - we would like the target value to be computed 30 days after the feature values.
Fenl allows values to "time-travel" forward in time. This can be accomplished by shifting the feature expressions forward in time by 30 days:
let ShiftedPurchaseTime = PurchaseTotal.time | add_time(days(30))
let ShiftedCustomerAverageTime = CustomerAveragePurchase.time | add_time(days(30))
let ShiftedPurchaseTotal = PurchaseTotal | shift_to(ShiftedPurchaseTime)
let ShiftedCustomerAveragePurchase = CustomerAveragePurchase | shift_to(ShiftedCustomerAverageTime)
entity | time | ShiftedPurchaseTotal | ShiftedCustomerAveragePurchase |
---|---|---|---|
cb_001 |
130 |
9 |
9 |
cb_002 |
131 |
2 |
5.5 |
cb_003 |
132 |
4 |
5 |
cb_004 |
133 |
5000 |
1255 |
cb_005 |
133 |
3 |
4.5 |
cb_006 |
134 |
5 |
4.6 |
kk_001 |
130 |
3 |
3 |
kk_002 |
131 |
5 |
4 |
kk_003 |
132 |
12 |
6.666 |
kk_004 |
134 |
9 |
1005.8 |
The result of these shift operations contain the same values as
PurchaseTotal
and CustomerAveragePurchase
, but the times associated
with each value will be 30 days later. We can now describe our training
set by combining the shifted predictor values with the non-shifted
target value:
let TrainingExample = {
p_total: ShiftedPurchaseTotal,
avg_purchase: ShiftedCustomerAveragePurchase,
target: Target,
}
entity | time | p_total | avg_purchase | target |
---|---|---|---|---|
cb_001 |
130 |
9 |
9 |
false |
cb_002 |
131 |
2 |
5.5 |
false |
cb_003 |
132 |
4 |
5 |
false |
cb_004 |
133 |
5000 |
1255 |
true |
cb_005 |
133 |
3 |
4.5 |
false |
cb_006 |
134 |
5 |
4.6 |
false |
kk_001 |
130 |
3 |
3 |
false |
kk_002 |
131 |
5 |
4 |
false |
kk_003 |
132 |
12 |
6.666 |
false |
kk_004 |
134 |
9 |
1005.8 |
false |
Values cannot travel backwards in time. This helps to ensure that temporal leakage cannot happen. |
Going to Production: Feature Vectors
Once a model has been trained, we’ll need to compute feature vectors for making predictions. Feature vectors consist of the non-shifted predictor expressions but not the target value.
let FeatureVector = {
p_total: PurchaseTotal,
avg_purchase: CustomerAveragePurchase,
}
entity | time | p_total | avg_purchase |
---|---|---|---|
cb_001 |
100 |
9 |
9 |
cb_002 |
101 |
2 |
5.5 |
cb_003 |
102 |
4 |
5 |
cb_004 |
103 |
5000 |
1255 |
cb_005 |
103 |
3 |
4.5 |
cb_006 |
104 |
5 |
4.6 |
kk_001 |
100 |
3 |
3 |
kk_002 |
101 |
5 |
4 |
kk_003 |
102 |
12 |
6.666 |
kk_004 |
104 |
9 |
1005.8 |
|