Fraud Detection and Analysis With Flink and Kafka Using the Eventador Platform

January 29, 2020 in Continuous SQL



Fraud Detection and Analysis With Flink and Kafka Using the Eventador Platform

Fraudulent transactions cost financial institutions billions of dollars annually. Streaming data and stream processing architectures present new opportunities to help discover, alert, and manage fraudulent activities. A robust and low latency fraud analysis pipeline/risk engine makes the financial institution more competitive and more trusted.

There are a number of mechanisms to build risk engines that can be employed in modern stream processing paradigms and on the Eventador Platform. The exact design of the system employed depends greatly on the type of business, use case, and fraud detection algorithm. The Eventador Platform supports modern approaches to fraud detection.

Fraud Pipelines with Eventador

A common approach is to represent financial transactions of some type (ATM withdrawals, credit authorizations, etc) as a stream of data – perhaps living in a collection of Kafka topics. A production pipeline can be created by processing this stream of data detecting potential fraud and sending results to various stakeholder applications. Eventador exposes a simple to use SQL interface using SQLStreamBuilder for easily inspecting and processing potential fraud. If more CEP or perhaps even a rich Machine Learning framework must be employed these jobs can run using Runtime for Flink.

Transaction origin --> Apache Kafka --> Fraud Processing Engine --> Customer Alerting Framework
                                                                --> Risk Team Fraud Dashboard
                                                                --> Enterprise Data Warehouse
                                       |   Eventador Platform  |
                                       |      SQL / Flink      |
                                       |      ML  / CEP        |

Fraud detection on streams using Continuous SQL with the Eventador Platform

The Eventador Platform, with SQLStreamBuilder, is a robust interface for creating production-grade stream processing jobs via Continuous SQL. Taking the example where credit card authorizations are streamed one message per authorization with the following fairly simple schema:

{
"name": "Payment Auths",
"type": "record",
"namespace": "com.eventador.auths",
"fields": [
{
"name": "card",
"type": "string"
},
{
"name": "amount",
"type": "int"
},
{
"name": "lat",
"type": "int"
},
{
"name": "lon",
"type": "int"
},
{
"name": "userid",
"type": "int"
}
]
}
view raw fraud_detection.json hosted with ❤ by GitHub

Multiple jobs can be created processing this stream in various ways, the output constantly being sent to an output sink, some sort of alerting framework, or maybe displayed in an application or dashboard. Processing events in this manner is relatively simple and straightforward.

-- detect multiple auths in a short window
-- send to alerting framework
SELECT card,
MAX(amount) as theamount,
TUMBLE_END(eventTimestamp, interval '5' minute) as ts
FROM auths
WHERE lat IS NOT NULL
AND lon IS NOT NULL
GROUP BY card, TUMBLE(eventTimestamp, interval '5' minute)
HAVING COUNT(*) > 4 -- >4==fraud
view raw fraud_detection.sql hosted with ❤ by GitHub
-- detect multiple large transactions within an 15m period
-- send to alerting framework
SELECT card, userid
TUMBLE_END(eventTimestamp, interval '15' minute) as ts
FROM auths
WHERE amount > 10000 -- large transaction
GROUP BY card, userid, TUMBLE(eventTimestamp, interval '15' minute)
view raw fraud_detection.sql hosted with ❤ by GitHub
-- show volume of transactions by location
-- for generating heatmap in Kibana for risk team
SELECT amount, lat, lon
FROM auths
WHERE eventTimestamp > CURRENT_TIMESTAMP-interval '12' hour;
view raw fraud_detection.sql hosted with ❤ by GitHub
-- aggregate transactions over time and for enterprise data Warehouse
-- use hourly because storage in EDW is expensive
SELECT
card,
amount,
userid,
eventTimestamp as transaction_time
FROM auths
GROUP BY TUMBLE(eventTimestamp, interval '1' hour)
view raw fraud_detection.sql hosted with ❤ by GitHub

Complex Event Processing with Apache Flink on Eventador

Complex Event Processing (CEP) has become a popular way to inspect streams of data for various patterns that the enterprise may be interested in. CEP is exposed as a library that allows financial events to be matched against various patterns to detect fraud. Eventador Platform exposes a robust framework for running CEP on streams of data. Apache Flink exposes a rich Pattern API in Java or Scala, and Eventador provides the processing runtime.

The Pattern API allows complex pattern sequences to be defined against the incoming stream. These are built from one or more simple patterns in a chain or sequence. Matches to the pattern are returned and can be routed to a sink similar to the Continuous SQL examples above.

A singleton pattern could simply be:

Pattern.<MonitoringEvent>begin("First Event")
.subtype(TransactionEvent.class)
.where(evt -> evt.getTransaction() >= LARGE_TRANSACTION_THRESHOLD);
view raw fraud_detection.java hosted with ❤ by GitHub

And can be sequenced for more complex logic:

Pattern<MonitoringEvent, ?> warningPattern = Pattern.<MonitoringEvent>begin("First Event")
.subtype(TransactionEvent.class)
.where(evt -> evt.getTransaction() >= LARGE_TRANSACTION_THRESHOLD)
.next("Second Event")
.subtype(TransactionEvent.class)
.where(evt -> evt.getPIN() >= PIN_MISTYPE_THRESHOLD)
.within(Time.seconds(10));
view raw fraud_detection.java hosted with ❤ by GitHub

Once a PatternStream is detected from sequences, transformations can be applied, and results emitted. A more complete example would look like:

StreamExecutionEnvironment env = ...
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream<Event> input = ...
DataStream<Event> partitionedInput = input.keyBy(new KeySelector<Event, Integer>() {
@Override
public Integer getKey(Event value) throws Exception {
return value.getId();
}
});
Pattern<MonitoringEvent, ?> warningPattern = Pattern.<MonitoringEvent>begin("First Event")
.subtype(TransactionEvent.class)
.where(evt -> evt.getTransaction() >= LARGE_TRANSACTION_THRESHOLD)
.next("Second Event")
.subtype(TransactionEvent.class)
.where(evt -> evt.getPIN() >= PIN_MISTYPE_THRESHOLD)
.within(Time.seconds(10));
PatternStream<Event> patternStream = CEP.pattern(partitionedInput, pattern);
DataStream<Alert> alerts = patternStream.select(new PatternSelectFunction<Event, Alert>() {
@Override
public Alert select(Map<String, List<Event>> pattern) throws Exception {
return createAlert(pattern);
}
});
view raw fraud_detection.java hosted with ❤ by GitHub

The Pattern library has powerful functions for quantifiers, conditions, combining patterns, groups, and more.

Complex Event Processing (CEP) using Continuous SQL on Eventador Platform

Combining the ease of use of SQL with the power of CEP is possible using Eventador SQLStreamBuilder. Apache Flink provides the underlying framework via the `MATCH_RECOGNIZE` grammar and is exposed via SQLStreamBuilder in a simple declarative manner. Patterns and sequences can be created and matched similar to the underlying Java language bindings including `PARTITION BY`, `ORDER BY`, `PATTERN`, `MEASURES`, `DEFINE`, and more.

For example, consider the following query that finds 1 or more rows that are above a certain threshold and growing:

SELECT *
FROM paymentauths
MATCH_RECOGNIZE(
PARTITION BY card
ORDER BY eventTimestamp
MEASURES
F.amount AS first_amount,
E.amount AS last_amount
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (F+ E) -- match 1 or more rows
DEFINE
F AS F.amount IS NOT NULL AND F.amount > 10, -- lower boundary
E AS E.amount IS NOT NULL AND F.amount < E.amount) -- starting value less than ending value
view raw fraud_detection.sql hosted with ❤ by GitHub

For more reading on running simple Continuous SQL on Eventador to full CEP processing in SQL check out the full set of documentation is available in the Apache Flink docs.

If you want to give some of these examples a try, you can jump on Eventador.cloud and get started for free.

Leave a Reply

Your email address will not be published. Required fields are marked *