Snowplow pipeline

The Objectiv Collector supports using the Snowplow pipeline as a sink for Objectiv events, hooking directly into Snowplow's enrichment step. Currently, there is data store support for:

Google BigQuery, via Google PubSub; and
Amazon S3, via AWS SQS/Kinesis.

How to set up Objectiv with Snowplow

In this setup, we assume you already have a fully functional Snowplow pipeline running, including enrichment, loader and iglu repository. If you haven't, please see the Snowplow quickstart for Open Source.

Enabling Objectiv involves two steps, as explained next:

Adding the Objectiv Taxonomy schema to the iglu repository;
Configuring the Objectiv Collector output to push events into the appropriate message queue.

1. Add the Objectiv schema to the iglu repo

This step is required so the Snowplow pipeline (enrichment) can validate the incoming custom contexts.

Preparation

copy the Objectiv iglu schemas (see here)
get the address / URL of your iglu repository;
get the uuid of the repo.

Pushing the schema

java -jar igluctl static push --public <path to iglu schemas> <url to repo> <uuid>

## example:
java -jar igluctl static push --public ./iglu https://iglu.example.com myuuid-abcd-abcd-abcd-abcdef12345

2. Configure output to push events to the data store

The Collector can be configured to push events into a Snowplow message queue, using environment variables.

To send output to GCP/BigQuery, please refer to BigQuery instructions.
To send output to AWS SQS/Kinesis, please refer to Amazon S3 instructions.

Background

The Snowplow pipeline roughly consists of the following components:

Collector: http(s) endpoint that receives events;
Enrichment: process that validates incoming events, potentially enriches them (adds metadata);
Loader: final step, where the validated and enriched events are loaded into persistent storage. Depending on your choice of platform, this could be BigQuery on GCP, Redshift on AWS, etc.;
iglu: central repository used by other components to pull schema for validation on events, contexts, etc.

The Snowplow pipeline uses message queues and Thrift messages to communicate between the components.

Objectiv uses its own Collector (which also handles validation) that bypasses the Snowplow collector, and pushes events directly into the message queue that is read by the enrichment.

Snowplow allows for so-called structured custom contexts to be added to events. This is exactly what Objectiv uses. As with all contexts, they must pass validation in the enrichment step, which is why a schema for the Objectiv custom context must be added to iglu, so Snowplow knows how to validate the context. Furthermore, it infers the database schema to be able to persist the context. How this is handled depends on the loader chosen, e.g. Postgres uses a more relational schema than BigQuery.

Objectiv to Snowplow events mapping

In a standard Snowplow setup, all data is stored in a table called events. Objectiv data is stored in the table by mapping the Objectiv event properties on the respective Snowplow properties. Objectiv's contexts are stored in custom contexts.

Events

Event and some context properties are mapped onto the Snowplow events table directly. See table below for details:

Objectiv property	SP Tracker property	Snowplow property
event.event_id	eid	event_id
event.time	ttm	true_stamp
event._type	se_ca	se_category
ApplicationContext.id	aid	app_id
CookieIdContext.id	networkUserId	network_userid
HttpContext.referrer	refr	page_referrer
HttpContext.remote_address	ip	user_ipaddress
PathContext.id	url	page_url

Global contexts

For every global context, a specific custom context is created, with its own schema in Iglu. Naming scheme is io.objectiv.context/SomeContext

NOTE: the _type and _types properties have been removed.

Location stack

As order is significant in the location stack, a slightly different approach is taken in storing it. The location stack is stored as a nested structure in a custom context (io.objectiv/location_stack)

Snowplow pipeline

How to set up Objectiv with Snowplow​

1. Add the Objectiv schema to the iglu repo​

Preparation​

Pushing the schema​

2. Configure output to push events to the data store​

Background​

Objectiv to Snowplow events mapping​

Events​

Global contexts​

Location stack​