Release: Amazon Athena modeling support - a new step in our mission to enable reusable data models to run across data stores

Ivar Pruijn

Our mission from day one has been to enable data teams to share and reuse data models & analyses across any data store - to build on the collective knowledge of others.

Today we're taking another step in that mission, by adding support for building & running data models directly on Amazon Athena with Objectiv.

You can now store your data tracked with Objectiv on Amazon S3, and then build advanced data models with pandas-like syntax in your notebooks, which runs directly on the Amazon Athena query service. All powered by Bach, our SQL abstraction layer.

You also have access to the open model hub, where you can take pre-built models such as user retention off-the-shelf and use them instantly to create BI dashboards, or deploy to tools like dbt. All the models work across data stores.

tip

Run Objectiv Up in just a few minutes to see pre-packaged product & marketing analytics notebooks and BI dashboards in action - which now also runs seamlessly on Athena.

Next to Athena, we support Google BigQuery & PostgreSQL. Our roadmap includes all the popular data stores: Databricks, Amazon Redshift, ClickHouse, Snowflake, etcetera.

Run data models on any data store with Objectiv, and use the results directly in tools like BI dashboards & dbt

Run data models on any data store with Objectiv, and use the results directly in tools like BI dashboards & dbt

Setting it up

See our Athena documentation for all the details. It takes just a few steps:

  1. Configure the Collector to store data on Amazon S3, through Snowplow.
  2. Configure Athena to query the data on S3, create a table and an account.
  3. Finally, simply provide the modelhub/Bach library a URL to connect to.
Store Objectiv data on S3 & query Amazon Athena directly

Store Objectiv data on S3 & query Amazon Athena directly

An example: pre-built retention analysis on Amazon Athena

One of the models in the open model hub is a retention_matrix, to analyze user retention/churn. Using the modelhub library in your notebook, you can execute the model in one operation, which will translate it to SQL under the hood, and run that directly on Athena:

retention_matrix = modelhub.aggregate.retention_matrix(
df,
time_period='weekly',
percentage=True,
display=True)
retention_matrix.head()
A regular query running in the Athena web interface

A regular query running in the Athena web interface

Once the query completes, the result is returned and shown in a heatmap in your notebook (or as a DataFrame, if you use parameter display=False):

Data model running on Athena, in a Jupyter notebook

Data model running on Athena, in a Jupyter notebook

Optionally, if you want to reduce data usage or query complexity, you can work with a sample, or temporarily materialize intermediate results.

You can further build out your analysis based on the results of this model, and/or export the resulting SQL at any time, to use in other tools like a BI dashboard.

Modeling results exported directly to a BI dashboard

Modeling results exported directly to a BI dashboard

What's next?

Of course we'll expand support to all popular data stores, such as Databricks, Amazon Redshift, ClickHouse, Snowflake, etcetera.

Next to this, we'll make it easy to export (intermediate) modeling results to other tools, such as dbt, including variables. Under the hood, this is supported by the Directed Acyclic Graph (DAG) of SQL operations that the models are made up of.

Under the hood, models built in Objectiv are a DAG of SQL operations   (shown here: a snippet of a retention_matrix model)

Under the hood, models built in Objectiv are a DAG of SQL operations   (shown here: a snippet of a retention_matrix model)

Enjoy working with Objectiv on Athena, and let us know if you have any feedback on Slack.

info

Office Hours

If you have any questions about this release or anything else, or if you just want to say 'Hi!' to team Objectiv, we have Office Hours every Thursday at 4pm CET, 10am EST that you can freely dial in to. If you're in a timezone that doesn’t fit well, just ping us on Slack and we'll send over an invite for a better moment.

Join the Office Hours

Try Objectiv

Get Objectiv Up - Try Objectiv on your local machine (takes 5 minutes)
Objectiv on GitHub - Check out the project and star us for future reference
Objectiv on Slack - Join the discussion or get help