Open taxonomy how-to

This notebook demonstrates what you can do with the Bach modeling library and a dataset that is validated against the open analytics taxonomy.

It’s also available in a notebook to run on your own data. The dataset used here is the same as in Objectiv Up.

The Objectiv Bach API is strongly pandas-like, to provide a familiar interface to handle large amounts of data in a python environment, while supporting multiple data stores. See an intro into the pandas API here.

This example uses real data collected with Objectiv’s Tracking SDK on objectiv.io, stored in an SQL database.

Getting started

First we have to install the open model hub and instantiate the Objectiv DataFrame object; see getting started in your notebook.

The open model hub is a toolkit with functions and models that can run directly on a full dataset collected with Objectiv’s Tracker SDKs. The get_objectiv_dataframe() operation creates a Bach DataFrame that has all columns and data types set correctly, and as such can always be used with models from the open model hub.

By instantiating the model hub with a global_contexts parameter, all global contexts that are needed in the following analyses are added to the DataFrame. In this example, we select ‘application’ and ‘marketing’ contexts. Later in this notebook we’ll give more details on what data is available in the global contexts and how to access this data for analyses.

In:

# set the timeframe of the analysis
start_date = '2022-03-01'
end_date = None

In:

from modelhub import ModelHub, display_sql_as_markdown
# instantiate the model hub, set the default time aggregation to daily
# and get the global contexts that will be used in this example
modelhub = ModelHub(time_aggregation='%Y-%m-%d', global_contexts=['application', 'marketing'])
# get an Objectiv DataFrame within a defined timeframe
df = modelhub.get_objectiv_dataframe(db_url=DB_URL, start_date=start_date, end_date=end_date)

The data for this DataFrame is still in the database, and the database is not queried before any of the data is loaded to the python environment. The methods that query the database are:

head
to_pandas
get_sample
to_numpy
The property accessors Series.array, and Series.value

For demo purposes of this notebook, these methods are called often to show the results of our operations. To limit the number of executed queries on the full dataset, it is recommended to use these methods less often or to sample the data first.

Reference

modelhub.ModelHub.get_objectiv_dataframe

The data

The DataFrame contains:

The index. This is a unique identifier for every hit.

In:

df.index_dtypes

Out:

{'event_id': 'uuid'}

The event data. These columns contain all information about the event.

In:

df.dtypes

Out:

{'day': 'date',
 'moment': 'timestamp',
 'user_id': 'uuid',
 'location_stack': 'objectiv_location_stack',
 'event_type': 'string',
 'stack_event_types': 'json',
 'session_id': 'int64',
 'session_hit_number': 'int64',
 'application': 'objectiv_global_context',
 'marketing': 'objectiv_global_context'}

What’s in these columns:

day: the day of the session as a date.
moment: the exact moment of the event.
user_id: the unique identifier of the user based on the cookie.
location_stack: a JSON-like data column that stores information on the exact location where the event is triggered in the product’s UI. See below for a more detailed explanation.
event_type: the type of event that is logged.
stack_event_types: the parents of the event_type.
session_id: a unique incremented integer id for each session. Starts at 1 for the selected data in the DataFrame.
session_hit_number: a incremented integer id for each hit in session ordered by moment.

Besides these ‘standard’ columns, the DataFrame contains additional columns that are extracted from the global contexts:

application
marketing

See more about global contexts here.

A preview of the data below, showing the latest PressEvents.

In:

df[df.event_type == 'PressEvent'].sort_values('moment', ascending=False).head()

Out:

                                             day                  moment                               user_id                                     location_stack  event_type                              stack_event_types  session_id  session_hit_number                                        application marketing
event_id
252d7d87-5600-4d90-b24f-2a6fb8986c5e  2022-06-30 2022-06-30 21:40:30.117  2d718142-9be7-4975-a669-ba022fd8fd48  [{'id': 'home', '_type': 'RootLocationContext'...  PressEvent  [AbstractEvent, InteractiveEvent, PressEvent]         872                   2  [{'id': 'objectiv-website', '_type': 'Applicat...        []
157a3000-bbfc-42e0-b857-901bd578ea7c  2022-06-30 2022-06-30 21:40:16.908  2d718142-9be7-4975-a669-ba022fd8fd48  [{'id': 'home', '_type': 'RootLocationContext'...  PressEvent  [AbstractEvent, InteractiveEvent, PressEvent]         872                   1  [{'id': 'objectiv-website', '_type': 'Applicat...        []
835b918b-23a3-46ce-b959-337adc9511de  2022-06-30 2022-06-30 19:52:41.870  549c8309-0851-4253-b11c-a41230c6d273  [{'id': 'home', '_type': 'RootLocationContext'...  PressEvent  [AbstractEvent, InteractiveEvent, PressEvent]         870                   1  [{'id': 'objectiv-website', '_type': 'Applicat...        []
d8969faa-af29-4b9a-8dbe-6244c30953dc  2022-06-30 2022-06-30 17:14:29.033  5f5b80ec-f14f-4e81-a8f4-7015ba4c0b47  [{'id': 'tracking', '_type': 'RootLocationCont...  PressEvent  [AbstractEvent, InteractiveEvent, PressEvent]         868                  38  [{'id': 'objectiv-docs', '_type': 'Application...        []
7f87a658-666c-4a81-8da3-8d134d91856b  2022-06-30 2022-06-30 17:14:28.471  5f5b80ec-f14f-4e81-a8f4-7015ba4c0b47  [{'id': 'taxonomy', '_type': 'RootLocationCont...  PressEvent  [AbstractEvent, InteractiveEvent, PressEvent]         868                  37  [{'id': 'objectiv-docs', '_type': 'Application...        []

Reference

The Open Taxonomy

Data in a DataFrame created with get_objectiv_dataframe follows the open analytics taxonomy:

event_type column: describes the type of interactive or non-interactive event.
location_stack column: describes where an event exactly happened in the user interface.
global contexts data: general information about the state in which an event happened.

The following section goes through these concepts one-by-one.

event_type column

The event_type describes what type of event was triggered. The goal of the open taxonomy is to label all interactive and non-interactive events in a standardized way. Together with the location_stack, the event_type ‘defines’ what happened with, or on the product.

In:

df[df.day == '2022-06-01'].event_type.head()

Out:

event_id
d1c72d21-4233-40dc-b93d-3323dbf4cf75                PressEvent
0928c734-8b6b-489d-905e-1c7a36257799            MediaLoadEvent
6a447240-dc52-4087-9fb7-65608479887d              VisibleEvent
2dd0c6d1-9973-4c77-979e-9a9f2fb26d9b    ApplicationLoadedEvent
bbebf1f5-46c7-4497-8edb-20b44ca458d9            MediaLoadEvent
Name: event_type, dtype: object

location stack and global contexts

The location stack and global contexts are stored as JSON type data. Within the DataFrame, it is easy to access data in JSON data based on position or content.

Slicing the JSON data

With the .json[] syntax you can slice the array using integers. Instead of integers, dictionaries can also be passed to ‘query’ the JSON array. If the passed dictionary matches a context object in the stack, all objects of the stack starting (or ending, depending on the slice) at that object will be returned.

An example

Consider a JSON array that looks like this (this is a real example of a location stack):

[{"id": "docs", "_type": "RootLocationContext"},
 {"id": "docs-sidebar", "_type": "NavigationContext"},
 {"id": "API Reference", "_type": "ExpandableContext"},
 {"id": "DataFrame", "_type": "ExpandableContext"},
 {"id": "Overview", "_type": "LinkContext"}]

Regular slicing

df.location_stack.json[2:4]

For the example array it would return:

[{"id": "API Reference", "_type": "ExpandableContext"},
 {"id": "DataFrame", "_type": "ExpandableContext"}]

Slicing by querying

We want to return only the part of the array starting at the object that contain this object:

{"id": "docs-sidebar", "_type": "NavigationContext"}

The syntax for selecting like this is:

df.location_stack.json[{"id": "docs-sidebar", "_type": "NavigationContext"}:]

For the example array it would return:

[{'id': 'docs-sidebar', '_type': 'NavigationContext'},
 {'id': 'API Reference', '_type': 'ExpandableContext'},
 {'id': 'DataFrame', '_type': 'ExpandableContext'},
 {'id': 'Overview', '_type': 'LinkContext'}]

In case a JSON array does not contain the object, None is returned. More info at the API reference.

location_stack column

The location_stack column in the DataFrame stores the information on where an event exactly happened in the user interface. The example used above is the location stack of a link to the DataFrame API reference, in the menu on our documentation pages.

Because of the specific way the location information is labeled, validated, and stored using the open analytics taxonomy, it can be used to easily slice and group your product’s features. The column is set as an objectiv_location_stack type, and therefore location stack specific methods can be used to access the data from the location_stack. These methods can be used using the .ls accessor on the column:

For example:

df.location_stack.ls.nice_name

returns ‘Link: Overview located at Root Location: docs => Navigation: docs-sidebar => Expandable: API Reference => Expandable: DataFrame’ for the location stack mentioned above.

See the full reference of the location stack here. An example location stack for a PressEvent is queried below:

In:

df[df.event_type == 'PressEvent'].location_stack.head(1)[0]

Out:

[{'id': 'home', '_type': 'RootLocationContext', '_types': ['AbstractContext', 'AbstractLocationContext', 'RootLocationContext']}, {'id': 'navbar-top', '_type': 'NavigationContext', '_types': ['AbstractContext', 'AbstractLocationContext', 'NavigationContext']}, {'id': 'about-us', 'href': '/about', '_type': 'LinkContext', '_types': ['AbstractContext', 'AbstractLocationContext', 'LinkContext', 'PressableContext']}]

global_contexts

Global contexts contain all general information that is relevant to the logged event. To optimize data processing, not all data that is stored in the global contexts in the database is loaded into the DataFrame when it is created. Data columns are only created for the global contexts that are selected when the model hub is instantiated. In this example, those columns are application and marketing.

Each selected global context is a JSON-like column of the ‘objectiv_global_context’ type, and therefore contains multiple key-value pairs. The data in these JSON columns can be accessed with the context accessor on the respective columns. For example to get the ID of the application as a Series, you use:

df.application.context.id

Similarly, the application ID can be set as regular (text) column in the DataFrame:

df['application_id'] = df.application.context.id

See the full reference of all available global contexts in the open taxonomy here. Each global context always has an ‘id’ key that uniquely identifies the global context of that type. Additional keys are shown in the blocks of each context in the reference.

From the marketing context, for example, we can therefore also get the ‘source’ as a column:

df['marketing_source'] = df.marketing.context.source

When instantiating the model hub, global contexts are added using the name of the context without the word ‘Context’ and converted to ‘snake_case’ (the name of the context split before every capital letter and joined with an underscore), i.e. to add the HttpContext use ‘http’ and to add the InputValueContext use ‘input_value’:

modelhub = ModelHub(global_contexts=['http', 'input_value'])

In the case you later want to add other data from the global contexts to your DataFrame, you will have to re-instantiate the model hub with those contexts and recreate the DataFrame. Note that no data has to be processed for recreating the DataFrame until the data gets queried (by using .head() or similar).

In:

# we create the columns from the examples above, and show the data.
df['application_id'] = df.application.context.id
df['marketing_source'] = df.marketing.context.source

In:

# we can now show the columns where the marketing source is not null.
df[df.marketing_source.notnull()][['application', 'marketing', 'application_id', 'marketing_source']].head()

Out:

                                                                            application                                          marketing    application_id marketing_source
event_id
d1c72d21-4233-40dc-b93d-3323dbf4cf75  [{'id': 'objectiv-website', '_type': 'Applicat...  [{'id': 'utm', 'term': None, '_type': 'Marketi...  objectiv-website          twitter
814f1861-c5b4-4f65-b475-2d37112b91de  [{'id': 'objectiv-website', '_type': 'Applicat...  [{'id': 'utm', 'term': None, '_type': 'Marketi...  objectiv-website         linkedin
ada3c06a-e669-4588-803b-aea8768e575f  [{'id': 'objectiv-website', '_type': 'Applicat...  [{'id': 'utm', 'term': None, '_type': 'Marketi...  objectiv-website         linkedin
63010d9d-730d-443f-9ba4-2c78ffe602cb  [{'id': 'objectiv-website', '_type': 'Applicat...  [{'id': 'utm', 'term': None, '_type': 'Marketi...  objectiv-website         linkedin
02d31e6b-c668-47bb-b6f4-356d997f9863  [{'id': 'objectiv-website', '_type': 'Applicat...  [{'id': 'utm', 'term': None, '_type': 'Marketi...  objectiv-website         linkedin

Reference

Sampling

One of the key features to Objectiv Bach is that it runs on your full dataset. There can, however, be situations where you want to experiment with your data, meaning you have to query the full dataset often, which can become slow and/or costly.

To limit this, it’s possible to do operations on a sample of the full dataset. All operations can easily be applied to the full dataset again at any time.

Below we create a sample that randomly selects ~1% of all the rows in the data, using the get_sample() operation. A table containing the sampled is written to the database, therefore the table_name must be provided when creating the sample.

In:

# for BigQuery the table name should be 'YOUR_PROJECT.YOUR_WRITABLE_DATASET.YOUR_TABLE_NAME'
df_sample = df.get_sample(table_name='sample_data', sample_percentage=10, overwrite=True)

A new column is created in the sample.

In:

df_sample['root_location_contexts'] = df_sample.location_stack.json[:1]
df_sample.sort_values('moment', ascending=False).head()

Out:

                                             day                  moment                               user_id                                     location_stack              event_type                                  stack_event_types  session_id  session_hit_number                                        application marketing    application_id marketing_source                             root_location_contexts
event_id
157a3000-bbfc-42e0-b857-901bd578ea7c  2022-06-30 2022-06-30 21:40:16.908  2d718142-9be7-4975-a669-ba022fd8fd48  [{'id': 'home', '_type': 'RootLocationContext'...              PressEvent      [AbstractEvent, InteractiveEvent, PressEvent]         872                   1  [{'id': 'objectiv-website', '_type': 'Applicat...        []  objectiv-website             None  [{'id': 'home', '_type': 'RootLocationContext'...
14d2fbb9-32f9-4031-955a-05935909d754  2022-06-30 2022-06-30 18:03:41.139  fdf31cae-4906-476a-b206-e0a9644ced52  [{'id': 'home', '_type': 'RootLocationContext'...  ApplicationLoadedEvent  [AbstractEvent, ApplicationLoadedEvent, NonInt...         869                   1  [{'id': 'objectiv-docs', '_type': 'Application...        []     objectiv-docs             None  [{'id': 'home', '_type': 'RootLocationContext'...
7f87a658-666c-4a81-8da3-8d134d91856b  2022-06-30 2022-06-30 17:14:28.471  5f5b80ec-f14f-4e81-a8f4-7015ba4c0b47  [{'id': 'taxonomy', '_type': 'RootLocationCont...              PressEvent      [AbstractEvent, InteractiveEvent, PressEvent]         868                  37  [{'id': 'objectiv-docs', '_type': 'Application...        []     objectiv-docs             None  [{'id': 'taxonomy', '_type': 'RootLocationCont...
454997de-09cb-440b-b288-83d48280e2d6  2022-06-30 2022-06-30 17:14:20.194  5f5b80ec-f14f-4e81-a8f4-7015ba4c0b47  [{'id': 'home', '_type': 'RootLocationContext'...            VisibleEvent  [AbstractEvent, NonInteractiveEvent, VisibleEv...         868                  33  [{'id': 'objectiv-docs', '_type': 'Application...        []     objectiv-docs             None  [{'id': 'home', '_type': 'RootLocationContext'...
566c9831-582c-4ec3-861a-6893c65b7b29  2022-06-30 2022-06-30 17:14:13.815  5f5b80ec-f14f-4e81-a8f4-7015ba4c0b47  [{'id': 'home', '_type': 'RootLocationContext'...            VisibleEvent  [AbstractEvent, NonInteractiveEvent, VisibleEv...         868                  31  [{'id': 'objectiv-docs', '_type': 'Application...        []     objectiv-docs             None  [{'id': 'home', '_type': 'RootLocationContext'...

Using the .get_unsampled() operation, the operations that are done on the sample (the creation of the column), are applied to the entire data set:

In:

df_unsampled = df_sample.get_unsampled()
df_unsampled.sort_values('moment', ascending=False).head()

Out:

                                             day                  moment                               user_id                                                                                location_stack              event_type                                                 stack_event_types  session_id  session_hit_number                                                                                   application marketing    application_id
event_id
96b5e709-bb8a-46de-ac82-245be25dac29  2022-06-30 2022-06-30 21:40:32.401  2d718142-9be7-4975-a669-ba022fd8fd48  [{'id': 'home', '_type': 'RootLocationContext', '_types': ['AbstractContext', 'AbstractLo...            VisibleEvent                [AbstractEvent, NonInteractiveEvent, VisibleEvent]         872                   3  [{'id': 'objectiv-website', '_type': 'ApplicationContext', '_types': ['AbstractContext', ...        []  objectiv-website
252d7d87-5600-4d90-b24f-2a6fb8986c5e  2022-06-30 2022-06-30 21:40:30.117  2d718142-9be7-4975-a669-ba022fd8fd48  [{'id': 'home', '_type': 'RootLocationContext', '_types': ['AbstractContext', 'AbstractLo...              PressEvent                     [AbstractEvent, InteractiveEvent, PressEvent]         872                   2  [{'id': 'objectiv-website', '_type': 'ApplicationContext', '_types': ['AbstractContext', ...        []  objectiv-website
157a3000-bbfc-42e0-b857-901bd578ea7c  2022-06-30 2022-06-30 21:40:16.908  2d718142-9be7-4975-a669-ba022fd8fd48  [{'id': 'home', '_type': 'RootLocationContext', '_types': ['AbstractContext', 'AbstractLo...              PressEvent                     [AbstractEvent, InteractiveEvent, PressEvent]         872                   1  [{'id': 'objectiv-website', '_type': 'ApplicationContext', '_types': ['AbstractContext', ...        []  objectiv-website
8543f519-d3a4-4af6-89f5-cb04393944b8  2022-06-30 2022-06-30 20:43:50.962  bb127c9e-3067-4375-9c73-cb86be332660  [{'id': 'home', '_type': 'RootLocationContext', '_types': ['AbstractContext', 'AbstractLo...          MediaLoadEvent  [AbstractEvent, MediaEvent, MediaLoadEvent, NonInteractiveEvent]         871                   2  [{'id': 'objectiv-website', '_type': 'ApplicationContext', '_types': ['AbstractContext', ...        []  objectiv-website
a0ad4364-57e0-4da9-a266-057744550cc2  2022-06-30 2022-06-30 20:43:49.820  bb127c9e-3067-4375-9c73-cb86be332660  [{'id': 'home', '_type': 'RootLocationContext', '_types': ['AbstractContext', 'AbstractLo...  ApplicationLoadedEvent      [AbstractEvent, ApplicationLoadedEvent, NonInteractiveEvent]         871                   1  [{'id': 'objectiv-website', '_type': 'ApplicationContext', '_types': ['AbstractContext', ...        []  objectiv-website

The sample can also be used for grouping and aggregating. The example below counts all hits and the unique event_types in the sample:

In:

df_sample_grouped = df_sample.groupby(['application_id']).agg({'event_type':'nunique','session_hit_number':'count'})
df_sample_grouped.head()

Out:

                  event_type_nunique  session_hit_number_count
application_id
objectiv-docs                      5                       399
objectiv-website                   8                       235

As can be seen from the counts, unsampling applies the transformation to the entire data set:

In:

df_unsampled_grouped = df_sample_grouped.get_unsampled()
df_unsampled_grouped.head()

Out:

                  event_type_nunique  session_hit_number_count
application_id
objectiv-docs                      8                      4106
objectiv-website                  10                      2292

Reference

Get the SQL for any analysis

The SQL for any analysis can be exported with one command, so you can use models in production directly to simplify data debugging & delivery to BI tools like Metabase, dbt, etc. See how you can quickly create BI dashboards with this.

In:

# show the underlying SQL for this dataframe - works for any dataframe/model in Objectiv
display_sql_as_markdown(press_events)

Out:

WITH "manual_materialize___cc346414e92dd9adcc7ea86035771176" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "cookie_id" AS "user_id",
               cast("value"->>'_type' AS text) AS "event_type",
               cast(cast("value"->>'_types' AS text) AS JSONB) AS "stack_event_types",
               cast(cast("value"->>'location_stack' AS text) AS JSONB) AS "location_stack",
               cast(cast("value"->>'time' AS text) AS bigint) AS "time",
               jsonb_path_query_array(cast(cast("value"->>'global_contexts' AS text) AS JSONB), '$[*] ? (@._type == $type)', '{"type":"ApplicationContext"}') AS "application",
               jsonb_path_query_array(cast(cast("value"->>'global_contexts' AS text) AS JSONB), '$[*] ? (@._type == $type)', '{"type":"MarketingContext"}') AS "marketing"
          FROM "data"
       ),
       "getitem_where_boolean___eed720629bfc3a3af5e6850e490fac2a" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "user_id" AS "user_id",
               "event_type" AS "event_type",
               "stack_event_types" AS "stack_event_types",
               "location_stack" AS "location_stack",
               "time" AS "time",
               "application" AS "application",
               "marketing" AS "marketing"
          FROM "manual_materialize___cc346414e92dd9adcc7ea86035771176"
         WHERE ((("day" >= cast('2022-03-01' AS date))) AND (("day" <= cast('2022-06-30' AS date))))
       ),
       "context_data___972e9cee2c8089e1557d5099b10a070f" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "user_id" AS "user_id",
               "location_stack" AS "location_stack",
               "event_type" AS "event_type",
               "stack_event_types" AS "stack_event_types",
               "application" AS "application",
               "marketing" AS "marketing"
          FROM "getitem_where_boolean___eed720629bfc3a3af5e6850e490fac2a"
       ),
       "session_starts___a2525f7f5f58f924e8ceef17d43f1d6e" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "user_id" AS "user_id",
               "location_stack" AS "location_stack",
               "event_type" AS "event_type",
               "stack_event_types" AS "stack_event_types",
               "application" AS "application",
               "marketing" AS "marketing",
               CASE WHEN (extract(epoch FROM (("moment") - (lag("moment", 1, cast(NULL AS timestamp WITHOUT TIME ZONE)) OVER (PARTITION BY "user_id" ORDER BY "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)))) <= cast(1800 AS bigint)) THEN cast(NULL AS boolean)
                    ELSE cast(TRUE AS boolean)
                     END AS "is_start_of_session"
          FROM "context_data___972e9cee2c8089e1557d5099b10a070f"
       ),
       "session_id_and_count___b30322203cc9641cee906c7b75f94f0c" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "user_id" AS "user_id",
               "location_stack" AS "location_stack",
               "event_type" AS "event_type",
               "stack_event_types" AS "stack_event_types",
               "application" AS "application",
               "marketing" AS "marketing",
               "is_start_of_session" AS "is_start_of_session",
               CASE WHEN "is_start_of_session" THEN row_number() OVER (PARTITION BY "is_start_of_session" ORDER BY "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
                    ELSE cast(NULL AS bigint)
                     END AS "session_start_id",
               count("is_start_of_session") OVER (ORDER BY "user_id" ASC NULLS LAST, "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "is_one_session"
          FROM "session_starts___a2525f7f5f58f924e8ceef17d43f1d6e"
       ),
       "objectiv_sessionized_data___962718fe5c33eef85cb282b6b0427ed8" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "user_id" AS "user_id",
               "location_stack" AS "location_stack",
               "event_type" AS "event_type",
               "stack_event_types" AS "stack_event_types",
               "application" AS "application",
               "marketing" AS "marketing",
               "is_start_of_session" AS "is_start_of_session",
               "session_start_id" AS "session_start_id",
               "is_one_session" AS "is_one_session",
               first_value("session_start_id") OVER (PARTITION BY "is_one_session" ORDER BY "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "session_id",
               row_number() OVER (PARTITION BY "is_one_session" ORDER BY "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "session_hit_number"
          FROM "session_id_and_count___b30322203cc9641cee906c7b75f94f0c"
       ),
       "getitem_where_boolean___cd985a4f9486a1cda54b53912ba1f447" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "user_id" AS "user_id",
               "location_stack" AS "location_stack",
               "event_type" AS "event_type",
               "stack_event_types" AS "stack_event_types",
               "session_id" AS "session_id",
               "session_hit_number" AS "session_hit_number",
               "application" AS "application",
               "marketing" AS "marketing",
               cast("application"->0->>'id' AS text) AS "application_id",
               cast("marketing"->0->>'source' AS text) AS "marketing_source"
          FROM "objectiv_sessionized_data___962718fe5c33eef85cb282b6b0427ed8"
         WHERE ("event_type" = 'PressEvent')
       ) SELECT "event_id" AS "event_id",
       "day" AS "day",
       "moment" AS "moment",
       "user_id" AS "user_id",
       "location_stack" AS "location_stack",
       "event_type" AS "event_type",
       "stack_event_types" AS "stack_event_types",
       "session_id" AS "session_id",
       "session_hit_number" AS "session_hit_number",
       "application" AS "application",
       "marketing" AS "marketing",
       "application_id" AS "application_id",
       "marketing_source" AS "marketing_source"
  FROM "getitem_where_boolean___cd985a4f9486a1cda54b53912ba1f447"
 ORDER BY "moment" DESC NULLS LAST

That’s it! To dive further into working with the open taxonomy, see the Bach API reference.

Join us on Slack if you have any questions or suggestions.

Next Steps

Try the notebooks in Objectiv Up

Spin up a full-fledged product analytics pipeline with Objectiv Up in under 5 minutes, and play with the included example notebooks yourself.

Use this notebook with your own data

You can use the example notebooks on any dataset that was collected with Objectiv’s tracker, so feel free to use them to bootstrap your own projects. They are available as Jupyter notebooks on our GitHub repository. See instructions to set up the Objectiv tracker.

Open model hub basics - use the pre-built models from the open model hub in conjunction with modeling library Bach to quickly build model stacks to answer common analytics questions.

Open taxonomy how-to

Getting started​

The data​

The Open Taxonomy​

event_type column​

location stack and global contexts​

Slicing the JSON data​

location_stack column​

global_contexts​

Sampling​

Get the SQL for any analysis​

Next Steps​

Try the notebooks in Objectiv Up​

Use this notebook with your own data​

Check out related example notebooks​

Getting started

The data

The Open Taxonomy

event_type column

location stack and global contexts

Slicing the JSON data

location_stack column

global_contexts

Sampling

Get the SQL for any analysis

Next Steps

Try the notebooks in Objectiv Up

Use this notebook with your own data

Check out related example notebooks