Basic user intent analysis

This example notebook shows how you can easily run basic User Intent analysis with Objectiv. It’s also available as a full Jupyter notebook to run on your own data (see how to get started in your notebook). The dataset used here is the same as in Objectiv Up.

Get started

We first have to instantiate the model hub and an Objectiv DataFrame object.

In:

# set the timeframe of the analysis
start_date = '2022-03-01'
end_date = '2022-05-01'

In:

# instantiate the model hub, set the default time aggregation to daily
# and get the application & path global contexts
from modelhub import ModelHub, display_sql_as_markdown
import bach
import pandas as pd
from datetime import timedelta
modelhub = ModelHub(time_aggregation='%Y-%m-%d', global_contexts=['application', 'path'])
# get an Objectiv DataFrame within a defined timeframe
df = modelhub.get_objectiv_dataframe(db_url=DB_URL, start_date=start_date, end_date=end_date)

The location_stack column, and the columns taken from the global contexts, contain most of the event-specific data. These columns are JSON typed, and we can extract data from it using the keys of the JSON objects with SeriesLocationStack methods, or the context accessor for global context columns. See the open taxonomy example for how to use the location_stack and global contexts.

In:

df['application_id'] = df.application.context.id
df['root_location'] = df.location_stack.ls.get_from_context_with_type_series(type='RootLocationContext', key='id')

Reference

Explore where users spend time

The root_location context in the location stack represents the top-level UI location of the user. As a first step of grasping user intent, this is a good starting point to see in what main areas of your product users are spending time.

In:

# see the number of unique users per application and root_location
users_root = modelhub.aggregate.unique_users(df, groupby=['application_id', 'root_location'])
users_root.sort_index().head(10)

Out:

application_id    root_location
objectiv-docs     home             156
                  modeling          64
                  taxonomy          69
                  tracking          67
objectiv-website  about            100
                  blog             120
                  home             437
                  jobs              72
                  join-slack        10
                  privacy            3
Name: unique_users, dtype: int64

Another good pointer to explore for user intent is how much time users spend in each root_location.

In:

# see duration per application and root location
duration_root = modelhub.aggregate.session_duration(df, groupby=['application_id', 'root_location']).sort_index()
duration_root.head(10)

Out:

application_id    root_location
objectiv-docs     home            0 days 00:02:49.554985
                  modeling        0 days 00:05:56.082236
                  taxonomy        0 days 00:04:46.021880
                  tracking        0 days 00:04:28.598646
objectiv-website  about           0 days 00:02:32.867455
                  blog            0 days 00:02:09.354135
                  home            0 days 00:02:18.139962
                  jobs            0 days 00:01:27.801652
                  join-slack      0 days 00:01:36.813100
                  privacy         0 days 00:00:10.705000
Name: session_duration, dtype: timedelta64[ns]

Finally, let’s look at the distribution of time spent. We’ll use this distribution to define the different stages of user intent.

In:

# see how the overall time spent is distributed
session_duration = modelhub.aggregate.session_duration(df, groupby='session_id')
# materialization is needed because the expression of the created Series contains aggregated data,
# and it is not allowed to aggregate that.
session_duration = session_duration.materialize()
# show quantiles
session_duration.quantile(q=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]).head(10)

Out:

quantile
1   0 days 00:00:00.009000
2   0 days 00:00:01.019000
3   0 days 00:00:01.815000
4   0 days 00:00:04.507000
5   0 days 00:00:15.570000
6   0 days 00:00:39.518000
7   0 days 00:01:29.395000
8   0 days 00:03:52.534000
9          0 days 00:11:33
Name: session_duration, dtype: timedelta64[ns]

Reference

Define the stages of user intent

Now that we’ve explored the root_location and session duration (both per root_location and overall quantiles) where users spend their time, we can make a simple definition of the different stages of their intent.

Based on this dataset (objectiv.io website data) we think that:

Users that spent most time on the site (the 90th percentile), and specifically in our documentation sections, are in the Implement phase.
As there’s a jump beyond the one minute mark at the 70th percentile, we assume that users in the 70th to 90th percentile duration in our documentation sections are in the Explore phase.
The remaining users are Informing themselves about the product. Those users are spending less than 1:30 in the docs and/or spend any amount of time on our main website.

Summarizing:

User intent	Root locations	Duration
1 - Inform	all sections other than the ones mentioned below	any time spent
1 - Inform	Docs: modeling, taxonomy, tracking, home	less than 1:30
2 - Explore	Docs: modeling, taxonomy, tracking, home	between 1:30 and 11:30
3 - Implement	Docs: modeling, taxonomy, tracking, home	more than 11:30

This is just for illustration purposes, you can adjust these definitions based on your own collected data.

Assign user intent

Using our intent definitions above, we can now assign a stage of intent to each user.

In:

# select the root_locations to use for each of the intent stages
roots = bach.DataFrame.from_pandas(engine=df.engine, df=pd.DataFrame({'roots': ['modeling', 'taxonomy', 'tracking', 'home', 'docs']})).roots

In:

# calculate the total time spent per user
user_intent_buckets = modelhub.agg.session_duration(df, groupby=['user_id'], method='sum', exclude_bounces=False).to_frame()

In:

# same as above, but for selected root_locations only
selector = (df.root_location.isin(roots)) & (df.application_id=='objectiv-docs')
explore_inform_users_session_duration = modelhub.agg.session_duration(df[selector], groupby='user_id', method='sum')
# and set it as column
user_intent_buckets['explore_inform_duration'] = explore_inform_users_session_duration

In:

# set the Inform bucket as a catch-all, meaning users that do not fall into Explore and Implement will be defined as Inform
user_intent_buckets['bucket'] = '1 - inform'

In:

# calculate buckets duration
user_intent_buckets.loc[(user_intent_buckets.explore_inform_duration >= timedelta(0, 90)) & (user_intent_buckets.explore_inform_duration <= timedelta(0, 690)), 'bucket'] = '2 - explore'
user_intent_buckets.loc[user_intent_buckets.explore_inform_duration > timedelta(0, 690), 'bucket'] = '3 - implement'

Reference

Work with the user intent results

Now that we have assigned intent to each user, we can run any analysis on it. For example, we can look at the total number of users per intent bucket.

In:

# see the total number of users per intent bucket
user_intent_buckets.reset_index().groupby('bucket').agg({'user_id': 'nunique'}).sort_index().head()

Out:

               user_id_nunique
bucket
1 - inform                 495
2 - explore                 47
3 - implement               30

Other examples of analyses you could run:

Which product features do each of the intent groups use?
With what kind of intent do users come from different marketing campaigns?
How can we drive more users to the ‘Implement’ phase? For instance, look at different product features that users with the ‘Implement’ intent use, compared to ‘Explore’.

A good starting point for these analyses on top of the user intent buckets is the basic product analytics example in the example notebooks.

Reference

Get the SQL for any analysis

The SQL for any analysis can be exported with one command, so you can use models in production directly to simplify data debugging & delivery to BI tools like Metabase, dbt, etc. See how you can quickly create BI dashboards with this.

In:

# show the underlying SQL for this dataframe - works for any dataframe/model in Objectiv
display_sql_as_markdown(user_intent_buckets)

Out:

WITH "manual_materialize___fbc087b7a6a9f389119f19df6d592367" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "cookie_id" AS "user_id",
               cast("value"->>'_type' AS text) AS "event_type",
               cast(cast("value"->>'_types' AS text) AS JSONB) AS "stack_event_types",
               cast(cast("value"->>'location_stack' AS text) AS JSONB) AS "location_stack",
               cast(cast("value"->>'time' AS text) AS bigint) AS "time",
               jsonb_path_query_array(cast(cast("value"->>'global_contexts' AS text) AS JSONB), '$[*] ? (@._type == $type)', '{"type":"ApplicationContext"}') AS "application",
               jsonb_path_query_array(cast(cast("value"->>'global_contexts' AS text) AS JSONB), '$[*] ? (@._type == $type)', '{"type":"PathContext"}') AS "path"
          FROM "data"
       ),
       "getitem_where_boolean___4c2e5462357cb69a7f6cfe2be876b177" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "user_id" AS "user_id",
               "event_type" AS "event_type",
               "stack_event_types" AS "stack_event_types",
               "location_stack" AS "location_stack",
               "time" AS "time",
               "application" AS "application",
               "path" AS "path"
          FROM "manual_materialize___fbc087b7a6a9f389119f19df6d592367"
         WHERE ((("day" >= cast('2022-03-01' AS date))) AND (("day" <= cast('2022-05-01' AS date))))
       ),
       "context_data___c925991b5a33be33854de220ca2c47dc" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "user_id" AS "user_id",
               "location_stack" AS "location_stack",
               "event_type" AS "event_type",
               "stack_event_types" AS "stack_event_types",
               "application" AS "application",
               "path" AS "path"
          FROM "getitem_where_boolean___4c2e5462357cb69a7f6cfe2be876b177"
       ),
       "session_starts___13cfbe3af99d880322fca11448d6a71b" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "user_id" AS "user_id",
               "location_stack" AS "location_stack",
               "event_type" AS "event_type",
               "stack_event_types" AS "stack_event_types",
               "application" AS "application",
               "path" AS "path",
               CASE WHEN (extract(epoch FROM (("moment") - (lag("moment", 1, cast(NULL AS timestamp WITHOUT TIME ZONE)) OVER (PARTITION BY "user_id" ORDER BY "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)))) <= cast(1800 AS bigint)) THEN cast(NULL AS boolean)
                    ELSE cast(TRUE AS boolean)
                     END AS "is_start_of_session"
          FROM "context_data___c925991b5a33be33854de220ca2c47dc"
       ),
       "session_id_and_count___14cb6495e559f981432c93e9d3d1b7c1" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "user_id" AS "user_id",
               "location_stack" AS "location_stack",
               "event_type" AS "event_type",
               "stack_event_types" AS "stack_event_types",
               "application" AS "application",
               "path" AS "path",
               "is_start_of_session" AS "is_start_of_session",
               CASE WHEN "is_start_of_session" THEN row_number() OVER (PARTITION BY "is_start_of_session" ORDER BY "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
                    ELSE cast(NULL AS bigint)
                     END AS "session_start_id",
               count("is_start_of_session") OVER (ORDER BY "user_id" ASC NULLS LAST, "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "is_one_session"
          FROM "session_starts___13cfbe3af99d880322fca11448d6a71b"
       ),
       "objectiv_sessionized_data___fe4d16265f809df564d6dd8a58aa75d3" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "user_id" AS "user_id",
               "location_stack" AS "location_stack",
               "event_type" AS "event_type",
               "stack_event_types" AS "stack_event_types",
               "application" AS "application",
               "path" AS "path",
               "is_start_of_session" AS "is_start_of_session",
               "session_start_id" AS "session_start_id",
               "is_one_session" AS "is_one_session",
               first_value("session_start_id") OVER (PARTITION BY "is_one_session" ORDER BY "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "session_id",
               row_number() OVER (PARTITION BY "is_one_session" ORDER BY "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "session_hit_number"
          FROM "session_id_and_count___14cb6495e559f981432c93e9d3d1b7c1"
       ),
       "nested_groupby___15ad3d4b2e2c4d1b956e1a45848ba746" AS (
        SELECT "user_id" AS "user_id",
               "session_id" AS "__session_id",
               min("moment") AS "moment_min",
               max("moment") AS "moment_max",
               ((max("moment")) - (min("moment"))) AS "session_duration"
          FROM "objectiv_sessionized_data___fe4d16265f809df564d6dd8a58aa75d3"
         GROUP BY "user_id",
                  "session_id"
       ),
       "merge_left___15e2559b44625cd487cbff258f356dfc" AS (
        SELECT "user_id" AS "user_id",
               sum("session_duration") AS "session_duration"
          FROM "nested_groupby___15ad3d4b2e2c4d1b956e1a45848ba746"
         GROUP BY "user_id"
       ),
       "loaded_data___6c36bcef7d60e965af3a13a3db67f6eb" AS (
        SELECT *
          FROM (VALUES (cast(0 AS bigint), 'modeling'), (cast(1 AS bigint), 'taxonomy'), (cast(2 AS bigint), 'tracking'), (cast(3 AS bigint), 'home'), (cast(4 AS bigint), 'docs')) AS t("_index_0", "roots")
       ),
       "getitem_where_boolean___27ff4f48eaa6b06c58e2f20b7a17cd41" AS (
        SELECT "event_id" AS "event_id",
               "day" AS "day",
               "moment" AS "moment",
               "user_id" AS "user_id",
               "location_stack" AS "location_stack",
               "event_type" AS "event_type",
               "stack_event_types" AS "stack_event_types",
               "session_id" AS "session_id",
               "session_hit_number" AS "session_hit_number",
               "application" AS "application",
               "path" AS "path",
               cast("application"->0->>'id' AS text) AS "application_id",
               cast(coalesce((SELECT jsonb_agg(x.value) FROM jsonb_array_elements("location_stack") WITH ORDINALITY x WHERE ORDINALITY - 1 >= (SELECT min(CASE WHEN ('{"_type": "RootLocationContext"}'::JSONB) <@ value THEN ORDINALITY END) -1 FROM jsonb_array_elements("location_stack") WITH ORDINALITY)), '[]'::JSONB)->0->>'id' AS text) AS "root_location"
          FROM "objectiv_sessionized_data___fe4d16265f809df564d6dd8a58aa75d3"
         WHERE ((cast(coalesce((SELECT jsonb_agg(x.value) FROM jsonb_array_elements("location_stack") WITH ORDINALITY x WHERE ORDINALITY - 1 >= (SELECT min(CASE WHEN ('{"_type": "RootLocationContext"}'::JSONB) <@ value THEN ORDINALITY END) -1 FROM jsonb_array_elements("location_stack") WITH ORDINALITY)), '[]'::JSONB)->0->>'id' AS text) in (SELECT "roots" AS "roots" FROM "loaded_data___6c36bcef7d60e965af3a13a3db67f6eb")) AND ((cast("application"->0->>'id' AS text) = 'objectiv-docs')))
       ),
       "getitem_having_boolean___8fa053d092b50b7c6c4ba97e1cfa86c9" AS (
        SELECT "user_id" AS "user_id",
               "session_id" AS "__session_id",
               min("moment") AS "moment_min",
               max("moment") AS "moment_max",
               ((max("moment")) - (min("moment"))) AS "session_duration"
          FROM "getitem_where_boolean___27ff4f48eaa6b06c58e2f20b7a17cd41"
         GROUP BY "user_id",
                  "session_id"
        HAVING (extract(epoch FROM ((max("moment")) - (min("moment")))) > cast(0 AS bigint))
       ),
       "merge_right___18e81c75d63033821f86af3e57e4688b" AS (
        SELECT "user_id" AS "user_id",
               sum("session_duration") AS "explore_inform_duration"
          FROM "getitem_having_boolean___8fa053d092b50b7c6c4ba97e1cfa86c9"
         GROUP BY "user_id"
       ),
       "merge_sql___40ed74dd42bc21e986782ebf2e8a61d3" AS (
        SELECT COALESCE("l"."user_id", "r"."user_id") AS "user_id",
               "l"."session_duration" AS "session_duration",
               "r"."explore_inform_duration" AS "explore_inform_duration"
          FROM "merge_left___15e2559b44625cd487cbff258f356dfc" AS l
          LEFT
            JOIN "merge_right___18e81c75d63033821f86af3e57e4688b" AS r
            ON ("l"."user_id" = "r"."user_id")
       ) SELECT "user_id" AS "user_id",
       "session_duration" AS "session_duration",
       "explore_inform_duration" AS "explore_inform_duration",
       CASE WHEN ("explore_inform_duration" > cast('P0DT0H11M30S' AS interval)) THEN '3 - implement'
            ELSE CASE WHEN ((("explore_inform_duration" >= cast('P0DT0H1M30S' AS interval))) AND (("explore_inform_duration" <= cast('P0DT0H11M30S' AS interval)))) THEN '2 - explore' ELSE '1 - inform' END
             END AS "bucket"
  FROM "merge_sql___40ed74dd42bc21e986782ebf2e8a61d3"

That’s it! Join us on Slack if you have any questions or suggestions.

Next Steps

Try the notebooks in Objectiv Up

Spin up a full-fledged product analytics pipeline with Objectiv Up in under 5 minutes, and play with the included example notebooks yourself.

Use this notebook with your own data

You can use the example notebooks on any dataset that was collected with Objectiv’s tracker, so feel free to use them to bootstrap your own projects. They are available as Jupyter notebooks on our GitHub repository. See instructions to set up the Objectiv tracker.

Product Analytics notebook - easily run basic product analytics on your data.
Funnel Discovery notebook - analyze the paths that users take that impact your product goals.

Basic user intent analysis

Get started​

Explore where users spend time​

Define the stages of user intent​

Assign user intent​

Work with the user intent results​

Get the SQL for any analysis​

Next Steps​

Try the notebooks in Objectiv Up​

Use this notebook with your own data​

Check out related example notebooks​

Get started

Explore where users spend time

Define the stages of user intent

Assign user intent

Work with the user intent results

Get the SQL for any analysis

Next Steps

Try the notebooks in Objectiv Up

Use this notebook with your own data

Check out related example notebooks