Basic user intent analysis
This example notebook shows how you can easily run basic User Intent analysis with Objectiv. It’s also available as a full Jupyter notebook to run on your own data (see how to get started in your notebook). The dataset used here is the same as in Objectiv Up.
Get started
We first have to instantiate the model hub and an Objectiv DataFrame object.
# set the timeframe of the analysis
start_date = '2022-03-01'
end_date = '2022-05-01'
# instantiate the model hub, set the default time aggregation to daily
# and get the application & path global contexts
from modelhub import ModelHub, display_sql_as_markdown
import bach
import pandas as pd
from datetime import timedelta
modelhub = ModelHub(time_aggregation='%Y-%m-%d', global_contexts=['application', 'path'])
# get an Objectiv DataFrame within a defined timeframe
df = modelhub.get_objectiv_dataframe(db_url=DB_URL, start_date=start_date, end_date=end_date)
The location_stack
column, and the columns taken from the global contexts, contain most of the
event-specific data. These columns are JSON typed, and we can extract data from it using the keys of the JSON
objects with SeriesLocationStack methods, or the context
accessor for global context columns. See the open taxonomy example for how to use
the location_stack
and global contexts.
df['application_id'] = df.application.context.id
df['root_location'] = df.location_stack.ls.get_from_context_with_type_series(type='RootLocationContext', key='id')
Explore where users spend time
The root_location
context in the location stack represents the top-level UI location of the user. As a first
step of grasping user intent, this is a good starting point to see in what main areas of your product users
are spending time.
# see the number of unique users per application and root_location
users_root = modelhub.aggregate.unique_users(df, groupby=['application_id', 'root_location'])
users_root.sort_index().head(10)
application_id root_location
objectiv-docs home 156
modeling 64
taxonomy 69
tracking 67
objectiv-website about 100
blog 120
home 437
jobs 72
join-slack 10
privacy 3
Name: unique_users, dtype: int64
Another good pointer to explore for user intent is how much time users spend in each root_location
.
# see duration per application and root location
duration_root = modelhub.aggregate.session_duration(df, groupby=['application_id', 'root_location']).sort_index()
duration_root.head(10)
application_id root_location
objectiv-docs home 0 days 00:02:49.554985
modeling 0 days 00:05:56.082236
taxonomy 0 days 00:04:46.021880
tracking 0 days 00:04:28.598646
objectiv-website about 0 days 00:02:32.867455
blog 0 days 00:02:09.354135
home 0 days 00:02:18.139962
jobs 0 days 00:01:27.801652
join-slack 0 days 00:01:36.813100
privacy 0 days 00:00:10.705000
Name: session_duration, dtype: timedelta64[ns]
Finally, let’s look at the distribution of time spent. We’ll use this distribution to define the different stages of user intent.
# see how the overall time spent is distributed
session_duration = modelhub.aggregate.session_duration(df, groupby='session_id')
# materialization is needed because the expression of the created Series contains aggregated data,
# and it is not allowed to aggregate that.
session_duration = session_duration.materialize()
# show quantiles
session_duration.quantile(q=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]).head(10)
quantile
0.1 0 days 00:00:00.009000
0.2 0 days 00:00:01.019000
0.3 0 days 00:00:01.815000
0.4 0 days 00:00:04.507000
0.5 0 days 00:00:15.570000
0.6 0 days 00:00:39.518000
0.7 0 days 00:01:29.395000
0.8 0 days 00:03:52.534000
0.9 0 days 00:11:33
Name: session_duration, dtype: timedelta64[ns]
Define the stages of user intent
Now that we’ve explored the root_location
and session duration (both per root_location
and overall
quantiles) where users spend their time, we can make a simple definition of the different stages of their
intent.
Based on this dataset (objectiv.io website data) we think that:
Users that spent most time on the site (the 90th percentile), and specifically in our documentation sections, are in the Implement phase.
As there’s a jump beyond the one minute mark at the 70th percentile, we assume that users in the 70th to 90th percentile duration in our documentation sections are in the Explore phase.
The remaining users are Informing themselves about the product. Those users are spending less than 1:30 in the docs and/or spend any amount of time on our main website.
Summarizing:
User intent | Root locations | Duration |
---|---|---|
1 - Inform | all sections other than the ones mentioned below | any time spent |
1 - Inform | Docs: modeling, taxonomy, tracking, home | less than 1:30 |
2 - Explore | Docs: modeling, taxonomy, tracking, home | between 1:30 and 11:30 |
3 - Implement | Docs: modeling, taxonomy, tracking, home | more than 11:30 |
This is just for illustration purposes, you can adjust these definitions based on your own collected data.
Assign user intent
Using our intent definitions above, we can now assign a stage of intent to each user.
# select the root_locations to use for each of the intent stages
roots = bach.DataFrame.from_pandas(engine=df.engine, df=pd.DataFrame({'roots': ['modeling', 'taxonomy', 'tracking', 'home', 'docs']})).roots
# calculate the total time spent per user
user_intent_buckets = modelhub.agg.session_duration(df, groupby=['user_id'], method='sum', exclude_bounces=False).to_frame()
# same as above, but for selected root_locations only
selector = (df.root_location.isin(roots)) & (df.application_id=='objectiv-docs')
explore_inform_users_session_duration = modelhub.agg.session_duration(df[selector], groupby='user_id', method='sum')
# and set it as column
user_intent_buckets['explore_inform_duration'] = explore_inform_users_session_duration
# set the Inform bucket as a catch-all, meaning users that do not fall into Explore and Implement will be defined as Inform
user_intent_buckets['bucket'] = '1 - inform'
# calculate buckets duration
user_intent_buckets.loc[(user_intent_buckets.explore_inform_duration >= timedelta(0, 90)) & (user_intent_buckets.explore_inform_duration <= timedelta(0, 690)), 'bucket'] = '2 - explore'
user_intent_buckets.loc[user_intent_buckets.explore_inform_duration > timedelta(0, 690), 'bucket'] = '3 - implement'
Work with the user intent results
Now that we have assigned intent to each user, we can run any analysis on it. For example, we can look at the total number of users per intent bucket.
# see the total number of users per intent bucket
user_intent_buckets.reset_index().groupby('bucket').agg({'user_id': 'nunique'}).sort_index().head()
user_id_nunique
bucket
1 - inform 495
2 - explore 47
3 - implement 30
Other examples of analyses you could run:
- Which product features do each of the intent groups use?
- With what kind of intent do users come from different marketing campaigns?
- How can we drive more users to the ‘Implement’ phase? For instance, look at different product features that users with the ‘Implement’ intent use, compared to ‘Explore’.
A good starting point for these analyses on top of the user intent buckets is the basic product analytics example in the example notebooks.
Get the SQL for any analysis
The SQL for any analysis can be exported with one command, so you can use models in production directly to simplify data debugging & delivery to BI tools like Metabase, dbt, etc. See how you can quickly create BI dashboards with this.
# show the underlying SQL for this dataframe - works for any dataframe/model in Objectiv
display_sql_as_markdown(user_intent_buckets)
WITH "manual_materialize___fbc087b7a6a9f389119f19df6d592367" AS (
SELECT "event_id" AS "event_id",
"day" AS "day",
"moment" AS "moment",
"cookie_id" AS "user_id",
cast("value"->>'_type' AS text) AS "event_type",
cast(cast("value"->>'_types' AS text) AS JSONB) AS "stack_event_types",
cast(cast("value"->>'location_stack' AS text) AS JSONB) AS "location_stack",
cast(cast("value"->>'time' AS text) AS bigint) AS "time",
jsonb_path_query_array(cast(cast("value"->>'global_contexts' AS text) AS JSONB), '$[*] ? (@._type == $type)', '{"type":"ApplicationContext"}') AS "application",
jsonb_path_query_array(cast(cast("value"->>'global_contexts' AS text) AS JSONB), '$[*] ? (@._type == $type)', '{"type":"PathContext"}') AS "path"
FROM "data"
),
"getitem_where_boolean___4c2e5462357cb69a7f6cfe2be876b177" AS (
SELECT "event_id" AS "event_id",
"day" AS "day",
"moment" AS "moment",
"user_id" AS "user_id",
"event_type" AS "event_type",
"stack_event_types" AS "stack_event_types",
"location_stack" AS "location_stack",
"time" AS "time",
"application" AS "application",
"path" AS "path"
FROM "manual_materialize___fbc087b7a6a9f389119f19df6d592367"
WHERE ((("day" >= cast('2022-03-01' AS date))) AND (("day" <= cast('2022-05-01' AS date))))
),
"context_data___c925991b5a33be33854de220ca2c47dc" AS (
SELECT "event_id" AS "event_id",
"day" AS "day",
"moment" AS "moment",
"user_id" AS "user_id",
"location_stack" AS "location_stack",
"event_type" AS "event_type",
"stack_event_types" AS "stack_event_types",
"application" AS "application",
"path" AS "path"
FROM "getitem_where_boolean___4c2e5462357cb69a7f6cfe2be876b177"
),
"session_starts___13cfbe3af99d880322fca11448d6a71b" AS (
SELECT "event_id" AS "event_id",
"day" AS "day",
"moment" AS "moment",
"user_id" AS "user_id",
"location_stack" AS "location_stack",
"event_type" AS "event_type",
"stack_event_types" AS "stack_event_types",
"application" AS "application",
"path" AS "path",
CASE WHEN (extract(epoch FROM (("moment") - (lag("moment", 1, cast(NULL AS timestamp WITHOUT TIME ZONE)) OVER (PARTITION BY "user_id" ORDER BY "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)))) <= cast(1800 AS bigint)) THEN cast(NULL AS boolean)
ELSE cast(TRUE AS boolean)
END AS "is_start_of_session"
FROM "context_data___c925991b5a33be33854de220ca2c47dc"
),
"session_id_and_count___14cb6495e559f981432c93e9d3d1b7c1" AS (
SELECT "event_id" AS "event_id",
"day" AS "day",
"moment" AS "moment",
"user_id" AS "user_id",
"location_stack" AS "location_stack",
"event_type" AS "event_type",
"stack_event_types" AS "stack_event_types",
"application" AS "application",
"path" AS "path",
"is_start_of_session" AS "is_start_of_session",
CASE WHEN "is_start_of_session" THEN row_number() OVER (PARTITION BY "is_start_of_session" ORDER BY "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
ELSE cast(NULL AS bigint)
END AS "session_start_id",
count("is_start_of_session") OVER (ORDER BY "user_id" ASC NULLS LAST, "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "is_one_session"
FROM "session_starts___13cfbe3af99d880322fca11448d6a71b"
),
"objectiv_sessionized_data___fe4d16265f809df564d6dd8a58aa75d3" AS (
SELECT "event_id" AS "event_id",
"day" AS "day",
"moment" AS "moment",
"user_id" AS "user_id",
"location_stack" AS "location_stack",
"event_type" AS "event_type",
"stack_event_types" AS "stack_event_types",
"application" AS "application",
"path" AS "path",
"is_start_of_session" AS "is_start_of_session",
"session_start_id" AS "session_start_id",
"is_one_session" AS "is_one_session",
first_value("session_start_id") OVER (PARTITION BY "is_one_session" ORDER BY "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "session_id",
row_number() OVER (PARTITION BY "is_one_session" ORDER BY "moment" ASC NULLS LAST, "event_id" ASC NULLS LAST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "session_hit_number"
FROM "session_id_and_count___14cb6495e559f981432c93e9d3d1b7c1"
),
"nested_groupby___15ad3d4b2e2c4d1b956e1a45848ba746" AS (
SELECT "user_id" AS "user_id",
"session_id" AS "__session_id",
min("moment") AS "moment_min",
max("moment") AS "moment_max",
((max("moment")) - (min("moment"))) AS "session_duration"
FROM "objectiv_sessionized_data___fe4d16265f809df564d6dd8a58aa75d3"
GROUP BY "user_id",
"session_id"
),
"merge_left___15e2559b44625cd487cbff258f356dfc" AS (
SELECT "user_id" AS "user_id",
sum("session_duration") AS "session_duration"
FROM "nested_groupby___15ad3d4b2e2c4d1b956e1a45848ba746"
GROUP BY "user_id"
),
"loaded_data___6c36bcef7d60e965af3a13a3db67f6eb" AS (
SELECT *
FROM (VALUES (cast(0 AS bigint), 'modeling'), (cast(1 AS bigint), 'taxonomy'), (cast(2 AS bigint), 'tracking'), (cast(3 AS bigint), 'home'), (cast(4 AS bigint), 'docs')) AS t("_index_0", "roots")
),
"getitem_where_boolean___27ff4f48eaa6b06c58e2f20b7a17cd41" AS (
SELECT "event_id" AS "event_id",
"day" AS "day",
"moment" AS "moment",
"user_id" AS "user_id",
"location_stack" AS "location_stack",
"event_type" AS "event_type",
"stack_event_types" AS "stack_event_types",
"session_id" AS "session_id",
"session_hit_number" AS "session_hit_number",
"application" AS "application",
"path" AS "path",
cast("application"->0->>'id' AS text) AS "application_id",
cast(coalesce((SELECT jsonb_agg(x.value) FROM jsonb_array_elements("location_stack") WITH ORDINALITY x WHERE ORDINALITY - 1 >= (SELECT min(CASE WHEN ('{"_type": "RootLocationContext"}'::JSONB) <@ value THEN ORDINALITY END) -1 FROM jsonb_array_elements("location_stack") WITH ORDINALITY)), '[]'::JSONB)->0->>'id' AS text) AS "root_location"
FROM "objectiv_sessionized_data___fe4d16265f809df564d6dd8a58aa75d3"
WHERE ((cast(coalesce((SELECT jsonb_agg(x.value) FROM jsonb_array_elements("location_stack") WITH ORDINALITY x WHERE ORDINALITY - 1 >= (SELECT min(CASE WHEN ('{"_type": "RootLocationContext"}'::JSONB) <@ value THEN ORDINALITY END) -1 FROM jsonb_array_elements("location_stack") WITH ORDINALITY)), '[]'::JSONB)->0->>'id' AS text) in (SELECT "roots" AS "roots" FROM "loaded_data___6c36bcef7d60e965af3a13a3db67f6eb")) AND ((cast("application"->0->>'id' AS text) = 'objectiv-docs')))
),
"getitem_having_boolean___8fa053d092b50b7c6c4ba97e1cfa86c9" AS (
SELECT "user_id" AS "user_id",
"session_id" AS "__session_id",
min("moment") AS "moment_min",
max("moment") AS "moment_max",
((max("moment")) - (min("moment"))) AS "session_duration"
FROM "getitem_where_boolean___27ff4f48eaa6b06c58e2f20b7a17cd41"
GROUP BY "user_id",
"session_id"
HAVING (extract(epoch FROM ((max("moment")) - (min("moment")))) > cast(0 AS bigint))
),
"merge_right___18e81c75d63033821f86af3e57e4688b" AS (
SELECT "user_id" AS "user_id",
sum("session_duration") AS "explore_inform_duration"
FROM "getitem_having_boolean___8fa053d092b50b7c6c4ba97e1cfa86c9"
GROUP BY "user_id"
),
"merge_sql___40ed74dd42bc21e986782ebf2e8a61d3" AS (
SELECT COALESCE("l"."user_id", "r"."user_id") AS "user_id",
"l"."session_duration" AS "session_duration",
"r"."explore_inform_duration" AS "explore_inform_duration"
FROM "merge_left___15e2559b44625cd487cbff258f356dfc" AS l
LEFT
JOIN "merge_right___18e81c75d63033821f86af3e57e4688b" AS r
ON ("l"."user_id" = "r"."user_id")
) SELECT "user_id" AS "user_id",
"session_duration" AS "session_duration",
"explore_inform_duration" AS "explore_inform_duration",
CASE WHEN ("explore_inform_duration" > cast('P0DT0H11M30S' AS interval)) THEN '3 - implement'
ELSE CASE WHEN ((("explore_inform_duration" >= cast('P0DT0H1M30S' AS interval))) AND (("explore_inform_duration" <= cast('P0DT0H11M30S' AS interval)))) THEN '2 - explore' ELSE '1 - inform' END
END AS "bucket"
FROM "merge_sql___40ed74dd42bc21e986782ebf2e8a61d3"
That’s it! Join us on Slack if you have any questions or suggestions.
Next Steps
Try the notebooks in Objectiv Up
Spin up a full-fledged product analytics pipeline with Objectiv Up in under 5 minutes, and play with the included example notebooks yourself.
Use this notebook with your own data
You can use the example notebooks on any dataset that was collected with Objectiv’s tracker, so feel free to use them to bootstrap your own projects. They are available as Jupyter notebooks on our GitHub repository. See instructions to set up the Objectiv tracker.
Check out related example notebooks
- Product Analytics notebook - easily run basic product analytics on your data.
- Funnel Discovery notebook - analyze the paths that users take that impact your product goals.