Release: Predict user behavior with the open model hub

Ivar Pruijn

In this release we introduce new models to easily predict and analyze user behavior directly on data collected using the open analytics taxonomy, and seamlessly switch between the full data set or a sample. No manual cleaning, transformations, exporting samples or complex tooling required.

The first model added to the open model hub is Logistic Regression, to predict user behavior such as:

  • Will a user convert?
  • Will a user start using a specific product feature or area?
  • Will a user have a long active session duration?

Next to this, two new models are added to analyze which features are used the most in any stage of your analysis, and what users did before converting.

Logistic Regression

Data collected with Objectiv's tracker is very well-structured, which makes it ideal for various machine learning applications.

The new LogisticRegression model in the open model hub works directly with data collected with Objectiv's tracker, and is based on sklearn's LogisticRegression, with all its parameters supported.

As a simple example below, we will predict if users on our own website will reach the modeling section of our docs, by looking at interactions that users have with all the other main sections of our website. We’ll use the simple dataframe below, which counts the number of clicks per user in each section of our website, using the root location:

/img/blog/releases/20220609/results-lr-df.png

See the example notebook for the intermediate steps of sampling the data, initializing the model, and fitting it. Note that for fitting the model, data is extracted from the database under the hood.

We can then create columns for the predicted values and labels in the sampled data set, and show the predictions (True if probability is >0.5):

features_set_sample['predicted_values'] = lr.predict_proba(X)
features_set_sample['predicted_labels'] = lr.predict(X)
# show the sampled data set, including predictions
features_set_sample.head(10)
/img/blog/releases/20220609/results-lr-predicted.png

Now that we have the model results, the data can easily be unsampled to work with the full data set, and its SQL exported to run in production:

features_set_full = features_set_sample.get_unsampled()
display_sql_as_markdown(features_set_full)

That’s all there is to running a logistic regression model on the full data set collected with Objectiv’s tracker, using the open model hub.

Top Used Product Features

The second model added to the open model hub is top_product_features. It enables you to understand which features are used the most in your full product, a subset of your product (using the location stack), or a selection of users (e.g. new users).

As an example:

top_product_features = modelhub.aggregate.top_product_features(df)
top_product_features.head()

... outputs the most used features overall:

/img/blog/releases/20220609/results-tupf-overall.png

You can narrow it down to a selection of users, e.g. new users:

df['is_new_user'] = modelhub.map.is_new_user(df)
top_product_features_new_users = modelhub.aggregate.top_product_features(df[df['is_new_user']])
top_product_features_new_users.head()
/img/blog/releases/20220609/results-tupf-new-users.png

Or you can analyze a subset of your product, by using the location stack. For example, we can see the top used features on our blog, using the root location:

# Slice on the blog
top_product_features_blog_section = modelhub.aggregate.top_product_features(df[df.root_location == 'blog'])
top_product_features_blog_section.head()
/img/blog/releases/20220609/results-tupf-blog.png

Top Product Features Before Conversion

The final model added to the open model hub is top_product_features_before_conversion. It calculates what users did before converting. You can specify which Event represents conversion, and optionally a subset of the location stack you want to know about.

As an example we can calculate which features were most used before clicking a link leading to our blog:

top_features_before_conversion = modelhub.agg.top_product_features_before_conversion(df, name='blog_press')
top_features_before_conversion.head()
/img/blog/releases/20220609/results-tfbc.png

Similar to the top_product_features model, you can also slice on subsets of your product (using the location stack) or a selection of users.

How to get it

The three new models are now live in the open model hub. To use them, install the package from PyPI:

pip install objectiv-modelhub

If you already have the package installed, don't forget to upgrade:

pip install --upgrade objectiv-modelhub
tip

If you want to test run these models on your own product without worrying about the Ops part, reach out to us to get a Launchpad: a free-to-use, fully managed Objectiv back-end and data store without any setup or configuration.

Introducing: Release Office Hours

If you have any questions about this release or anything else, or if you just want to say 'Hi!' to team Objectiv, we now have Release Office Hours every Thursday at 4pm CET, 10am EST that you can freely dial in to. If you're in a timezone that doesn’t fit well, just ping us on Slack and we'll send over an invite for a better moment.

Join the Release Office Hours

Try Objectiv

Get Objectiv Up - Try Objectiv on your local machine (takes 5 minutes)
Objectiv on GitHub - Check out the project and star us for future reference
Objectiv on Slack - Join the discussion or get help