modelhub.ModelHub.aggregate

property aggregate​

Access aggregation methods from the model hub. Same as agg.

class Aggregate​

(mh)

Models that return aggregated data in some form from the original DataFrame with Objectiv data.

static drop_off_locations​

(data, location_stack=None, groupby='user_id', percentage=False)

Find the locations/features where users drop off, and their usage/share.

Parameters​

  • data – bach.DataFrame to apply the method on.

  • location_stack – the column of which to create the drop-off locations. Can be a string of the name of the column in data, or a Series with the same base node as data. If None the default location stack is taken.

  • groupby – sets the column(s) to group by.

  • percentage – if True calculate the percentage.

Returns​

bach.DataFrame with the location where users drop off, and the count/percentage.

frequency​

(data)

Calculate a frequency table for the number of users by number of sessions.

Parameters​

data – bach.DataFrame to apply the method on.

Returns​

series with results.

funnel_conversion​

(data, location_stack=None, groupby=None)

Calculates conversion numbers for all locations stacks in the data. N.B. Filter the dataframe beforehand to filter down to the funnel locations.

For each step in a funnel, calculates the number of unique users who started it, the number of unique users who completed the step (defined as whether the user went to any other step in the funnel), the conversion rate to completing the step, the conversion rate to completing the step when looking at all users who started the funnel (= the β€˜full’ conversion rate), and the fraction of the users in the funnel dropping out at the given step.

N.B. We assumed that the funnel direction is always the same. The implementation of VisibleEvents makes for the most accurate calculation of the conversion numbers, as the number of users as well as the conversion rate is based on events on each location stack.

Parameters​

  • data – The bach.DataFrame to apply the operation on.
  • location_stack – The column that holds the steps in the funnel. Can be:
    • A string of the name of the column in data.
    • Any slice of a modelhub.SeriesLocationStack type column.
    • A Series with the same base node as data.

If its value is None, the whole location stack is taken.

  • groupby – sets the column(s) to group by. It would be also handy later for the filtering of the results.

Returns​

bach.DataFrame with the following columns: step (the location considered as a step, e.g. a feature or root location), n_users (number of unique users starting the step), n_users_completed_step (number of unique users completing the step), step_conversion_rate (number of users completing the step / n_users), full_conversion_rate (number of users completing the step / number of users starting the funnel), and dropoff_share (ratio between the users dropping out at a given step and users at the begging at the funnel).

retention_matrix​

(data, time_period='monthly', event_type=None, start_date=None, end_date=None, percentage=False, display=True)

Finds the number of users in a given cohort who are active at a given time period, where time is computed with respect to the beginning of each cohort. The β€œactive user” is the user who made an action that we are interested in that time period. Users are divided into mutually exclusive cohorts, which are then tracked over time. In our case users are assigned to a cohort based on when they made their first action that we are interested in.

Returns the retention matrix dataframe, it represents users retained across cohorts:

  • index value represents the cohort
  • columns represent the number of given date period since the current cohort
  • values represent number (or percentage) of unique active users of a given cohort

One can calculate the retention matrix for a given time range, for that one can specify start_date a/o end_date. N.B. the users’ activity starts to be traced from the first date the user is seen in the data.

Parameters​

  • data – bach.DataFrame to apply the method on.
  • time_period – can be β€˜daily’, β€˜weekly’, β€˜monthly’ or β€˜yearly’.
  • event_type – the event/action that we are interested in. Must be a valid event_type (either parent or child). if None we take all the events generated by the user.
  • start_date – start date of the retention matrix, e.g. β€˜2022-04-01’ if None take all the data.
  • end_date – end date of the retention matrix, e.g. β€˜2022-05-01’ if None take all the data.
  • percentage – if True calculate percentage with respect to the number of a users in the cohort, otherwise it leaves the absolute values.
  • display – if display==True visualize the retention matrix as a heat map

Returns​

retention matrix bach DataFrame.

session_duration​

(data, groupby=NotSet.token, exclude_bounces=True, method='mean')

Calculate the duration of sessions.

With default method, it calculates the mean of the session duration over the groupby.

Parameters​

  • data – bach.DataFrame to apply the method on.

  • groupby – sets the column(s) to group by.

    • if not_set it defaults to using ModelHub.time_agg.
    • if None it aggregates over all data.
  • exclude_bounces – if True only session durations greater than 0 will be considered

  • method – β€˜mean’ or β€˜sum’

Returns​

series with results.

top_product_features​

(data, location_stack=None, event_type='InteractiveEvent')

Calculate the top used features in the product.

Parameters​

  • data – bach.DataFrame to apply the method on.

  • location_stack – the location stack

  • event_type – event type. Must be a valid event_type (either parent or child).

Returns​

bach DataFrame with results.

top_product_features_before_conversion​

(data, name, location_stack=None, event_type='InteractiveEvent')

Calculates what users did before converting by combining several models from the model hub.

Parameters​

  • data – bach.DataFrame to apply the method on.

  • name – label of the conversion event.

  • location_stack – the location stack

  • event_type – event type. Must be a valid event_type (either parent or child).

Returns​

bach DataFrame with results.

unique_sessions​

(data, groupby=NotSet.token)

Calculate the unique sessions in the Objectiv data.

Parameters​

  • data – bach.DataFrame to apply the method on.
  • groupby – sets the column(s) to group by.
    • if not_set it defaults to using ModelHub.time_agg.
    • if None it aggregates over all data.

Returns​

series with results.

unique_users​

(data, groupby=NotSet.token)

Calculate the unique users in the Objectiv data.

Parameters​

  • data – bach.DataFrame to apply the method on.
  • groupby – sets the column(s) to group by.
    • if not_set it defaults to using ModelHub.time_agg.
    • if None it aggregates over all data.

Returns​

series with results.