modelhub.ModelHub.aggregate

property aggregate​

Access aggregation methods from the model hub. Same as agg.

class Aggregate​

(mh)

Models that return aggregated data in some form from the original DataFrame with Objectiv data.

static drop_off_locations​

(data, location_stack=None, groupby='user_id', percentage=False)

Find the locations/features where users drop off, and their usage/share.

Parameters​

  • data – bach.DataFrame to apply the method on.

  • location_stack – the slice of the location stack to consider.

  • groupby – sets the column(s) to group by.

  • percentage – if True calculate the percentage.

Returns​

bach.DataFrame with the location where users drop off, and the count/percentage.

frequency​

(data)

Calculate a frequency table for the number of users by number of sessions.

Parameters​

data – bach.DataFrame to apply the method on.

Returns​

series with results.

retention_matrix​

(data, time_period='monthly', event_type=None, start_date=None, end_date=None, percentage=False, display=True)

Finds the number of users in a given cohort who are active at a given time period, where time is computed with respect to the beginning of each cohort. The β€œactive user” is the user who made an action that we are interested in that time period. Users are divided into mutually exclusive cohorts, which are then tracked over time. In our case users are assigned to a cohort based on when they made their first action that we are interested in.

Returns the retention matrix dataframe, it represents users retained across cohorts:

  • index value represents the cohort
  • columns represent the number of given date period since the current cohort
  • values represent number (or percentage) of unique active users of a given cohort

One can calculate the retention matrix for a given time range, for that one can specify start_date a/o end_date. N.B. the users’ activity starts to be traced from the first date the user is seen in the data.

Parameters​

  • data – bach.DataFrame to apply the method on.
  • time_period – can be β€˜daily’, β€˜weekly’, β€˜monthly’ or β€˜yearly’.
  • event_type – the event/action that we are interested in. Must be a valid event_type (either parent or child). if None we take all the events generated by the user.
  • start_date – start date of the retention matrix, e.g. β€˜2022-04-01’ if None take all the data.
  • end_date – end date of the retention matrix, e.g. β€˜2022-05-01’ if None take all the data.
  • percentage – if True calculate percentage with respect to the number of a users in the cohort, otherwise it leaves the absolute values.
  • display – if display==True visualize the retention matrix as a heat map

Returns​

retention matrix bach DataFrame.

session_duration​

(data, groupby=NotSet.token, exclude_bounces=True, method='mean')

Calculate the duration of sessions.

With default method, it calculates the mean of the session duration over the groupby.

Parameters​

  • data – bach.DataFrame to apply the method on.

  • groupby – sets the column(s) to group by.

    • if not_set it defaults to using ModelHub.time_agg.
    • if None it aggregates over all data.
  • exclude_bounces – if True only session durations greater than 0 will be considered

  • method – β€˜mean’ or β€˜sum’

Returns​

series with results.

top_product_features​

(data, location_stack=None, event_type='InteractiveEvent')

Calculate the top used features in the product.

Parameters​

  • data – bach.DataFrame to apply the method on.

  • location_stack – the location stack

  • event_type – event type. Must be a valid event_type (either parent or child).

Returns​

bach DataFrame with results.

top_product_features_before_conversion​

(data, name, location_stack=None, event_type='InteractiveEvent')

Calculates what users did before converting by combining several models from the model hub.

Parameters​

  • data – bach.DataFrame to apply the method on.

  • name – label of the conversion event.

  • location_stack – the location stack

  • event_type – event type. Must be a valid event_type (either parent or child).

Returns​

bach DataFrame with results.

unique_sessions​

(data, groupby=NotSet.token)

Calculate the unique sessions in the Objectiv data.

Parameters​

  • data – bach.DataFrame to apply the method on.
  • groupby – sets the column(s) to group by.
    • if not_set it defaults to using ModelHub.time_agg.
    • if None it aggregates over all data.

Returns​

series with results.

unique_users​

(data, groupby=NotSet.token)

Calculate the unique users in the Objectiv data.

Parameters​

  • data – bach.DataFrame to apply the method on.
  • groupby – sets the column(s) to group by.
    • if not_set it defaults to using ModelHub.time_agg.
    • if None it aggregates over all data.

Returns​

series with results.