modelhub.ModelHub.aggregate
property aggregateβ
Access aggregation methods from the model hub. Same as agg
.
class Aggregateβ
(mh)Models that return aggregated data in some form from the original DataFrame with Objectiv data.
static drop_off_locationsβ
(data, location_stack=None, groupby='user_id', percentage=False)Find the locations/features where users drop off, and their usage/share.
Parametersβ
data
βbach.DataFrame
to apply the method on.location_stack
β the column of which to create the drop-off locations. Can be a string of the name of the column in data, or a Series with the same base node asdata
. If None the default location stack is taken.- can be any slice of a
modelhub.SeriesLocationStack
type column. - if
None
, the whole location stack is taken.
- can be any slice of a
groupby
β sets the column(s) to group by.percentage
β if True calculate the percentage.
Returnsβ
bach.DataFrame
with the location where users drop off, and the count/percentage.
frequencyβ
(data)Calculate a frequency table for the number of users by number of sessions.
Parametersβ
data
β bach.DataFrame
to apply the method on.
Returnsβ
series with results.
funnel_conversionβ
(data, location_stack=None, groupby=None)Calculates conversion numbers for all locations stacks in the data
.
N.B. Filter the dataframe beforehand to filter down to the funnel locations.
For each step in a funnel, calculates the number of unique users who started it, the number of unique users who completed the step (defined as whether the user went to any other step in the funnel), the conversion rate to completing the step, the conversion rate to completing the step when looking at all users who started the funnel (= the βfullβ conversion rate), and the fraction of the users in the funnel dropping out at the given step.
N.B. We assumed that the funnel direction is always the same. The implementation of VisibleEvents makes for the most accurate calculation of the conversion numbers, as the number of users as well as the conversion rate is based on events on each location stack.
Parametersβ
data
β Thebach.DataFrame
to apply the operation on.location_stack
β The column that holds the steps in the funnel. Can be:- A string of the name of the column in
data
. - Any slice of a
modelhub.SeriesLocationStack
type column. - A Series with the same base node as
data
.
- A string of the name of the column in
If its value is None
, the whole location stack is taken.
groupby
β sets the column(s) to group by. It would be also handy later for the filtering of the results.
Returnsβ
bach.DataFrame
with the following columns: step
(the location considered as a
step, e.g. a feature or root location), n_users
(number of unique users starting the step),
n_users_completed_step
(number of unique users completing the step),
step_conversion_rate
(number of users completing the step / n_users
), full_conversion_rate
(number of users completing the step / number of users starting the funnel), and dropoff_share
(ratio between the users dropping out at a given step and users at the begging at the funnel).
retention_matrixβ
(data, time_period='monthly', event_type=None, start_date=None, end_date=None, percentage=False, display=True)Finds the number of users in a given cohort who are active at a given time period, where time is computed with respect to the beginning of each cohort. The βactive userβ is the user who made an action that we are interested in that time period. Users are divided into mutually exclusive cohorts, which are then tracked over time. In our case users are assigned to a cohort based on when they made their first action that we are interested in.
Returns the retention matrix dataframe, it represents users retained across cohorts:
- index value represents the cohort
- columns represent the number of given date period since the current cohort
- values represent number (or percentage) of unique active users of a given cohort
One can calculate the retention matrix for a given time range, for that
one can specify start_date a/o end_date.
N.B. the usersβ activity starts to be traced from the first date the user is seen in the data
.
Parametersβ
data
βbach.DataFrame
to apply the method on.time_period
β can be βdailyβ, βweeklyβ, βmonthlyβ or βyearlyβ.event_type
β the event/action that we are interested in. Must be a valid event_type (either parent or child). if None we take all the events generated by the user.start_date
β start date of the retention matrix, e.g. β2022-04-01β if None take all the data.end_date
β end date of the retention matrix, e.g. β2022-05-01β if None take all the data.percentage
β if True calculate percentage with respect to the number of a users in the cohort, otherwise it leaves the absolute values.display
β if display==True visualize the retention matrix as a heat map
Returnsβ
retention matrix bach DataFrame.
session_durationβ
(data, groupby=NotSet.token, exclude_bounces=True, method='mean')Calculate the duration of sessions.
With default method
, it calculates the mean of the session duration over the groupby
.
Parametersβ
data
βbach.DataFrame
to apply the method on.groupby
β sets the column(s) to group by.- if not_set it defaults to using
ModelHub.time_agg
. - if None it aggregates over all data.
- if not_set it defaults to using
exclude_bounces
β if True only session durations greater than 0 will be consideredmethod
β βmeanβ or βsumβ
Returnsβ
series with results.
top_product_featuresβ
(data, location_stack=None, event_type='InteractiveEvent')Calculate the top used features in the product.
Parametersβ
data
βbach.DataFrame
to apply the method on.location_stack
β the location stack- can be any slice of a
modelhub.SeriesLocationStack
type column - if None - the whole location stack is taken.
- can be any slice of a
event_type
β event type. Must be a valid event_type (either parent or child).
Returnsβ
bach DataFrame with results.
top_product_features_before_conversionβ
(data, name, location_stack=None, event_type='InteractiveEvent')Calculates what users did before converting by combining several models from the model hub.
Parametersβ
data
βbach.DataFrame
to apply the method on.name
β label of the conversion event.location_stack
β the location stack- can be any slice of a
modelhub.SeriesLocationStack
type column - if None - the whole location stack is taken.
- can be any slice of a
event_type
β event type. Must be a valid event_type (either parent or child).
Returnsβ
bach DataFrame with results.
unique_sessionsβ
(data, groupby=NotSet.token)Calculate the unique sessions in the Objectiv data
.
Parametersβ
data
βbach.DataFrame
to apply the method on.groupby
β sets the column(s) to group by.- if not_set it defaults to using
ModelHub.time_agg
. - if None it aggregates over all data.
- if not_set it defaults to using
Returnsβ
series with results.
unique_usersβ
(data, groupby=NotSet.token)Calculate the unique users in the Objectiv data
.
Parametersβ
data
βbach.DataFrame
to apply the method on.groupby
β sets the column(s) to group by.- if not_set it defaults to using
ModelHub.time_agg
. - if None it aggregates over all data.
- if not_set it defaults to using
Returnsβ
series with results.