modelhub.FunnelDiscovery.get_navigation_paths

get_navigation_paths​

(data, steps, by=NotSet.token, location_stack=None, add_conversion_step_column=False, only_converted_paths=False, start_from_end=False, n_examples=None, sort_by=None)

​[source]

Get the navigation paths for each event’s location stack. Each navigation path is represented as a row, where each step is defined by the nice name of the considered location.

For each location stack:

  • The number of navigation paths to be generated is less than or equal to steps.

  • The locations to be considered as starting steps are those that have an offset between 0 and steps - 1 in the location stack.

  • For each path, the rest of steps are defined by the steps - 1 locations that follow the start location in the location stack.

For example, having location_stack = [β€˜a’, β€˜b’, β€˜c’ , β€˜d’] and steps = 3 will generate the following paths:

  • β€˜a’, β€˜b’, β€˜c’
  • β€˜b’, β€˜c’, β€˜d’
  • β€˜c’, β€˜d’, None

Parameters​

  • data (bach.dataframe.DataFrame) – bach.DataFrame to apply the method on.

  • steps (int) – Number of steps/locations to consider in navigation path.

  • by (Union[List[Union[str, bach.series.series.Series]], str, bach.series.series.Series, sql_models.constants.NotSet]) – sets the column(s) to group by. If by is None or not set, then steps are based on the order of events based on the entire dataset.

  • location_stack (Union[str, SeriesString, SeriesLocationStack, SeriesInt64]) – the column of which to create the paths. Can be a string of the name of the column in data, or a Series with the same base node as data. If None the default location stack is taken.

  • add_conversion_step_column (bool) – if True gets the first conversion step number per each navigation path and adds it as a column to the returned dataframe.

  • only_converted_paths (bool) – if True filters each navigation path to first conversion location.

  • start_from_end (bool) – if True starts the construction of navigation paths from the last context from the stack, otherwise it starts from the first. If there are too many steps, and we limit the amount with n_examples parameter we can lose the last steps of the user, hence in order to β€˜prioritize’ the last steps one can use this parameter. Having location_stack = [β€˜a’, β€˜b’, β€˜c’ , β€˜d’] and steps = 3 will generate the following paths:

    • ’b’, β€˜c’, β€˜d’
    • ’a’, β€˜b’, β€˜c’
    • None, β€˜a’, β€˜b’
  • n_examples (int) – limit the amount of navigation paths. If None, all the navigation paths are taken.

  • sort_by (str) – column to sort by for determining the order of the sequences of β€˜location_stack’

Returns​

Bach DataFrame containing a new Series for each step containing the nice name of the location.

Return type​

bach.dataframe.DataFrame