Get started in your notebook
You can run Objectiv in your favorite notebook, from a self-hosted/local Jupyter notebook to the cloud with Google Colab, Hex and Deepnote. See the instructions below.
Jupyter
- Install the open model hub locally:
pip install objectiv-modelhub
- Import the required packages in the beginning of your notebook:
from modelhub import ModelHub
- Instantiate the
ModelHub
and set the default time aggregation:
modelhub = ModelHub(time_aggregation='%Y-%m-%d')
Google Colab / Hex / Deepnote
For Deepnote only: as a very first step, create a requirements.txt file, add objectiv-modelhub
to it,
restart the machine and go to step 2.
- Install the open model hub in the beginning of your notebook:
pip install objectiv-modelhub
- Import the required packages:
from modelhub import ModelHub
- Instantiate the
ModelHub
and set the default time aggregation:
modelhub = ModelHub(time_aggregation='%Y-%m-%d')
- Optionally: create an SSH tunnel to the Postgres database server:
pip install sshtunnel
from sshtunnel import SSHTunnelForwarder
import os, stat
# SSH tunnel configuration
ssh_host = ''
ssh_port = 22
ssh_username = ''
ssh_passphrase = ''
ssh_private_key= ''
db_host = ''
db_port = 5432
try:
pk_path = '._super_s3cret_pk1'
with open(pk_path, 'a') as pkf:
pkf.write(ssh_private_key)
os.chmod(pk_path, stat.S_IREAD)
ssh_tunnel = SSHTunnelForwarder(
(ssh_host, ssh_port),
ssh_username=ssh_username,
ssh_private_key=pk_path,
ssh_private_key_password=ssh_passphrase,
remote_bind_address=(db_host, db_port)
)
ssh_tunnel.start()
os.remove(pk_path)
tunnel_port = ssh_tunnel.local_bind_port
except Exception as e:
os.remove(pk_path)
raise(e)
Set up your database connection
Now we can connect to the database, and create an Objectiv DataFrame
. This
DataFrame then points to the data in the database, and all operations are done directly on it.
When setting up the database connection and creating the Objectiv DataFrame, you can pass multiple
parameters (such as start_date
and end_date
above). See the
get_objectiv_dataframe
call for details.
PostgreSQL - without a tunnel:
df = modelhub.get_objectiv_dataframe(
db_url='postgresql://USER:PASSWORD@HOST:PORT/DATABASE',
start_date='2022-06-01',
end_date='2022-06-30',
table_name='data')
PostgreSQL - with a tunnel:
df = modelhub.get_objectiv_dataframe(
db_url=f'postgresql://USER:PASSWORD@localhost:{tunnel_port}/DATABASE',
start_date='2022-06-01',
end_date='2022-06-30',
table_name='data')
Google BigQuery
Google BigQuery is supported via a Snowplow pipeline. See how to set up Google BigQuery.
With BQ_CREDENTIALS_PATH
the path to the credentials for your BigQuery connection, e.g.
/home/myusername/myrepo/.secrets/production–bigquery.json
:
df = modelhub.get_objectiv_dataframe(
db_url='bigquery://your_project/snowplow',
start_date='2022-06-01',
end_date='2022-06-30',
table_name='events',
bq_credentials_path=BQ_CREDENTIALS_PATH)
Next steps
After the steps above, you’re ready to go!
Check out the example notebooks and the open model hub for what to do next.