Json
class bach.SeriesJson
(engine, base_node, index, name, expression, group_by, instance_dtype, order_by=None, **kwargs)A Series that represents the JSON type and its specific operations.
Database support and types
- On Postgres this utilizes the native ‘jsonb’ database type.
- On Athena this utilizes the native ‘json’ database type.
- On BigQuery this utilizes the generic ‘STRING’ database type.
On Postgres, SeriesJson does not use the ‘json’ database type, but the ‘jsonb’ type, As the ‘json’ type has limited functionality.
This class is the standard and recommended type to use for handling json like data. Having said that, there is a special SeriesJsonPostgres type that uses the ‘json’ data type on Postgres, but internally that casts all data to ‘jsonb’ too.
Getting data
It is possible to get a selection of data from the json in the json type column. For selecting data from
json, arrays and objects are supported. The data can be selected using .json[]
on the json column
Selecting data from an array is based on position. It works similar to slicing through python lists.
Slicing is only possible if all values in the column are lists or None.
Selecting from objects is possible by key.
Examples:
pdf
jsonb_column
0 ["a","b","c"]
1 ["d","e","f","g"]
2 [{"h":"i","j":"k"},{"l":["m","n","o"]},{"p":"q"}]
df = DataFrame.from_pandas(engine, pdf)
df['jsonb_column'] = df.jsonb_column.astype('json')
# load some json strings and convert them to jsonb type
# slice and show with .head()
df.jsonb_column.json[:2].head()
_index_0
0 [a, b]
1 [d, e]
2 [{'h': 'i', 'j': 'k'}, {'l': ['m', 'n', 'o']}]
Name: jsonb_column, dtype: object
df.jsonb_column.json[1].head()
_index_0
0 b
1 e
2 {'l': ['m', 'n', 'o']}
Name: jsonb_column, dtype: object
# selecting from objects is done by entering a key:
df.jsonb_column.json[1].json['l'].head()
_index_0
0 None
1 None
2 [m, n, o]
Name: jsonb_column, dtype: object
A last case is selecting based on the objects in an array.
With this method, a dict is passed in the .json[]
selector. The value of the first match with the dict
to the objects in a json array is returned for the .json[]
selector. A match is when all key/value pairs
of the dict are found in an object. This can be used for selecting a subset of a json array with objects.
# selecting from arrays by searching objects in the array.
df.jsonb_column.json[:{"j":"k"}].head()
_index_0
0 []
1 []
2 [{'h': 'i', 'j': 'k'}]
Name: jsonb_column, dtype: object
# or:
df.jsonb_column.json[{"l":["m","n","o"]}:].head()
_index_0
0 []
1 []
2 [{'l': ['m', 'n', 'o']}, {'p': 'q'}]
Name: jsonb_column, dtype: object
SeriesJson.json | Get access to json operations via the class that's returned through this accessor. |