Note
This help should be accurate and comprehensive. If you see anything missing or that needs to be fixed, see How to Contribute or let us know in the Juice Slack #documentation channel.
Now that we know how to add slices to a stack, we need to know how to provide data to those slices. There are two ways to do this, the first is to provide a json fixture file that provides the data in the proper response format. We’ll talk about this more in a later section. The second method, which is more common is to build a Python module that has a class for each slice to provide the data needed for the slice.
These Python modules all follow the same basic structure as shown here:
# Setup # Data Tables # Recipe Service # Filter Service # Slice Services
The setup section is where we import all the needed components and establish database connections. Next, we define all the tables that hold the data we will be using in our slices. With our tables defined, we can build a Recipe Service that houses all our data “ingredients”. (Don’t worry about all these new words, we will define and explore them all in future sections.) Finally, we build classes based off our Recipe Service that provide the data for each slice.
In the setup section, we need to import all the data types and functions we use
in our Tables and slices. You will always need Table
and Column
. If you are
going to sum or count things, you will need func
as well. After that
you will need to import the data types present in your table, a full list of
these is available in the SQLAlchemy Docs
from sqlalchemy import func, Table, Column, String, Float
Next, we’ll import the declarative base, which provides a container to store all the metadata about our data tables. This is used by the recipes we’ll talk more about soon.
from sqlalchemy.ext.declarative import declarative_base
Most Juicebox stacks require a connection to Redshift, a type of database, to access their data. We’ve built an easier way to use a preconfigured engine to create connections.
from dataservices.redshift_connectionbase import redshift_create_engine
When we start creating data services to power our slices, we are going to use a recipe, which is a way of defining how we get and use data in our slice. All recipes are built from metrics and dimensions. A metric is a single unit of data that we are displaying in the slice. Dimensions are just like metrics, except that they are used to group or aggregate data in our slice. Recipes can also have filters that are used to limit the data output in our slice. More on this later, but for now we just need to import these components so we can use them in our slices.
from dataservices.recipe import Metric, Dimension, Filter
Recipes are part of the secret sauce that makes Juicebox awesome. The RecipeServiceBaseV3 is the default object that all data services inherit from. It provides most of the services needed by a Juicebox data services.
from dataservices.servicebasev3 import RecipeServiceBaseV3
With all the items we need imported, we start by connecting to the database. This is done by creating an engine, we’ll use in our BaseService.
engine = redshift_create_engine()
Next, we initialize a Base to use as the foundation of our data tables. This base stores important metadata about our tables, and how they function inside the database.
Base = declarative_base()
We begin the main part of our application by defining the data tables that will be used. The data tables are represented by classes, and those classes must inherit from the Base we created in the previous step in order to have access to all the query features available in the platform.
class Census(Base):
__table__ = Table('census', Base.metadata,
Column('state', String(30), primary_key=True),
Column('sex', String(1)),
Column('age', Float()),
Column('pop2000', Float()),
Column('pop2008', Float()),
schema='demo', extend_existing=True)
Tables are defined and assigned to the __table__
attribute of the
class. The Table
constructor takes a table name, an instance of
Base.metadata
, a collection of Column
objects, a schema name that we
will provide to you, and a setting extend_existing=True
to ensure that if
the table is already defined that our new configuration is still applied. The
name must match the table name present in Redshift.
Column objects are defined by a name, type, and other attributes. You can learn
more about Column object types in the SQLAlchemy docs for types. At least
one of the Column objects just be marked as a primary_key
, and is used by
the metadata and other Juicebox features.
Ingredients represent the data we will be using later in data services for our slices. There are three main types of ingredients: Metrics, Dimensions, and Filters. Metrics are values, which are often numeric calculations. Dimensions are typically categorical values that represent a grouping or aggregation. For example, if you wanted to know the average height by gender, “average height” would be a Metric, and “gender” would be a Dimension. It’s important to note that multiple columns from a table can be used to compute a Metric or Dimension. A Filter is used to limit the results of a recipe based on some metric or dimension.
Metrics and Dimensions are all defined with a reference to a column from a data
table or a function operating on one or more columns. They also use keywords
that define the visual formatting of the Ingredient. The singular
and
plural
keywords specify what the singular and plural descriptions of the
ingredient should be. To control the format of numbers, you can use the
format
keyword. If you need an expression to be output that includes the
ingredient everywhere it appears, you can use the formatter
keyword;
however, this is often unnecessary.
An example Metric:
Metric(func.sum(Census.pop2008 - Census.pop2000), format=".3s",
singular='Population Growth')
An example Dimension:
Dimension(Census.state, singular='State', plural='States',
format="")
Recipe services are the core elements of a Juicebox application. They are
made up of collections of Metrics and Dimensions coupled with additional
configuration settings. All Juicebox 3 recipe services inherit from the
RecipeServiceBaseV3
we imported earlier, and define a metric_shelf
and
a dimension_shelf
which contain the Ingredients we wish to use in our data
services. Here is an example of a basic service.
class BasicService(RecipeServiceBaseV3):
metric_shelf = {
'pop2000': Metric(func.sum(Census.pop2000),
singular='Population 2000'),
'pop2008': Metric(func.sum(Census.pop2008),
singular='Population 2008'),
'popdiff': Metric(func.sum(Census.pop2008 - Census.pop2000),
singular='Population Growth'),
'avgage': Metric(func.sum(Census.pop2008 * Census.age) /
func.sum(Census.pop2008), singular='Average Age'),
# A metric using a complex expression
'pctfemale': Metric(func.sum(case([(Census.sex == 'F',
Census.pop2008)], else_=0)) /
func.sum(Census.pop2008),
singular='% Female'),
# a metric using a formatter
'pctdiff': Metric(func.sum(Census.pop2008 - Census.pop2000) /
func.sum(Census.pop2000),
singular='Population Pct Change',
formatters=[lambda x: "Change is {0:0.1%} "
"percent".format(x)]),
}
dimension_shelf = {
# Simplest possible dimension, a SQLAlchemy expression and a label.
'state': Dimension(Census.state, singular='State'),
# This will use the lookup to get display values of "M" and "F"
'sex': LookupDimension(Census.sex, singular='Sex',
lookup={'M': 'Menfolk', "F": "Womenfolk"}),
}
Our recipe service can also specify some values to filter on automatically. By default every slice listens to the slice above it to adjust its view of the data. They are a list of keys on which every slice should be automatically filtered if it is present in the request. It typically matches the dimensions and metrics used in the FilterService, which we will discuss in the next chapter.
An example of using automatic_filter_keys
.
automatic_filter_keys = ('state',)
It is also possible to target a specific database by using the database
attribute on a recipe service. This allows you to specify which database the
recipes should be run against. Most commonly this is used to target the
juicebox
or healthstream
databases. This is not required and by default
targets the juicebox
redshift instance.
An example of specifying the database
.
database = 'juicebox'
We’re all set for building our global filters now.