************************
Providing Data to Slices
************************

.. include:: ../../complete.rst

Now that we know how to add slices to a stack, we need to know how to provide
data to those slices. There are two ways to do this, the first is to provide a
json fixture file that provides the data in the proper response format. We'll
talk about this more in a later section. The second method, which is more
common is to build a Python module that has a class for each slice to provide
the data needed for the slice.

These Python modules all follow the same basic structure as shown here:

# Setup
# Data Tables
# Recipe Service
# Filter Service
# Slice Services

The setup section is where we import all the needed components and establish
database connections. Next, we define all the tables that hold the data we will
be using in our slices. With our tables defined, we can build a Recipe Service
that houses all our data "ingredients". (Don't worry about all these new words,
we will define and explore them all in future sections.) Finally, we build
classes based off our Recipe Service that provide the data for each slice.

.. _label_setup:

Setup
=====

In the setup section, we need to import all the data types and functions we use
in our Tables and slices. You will always need ``Table`` and ``Column``. If you are
going to sum or count things, you will need ``func`` as well. After that
you will need to import the data types present in your table, a full list of
these is available in the `SQLAlchemy Docs
<http://docs.sqlalchemy.org/en/rel_1_0/core/type_basics.html>`_

.. code-block:: python

    from sqlalchemy import func, Table, Column, String, Float

Next, we'll import the declarative base, which provides a container to store
all the metadata about our data tables. This is used by the recipes we'll talk
more about soon.

.. code-block:: python

    from sqlalchemy.ext.declarative import declarative_base


Most Juicebox stacks require a connection to Redshift, a type of database,
to access their data. We've built an easier way to use a preconfigured engine
to create connections.

.. code-block:: python

    from dataservices.redshift_connectionbase import redshift_create_engine

When we start creating data services to power our slices, we are going to use
a recipe, which is a way of defining how we get and use data in our slice. All
recipes are built from metrics and dimensions. A metric is a single unit of
data that we are displaying in the slice. Dimensions are just like metrics,
except that they are used to group or aggregate data in our slice. Recipes can
also have filters that are used to limit the data output in our slice. More on
this later, but for now we just need to import these components so we can use
them in our slices.

.. code-block:: python

    from dataservices.recipe import Metric, Dimension, Filter

Recipes are part of the secret sauce that makes Juicebox awesome. The
RecipeServiceBaseV3 is the default object that all data services inherit from.
It provides most of the services needed by a Juicebox data services.

.. code-block:: python

    from dataservices.servicebasev3 import RecipeServiceBaseV3


With all the items we need imported, we start by connecting to the database.
This is done by creating an engine, we'll use in our BaseService.

.. code-block:: python

    engine = redshift_create_engine()


Next, we initialize a Base to use as the foundation of our data tables. This
base stores important metadata about our tables, and how they function
inside the database.

.. code-block:: python

    Base = declarative_base()

Data Tables
===========

We begin the main part of our application by defining the data tables that
will be used. The data tables are represented by classes, and those classes
must inherit from the Base we created in the previous step in order to have
access to all the query features available in the platform.

.. code-block:: python

    class Census(Base):
        __table__ = Table('census', Base.metadata,
                          Column('state', String(30), primary_key=True),
                          Column('sex', String(1)),
                          Column('age', Float()),
                          Column('pop2000', Float()),
                          Column('pop2008', Float()),
                          schema='demo', extend_existing=True)

Tables are defined and assigned to the ``__table__`` attribute of the
class. The ``Table`` constructor takes a table name, an instance of
``Base.metadata``, a collection of ``Column`` objects, a schema name that we
will provide to you, and a setting ``extend_existing=True`` to ensure that if
the table is already defined that our new configuration is still applied. The
name must match the table name present in Redshift.

Column objects are defined by a name, type, and other attributes. You can learn
more about Column object types in the `SQLAlchemy docs for types
<http://docs.sqlalchemy.org/en/rel_1_0/core/types.html>`_. At least
one of the Column objects just be marked as a ``primary_key``, and is used by
the metadata and other Juicebox features.

Ingredients
===========

Ingredients represent the data we will be using later in data services for our
slices. There are three main types of ingredients: Metrics, Dimensions, and
Filters. Metrics are values, which are often numeric calculations. Dimensions
are typically categorical values that represent a grouping or aggregation. For
example, if you wanted to know the average height by gender, "average height"
would be a Metric, and "gender" would be a Dimension. It's important to note
that multiple columns from a table can be used to compute a Metric or
Dimension. A Filter is used to limit the results of a recipe based on some
metric or dimension.

Metrics and Dimensions are all defined with a reference to a column from a data
table or a function operating on one or more columns. They also use keywords
that define the visual formatting of the Ingredient. The ``singular`` and
``plural`` keywords specify what the singular and plural descriptions of the
ingredient should be. To control the format of numbers, you can use the
``format`` keyword. If you need an expression to be output that includes the
ingredient everywhere it appears, you can use the ``formatter`` keyword;
however, this is often unnecessary.

An example Metric:

.. code-block:: python

    Metric(func.sum(Census.pop2008 - Census.pop2000), format=".3s",
           singular='Population Growth')

An example Dimension:

.. code-block:: python

    Dimension(Census.state, singular='State', plural='States',
              format="")

Recipe Service
==============

Recipe services are the core elements of a Juicebox application. They are
made up of collections of Metrics and Dimensions coupled with additional
configuration settings. All Juicebox 3 recipe services inherit from the
``RecipeServiceBaseV3`` we imported earlier, and define a ``metric_shelf`` and
a ``dimension_shelf`` which contain the Ingredients we wish to use in our data
services. Here is an example of a basic service.

.. code-block:: python

    class BasicService(RecipeServiceBaseV3):
        metric_shelf = {
            'pop2000': Metric(func.sum(Census.pop2000),
                              singular='Population 2000'),
            'pop2008': Metric(func.sum(Census.pop2008),
                              singular='Population 2008'),
            'popdiff': Metric(func.sum(Census.pop2008 - Census.pop2000),
                              singular='Population Growth'),
            'avgage': Metric(func.sum(Census.pop2008 * Census.age) /
                             func.sum(Census.pop2008), singular='Average Age'),

            # A metric using a complex expression
            'pctfemale': Metric(func.sum(case([(Census.sex == 'F',
                                                Census.pop2008)], else_=0)) /
                                func.sum(Census.pop2008),
                                singular='% Female'),

            # a metric using a formatter
            'pctdiff': Metric(func.sum(Census.pop2008 - Census.pop2000) /
                              func.sum(Census.pop2000),
                              singular='Population Pct Change',
                              formatters=[lambda x: "Change is {0:0.1%} "
                                                    "percent".format(x)]),
        }

        dimension_shelf = {
            # Simplest possible dimension, a SQLAlchemy expression and a label.
            'state': Dimension(Census.state, singular='State'),

            # This will use the lookup to get display values of "M" and "F"
            'sex': LookupDimension(Census.sex, singular='Sex',
                                   lookup={'M': 'Menfolk', "F": "Womenfolk"}),
        }

Our recipe service can also specify some values to filter on automatically.
By default every slice listens to the slice above it to adjust its view of the
data. They are a list of keys on which every slice should be automatically
filtered if it is present in the request. It typically matches the dimensions
and metrics used in the FilterService, which we will discuss in the next
chapter.

An example of using ``automatic_filter_keys``.

.. code-block:: python

        automatic_filter_keys = ('state',)

It is also possible to target a specific database by using the ``database``
attribute on a recipe service. This allows you to specify which database the
recipes should be run against. Most commonly this is used to target the
``juicebox`` or ``healthstream`` databases. This is not required and by default
targets the ``juicebox`` redshift instance.

An example of specifying the ``database``.

.. code-block:: python

        database = 'juicebox'

We're all set for building our global filters now.