Status change fact table – Part 2 (The input data)

(See part 1 of this series here)

To make matters simpler we’ll assume that a table with a stream of order events already exists and that it’s denormalised, with each row containing all the relevant data. Either the source system already had that information or it was built in an earlier phase of the ETL.

Table order_events:
Order_id int
Order_date datetime
Customer_name string
Event_date datetime
Event_name string
Warehouse string
Carrier string

Here’s a sample of the data we’ll be dealing with:

101/01/16Acme Industries02/01/16 08:00Preparing shipmentLondonDeliveries express
101/01/16Acme Industries03/01/16 10:00Shipped from warehouseLondonDeliveries express
101/01/16Acme Industries03/01/16 14:30Delivery failedLondonDeliveries express
101/01/16Acme Industries03/01/16 18:00Returned to warehouseLondonDeliveries express
101/01/16Acme Industries04/01/16 09:00Shipped from warehouseLondonDeliveries express
101/01/16Acme Industries04/01/16 11:30Delivery failedLondonDeliveries express
101/01/16Acme Industries04/01/16 18:00Returned to warehouseLondonDeliveries express
101/01/16Acme Industries05/01/16 10:30Shipped from warehouseLondonDeliveries express
101/01/16Acme Industries05/01/16 15:00Delivery successfulLondonDeliveries express
201/01/16Boggs Corp02/01/16 09:00Preparing shipmentLondonDeliveries express
201/01/16Boggs Corp03/01/16 10:00Shipped from warehouseLondonDeliveries express
201/01/16Boggs Corp03/01/16 11:00Delivery successfulLondonDeliveries express
302/01/16Boggs Corp03/01/16 08:00Preparing shipmentLiverpoolWeDeliverStuff
302/01/16Boggs Corp04/01/16 09:30Shipped from warehouseLiverpoolWeDeliverStuff
302/01/16Boggs Corp04/01/16 15:00Delivery successfulLiverpoolWeDeliverStuff
302/01/16Boggs Corp08/01/16 16:30Returned by customerLiverpoolWeDeliverStuff
302/01/16Boggs Corp09/01/16 08:00Preparing shipmentLiverpoolWeDeliverStuff
302/01/16Boggs Corp10/01/16 10:00Shipped from warehouseLiverpoolWeDeliverStuff
302/01/16Boggs Corp10/01/16 14:00Delivery successfulLiverpoolWeDeliverStuff

Our data sample has 3 orders, from 2 different customers and that shipped from two different warehouses. One of the orders was only delivered after a couple of failed attempts and in another the products were returned and had to be replaced.

Our goal in the next article is to build a fact table that can track all events that happened to all orders and provide meaningful analytics to the client.

In the next part of this series we’re going to explore a data model that allows analysing this dataset.

Back to blog

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

  • Graph Databases: Loading Data with Neo4j

    Graph databases are becoming more popular as a way of storing and analysing large connected datasets. Neo4j is a...

    Read more
  • Date Dimension Revisited: Bank Holidays

    Everyone familiar with data warehousing knows about the date dimension. It’s one of the first steps in the creation...

    Read more
  • Setup Pentaho BA Server to use SSL Certificates

    SSL Certificate Java Truststore Setup SSL, or Secure Socket Layer, is a technology which allows web browsers and web...

    Read more