(See part 1 of this series here)
To make matters simpler we’ll assume that a table with a stream of order events already exists and that it’s denormalised, with each row containing all the relevant data. Either the source system already had that information or it was built in an earlier phase of the ETL.
Table order_events:
Order_id int
Order_date datetime
Customer_name string
Event_date datetime
Event_name string
Warehouse string
Carrier string
Here’s a sample of the data we’ll be dealing with:
order_id | order_date | customer_name | event_date | event_name | warehouse | carrier |
---|---|---|---|---|---|---|
1 | 01/01/16 | Acme Industries | 02/01/16 08:00 | Preparing shipment | London | Deliveries express |
1 | 01/01/16 | Acme Industries | 03/01/16 10:00 | Shipped from warehouse | London | Deliveries express |
1 | 01/01/16 | Acme Industries | 03/01/16 14:30 | Delivery failed | London | Deliveries express |
1 | 01/01/16 | Acme Industries | 03/01/16 18:00 | Returned to warehouse | London | Deliveries express |
1 | 01/01/16 | Acme Industries | 04/01/16 09:00 | Shipped from warehouse | London | Deliveries express |
1 | 01/01/16 | Acme Industries | 04/01/16 11:30 | Delivery failed | London | Deliveries express |
1 | 01/01/16 | Acme Industries | 04/01/16 18:00 | Returned to warehouse | London | Deliveries express |
1 | 01/01/16 | Acme Industries | 05/01/16 10:30 | Shipped from warehouse | London | Deliveries express |
1 | 01/01/16 | Acme Industries | 05/01/16 15:00 | Delivery successful | London | Deliveries express |
2 | 01/01/16 | Boggs Corp | 02/01/16 09:00 | Preparing shipment | London | Deliveries express |
2 | 01/01/16 | Boggs Corp | 03/01/16 10:00 | Shipped from warehouse | London | Deliveries express |
2 | 01/01/16 | Boggs Corp | 03/01/16 11:00 | Delivery successful | London | Deliveries express |
3 | 02/01/16 | Boggs Corp | 03/01/16 08:00 | Preparing shipment | Liverpool | WeDeliverStuff |
3 | 02/01/16 | Boggs Corp | 04/01/16 09:30 | Shipped from warehouse | Liverpool | WeDeliverStuff |
3 | 02/01/16 | Boggs Corp | 04/01/16 15:00 | Delivery successful | Liverpool | WeDeliverStuff |
3 | 02/01/16 | Boggs Corp | 08/01/16 16:30 | Returned by customer | Liverpool | WeDeliverStuff |
3 | 02/01/16 | Boggs Corp | 09/01/16 08:00 | Preparing shipment | Liverpool | WeDeliverStuff |
3 | 02/01/16 | Boggs Corp | 10/01/16 10:00 | Shipped from warehouse | Liverpool | WeDeliverStuff |
3 | 02/01/16 | Boggs Corp | 10/01/16 14:00 | Delivery successful | Liverpool | WeDeliverStuff |
Our data sample has 3 orders, from 2 different customers and that shipped from two different warehouses. One of the orders was only delivered after a couple of failed attempts and in another the products were returned and had to be replaced.
Our goal in the next article is to build a fact table that can track all events that happened to all orders and provide meaningful analytics to the client.
In the next part of this series we’re going to explore a data model that allows analysing this dataset.