All posts in “Uncategorized”

Filling in the gaps

Today I saw Harris Ward asking a question I’ve seen pop up over and over again. And somehow the solution is never obvious.

The problem is how to fill in the gaps when we have nulls in our data stream. For example, our data set looks like:

And we want to replace all those nulls by the last non-null value we’ve seen in the data set: all non null values stay as is, but Mar, Apr and May get the value 20, Jul gets 50 and Sep, Oct and Nov get 40.

If we’re ingesting the data from a CSV, Excel or XML file, it’s easy to achieve this (ever wondered what the “Repeat” flag means in the fields tab of the Text File input step?).

But suppose our data doesn’t come from files. We need to implement the equivalente of the repeat flag inside a PDI transformation.

Invariably, when any of us encounters this requirement we all go through the same attempts. Let’s use the Analytical query step! Or the Group by with a “pass all rows option”. Or the NVL operation of the calculator step. And then we learn these all fail.

This is perhaps one of the best examples where a Javascript step is the best option. Because it lets us persist variables between rows, and we can control their values at will.

So, to achieve the desired result, here’s our proposed solution: add a Javascript step and use the following code:

Variable tmp is only initialized with the first row of data. From then on it’ll keep the last non null value of sales (or 0).

Sure, we’ve all heard the motto “Don’t use Javascript, it kills PDI performance”. But the occasional simple JS script doesn’t hurt that much in most cases, you’re likely to have a slower step elsewhere in your transformation (a DB read or write operation, for example). And, if performance is vital, you can always re-code this as a User defined Java class. But for all but the most extreme applications Javascript will work just fine, though.

Don’t forget to specify on the configuration of the output fields that you want to overwrite the value of the sales field, though:

And here’s the transformed data:

Exactly as intended.

Download the sample transformation here.

Ctools table component tweaks

(tl;dr: jump straight to section 3 below if you already know about scrolling tables)

Tables in Ctools are implemented by, a jQuery plugin to… well, display tables. However, as it normally is the case with Ctools, not all features are exposed via the CDE interface. And how could they, given the astounding number of features available plus the ever growing list of DataTables plugins out there. Implementing that extra bit of behaviour the users really love ends up being more an exercise in googling and finding the right plugin or trick to do the task than exploring the list of available properties in the Advanced column of CDE.

And that is the subject of today’s post. A specific tweak to the DataTables component that may be quite handy to know about.

1. To paginate or not to paginate.
By default tables are paginated in DataTables and, therefore, in CDE. However, especially if the result set is not very large, users may prefer a single page table to be displayed. Ok, that’s easy, just go to your Advanced properties tab and set pagination to False. Done, easy.

2. Don’t paginate, but scroll
However, single page tables are really only useful if your table is displaying a handful of rows. If you’re displaying some 20-30 records, for example, they may not be enough to justify pagination, but they’re way too many to display in a single page. One way around it is via CSS: just set its parent div height to be of a fixed size and set its overflow property to scroll. Again, done, easy.

Ah, except that now the table headers, filter box, etc. disappear when you scroll. And you can’t just set the scroll on the tbody element (ok, maybe you can with some really clever CSS hack). Fortunately, there’s an easy way around. A quick google search brings us to an example showing a scrollable table. The relevant property is scrollY, which can be set to any meaningful unit (e.g., 200px; vh and vw are units proportional to the viewpoort). To use it in CDE all we need is to add a new Extra option on the Advanced properties panel and set the property scrollY to whatever value we want.

Ok, but this is all relatively old news. If you have some experience with CDE and tables you may already know all this. So lets move on to something cooler.

3. But when the table refreshes, the scrollable area goes back to the top
Ah, so now your table scrolls up and down, but it has a major issue: the user wants the table to remember where it was. Namely, when the table refreshes because some parameter changed, the table should keep the search string and the scroll position and go back to the previous state.

For that, unfortunately, there’s no option in DataTables we can use, at least none we know about. So we need to implement it. Fortunately, it’s not that complex and the idea can be generalized for other properties we want to keep across component updates.

Step 1: add the following to the table’s preExecution function:

if( typeof this.tableState == "undefined"){
this.tableState = {
search: '',
scroll: 0
return true;

This creates the tableState property where we can store the two values we want to keep.

Step 2: add the following on the preChange of any selector that changes a parameter the table listens to:

var t = render_customerTable;
if( typeof t.tableState != "undefined"){ = $("#" + t.htmlObject + " div.dataTables_filter input").val();
t.tableState.scroll = $("#" + t.htmlObject + " .dataTables_scrollBody" ).scrollTop();

(we need the condition just in case the selector renders before the table itself, which is the case in the sample provided)

This retrieves the two values we want to keep and updates the tableState to reflect them.

Step 3: add this to the postExecution of the table:

var e = this.placeholder().find("div.dataTables_filter input");
this.placeholder().find(" .dataTables_scrollBody" ).scrollTop(this.tableState.scroll);

This will restore the contents of the search box, force the table to draw itself again (by calling the keyup event) and then scroll the table body.

And that’s it. The table now does what we needed with relatively low effort.

Note that the code was written in the pre-require framework. If your dashboard has support for require.js the code should still work, but there are probably better ways to interact with the table, other than jQuerying its HTML target.

If you want to try it, download the sample we created.

Join the team!

We’re looking for a new junior member to join our team of Business Analytics and Big Data consultants.

If you’re passionate about data and have strong analytical skills this is a good opportunity to join a dynamic team developing high quality solutions using Pentaho and Hadoop.

Previous experience in programming (Java or Javascript preferred) and in SQL is desired, but not essential.

Apply now

Ctools now supports Require.js – But… what about jQuery plugins?

Great news from the Pentaho Community Dashboards front. As of the current version (15.06.30), there’s support for Require.js, which greatly streamlines the javascript downloaded when rendering a dashboard, cleans up the global scope and provides a proper management of modules and dependencies. I won’t bother you with the details, you can read more about it in Pedro Alves’s post.

This is truly a game changer when it comes to developing CDF/CDE dashboards. We have to write our code better, with a proper architecture in place to make all the pieces talk to each other, we need to use the proper methods to set or get parameter values, to run custom functions in pre-execution or post-execution code snippets, etc.

This is not a bad thing, far from it. But it doesn’t come without its challenges. We’ve started playing with the new require.js framework here at Ubiquis and we’ve learned useful tricks already.

Here’s one: remember how easy it was to upload that myAwesomeJqueryPlugin that you found on some github repository and allowed you to implement that super-duper feature on your dashboard? Well, it won’t work anymore. Not without a bit of work, at least. Here’s why: your typical jQuery plugin is a function of the type

(function ($) {
// your awesome code goes here

But now it will fail. The reason is that $ is no longer defined in the global scope. It’s still exposed and you can access it as usual from within any component’s internal methods, but not when you’re loading your jQuery plugins.

So, after some reading we got to a very interesting Stack Overflow post (see , that pretty much solves the problem. We just need some tweaks on the plugin code to make it require friendly:

(function (factory) {
if (typeof define === 'function' && define.amd) {
// AMD. Register as an anonymous module depending on jQuery.
define(['jquery'], factory);
} else {
// No AMD. Register plugin with global jQuery object.
}(function ($) {
// your awesome code goes here

(note that the whole jQuery plugin is passed as a parameter to the factory function)

And there you have it, you can continue to use all the jQuery plugins you want, which are now loaded as require.js modules.

Dynamic date ranges in MDX: what happens if there’s no data?

Imagine the following scenario: a user wants a query to return the sales value from the past N years (lets say 5). It’s quite an easy query, your rows clause would look like this:

SELECT {[Measures].[Sales]} ON COLUMNS,
([Time].[2005].Lag(4) : [Time].[2005]) ON ROWS
FROM [SteelWheelsSales]

(As we’re using our good old friend SteelWheelsSales, lets pretend we’re back in 2005; If you’d like to have SteelWheelsSales updated, say to cover 2013-2015, please vote on the JIRA we created)

Here lies the problem: SteelWheelsSales only has data from 2003 to 2005, so the member [Time].[2005].Lag(4), which would be 2001, doesn’t exist. As such, the first member of that range doesn’t exist and the row set is empty.

And that’s not nice. It would be good to have a query that goes back N years in time but if there’s not enough data it still displays something.

What we need is a dynamic date range. It goes as far back as 4 years ago or, if there aren’t enough years in your database, shows all years.

Here’s what such a query would look like:

Set YEARS as [Time].[Years].Members
Member [Measures].[Year Number] as Rank([Time].[2005], YEARS ) - 1
Member [Measures].[Lag] as Iif( [Measures].[Year Number]<4, [Measures].[Year Number], 4) SELECT [Time].[2005].Lag( [Measures].[Lag] ) : [Time].[2005] ON ROWS, [Measures].[Sales] ON COLUMNS FROM [SteelWheelsSales]

Here's how it works:

  • The Year Number measure determines the rank of the last year of the date range; we subtract 1 to make it zero based;
  • The Lag measure checks whether the last year's rank is at least 4. If it's not, it takes the zero based rank (which means, it takes the first available year as the start of the date range
  • The date range on Rows takes this Lag measure to set the beginning of the range

This query returns all three years available in SteelWheelsSales. However, if more periods are available, it'll display up to 5 years of data.

And we can parametrize the query to take a year number:

Set YEARS as [Time].[Years].Members
Member [Measures].[Year Number] as Rank([Time].[${year}], YEARS ) - 1
Member [Measures].[Lag] as Iif( [Measures].[Year Number]<4, [Measures].[Year Number], 4) SELECT [Time].[${year}].Lag( [Measures].[Lag] ) : [Time].[${year}] ON ROWS, [Measures].[Sales] ON COLUMNS FROM [SteelWheelsSales]

Bonus points: make it smarter, that is, make the query dinamycally return a range of members of the time dimension, regardless of the level of the parameter:

Set MEMBERS as ${timeParameter}.Level.Members
Member [Measures].[Member Number] as (Rank(${timeParameter}, MEMBERS) - 1)
Member [Measures].[Lag] as IIf(([Measures].[Member Number] < 4), [Measures].[Member Number], 4) SELECT {[Measures].[Sales]} ON COLUMNS, (${timeParameter}.Lag([Measures].[Lag]) : ${timeParameter}) ON ROWS FROM [SteelWheelsSales]

This query takes a parameter ${timeParameter} as a fully qualified MDX member, e.g. [Time].[2004] or [Time].[2004].[QTR1] or [Time].[2004].[QTR1].[Jan] and returns a row set with 5 time periods, or a smaller range if we don't have that much data in the past.

1 year gone by

Ubiquis Consulting turned 1 year old last Saturday. It’s been a great year, full of the normal ups and downs of starting a business, but overall a very positive experience. And we got to bake a cake to celebrate the anniversary, which is always a plus.


For our second year, we plan to keep this blog a bit busier, so stay tuned.

So, what else?… Oh, yes, please join me in welcoming Miguel Cunhal to the Ubiquis Consulting family. Miguel is a very bright young consultant and he is looking forward to learn all things BI and Pentaho. We’re sure he’s a very valuable adition to our team.

Hello world, again

We got some very good feedback to our World population demo, thanks everyone.

And in response to your opinions, we made a few modifications:
– Moving the globe no longer brings up the country details overlay;
– Clicking on a country in the table will bring the details overlay;
– When a country is selected, it’s highlighted in a different color.

The new version is updated in the demo site.

Hello world.

Welcome to Ubiquis. Come back to read everything about our work, projects and development.

Take a look at our first demo, currently available at (no login required).

It’s a CDE dashboard with a geographical visualisation built with D3.JS, featuring zoom and pan features.

Here’s a screenshot: