All posts in “ctools”

Using PDI and C-tools to display real time scores analysis for UEFA Euro 2016

So, the Euro 2016 started this past Friday. We’ll have matches almost everyday until July 10th.

And here at Ubiquis we decided to start a little project around the Euro. The basic idea is: suppose England gets to half time level with Slovakia. Given both teams’ past history, what’s the most likely scenario? A win for England? A draw? A win for Slovakia? We got score change information from all Euro finals matches since 1960 from Wikipedia and parsed it with PDI. And using a real time scores API called, built a couple queries that answer one basic question: “out of all matches in which team A was leading/trailing by X goals or more, at the same point in time, how many ended up in wins, losses and draws?”.

So, for example, when yesterday Germany was leading Ukraine by 1 goal at Half time, the dashboard would give us some idea of how often Germany managed to keep a lead and win the match, or how often Ukraine managed to turn the tables around when losing by 1 goal or more and ended up drawing or winning the match.

The dataset is quite small and the data model very simple, but it serves to show how Pentaho C-tools and PDI can be leveraged to create a real time information system, fed by external data sources, in a reliable maner.

Here’s the link: Euro 2016 Analysis dashboard (no login required).

Hopefully we’ll use the learnings from this project to expand its scope and reliability and use it for other sports competitions. Stay tuned.

Hello world, again

Well, we finally got round to upgrade our Pentaho installation to 5.2 and decided to revamp our World Population dashboard in the process.

First thing you may notice: it’s called World Data now, not World Population. This is because we plugged in data available at the Gapminder and add a few selectors to allow you to choose different metrics. Anything from Overall population to GDP/capita in inflation-adjusted PPP dollars, to percentage of renewable energies, you name it and probably Gapminder has it.

The best part: there’s time series data, sometimes dating as far back as the Middle Ages, which allows us to show how those indicators evolved for any particular country. Just click on a country and you’ll see.

We also added a switch to allow you to change the color scale from linear to log (the algorithm decided automatically which one to use, but sometimes it may choose unwisely).

Here, enjoy it: World Data demo dashboard.

Reading/Writing files from a Kettle CDA datasource in Pentaho 5.x

Kettle datasources in CDA are the most versatile way of getting data from disparate systems into a dashboard or report. Both C-tools and PRD expose Kettle as a datasource, so we can call a transformation that queries several different systems (databases, web services, etc.), do some Kettle magic on the results and then ship the final resultset back to the server.

However… there’s no way out of the box to reference sub-transformations in Pentaho 5.x. Solution files are now hosted on Jackrabbit and not on the file system and ${Internal.Transformation.Filename.Directory} just doesn’t work. If you have a transformation calling a sub-transformation and upload both to the repository, when you run it you get an error message saying that it cannot find the file.

Although this usually isn’t a problem, as most Kettle transformations used by dashboards are relatively simple, for more complex tasks you may want to use mappings (sub-transformations), call a job or transformation executor or, and perhaps the best use case, use metadata injection. In these scenarios you _really_ need to be able to call ktr files from the BA server context, and as it happens you can’t.

Well, sort of. As it turns out, you can.

The problem: lets say you have your server installed in /opt/pentaho/biserver-ce (or biserver-ee, we at Ubiquis don’t discriminate), and your kettle file is in your repository in /public/myDashboard/files/kettle/myTransformation.ktr. You want to use Metadata injection on a transformation called myTemplate.ktr and so you reference it by ${Internal.Transformation.Filename.Directory}/myTemplate.ktr in your metadata injection step. When you run your datasource, your error message will say something like

Cannot read from file /opt/pentaho/biserver-ce/pentaho-solutions/public/myDashboard/files/kettle/myTemplate.ktr because it’s not a file.

So what seems to be happening is that Kettle is using your pentaho-solutions folder as the root folder and it just appends the JCR path to it. Which obviously fails, as there’s no physical counterpart in your filesystem for the JCR files.

But, if you use the Pentaho Repository Synchronizer plugin to copy the contents of JCR to the filesystem and vice-versa (which is perhaps the simplest way to have your solution files under GIT or SVN, at least until the Ivy GS plugin becomes fully functional), then the synchronisation process will create a folder repositorySynchronizer inside your pentaho-solutions folder and use it to create the copies of every JCR file.

So, after synchronising JCR with the file system, your template transformation can be found in /opt/pentaho/biserver-ce/pentaho-solutions/repositorySynchronizer/public/myDashboard/files/kettle.

And now, the funny bit: to get it working, just create a symbolic link to remove the extra repositorySynchronizer element from your path: go to your pentaho-solutions folder and just type ln -s repositorySynchronizer/public public.

Now the template transformation is found and you can reference it from a transformation called by a dashboard.

Credit for the discovery goes to Miguel Cunhal, our junior consultant, that had the tough task of getting our metadata injection transformation running inside a demo dashboard. But more on that in another post.