Today I saw Harris Ward asking a question I’ve seen pop up over and over again. And somehow the solution is never obvious.
The problem is how to fill in the gaps when we have nulls in our data stream. For example, our data set looks like:
And we want to replace all those nulls by the last non-null value we’ve seen in the data set: all non null values stay as is, but Mar, Apr and May get the value 20, Jul gets 50 and Sep, Oct and Nov get 40.
If we’re ingesting the data from a CSV, Excel or XML file, it’s easy to achieve this (ever wondered what the “Repeat” flag means in the fields tab of the Text File input step?).
But suppose our data doesn’t come from files. We need to implement the equivalente of the repeat flag inside a PDI transformation.
Invariably, when any of us encounters this requirement we all go through the same attempts. Let’s use the Analytical query step! Or the Group by with a “pass all rows option”. Or the NVL operation of the calculator step. And then we learn these all fail.
Variable tmp is only initialized with the first row of data. From then on it’ll keep the last non null value of sales (or 0).
Don’t forget to specify on the configuration of the output fields that you want to overwrite the value of the sales field, though:
And here’s the transformed data:
Exactly as intended.
Download the sample transformation here.