The Case

Mayor Gaia Budskott is facing an electoral calamity. Utility prices are surging, voters aren’t happy and the outlook for the upcoming election is not good. Once again, Mayor B has turned to us to bail them out solve the problem. The billing system is super legacy, has very questionable data quality and is running something called “SQL”. Never heard of it, must be from the 70s.

So, a political crisis bought about by out of control utility bills? Topical, KDA, very topical. One might be tempted to say this little billing issue is just symptomatic of decades of underinvestment by successive governments and the Mayor deserves the retribution of the ballot box. Might.

Anyway, back to the challenge.

Solving

We have a months worth of dirty data to clean up and find the real total. Easy peasy.

First, let’s have a look at the data. It looks like each household has two readings per day, one for power and one for water. Let’s see if there are any duplicates:

Consumption
| summarize Count=count(MeterType) by HouseholdId, Timestamp
| where Count > 2

44314 records. That’s a lot of duplicates. Let’s get rid of them using some standard dedup techniques. We’ll just take the max value for any day’s readings (there are negatives in the dataset if you look).

Consumption
| summarize hint.strategy=shuffle arg_max(Consumed, *) by HouseholdId, MeterType, Timestamp
...

and while we’re here we might as well calculate the new total:

...
| lookup Costs on MeterType
| summarize sum(Consumed * Cost)

Too easy.