Monthly Archives: February 2014

When you absolutely have to do a count distinct

Previously, I discussed how horrible it was to attempt to perform a count distinct in Hive; how it would cause you to sort the universe, and then wait until the end of time until a single reducer to complete. The … Continue reading

Posted in Uncategorized | 1 Comment

Hive and JSON made simple

It seems that JSON has become the lingua france for the Web 2.0 world. It’s simple, extendible, easily parsed by browsers, easily understood by humans, and so on. It’s no surprise then that a lot of our Big Data ETL … Continue reading

Posted in Big Data, Hive | Tagged , , , | 32 Comments