Tag Archives: Hadoop

Hive and JSON made simple

It seems that JSON has become the lingua france for the Web 2.0 world. It’s simple, extendible, easily parsed by browsers, easily understood by humans, and so on. It’s no surprise then that a lot of our Big Data ETL … Continue reading

Posted in Big Data, Hive | Tagged , , , | 32 Comments

Let’s start off with collect()

One the most fundamental functions in Brickhouse is the collect UDF. Based on collect_set, and similar to methods in Ruby and other languages, collect aggregates records into either an array or map from multiple rows. It is the opposite of … Continue reading

Posted in Uncategorized | Tagged , , , | Leave a comment