Tag Archives: Hive

Hive and JSON made simple

It seems that JSON has become the lingua france for the Web 2.0 world. It’s simple, extendible, easily parsed by browsers, easily understood by humans, and so on. It’s no surprise then that a lot of our Big Data ETL … Continue reading

Posted in Big Data, Hive | Tagged , , , | 32 Comments

Squash the Long Tail with Brickhouse’s HBase UDFs

The problems we face with Data Science and Big Data often is that we often can express solutions to our problems in simple and elegant terms, which often works well with toy or limited datasets, but can never scale to … Continue reading

Posted in Uncategorized | Tagged , , , , , , | 4 Comments

Use collect to avoid the self-join

Hive is the 5GL for MapReduce One of the confusions in describing what Brickhouse is about, is that Hive has multiple purposes, and different uses for different people. It is analogous to SQL, but that is the trick. There are … Continue reading

Posted in Uncategorized | Tagged , , | Leave a comment

Let’s start off with collect()

One the most fundamental functions in Brickhouse is the collect UDF. Based on collect_set, and similar to methods in Ruby and other languages, collect aggregates records into either an array or map from multiple rows. It is the opposite of … Continue reading

Posted in Uncategorized | Tagged , , , | Leave a comment