Working with Hive

Before You Begin

In this 20-minute tutorial, you use Hive to enable SQL queries against our Citi Bike .csv dataset.

This is the second tutorial in the Storing and Analyzing Data with Big Data Cloud series. Perform the tutorials sequentially.

Background

Apache Hive uses SQL to allow users to structure and query large datasets residing in Hadoop Distributed File System (HDFS) and Object Store to obtain useful analytics.

What Do You Need?

A running BDC cluster.
BDC account credentials or Big Data Cluster Console direct URL (for example: https://xxx.xxx.xxx.xxx:1080/).
code_snippet-1.txt
code_snippet-2.txt

Create a Hive Table

In this section, you create a Hive table on top of the citi bike CSV file using 2 variations. Perform these steps to create a “managed” table where Hive manages the storage details (internally Hive will leverage HDFS storage).

Login to Big Data Cloud Console and click Notebook.
Open Citi Bike Trip note and click the + icon below the Download Data and Copy into Object Storage paragraph.
Copy the code_snippet-1.txt and paste it in the empty paragraph.
Run the paragraph by clicking the Run this paragraph icon and view the output.
Description of the illustration a4.jpg

bike_trips table is created.
Copy the code_snippet-2.txt and paste it in the next paragraph. Then run the paragraph.
Description of the illustration a5.jpg
The bike_trips_objectstore table is created.
Run the following query in the next paragraph to view all the Hive tables.
```
hive  <<EOF
show tables;
EOF
```
Description of the illustration a6.jpg

Query Hive Table with JDBC(Hive) Interpreter

Zeppelin Notebook includes a JDBC interpreter that allows you to run a query as a paragraph and format the results. In BDC, the JDBC interpreter has been per-configured to connect to Hive. Perform the following steps to work with the JDBC interpreter and connect to Hive:

Click the + icon below the paragraph.
Run the following query to view the Hive table using the JDBC interpreter.
```
%jdbc(hive)
show create table bike_trips
```
Description of the b2.jpg illustration

The output shows the create table statement for bike_trips table.
Run the following query to select bike riders based on gender from the bike_trips table. Click the Pie Chart icon to view the output as Pie Chart.
```
%jdbc(hive)

select 
 case when a.gender=1 then 'Male' when a.gender=2 then 
'Female' else 'Unknown' end gender,  a.trip_count 
from (select gender, count(*) trip_count from bike_trips
group by gender) a
```
Description of the illustration b3.jpg
Run the following query to select bike riders based on gender and day of the week from the bike_trips table. Click the Bar Graph icon to view the output as as Chart with Stacks and Groupings.
```
%jdbc(hive)

select gender, dayofweek, count(*)
from (    select date_format(`StartTime`,"E") dayofweek,
          case when gender=1 then 'Male' when gender=2 then 
'Female' else 'Unknown' end gender 
from bike_trips) bike_times 
group by gender, dayofweek
```
Description of the illustration b4.jpg

Next Tutorial

Working with Spark Interpreter