Oracle by Example brandingWorking with Hive

section 0Before You Begin

In this 20-minute tutorial, you use Hive to enable SQL queries against our Citi Bike .csv dataset.

This is the second tutorial in the Storing and Analyzing Data with Big Data Cloud series. Perform the tutorials sequentially.

Background

Apache Hive uses SQL to allow users to structure and query large datasets residing in Hadoop Distributed File System (HDFS) and Object Store to obtain useful analytics.

What Do You Need?


section 2Create a Hive Table

In this section, you create a Hive table on top of the citi bike CSV file using 2 variations. Perform these steps to create a “managed” table where Hive manages the storage details (internally Hive will leverage HDFS storage).

  1. Login to Big Data Cloud Console and click Notebook.
  2. Open Citi Bike Trip note and click the + icon below the Download Data and Copy into Object Storage paragraph.
  3. Copy the code_snippet-1.txt and paste it in the empty paragraph.
  4. Run the paragraph by clicking the Run this paragraph icon This is Run this paragraph icon.and view the output.
    running paragraph
    Description of the illustration a4.jpg
    bike_trips table is created.
  5. Copy the code_snippet-2.txt and paste it in the next paragraph. Then run the paragraph.
    running paragraph
    Description of the illustration a5.jpg
    The bike_trips_objectstore table is created.
  6. Run the following query in the next paragraph to view all the Hive tables.
    hive  <<EOF
    show tables;
    EOF
    output paragraph
    Description of the illustration a6.jpg

section 2Query Hive Table with JDBC(Hive) Interpreter

Zeppelin Notebook includes a JDBC interpreter that allows you to run a query as a paragraph and format the results. In BDC, the JDBC interpreter has been per-configured to connect to Hive. Perform the following steps to work with the JDBC interpreter and connect to Hive:

  1. Click the + icon below the paragraph.
  2. Run the following query to view the Hive table using the JDBC interpreter. 
    %jdbc(hive)
    show create table bike_trips
    create table
    Description of the b2.jpg illustration
    The output shows the create table statement for bike_trips table.
  3. Run the following query to select bike riders based on gender from the bike_trips table. Click the Pie Chart icon This is Run this paragraph icon.to view the output as Pie Chart.
    %jdbc(hive)
    
    select 
     case when a.gender=1 then 'Male' when a.gender=2 then 
    'Female' else 'Unknown' end gender,  a.trip_count 
    from (select gender, count(*) trip_count from bike_trips
    group by gender) a
    pie chart output
    Description of the illustration b3.jpg
  4. Run the following query to select bike riders based on gender and day of the week from the bike_trips table. Click the Bar Graph icon Bar Graph icon.to view the output as as Chart with Stacks and Groupings.
    %jdbc(hive)
    
    select gender, dayofweek, count(*)
    from (    select date_format(`StartTime`,"E") dayofweek,
              case when gender=1 then 'Male' when gender=2 then 
    'Female' else 'Unknown' end gender 
    from bike_trips) bike_times 
    group by gender, dayofweek
    chart with stacks or groupings
    Description of the illustration b4.jpg

next stepNext Tutorial

Working with Spark Interpreter


more informationWant to Learn More?