Oracle by Example brandingWorking with Spark Interpreter

section 0Before You Begin

This 10-minute tutorial shows you how to use Spark to read and query the citi bike data.

This is the 3rd tutorial in the Storing and Analyzing Data with Big Data Cloud series. Perform the tutorials sequentially.

Background

Apache Spark is an open source parallel processing framework for running large-scale data analytics applications across clustered computers. It can handle both batch and real-time analytics.

What Do You Need?


section 1Read Data and Register as a Spark SQL table

  1. In the Big Data Cloud Console Notebook page, click Citi Bike Trip note.
  2. Copy code-snippet-1.txt and paste it in the empty paragraph.
  3. Run the paragraph to read the CSV file and register as a bike_trips_temp temp view.
    Settings context menu
    Description of the illustration a3.jpg

section 3Query the Citi Bike Trip Tables

  1.  Run the following query in the next paragraph to retrieve the number of bike trips based on gender.
    %sql
    select 
     case when a.gender=1 then 'Male' when a.gender=2 then
    'Female' else 'unknown' end gender ,
            a.trip_count 
    from (select gender, count(*) trip_count from bike_trips_temp
    group by gender) a
    Run the paragraph
    Description of the illustration b1.jpg
  2. Run the following query in the next paragraph to retrieve the number of bike trips based on day of the month.
    select dayofmonth, count(*)
    from (select date_format(`Start Time`,"H") hour,
     date_format(`Start Time`,"E") dayofweek,
     date_format(`Start Time`,"d") dayofmonth
    from bike_trips_temp) bike_times
    group by dayofmonth
    Run the paragraph
    Description of the illustration b2.jpg
  3. Run the following query in the next paragraph to retrieve the number of bike trips based on the hour and the day of the week.
    select dayofweek, hour, count(*) from (select date_format(`Start Time`,"H") hour,
     date_format(`Start Time`,"E") dayofweek,
     date_format(`Start Time`,"d") dayofmonth,
     case when gender=1 then 'Male' when gender=2 then 
    'Female' else 'unknown' end gender 
    from bike_trips_temp) bike_times
    where (gender="${gender=Male,Male|Female|unknown}" )
    group by dayofweek, hour
    Run the paragraph
    Description of the illustration b3.jpg

section 3Save the temporary Spark SQL table as a permanent Hive table

In this section, you save a copy of the Spark SQL temporary table as a new permanent Hive table. This might be useful if you want to use BI tools, like Oracle Data Visualization Desktop, to query the permanent table.

  1. Run the following query to retrieve the hive tables.
    show tables
    hive tables
    Description of the illustration c1.jpg
  2. Copy code-snippet-2.txt and paste it in the next paragraph.
  3. Run the paragraph to create a permanent Hive table from SparkSQL temporary table.
    Run the pragraph
    Description of the illustration c3.jpg
    bike_trips_parquet table is created.
  4. Run the following query in the next paragraph to retrieve data from bike_trips_parquet table.
    select * from bike_trips_parquet limit 5
    paragraph output
    Description of the illustration c4.jpg

next stepNext Tutorial

Adding Weather Data to the Object Store


more informationWant to Learn More?