Utilizando o Hive e o Impala através do banco de dados Oracle
Por Yenugula Venkata Ravi Kumar (OCM), Adrian Neagu (OCM) e Alex Zaballa (OCM),
Publicado em Janeiro 2018
Introdução
Atualmente a disseminação de tecnologias de Big Data nas organizações tornou-se um fato irrefutável. Com isso, ocorrem os problemas de integração com os sistemas legados. Neste artigo iremos demonstrar a utilização do banco de dados Oracle com o Hive e o Impala.
Instalação e configuração
Ambiente Oracle
- Oracle Virtual Box com Oracle Enterprise Linux 6.5
- Oracle Database 12.1.0.2.0 Enterprise Edition
- Hostname: Node1
- IP: 192.168.1.20
Ambiente Cloudera
- Oracle Virtual Box com CentOS 6.7
- Cloudera 5.12
- Hostname: Cloudera
- IP: 192.168.1.88
Para configurar o ambiente Cloudera, usamos uma imagem Cloudera Quickstart que pode ser baixada em https://www.cloudera.com/downloads/quickstart_vms/5-12.html.
As tabelas Hive usadas neste artigo foram criadas de acordo com o tutorial que pode ser encontrado em .
Os procedimentos para instalação do ambiente Oracle podem ser encontrados em https://docs.oracle.com/cd/E11882_01/gateways.112/e12013/toc.htm.
Instalando os drivers ODBC para o Cloudera Hive e Cloudera Impala
Como pré-requisito para instalar os drivers ODBC Cloudera Hive e Cloudera Impala precisamos utilizar o unixODBC ou o iODBC. Neste artigo optamos por utilizar o unixODBC:
[root@node1 etc]# yum install unixODBC
Loaded plugins: refresh-packagekit, security
..........................................................................................................
Setting up Install Process
Resolving Dependencies
--> Running transaction check
--> Package unixODBC.x86_64 0:2.2.14-14.el6 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
=============================================================
Package Arch Version Repository Size=============================================================
Installing:
unixODBC x86_64 2.2.14-14.el6
ol6_latest 377 k
Transaction Summary
============================================================
Install 1 Package(s)
Total download size: 377 k
Installed size: 1.1 M
Is this ok [y/N]: y
Downloading Packages:
unixODBC-2.2.14-14.el6.x86_64.rpm
…....................................................................................................................................................
Faça o download do driver ODBC Cloudera Hive no seguinte endereço: http://www.cloudera.com/downloads/connectors/hive/odbc/2-5-12.html
Utilizando o usuário root, execute o seguinte comando para instalar o driver ODBC Cloudera Hive:
[root@node1 kit]# rpm -Uhv ClouderaHiveODBC-2.5.19.1004-1.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:ClouderaHiveODBC ########################################### [100%]
[root@node1 kit]#
Faça o download do driver ODBC Cloudera Impala no seguinte endereço: (http://www.cloudera.com/downloads/connectors/impala/odbc/2-5-22.html)
Utilizando o usuário root, execute o seguinte comando para instalar o driver ODBC Cloudera Impala:
[root@node1 kit]# rpm -Uhv ClouderaImpalaODBC-2.5.33.1004-1.el6.x86_64.rpm
Preparing... ###################### [100%]
1:ClouderaImpalaODBC ###################### [100%]
[root@node1 kit]#
Configurando o driver ODBC Cloudera Hive
O diretório de instalação do driver está localizado em /opt/cloudera/hiveodbc. Em /opt/cloudera/hiveodbc/Setup encontramos arquivos de configuração de exemplo que podem ser usados como template.
Como root, crie o diretório /usr/local/odbc e copie o arquivo odbcinst.ini:
[root@node1 ~]# mkdir -p /usr/local/odbc
[root@node1 Setup]# cp odbcinst.ini /usr/local/odbc
Copie o arquivo odbc.ini para /home/oracle:
[root@node1 Setup]# cp odbc.ini /home/oracle
Copie o arquivo cloudera.hiveodbc.ini de /opt/cloudera/hiveodbc/lib/64 para o /etc:
[root@node1 64]# cp cloudera.hiveodbc.ini /etc
Agora vamos configurar as variáveis de ambiente para o driver ODBC Cloudera Hive.
Edite o arquivo /home/oracle/.bash_profile e adicione as seguintes variáveis:
[oracle@node1 ~]$ vi
#.bash_profile # .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1
export ORACLE_SID=ORCL
export ODBCINI=~/odbc.ini
export ODBCSYSINI=/usr/local/odbc
export CLOUDERAHIVEINI=/etc/cloudera.hiveodbc.ini
PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin
export LD_LIBRARY_PATH=$LIBRARY_PATH:$ORACLE_HOME/lib
Após isso, carregue as variáveis de ambiente com o seguinte comando:
[oracle@node1 ~]$ source .bash_profile
Iremos alterar o nome do data source para HIVEDSN e alterar apenas as variáveis host, port e UID:
[HiveDSN]
Description = Cloudera ODBC Driver for Apache Hive (64-bit) DSN
Driver = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
HOST = 192.168.1.88
PORT = 10000
Schema = default
ServiceDiscoveryMode = 0
ZKNamespace =
HiveServerType = 2
AuthMech = 2
ThriftTransport = 1
UseNativeQuery = 0
KrbHostFQDN = [Hive Server 2 Host FQDN]
KrbServiceName = [Hive Server 2 Kerberos service name]
KrbRealm = [Hive Server 2 Kerberos realm]
SSL = 0
TwoWaySSL = 0
ClientCert =
ClientPrivateKey =
ClientPrivateKeyPassword =
UID=cloudera
Teste a conexão conforme exemplo abaixo:
[oracle@node1 ~]$ isql -v hivedsn
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL>
Liste as tabelas e faça um count na tabela orders para ter certeza que tudo está ok:
SQL> show tables;
+-----------------------------------+
|
tab_name
|+-----------------------------------------+
| categories
|| customers
|| departments
|| order_items
|| orders
|| products
|+------------------------------------------+
SQLRowCount returns -1
6 rows fetched
SQL> select count(*) from orders;
+---------------------+
| EXPR_1 |
+---------------------+
| 68883 |
+---------------------+
SQLRowCount returns -1
1 rows fetched
Configurando o Oracle Database Gateway para utilizar o driver ODBC Cloudera Hive
No diretório $ORACLE_HOME/hs/admin criaremos um arquivo de parâmetros chamado initHIVEDSN.ora (utilizaremos o arquivo dg4odbc.ini como template):
[oracle@node1 admin]$ vi initHIVEDSN.ora
# This is a sample agent init file that contains the HS parameters that are
# needed for the Database Gateway for ODBC
HS_FDS_CONNECT_INFO = "HIVEDSN"
HS_FDS_TRACE_LEVEL = 0
HS_FDS_SHAREABLE_NAME = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
#
# ODBC specific environment variables
#
set ODBCINI=/home/oracle/odbc.ini
Configuração do Listener
Abra o arquivo listener.ora e faça as seguintes alterações:
[oracle@node1 admin]$ vi /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
# listener.ora Network Configuration File:
/u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(PROGRAM = dg4odbc)
(ARGS = ENVS=LD_LIBRARY_PATH=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so)
(SID_NAME = HIVEDSN)
(ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)
)
)
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))
)
(DESCRIPTION =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
)
ADR_BASE_LISTENER = /u01/app/oracle
Adicione a entrada HIVEDSN ao arquivo tnsnames.ora:
HIVEDSN =
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521))
(CONNECT_DATA=(SID=HIVEDSN))
(HS=OK)
)
Faça o reload do listener:
[oracle@node1 admin]$ lsnrctl reload
LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:05:57
Copyright (c) 1991, 2014, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))
The command completed successfully
[oracle@node1 admin]$
Verifique o status do listener:
[oracle@node1 admin]$ lsnrctl status
LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:06:32
Copyright (c) 1991, 2014, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))
STATUS of the LISTENER
-----------------------------------------------------------------------------------------------------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date 19-JUN-2016 08:04:43
Uptime 1 days 0 hr. 1 min. 49 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/node1/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=node1)(PORT=5500))
(Security=(my_wallet_directory=/u01/app/oracle/product/12.1.0/dbhome_1/admin/ORCL/xdb_wallet)) (Presentation=HTTP)(Session=RAW))
Services Summary...
Service "HIVEDSN" has 1 instance(s).
Instance "HIVEDSN", status UNKNOWN, has 1 handler(s) for this service...
Service "ORCL" has 1 instance(s).
Instance "ORCL", status READY, has 1 handler(s) for this service...
Service "ORCLXDB" has 1 instance(s).
Instance "ORCL", status READY, has 1 handler(s) for this service...
The command completed successfully
Utilize o tnsping para verificar se a entrada HIVEDSN está funcionando corretamente:
[oracle@node1 admin]$ tnsping HIVEDSN
TNS Ping Utility for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:07:56
Copyright (c) 1997, 2014, Oracle. All rights reserved.
Used parameter files:
/u01/app/oracle/product/12.1.0/dbhome_1/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION= (ADDRESS=(PROTOCOL=tcp)(HOST=node1)
(PORT=1521)) (CONNECT_DATA=(SID=HIVEDSN)) (HS=OK))
OK (10 msec)
Crie um database link público chamado hivedsn:
[oracle@node1 admin]$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 20 08:09:35 2016
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> create public database link hivedsn connect to cloudera identified by cloudera using 'HIVEDSN';
Database link created.
Agora tente buscar os dados utilizando o database link recém criado:
SQL> select * from customers@hivedsn;
select * from customers@hivedsn
*
ERROR at line 1:
ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[Cloudera][ODBC] (11560) Unable to locate SQLGetPrivateProfileString function.
{HY000,NativeErr = 11560}
ORA-02063: preceding 2 lines from HIVEDSN
Para resolver este problema, temos que adicionar o caminho completo para a lib libodbcinst.so em /etc/cloudera.hiveodbc.ini:
[Driver]
ODBCInstLib=/usr/lib64/libodbcinst.so
ErrorMessagesPath=/opt/cloudera/hiveodbc/ErrorMessages
LogLevel=0
LogPath=
SwapFilePath=/tmp
Também exporte a variável LD_PRELOAD:
[oracle@node1 ~]$ export LD_PRELOAD=/usr/lib64/libodbcinst.so
[oracle@node1 ~]$ ldd /opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
linux-vdso.so.1 => (0x00007fffad5ff000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fd1af2b2000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd1af094000)
libsasl2.so.2 => /usr/lib64/libsasl2.so.2 (0x00007fd1aee7a000)
librt.so.1 => /lib64/librt.so.1 (0x00007fd1aec72000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fd1aea57000)
libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fd1ae854000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fd1ae54e000)
libm.so.6 => /lib64/libm.so.6 (0x00007fd1ae2c9000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd1ae0b3000)
libc.so.6 => /lib64/libc.so.6 (0x00007fd1add20000)
/lib64/ld-linux-x86-64.so.2 (0x00000033a1a00000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fd1adae8000)
libfreebl3.so => /lib64/libfreebl3.so (0x00007fd1ad886000)
Faça o reload do listener e teste novamente:
[oracle@node1 admin]$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 20 08:23:36 2016 Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> select * from customers@hivedsn; select * from customers@hivedsn * ERROR at line 1: ORA-28500: connection from ORACLE to a non-Oracle system returned this message: [Cloudera][ODBC] (11470) Transactions are not supported. {HYC00,NativeErr = 11470} ORA-02063: preceding 2 lines from HIVEDSN
Para resolver este problema, edite o arquivo initHIVEDSN.ora e adicione o parâmetro HS_TRANSACTION_MODEL:
# This is a sample agent init file that contains the HS parameters that are
# needed for the Database Gateway for ODBC
HS_FDS_CONNECT_INFO = "HIVEDSN"
HS_FDS_TRACE_LEVEL = 0
HS_FDS_SHAREABLE_NAME = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
HS_TRANSACTION_MODEL=READ_ONLY_AUTOCOMMIT
#
# ODBC specific environment variables
#
set ODBCINI=/home/oracle/odbc.ini
Agora teste novamente:
SQL> select * from customers@hivedsn
-------------------------------------------------------------------------------------------
1530 David Smith XXXXXXXXX XXXXXXXXX
Configurando o driver ODBC Cloudera Impala
O diretório de instalação do driver está localizado em /opt/cloudera/impalaodbc. Em
/opt/cloudera/impalaodbc/Setup encontraremos arquivos de configuração de exemplo que podem ser usados como template.
Como root, crie o diretório /usr/local/odbc e copie o arquivo odbcinst.ini:
[root@node1 ~]# mkdir -p /usr/local/odbc
[root@node1 Setup]# cp odbcinst.ini /usr/local/odbc
Copie o arquivo odbc.ini para /home/oracle:
[root@node1 Setup]# cp odbc.ini /home/oracle
Copie o arquivo cloudera.impalaodbc.inide /opt/cloudera/impalaodbc/lib/64 para o /etc:
[root@node1 64]# cp cloudera.impalaodbc.ini /etc
Agora vamos configurar as variáveis de ambiente para o driver ODBC Cloudera Impala.
Edite o arquivo /home/oracle/.bash_profile e adicione as seguintes variáveis:
[oracle@node1 ~]$ vi .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1
export ORACLE_SID=ORCL
export ODBCINI=~/odbc.ini
export ODBCSYSINI=/usr/local/odbc
export SIMBAINI=/etc/cloudera.impalaodbc.ini
PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin
export LD_LIBRARY_PATH=$LIBRARY_PATH:$ORACLE_HOME/lib
Após isso, carregue as variáveis de ambiente com o seguinte comando:
[oracle@node1 ~]$ source .bash_profile
Iremos alterar o nome do data source para ImplDSN e alterar apenas as variáveis host e port:
[ImplDSN]
# Description: DSN Description.
# This key is not necessary and is only to give a description of the data source.
Description=Cloudera ODBC Driver for Impala (64-bit) DSN
# Driver: The location where the ODBC driver is installed to.
Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
HOST=192.168.1.88
PORT=21050
Database=default
AuthMech=0
Teste a conexão conforme exemplo abaixo:
[oracle@node1 ~]$ isql -v ImplDSN
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL>
Liste as tabelas e faça um count na tabela orders para ter certeza que tudo está ok:
SQL> show tables;
+------------------+
| tab_name |+------------------+
| categories || customers || departments || order_items || orders || products |+-------------------+
SQLRowCount returns -1
6 rows fetched
SQL> select count(*) from orders;
+---------------------+
| EXPR_1 |
+---------------------+
| 68883 |
+---------------------+
SQLRowCount returns -1
1 rows fetched
Configurando o Oracle Database Gateway para utilizar o driver ODBC Cloudera Impala
No diretório $ORACLE_HOME/hs/admin criaremos um arquivo de parâmetros chamado initIMPLDSN.ora (utilizaremos o arquivo dg4odbc.ini como template):
[oracle@node1 admin]$ vi initIMPLDSN.ora
# This is a sample agent init file that contains the HS parameters that are
# needed for the Database Gateway for ODBC
HS_FDS_CONNECT_INFO = "HIVEDSN"
HS_FDS_TRACE_LEVEL = 0
HS_FDS_SHAREABLE_NAME = /opt/cloudera/impalaodbc/lib/64/libclouderahiveodbc64.so
#
# ODBC specific environment variables
#
set ODBCINI=/home/oracle/odbc.ini
Modifique o arquivo /etc/cloudera.impalaodbc.ini conforme abaixo:
[Driver]
ODBCInstLib=/usr/lib64/libodbcinst.so
ErrorMessagesPath=/opt/cloudera/impalaodbc/ErrorMessages/
LogLevel=0
LogPath=
SwapFilePath=/tmp
Configuração do Listener
Abra o arquivo listener.ora e faça as seguintes alterações:
[oracle@node1 admin]$ vi /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
# listener.ora Network Configuration File: /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(PROGRAM = dg4odbc)
(ARGS = ENVS=LD_LIBRARY_PATH=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so)
(SID_NAME = IMPLDSN)
(ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)
)
)
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))
)
(DESCRIPTION =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
)
ADR_BASE_LISTENER = /u01/app/oracle
Adicione a entrada IMPLDSN ao arquivo tnsnames.ora:
IMPLDSN =
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521))
(CONNECT_DATA=(SID=IMPLDSN))
(HS=OK)
)
Faça o reload do listener:
[oracle@node1 admin]$ lsnrctl reload
LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:05:57
Copyright (c) 1991, 2014, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))
The command completed successfully
[oracle@node1 admin]$
Verifique o status do listener:
[oracle@node1 admin]$ lsnrctl status
LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:06:32
Copyright (c) 1991, 2014, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))
STATUS of the LISTENER
------------------------------------------------------------------------------------------------------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date 19-JUN-2016 08:04:43
Uptime 1 days 0 hr. 1 min. 49 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/node1/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=node1)(PORT=5500))
(Security=(my_wallet_directory=/u01/app/oracle/product/12.1.0/dbhome_1/admin/ORCL/xdb_wallet))(Presentation=HTTP)(Session=RAW))
Services Summary...
Service "IMPLDSN" has 1 instance(s).
Instance "IMPLDSN", status UNKNOWN, has 1 handler(s) for this service...
Service "ORCL" has 1 instance(s).
Instance "ORCL", status READY, has 1 handler(s) for this service...
Service "ORCLXDB" has 1 instance(s).
Instance "ORCL", status READY, has 1 handler(s) for this service...
The command completed successfully
Utilize o tnsping para verificar se a entrada IMPLDSN está funcionando corretamente:
[oracle@node1 admin]$ tnsping IMPLDSN
TNS Ping Utility for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:07:56
Copyright (c) 1997, 2014, Oracle. All rights reserved.
Used parameter files:
/u01/app/oracle/product/12.1.0/dbhome_1/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION= (ADDRESS=(PROTOCOL=tcp)(HOST=node1)
(PORT=1521)) (CONNECT_DATA=(SID=IMPLDSN)) (HS=OK))
OK (10 msec)
Crie um database link publico chamado impldsn:
[oracle@node1 admin]$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 20 08:09:35 2016
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> create public database link impldsn connect to cloudera identified by cloudera using 'IMPLDSN';
Database link created.
Agora, tente buscar os dados utilizando o database link criado:
SQL> select * from customers@impldsn
-----------------------------------------------------------------------------------
1530 David Smith XXXXXXXXX XXXXXXXXX
Observações
Devido a limitações na sintaxe e na velocidade, sempre que possível, use o Impala ao invés do Hive. Para algumas queries complexas, é preferível criar uma view dentro do Hive ou Impala e acessar a view através do Oracle.
Por exemplo, se tentarmos executar a consulta abaixo, ocorrerá um erro de que o recurso não é suportado:
SQL> l
1 select c."category_name", count("order_item_quantity") as count
2 from order_items@impaladsn oi
3 inner join products@impaladsn p on oi."order_item_product_id" = p."product_id"
4 inner join categories@impaladsn c on c."category_id" = p."product_category_id"
5 group by c."category_name"
6* order by count("order_item_quantity") desc
SQL> /
select c."category_name", count("order_item_quantity") as count
*
ERROR at line 1:
ORA-02070: database IMPALADSN does not support in this context
Para resolver esse problema, podemos criar uma view no Impala:
[cloudera@quickstart Desktop]$ impala-shell
Starting Impala Shell without Kerberos authentication
Connected to quickstart.cloudera:21000
Server version: impalad version 2.5.0-cdh5.7.0 RELEASE (build ad3f5adabedf56fe6bd9eea39147c067cc552703)
***********************************************************************************
Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights reserved.
(Impala Shell v2.5.0-cdh5.7.0 (ad3f5ad) built on Wed Mar 23 11:33:33 PDT 2016)
After running a query, type SUMMARY to see a summary of where time was spent.
***********************************************************************************
[quickstart.cloudera:21000] >
create view summaryvw as -- Most popular product categories
> select c.category_name, count(order_item_quantity) as count
> from order_items oi
> inner join products p on oi.order_item_product_id = p.product_id
> inner join categories c on c.category_id = p.product_category_id
> group by c.category_name
> order by count desc
> limit 10;
Agora podemos executar a query diretamente sobre a view sumaryvw:
1* select * from summaryvw@impaladsn SQL> /
category_name count
-------------------------------------------------------------------------------- ----------
Cleats 24551
Men's Footwear 22246
Women's Apparel 21035
Indoor/Outdoor Games 19298
Fishing 17325
Water Sports 15540
Camping & Hiking 13729
Cardio Equipment 12487
Shop By Sport 10984
Electronics 3156
10 rows selected.
Yenugula Venkata Ravikumar é um DBA com mais de 18 anos de experiencia com Oracle e em ambientes de alta disponibilidade (RAC, Data Guard, dentre outros), tuning e desempenho, migrações, backup e recover, Oracle Exadata X2 e X3, é Expert em sistemas operacionais tais como como AIX, HP-UX e Linux. Já participou como conferencista de Oracle pela India, onde mora atualmente. Obteve o título de "Oracle Certified Master (OCM 10g)" em 2009.
Adrian Neagu é um DBA com mais de 15 anos de experiência em diversas tecnologias de banco de dados. Ele é Oracle Certified Master 10g e 11g, Oracle Certified Professional 9i, 10g, e 11g, Cloudera Certified Administrator for Apache Hadoop, IBM DB2 Certified Administrator version 8.1.2 e 9, IBM DB2 9 Advanced Certified Administrator 9, e Sun Certified System Administrator Solaris 10.
Alex Zaballa, formado em Análise de Sistemas, é especialista em Banco de Dados Oracle com sólidos conhecimentos em Servidores de Aplicação e Sistemas Operacionais; trabalha com Oracle há 17 anos, é Oracle ACE Director, certificado OCM Database 12c/MAA/11G/Cloud e conta com mais de 235 outras certificações em produtos da Oracle. Alex também é membro do Groupo de Usuários Oracle do Brasil (GUOB), fundador do Grupo de Usuários Oracle de Angola (GUOA) e membro do time OraWorld.
Este artigo foi revisto pela equipe de produtos Oracle e está em conformidade com as normas e práticas para o uso de produtos Oracle.