HPL/SQL is included to Apache Hive since version 2.0
HPL/SQL allows you to access multiple databases simultaneously from a single HPL/SQL script.
Hadoop extends, not replaces a traditional data warehouse, so you have to work with different systems for different type of workloads.
Most SQL-on-Hadoop solutions are good for querying large volumes of data rather than transaction processing. Note that even Hive stores its own metadata is a separate database (MySQL, PostgreSQL, Oracle etc.).
When you use Hive or other SQL-on-Hadoop tool to query your data, you can use a RDBMS or NoSQL to:
Before you can work with multiple databases, you have to configure connections.
HPL/SQL connects to a database only when it executes the first SQL statement for this database, the tool does not connect to any database in advance.
Once connected, HPL/SQL holds and re-uses the connection until the script completes.
By default, HPL/SQL uses the connection profile defined by the hplsql.conn.default option.
Once you configured multiple connection profiles you can use MAP OBJECT statement to map a table or view to the specified connection. Then when accessing in SQL statements HPL/SQL will use appropriate connection for the object:
For example, you can run SELECT in Hive, but log messages to MySQL:
MAP OBJECT log TO log.log_data AT mysqlconn; DECLARE cnt INT; SELECT count(*) INTO cnt FROM sales.users WHERE local_dt = CURRENT_DATE; INSERT INTO log (message) VALUES ('Number of users: ' || cnt);