Redwood Software, parent company of Tidal, was again named a Gartner® Magic Quadrant™ for SOAP Leader. Get the report

Hadoop Hive

The Tidal by Redwood adapter enables you to automate Apache HiveQL commands between Tidal and a Hadoop cluster. Apache Hive is a data warehousing system built on top of Hadoop that provides an SQL-like interface for querying and analyzing large datasets stored in the Hadoop Distributed File System (HDFS™).

Automate Hive queries and big data analysis

Orchestrate the complex big data workflows that drive your business.

Maintain visibility

Monitor system status via live JDBC connection to the Hive server.

Apply the right tools

Access and manage data stored in the HDFS™ using HiveQL.

Deliver big results

Incorporate Hadoop Hive data into your existing processes.

Simplify cross-platform data management

This adapter lets you access and manage data stored in the HDFS™ using Hive’s query language, HiveQL. Integration with Tidal enables you to define, launch, control and monitor HiveQL commands submitted to Hive via JDBC on a scheduled basis.

The adapter automates HiveQL commands as part of the cross-platform process organization between Tidal and its Hadoop cluster. It uses the same user interface approach as other Tidal adapter jobs, integrating them seamlessly into your operational processes.

The adapter allows you to access and manage data stored in the HDFS™ using HiveQL. HiveQL syntax is similar to SQL standard syntax.

What the adapter enables

  • Connection management to monitor system status with a live connection to the Hive server via JDBC
  • Hive job and event management:
    • Dependencies and events defined with Tidal for scheduling control
    • Dynamic runtime overrides for parameters and values passed to the HiveQL command
    • Output-formatting options to control the results, including table, XML and CSV
    • Runtime MapReduce parameter overrides if HiveQL commands result in a MapReduce job
    • Scheduling and monitoring of HiveQL commands from a centralized work console with Tidal

Ways to get started

  • Define Hive Jobs and include a variety of tasks and options in each job
  • Define events for alerting and invoking an automated response through email and/or inserting additional jobs into the schedule
  • Monitor Hive tasks that run as pre-scheduled or event-based jobs as you would any other type of job in Tidal using the Job Details dialog, and use Business Views to monitor job activity

Tidal and Hadoop Hive integration FAQs

  • What is Hadoop and Hive?

    Hadoop is a distributed processing framework for big data, while Hive is a data warehouse system built on top of Hadoop that provides an SQL-like interface for querying and analyzing data stored in the Hadoop Distributed File System (HDFS™).

  • What is the difference between SQL and Hive?

    Designed for data warehousing, Apache Hive allows users to analyze and manipulate massive datasets distributed across various storage systems by leveraging the familiar SQL language. SQL is a query language fundamental for database operations.

  • What is Hive integration?

    Hive integrations allow Apache Hive to function closely with other platforms. Tidal is a workload automation platform that can automate and orchestrate various tasks, including those related to big data processing. It integrates with Apache Hadoop, a distributed data processing framework, through specific adapters. The Tidal adapter for Hive allows Tidal to interact with Hive, a data warehousing system built on top of Hadoop.

    The Hive adapter enables you to submit and manage Hive queries (written in HiveQL) as part of Tidal-managed processes. This integration streamlines workload definitions and job management for Hadoop clusters, saving time and reducing errors.

    Tidal also provides adapters for other Hadoop components, such as Sqoop and MapReduce.

  • What is the use of Hive in Hadoop?

    Apache Hive, built on top of Hadoop, is a data warehouse system that allows users to query and analyze large datasets using SQL-like commands (HiveQL). This makes it easier to handle big data tasks without knowledge of Java or MapReduce. Hive leverages the Hadoop Distributed File System (HDFS™) and processing (MapReduce, Tez or Spark) capabilities. As it is a "schema on read" database, Hive doesn't require a schema to be defined before data is loaded, allowing users to start working with data immediately. 

    Use cases:

    • Batch SQL queries: Hive is suitable for executing batch SQL queries on large datasets.
    • Data lakes: Hive is a critical component in many data lake architectures, providing a central repository for metadata. 
    • ETL/ELT jobs: It's used for batch processing large extract, transform, load (ETL) or extract, load, transform (ELT) jobs.