Redwood Software, parent company of Tidal, was again named a Gartner® Magic Quadrant™ for SOAP Leader. Get the report

Apache® Hadoop

Automate your big data workflows with the Tidal by Redwood adapter for Apache Hadoop, an open-source, Java-based software framework designed for distributed storage and processing of large datasets across clusters of computers.

Handle big data more efficiently with Tidal and Hadoop

Learn how much more you can achieve with this powerful pairing.

Connect key features

Get the tools you need to manage data and transfer Hadoop data.

Boost efficiency

Use comprehensive system management to drive efficiency.

Automate securely

Reduce the potential attack surface in your environment.

One adapter, multiple functions

This Tidal adapter supports various Apache Hadoop software utilities. Get the benefits of centralized management and control plus advanced scheduling functionality by integrating your Hadoop activities into your Tidal automations.

Sqoop

Sqoop transfers data between Hadoop and relational databases. This adapter enables you to automate the various tasks performed by Sqoop — importing, transforming and exporting data. The integration enables the following job definitions in Tidal:

  • Code Generation: Generate Java classes that interpret imported records.
  • Export: Export files from the Hadoop Distributed File System (HDFS) to a relational database management system (RDBMS).
  • Import: Import structured data from an RDBMS to the HDFS.
  • Merge: Combine two datasets, where entries in one overwrite those in an older one.

MapReduce

MapReduce is the programming model in the Hadoop framework used to access and process large amounts of data stored in the HDFS. The adapter serves as the job client to automate the execution of MapReduce jobs as part of Tidal-managed processes. It uses the Hadoop API to submit and monitor MapReduce jobs using Tidal’s full scheduling capabilities. Jobs in Tidal divide the input data into independent chunks processed by the map tasks in parallel.

Hive

This adapter enables you to access and manage data stored in the HDFS using Hive’s query language (HiveQL). Integration with Tidal allows you to define, launch, control and monitor HiveQL scheduled commands submitted to Hive via Java Database Connectivity (JDBC).

HDFS Data Mover Linux Agent

This data mover agent lets you easily manage file transfers in and out of the Hadoop file system.

What the adapter enables

Hive job and event management includes:

  • Connection management to monitor system status with a live connection to the Hive Server via JDBC
  • Defined dependencies and events using Tidal for scheduling control
  • Dynamic runtime overrides for parameters and values passed to the HiveQL command
  • Output formatting options to control the results, including table, XML and CSV
  • Orchestrating Hadoop workflows — on-prem or in the cloud
  • Scheduling and monitoring HiveQL commands from a centralized work console with Tidal
  • Runtime MapReduce parameter overrides when a HiveQL command results in a MapReduce job

How it works

  • The Tidal Hadoop adapter acts as a bridge, enabling Tidal to submit and monitor Hadoop MapReduce jobs using the Hadoop API
  • Tidal serves as an advanced orchestrator of Hadoop jobs, managing their dependencies and resources and providing a centralized platform to define, schedule and manage Hadoop jobs and features like:
    • Automated job rerun functions
    • Drop-down parameter selections
    • Highly specific alerts
    • Multiple layers of dependency mapping
    • Nesting of parent and child jobs
    • Resource awareness
  • For MapReduce Jobs, Tidal can divide input data into independent chunks, which are then processed by map tasks in parallel, leveraging the distributed nature of Hadoop

Tidal and Apache Hadoop integration FAQs

  • What is Apache Hadoop used for?

    Apache Hadoop is an open-source software framework used for efficiently storing and processing large datasets, allowing businesses to store and analyze massive amounts of data across a network of computers. 

    Hadoop is part of a broader ecosystem of open-source tools and technologies, including Apache Hive, Apache HBase, Apache Spark, Apache Kafka and others. 

    Common use cases for Hadoop include:

    • Big data analytics: Hadoop is widely used to analyze large datasets, gain insights, identify trends and make data-driven decisions. 
    • Data lakes: Hadoop can serve as a foundation for building data lakes — centralized repositories for storing diverse data formats. 
    • Data storage and archiving: Hadoop provides a cost-effective solution for storing and archiving large amounts of data. 
    • Financial services: Hadoop is used in financial services for risk assessment, fraud detection and algorithmic trading. 
    • Machine learning and AI: Hadoop processes and analyzes data for machine learning and AI applications. 
    • Retail: Hadoop can help analyze customer data, personalize shopping experiences and optimize inventory management. 
    • Telecommunications: Telecommunications companies use Hadoop to analyze customer behavior, identify network bottlenecks and improve network performance.

  • What does a Hadoop integration do?

    A Hadoop integration aims to connect Apache Hadoop's data processing and storage capabilities with other systems or workflows, automating data movement, processing and management. This allows organizations to streamline big data operations, improve efficiency and gain better control over data pipelines.

    For example, the Tidal adapter for Hadoop automates and orchestrates your Hadoop workflows by integrating with key Hadoop utilities. Specifically, it enables you to:

    • Automate MapReduce jobs, dividing data for parallel processing and monitoring job execution
    • Schedule and control HiveQL queries for managing data within HDFS
    • Schedule and manage data transfers between Hadoop and relational databases using Sqoop
    • Streamline file transfers in and out of the Hadoop file system

    This integration centralizes control, enhances efficiency and improves the reliability of your Hadoop processes.