Redwood Software, parent company of Tidal, was again named a Gartner® Magic Quadrant™ for SOAP Leader. Get the report

Adapter Apache Spark SQL

Spark SQL

Spark SQL is a module within Apache Spark designed to work with structured data, like tables in a database. With the Tidal by Redwood adapter, you can use standard SQL syntax to query and analyze data within Spark SQL DataFrames and Datasets.

Broaden your data horizons

Easily automate complex, multi-step processes with Tidal and Spark SQL.

Eliminate silos

Automate transfers, deployments, provisioning and more.

Work more efficiently

Use comprehensive system management to mitigate risks.

Integrate securely

Reduce attack surface across your environment.

Expand data processing power with workload automation

The Tidal adapter for Databricks Spark SQL integrates Tidal with Databricks SQL and Apache Spark SQL so you can create, schedule and run Databricks Spark jobs through Tidal.

Databricks Spark SQL is a module for structured data processing and acts as a distributed SQL query engine. Databricks provides a programming abstraction called DataFrames, which organizes data into a table of rows and columns.

What the adapter enables

The Tidal adapter for SQL Spark gives you access to these features and capabilities:

  • DataFrames and Datasets:
    • Spark SQL’s DataFrames and Datasets are programming abstractions that make it easier to work with structured data.
    • These structures are similar to tables in relational databases, allowing for organized data manipulation.
  • Data source versatility:
    • The adapter can connect to and query various data sources, including:
      • Hive tables
      • JDBC databases
      • JSON files
      • Parquet files
  • Hive compatibility:
    • As it’s compatible with Apache Hive, the Tidal adapter can run existing Hive queries.
  • Integration with Spark:
    • A major strength is Tidal’s seamless integration with the broader Spark ecosystem. This allows you to combine SQL queries with other Spark capabilities, such as machine learning and stream processing.
  • Performance:
    • Spark SQL is designed for performance, with features like a cost-based optimizer, columnar storage and code generation to accelerate query execution.
  • SQL query engine:
    • The adapter functions as a distributed SQL query engine, meaning you can use standard SQL syntax to query and analyze data.

How it works

The Tidal adapter connects Tidal with Databricks SQL and Apache Spark SQL, enabling you to orchestrate and execute Databricks Spark jobs directly within Tidal. This integration leverages Databricks Spark SQL’s capabilities as a distributed SQL query engine for structured data processing.

At the core of this process are DataFrames, Databricks’ programming abstraction that organizes data into tabular structures. These DataFrames, with their defined schemas specifying column names and data types (including standard types like StringType and IntegerType, as well as Spark-specific StructType), provide a flexible and intuitive way to manage and store data. Notably, the system handles missing or incomplete data by representing them as null values within these DataFrames to ensure data integrity during processing.

Tidal and Spark SQL integration FAQs

  • What is Spark SQL?

    Spark SQL is Apache Spark's module for structured data processing. It allows users to query data using SQL or DataFrame APIs, thus integrating SQL-like capabilities into the Spark ecosystem.

  • What is the difference between Databricks SQL and Spark SQL?

    Spark SQL is the open-source component within Apache Spark, while Databricks SQL is a serverless data warehouse on Databricks. Databricks SQL optimizes and enhances Spark SQL for interactive and performant SQL workloads in a cloud environment with enhancements for performance, cost optimization and user interface. Essentially, Databricks SQL is a refined, cloud-optimized version of Spark SQL.