As a visual representation of data, data visualization is a widely adopted method in data analytics to gain useful business insights (e.g., trends, patterns, outliers, correlations, etc.) from large-scale datasets. Recently, I presented a software development method for using Spark, Plotly, and Dash to develop interactive and insightful data visualization dashboards for Web applications in Python .
Similarly to , this paper uses the same open source dataset as used in  to show how to use Spark and Tableau Desktop  to create insightful dashboards from large-scale datasets in Cloud data lake without programming.
Figure 1 shows the high-level overview of the work flow. It consists of the following major steps:
- connecting the Tableau Desktop for dashboard authoring to Spark
- querying dataset from Cloud Data Lake
- creating data visualization graphs from loaded dataset
- creating dashboards from created individual graphs
- publishing dashboards to Tableau server for sharing
As described in , the following steps can be followed to use Spark SQL as a distributed query engine using its JDBC/ODBC  and connect Tableau Desktop to the distributed Spark SQL Engine :
- install Hadoop
- setup Hive
- setup MySQL
- setup Spark
- setup Tableau Desktop
Once Tableau Desktop has been connected to the distributed Spark SQL engine successfully, we should be able to browse to the default schema and see the Hive Hadoop cluster tables .
From the perspective of creating dashboards with Tableau Desktop, there is no difference between a table loaded from a Hive Hadoop cluster and a table loaded from a local Microsoft Excel file. For convenience, the free version of Tableau Desktop Public with a local Excel file that is converted from the dataset csv file in  will be used for demonstration purpose in this paper.
We need to create individual visualization graphs first before visualization dashboards can be created.
We can use Tableau Desktop to create many different types of graphs. As described in , some of the graphs are suitable for visualizing continuous numeric features, while others are suitable for visualizing discrete categorical features.
Similarly to , this paper uses Tableau Desktop to create the following common diagrams for demonstration purpose.
- Graphs for numeric features: scatter plot, histogram chart, and line chart
- Graphs for categorical features: bar chart, line chart, and pie chart
4.1 Graphs for Numeric Features
Tableau Desktop uses the symbol # to indicate numeric features. This subsection shows how to use Tableau Desktop to create the following three common graphs for numeric features:
- scatter plot
- histogram chart
- line chart
4.1.1 Scatter Plot
For a pair of numeric features, scatter plot uses each pair of feature values as coordinates to draw a point on a 2D plane. As example, as in , Figure 2 shows a scatter plot of two numeric features Patient ID and Admission Deposit for people from 21 to 30 years old. The feature Type of Admission is used for color coding.
The following steps can be followed to create the scatter plot:
- drag the feature Patientid and drop it into the Columns shelf
- drag the feature Admission Deposit and drop it into the Rows shelf
- drag the feature Type of Admission and drop it on to the Color property of the Marks Card
- click the dropdown on Marks Card and select Circle
- right-click on the feature Age, choose show filter, and select 21–30 only