You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. Select **HDInsight clusters**, and then select the cluster you created.
98
98
99
-
:::image type="content" source="./media/apache-spark-jupyter-spark-sql/azure-portal-open-hdinsight-cluster.png" alt-text="Open HDInsight cluster in the Azure portal" border="true":::
99
+
:::image type="content" source="./media/apache-spark-jupyter-spark-sql/azure-portal-open-hdinsight-cluster.png" alt-text="Open HDInsight cluster in the Azure portal." border="true":::
100
100
101
101
3. From the portal, in **Cluster dashboards** section, select **Jupyter Notebook**. If prompted, enter the cluster login credentials for the cluster.
102
102
103
-
:::image type="content" source="./media/apache-spark-jupyter-spark-sql/hdinsight-spark-open-jupyter-interactive-spark-sql-query.png " alt-text="Open Jupyter Notebook to run interactive Spark SQL query" border="true":::
103
+
:::image type="content" source="./media/apache-spark-jupyter-spark-sql/hdinsight-spark-open-jupyter-interactive-spark-sql-query.png " alt-text="Open Jupyter Notebook to run interactive Spark SQL query." border="true":::
104
104
105
105
4. Select **New** > **PySpark** to create a notebook.
106
106
107
-
:::image type="content" source="./media/apache-spark-jupyter-spark-sql/hdinsight-spark-create-jupyter-interactive-spark-sql-query.png " alt-text="Create a Jupyter Notebook file to run interactive Spark SQL query" border="true":::
107
+
:::image type="content" source="./media/apache-spark-jupyter-spark-sql/hdinsight-spark-create-jupyter-interactive-spark-sql-query.png " alt-text="Create a Jupyter Notebook file to run interactive Spark SQL query." border="true":::
108
108
109
109
A new notebook is created and opened with the name Untitled(Untitled.pynb).
110
110
@@ -114,7 +114,7 @@ SQL (Structured Query Language) is the most common and widely used language for
114
114
115
115
1. Verify the kernel is ready. The kernel is ready when you see a hollow circle next to the kernel name in the notebook. Solid circle denotes that the kernel is busy.
:::image type="content" source="./media/apache-spark-jupyter-spark-sql/jupyter-spark-kernel-status.png " alt-text="Screenshot showing that the kernel is ready." border="true":::
118
118
119
119
When you start the notebook for the first time, the kernel performs some tasks in the background. Wait for the kernel to be ready.
120
120
@@ -127,7 +127,7 @@ SQL (Structured Query Language) is the most common and widely used language for
127
127
128
128
When you use a Jupyter Notebook file with your HDInsight cluster, you get a preset `spark` session that you can use to run Hive queries using Spark SQL. `%%sql` tells Jupyter Notebook to use the preset `spark` session to run the Hive query. The query retrieves the top 10 rows from a Hive table (**hivesampletable**) that comes with all HDInsight clusters by default. The first time you submit the query, Jupyter will create a Spark application for the notebook. It takes about 30 seconds to complete. Once the Spark application is ready, the query is executed in about a second and produces the results. The output looks like:
129
129
130
-
:::image type="content" source="./media/apache-spark-jupyter-spark-sql/hdinsight-spark-get-started-hive-query.png " alt-text="Apache Hive query in HDInsight" border="true":::y in HDInsight" border="true":::
130
+
:::image type="content" source="./media/apache-spark-jupyter-spark-sql/hdinsight-spark-get-started-hive-query.png " alt-text="Screenshot that shows an Apache Hive query in HDInsight." border="true":::
131
131
132
132
Every time you run a query in Jupyter, your web browser window title shows a **(Busy)** status along with the notebook title. You also see a solid circle next to the **PySpark**textin the top-right corner.
133
133
@@ -140,7 +140,7 @@ SQL (Structured Query Language) is the most common and widely used language for
140
140
141
141
The screen should refresh to show the query output.
142
142
143
-
:::image type="content" source="./media/apache-spark-jupyter-spark-sql/hdinsight-spark-get-started-hive-query-output.png" alt-text="Hive query output in HDInsight" border="true":::Insight" border="true":::
143
+
:::image type="content" source="./media/apache-spark-jupyter-spark-sql/hdinsight-spark-get-started-hive-query-output.png " alt-text="Screenshot that shows Hive query output in HDInsight." border="true":::
144
144
145
145
1. From the **File** menu on the notebook, select**Close and Halt**. Shutting down the notebook releases the cluster resources, including Spark application.
0 commit comments