data bricks

profiletejaswini333
Databricks-upload-data.zip

Databricks-upload-data/.DS_Store

__MACOSX/Databricks-upload-data/._.DS_Store

Databricks-upload-data/1-Databricks-Upload-CSV.pdf

Databricks: Uploading a CSV File Ashraf Shirani

• Log into your Databricks account. Create a new cluster (Clusters > Create Cluster) • Once the cluster is up and running, go the Data link to upload a csv file. • Drag the csv file to the window. • Click on the “Create Table in Notebook” button

• Databricks creates and opens a new notebook. You can change its name and start working with this notebook. The first cell in the notebook displays path to the data file. Here you can modify some of the default upload settings. For example, I set the first row is a header to "true" since that’s the case in the emails.csv file that I uploaded.

• Run the code in the cell. Attach cluster to the notebook, if not yet attached.

• The output of the display(df) command shows head of the data frame whose name is df

• Add a new code cell below the first cell and type in the code (df.dtypes) to see data types of the Spark data frame columns since it was uploaded with default settings. You’ll see that by default, all columns were imported as String data type.

• Now that you have a Spark data frame containing the emails.csv data, you can work with it. You can easily re-cast (i.e., change) column data types in your Spark SQL statements. Please see the second file (Shirani-Cast-Data- Types.html) in which I have included a few examples of using the CAST() function

__MACOSX/Databricks-upload-data/._1-Databricks-Upload-CSV.pdf

Databricks-upload-data/2-Shirani-Cast-Data-Types.html

__MACOSX/Databricks-upload-data/._2-Shirani-Cast-Data-Types.html

__MACOSX/._Databricks-upload-data