Index:
1. what is Csv
2. What is Spark Session
3. How to read csv file in Pyspark
4. How to Write file through Pyspark and stored in local
tools:
1. Pycharm 2021.1.3, python 3.6
2. spark 2.4.8
#code
=========================================================================
from pyspark.sql import *
from pyspark.sql import functions as F
#Initalization
spark = SparkSession.builder.master("local[2]").appName("testing").getOrCreate()
#Reading data and Create data frame
df = spark.read.option("header",'true').csv("E://YoutubebigData//csv_read//abc.csv")
df.show()
#Write data in Local System
df.write.option("header",True).csv("E://YoutubebigData//csv_read//output//abc")
=========================================================================
What is SparkSession??
SparkSession introduced in v2.0, It is an entry point to underlying PySpark functionality in order to programmatically create Pyspark RDD, DataFrame.
It's object spark is default available in pyspark-shell and it can be created programmatically using SparkSession.
SparkSession vs SparkContext –
Since earlier versions of Spark or Pyspark, SparkContext (JavaSparkContext for Java) is an entry point to Spark programming with RDD and to connect to Spark Cluster,
Since Spark 2.0 SparkSession has been introduced and became an entry point to start programming with DataFrame and Dataset.
----------------------------------------------------------------------------------------------------------------------------
Video Link: https://www.youtube.com/watch?v=zRUWkzGosqA