Sunday, January 30, 2022

How To Read & Write CSV File Data In Local System By Using Pyspark


Index:
1. what is Csv 
2. What is Spark Session
3. How to read csv file in Pyspark
4. How to Write file through Pyspark and stored in local 

tools:
1. Pycharm 2021.1.3, python 3.6
2. spark 2.4.8

#code
=========================================================================
from pyspark.sql import *
from pyspark.sql import functions as F
#Initalization 
spark = SparkSession.builder.master("local[2]").appName("testing").getOrCreate()

#Reading data and Create data frame
df = spark.read.option("header",'true').csv("E://YoutubebigData//csv_read//abc.csv")
df.show()

#Write data in Local System 
df.write.option("header",True).csv("E://YoutubebigData//csv_read//output//abc")

=========================================================================

What is SparkSession??
SparkSession introduced in v2.0, It is an entry point to underlying PySpark functionality in order to programmatically create Pyspark RDD, DataFrame.
It's object spark is default available in pyspark-shell and it can be created programmatically using SparkSession.

SparkSession vs SparkContext – 
Since earlier versions of Spark or Pyspark, SparkContext (JavaSparkContext for Java) is an entry point to Spark programming with RDD and to connect to Spark Cluster,
Since Spark 2.0 SparkSession has been introduced and became an entry point to start programming with DataFrame and Dataset.
----------------------------------------------------------------------------------------------------------------------------
Video Link: https://www.youtube.com/watch?v=zRUWkzGosqA





No comments:

Post a Comment

2 Basic Python Program

  2 Basic Python Program : 1. Read and display user Inputs in Python Program  2.  Sum And Average of float numbers  using Python Program    ...