Thursday, February 10, 2022

Difference between Partition and bucketing in Hive

 



Difference: 

1. Partition is folder

Bucket is file 

2. Go with partition when there are less number of distinct values in the column.

Go with bucketing when there are more number of distinct values in the column.

3. partition are logical division

Bucket are based on hash (here we should go with a fix no. of buckets)

4. Partition syntax: 

Create table table_name(col1 datatype, col2 datatype,col3 datatype)

partition by (col4 datatype,col5 datatype);

bucketing syntax: 

Create table table_name(col1 datatype, col2 datatype,col3 datatype)

partition by (col4 datatype,col5 datatype)

clustered by (col2) into 50 Buckets;


Vidoe: https://www.youtube.com/watch?v=WKpELsHZ0Zc&list=PLVt87wOZJLOdtvKLe6X846CbuFNKOX95E&index=2






No comments:

Post a Comment

2 Basic Python Program

  2 Basic Python Program : 1. Read and display user Inputs in Python Program  2.  Sum And Average of float numbers  using Python Program    ...