Friday, May 10, 2013

INSERT data into Dynamically Partitioned HIVE Table in Windows HDInsight


Yesterday, I was trying to load data into a Partitioned HIVE Table and was unable to load it using "Load data inpath" command. I don't know why.
But after researching on GOOGLE, I came to know another approach for loading the data into a partitioned HIVE table - which is "Using INSERT statement". The INSERT statement loads the data from a non-partitioned table and insert it into partitioned table. I am not sure whether this is the ONLY option or not but thought of sharing my experience with people across the globe. :)
The first thing to note here is that, you need to work on the HIVE command prompt from the HDInsight cluster as the browser doesn't support the following commands. Also, it is assumed that, you have the desired data already available in a non-partitioned table, in our case HIVESAMPLETABLE.

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

These commands are used to enable the dynamic partitioning and change its mode to NONSTRICT.
Once you are done with these steps, you can create a partitioned table using following command:

CREATE TABLE hivesampletable_partitioned 
(
clientid STRING,
querytime STRING,
market STRING,
devicemake STRING,
devicemodel STRING,
state STRING,
querydwelltime STRING,
sessionid BIGINT,
sessionpagevieworder BIGINT
)
PARTITIONED BY (country STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE

Now you have the partitioned HIVE table, you can start inserting the values into this table using INSERT query and using the data already existing in the non-partitioned table. Please ensure that the number of columns are same in both the tables. To insert the data, use the following command.

FROM hivesampletable he INSERT OVERWRITE TABLE hivesampletable_partitioned PARTITION (country) SELECT he.clientid, he.querytime, he.market, he.devicemake, he.devicemodel, he.state, he.querydwelltime, he.sessionid, he.sessionpagevieworder, he.country;

To view the newly created partitions, use the following command:

SHOW PARTITIONS te_census_info_partitioned;

1 comment: