Efficient Dual-Level Encryption for Securing Data in HDFS Using Hybrid User-Defined Function (HUDF)

Authors

  • Shivani Awasthi, Narendra Kohli

Keywords:

AES, Avro, Deflate, ORC, Parquet, Snappy, Gzip, HDFS, cloud environments

Abstract

Big Data is a new class of technology that gives businesses more insight into their massive data sets, allowing them to make better business decisions and satisfy customers. Big data systems are also a desirable target for hackers due to the aggregation of their data.The Hadoop Distributed File System (HDFS) stores massive data in the Hadoop framework. Since HDFS does not safeguard data privacy, encrypting the file is the right way to protect the stored data in HDFS but takes a long time. In this paper, regarding privacy concerns, we use different compression-type data storage file formats with the proposed hybrid user-defined function (HUDF)based on an XOR-Onetime pad with AES to securedata in HDFS. In this way, we provide a dual level of encryptionby maskingselective data and whole data in the file. Our experiment demonstrates that this scheme offersoverall data security and also faster processing time than the conventional methods. The proposed HUDF with ORC, Zlib (Z) file format (HUDF-ORC-Z) gives 9-10% better performanceresults than 2DES and other method. Finally, we efficiently utilized the space, improved query processing time,and decreased data load timewith the Hive engine.

Downloads

Published

2024-09-27

How to Cite

Shivani Awasthi, Narendra Kohli. (2024). Efficient Dual-Level Encryption for Securing Data in HDFS Using Hybrid User-Defined Function (HUDF). International Journal of Communication Networks and Information Security (IJCNIS), 16(4), 906–920. Retrieved from https://ijcnis.org/index.php/ijcnis/article/view/7242

Issue

Section

Research Articles