IoT-Based Data Size Minimization Using Cluster-Based-Similarity- Elimination

Main Article Content

Alaa A. Abdelhafez
Osama Ismael
Hatem Elkady

Abstract

This paper proposes a new redundancy reduction approach for continuous data flow from IoT devices based on minimizing the size of IoT data using a novel cluster-based-similarity-elimination algorithm. The continuously flowing data from IoT devices are characterized by the existence of redundant records. This redundancy not only leads to the overfitting of models but also requires a large processing power because of the large number of records. Feature selection is a technique used to partially reduce the data and thus redundancy, however, this is not sufficient. Removing redundant data is considered of utmost importance because as smart city scenarios are implemented, flow data generation requires more advanced analytics to deal with the evolution and regrowth of the IoT environment. Thus, this study aims to minimize processing time while maintaining the best accuracy by minimizing data similarity, therefore addressing the overfitting problem, and saving time. The proposed approach minimizes the data size, considering the number of tuples. The effectiveness of the proposed approach was validated using various classification algorithms and evaluation metrics. The results show a significant improvement compared with traditional approaches, resulting in a reduction in the real-time classification execution time to only 9% of the original time. This approach can be used to optimize data size and achieve accurate results with a fast execution time while also addressing overfitting issues.

Article Details

How to Cite
Alaa A. Abdelhafez, Osama Ismael, & Hatem Elkady. (2023). IoT-Based Data Size Minimization Using Cluster-Based-Similarity- Elimination. International Journal of Communication Networks and Information Security (IJCNIS), 15(2), 34–50. https://doi.org/10.17762/ijcnis.v15i2.6151
Section
Research Articles