Early Identification of Abused Domains in TLD through Passive DNS Applying Machine Learning Techniques

Main Article Content

Leandro Marcos da Silva


DNS is vital for the proper functioning of the Internet. However, users use this structure for domain registration and abuse. These domains are used as tools for these users to carry out the most varied attacks. Thus, early detection of abused domains prevents more people from falling into scams. In this work, an approach for identifying abused domains was developed using passive DNS collected from an authoritative DNS server TLD along with the data enriched through geolocation, thus enabling a global view of the domains. Therefore, the system monitors the domain’s first seven days of life after its first DNS query, in which two behavior checks are performed, the first with three days and the second with seven days. The generated models apply the machine learning algorithm LightGBM, and because of the unbalanced data, the combination of Cluster Centroids and K-Means SMOTE techniques were used. As a result, it obtained an average AUC of 0.9673 for the three-day model and an average AUC of 0.9674 for the seven-day model. Finally, the validation of three and seven days in a test environment reached a TPR of 0.8656 and 0.8682, respectively. It was noted that the system has a satisfactory performance for the early identification of abused domains and the importance of a TLD to identify these domains.

Article Details

How to Cite
da Silva, L. M. (2022). Early Identification of Abused Domains in TLD through Passive DNS Applying Machine Learning Techniques. International Journal of Communication Networks and Information Security (IJCNIS), 14(1). https://doi.org/10.17762/ijcnis.v14i1.5256
Research Articles
Author Biography

Leandro Marcos da Silva, Sao Paulo State University (UNESP)

Department of Computer Science and Statistics (DCCE)


D. N. Stat, “Domain name registration’s statistics,” 2022, URL: https://domainnamestat.com/statistics/overview, [Online; accessed on January 23, 2022].

S. Khalid, A. Mahboob, F. Azim, A. U. Rehman, “IDHOCNET-A novel protocol stack and architecture for ad hoc networks,” International Journal of Communication Networks and Information Security (IJCNIS), Vol. 7, No. 1, pp. 20, 2015.

K. R. Fall, W. R. Stevens, “TCP/IP illustrated, volume 1: the protocols,” Addison-Wesley, 2011.

J. F. Kurose, K. W. Ross, “Computer Networking: A Top-Down Approach,” Pearson, 2017.

L. Desmet, J. Spooren, T. Vissers, P. Janssen, W. Joosen, “Premadoma: an operational solution to prevent malicious domain name registrations in the .eu TLD,” Digital Threats: Research and Practice, Vol. 2, No. 1, pp. 1-24, 2021.

A. Kountouras, P. Kintis, C. Lever, Y. Chen, Y. Nadji, D. Dagon, M. Antonakakis, R. Joffe, “Enabling network security through active DNS datasets,” International Symposium on Research in Attacks, Intrusions, and Defenses, pp. 188-208, 2016.

M. R. Silveira, L. M. Da Silva, A. M. Cansian, H. K. Kobayashi, “XGBoost applied to identify malicious domains using passive DNS,” 2020 IEEE 19th International Symposium on Network Computing and Applications (NCA), pp. 1-4, 2020.

F. Weimer, “Passive DNS replication,” FIRST Conference on Computer Security Incident, pp. 1-14, 2005.

M. Antonakakis, R. Perdisci, W. Lee, N. Vasiloglou, D. Dagon, “Detecting malware domains at the upper DNS hierarchy,” Proceedings of the 20th USENIX Security Symposium, Vol. 11, pp. 1-16, 2011.

T. Kulikova, T. Shcherbakova, “Spam and Phishing in Q3 2021,” 2021, URL: https://securelist.com/spam-and-phishing-in-q3-2021/104741/, [Online; accessed on December 13, 2021].

Symantec, “Internet security threat report,” Vol. 21, 2019, URL: https://docs.broadcom.com/doc/istr-24-2019-en, [Online; accessed on August 17, 2021].

A. Fernández, S. García, M. Galar, R. C. Prati, B. Krawczyk, F. Herrera, “Learning from imbalanced data sets,” Springer, 2018.

S. J. Yen, Y. S. Lee, “Cluster-based under-sampling approaches for imbalanced data distributions,” Expert Systems with Applications, Vol. 36, No. 3, pp. 5718-5727, 2009.

N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, Vol. 16, pp. 321-357, 2002.

H. Han, W. Y. Wang, B. H. Mao, “Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning,” International Conference on Intelligent Computing, pp. 878-887, 2005.

H. M. Nguyen, E. W. Cooper, K. Kamei, “Borderline over-sampling for imbalanced data classification,” International Journal of Knowledge Engineering and Soft Data Paradigms, Vol. 3, No. 1, pp. 4-21, 2011.

G. Douzas, F. Bacao, F. Last, “Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE,” Information Sciences, Vol. 465, pp. 1-20, 2018.

M. Wullink, G. C. M. Moura, M. Müller, C. Hesselman, “ENTRADA: A high-performance network traffic data streaming warehouse,” NOMS 2016-2016 IEEE/IFIP Network Operations and Management Symposium, pp. 913-918, 2016.

M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, N. Feamster, “Building a dynamic reputation system for DNS,” Proceedings of the 19th USENIX Security Symposium, pp. 273-290, 2010.

L. Bilge, E. Kirda, C. Kruegel, M. Balduzzi, “EXPOSURE: finding malicious domains using passive DNS analysis,” Ndss, pp. 1-17, 2011.

P. Lison , V. Mavroeidis, “Neural reputation models learned from passive DNS data,” 2017 IEEE International Conference on Big Data (Big Data), pp. 3662-3671, 2017.

Z. Bao, W. Wang, Y. Lan, “Using passive DNS to detect malicious domain name,” Proceedings of the 3rd International Conference on Vision, Image and Signal Processing, pp. 1-8, 2019.

Q. Wang, L. Li, B. Jiang, Z. Lu, J. Liu, S. Jian, “Malicious domain detection based on k-means and SMOTE,” International Conference on Computational Science, pp. 468-481, 2020.

L. Watkins, S. Beck, J. Zook, A. Buczak, J. Chavis, W. H. Robinson, J. A. Morales, S. Mishra, “Using semi-supervised machine learning to address the big data problem in DNS networks,” 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC), pp. 1-6, 2017.

I. Khalil, T. Yu, B. Guan, “Discovering Malicious Domains through Passive DNS Data Graph Analysis,” Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, pp. 663-674, 2016.

D. Borkin, A. Némethová, G. Michaľčonok, K. Maiorov, “Impact of data normalization on classification model accuracy,” Research Papers Faculty of Materials Science and Technology Slovak University of Technology, Vol. 27, No. 45, pp. 79-84, 2019.

T. Chen, C. Guestrin, “XGBoost: A scalable tree boosting system,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, 2016.

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T. Y. Liu, “LightGBM: A highly efficient gradient boosting decision tree,” Advances in Neural Information Processing Systems 30 (NIPS 2017), Vol. 30, pp. 3146-3154, 2017.

J. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, “Classification and regression trees,” CRC Press, 1984.

R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” Appears in the International Joint Conference on Artificial Intelligence (IJCAI), Vol. 14, No. 2, pp. 1137-1145, 1995.

T. Fawcett, “ROC graphs: notes and practical considerations for researchers,” Machine Learning, Vol. 31, No. 1, pp. 1-38, 2004.

L. M. Da Silva, M. R. Silveira, A. M. Cansian, H. K. Kobayashi, “Multiclass classification of malicious domains using passive DNS with XGBoost:(work in progress),” 2020 IEEE 19th International Symposium on Network Computing and Applications (NCA), pp. 1-3, 2020.