98% Zero-Day Virus Detection (by natural language training)

Abstract

Detection of rapidly evolving malware requires classification techniques that can effectively and efficiently detect zero-day attacks. Most current antivirus software has very limited and simple heuristics and therefore fails to protect the customers against the latest malware. In this presentation we propose a model for detection of malicious code based on automated classification of features selected from the binary execution header. We first develop simple statistical models of static file attributes derived from the empirical data of thousands of benign and legit executables. We use the models of divergent attributes in maximum entropy and bayesian probability frameworks to classify unseen executables. Our results, using over 1000 malicious file samples, indicate that the proposed detector provides reasonably high detection accuracy (98%), while having significantly lower complexity than existing commercial detectors.

Speaker

shirtie