98% Zero-Day Virus Detection (by natural language training)
Abstract
Detection of rapidly evolving malware requires classification techniques that can effectively and efficiently
detect zero-day attacks. Most current antivirus software has very limited and simple heuristics and therefore
fails to protect the customers against the latest malware. In this presentation we propose a model for
detection of malicious code based on automated classification of features selected from the binary execution
header. We first develop simple statistical models of static file attributes derived from the empirical data
of thousands of benign and legit executables. We use the models of divergent attributes in maximum entropy
and bayesian probability frameworks to classify unseen executables. Our results, using over 1000 malicious
file samples, indicate that the proposed detector provides reasonably high detection accuracy (98%), while
having significantly lower complexity than existing commercial detectors.