Automated Document Classification for Effective Data Protection at Scale

Lei Ding, PH.D.

R&D principal at Accenture Labs in Washington, DC

Adjunct Professor at Johns Hopkins University

Abstract: Organizations use documents to communicate, perform business transactions, collaborate and innovate. These documents, which include e-mails, project reports, proposals, contracts, and design drafts, may carry confidential information and intellectual property. They have to be protected from unauthorized access, exfiltration or loss, but they need not be protected at the same level given that their contents are not equally sensitive. So, identifying and properly labeling sensitive documents is important. In this talk, I will introduce our tool (and an approach) that automatically determines the sensitivity level of documents using Natural Language Processing and Machine Learning techniques in order to apply appropriate data protection controls.

Bio: Dr. Lei Ding is currently a R&D principal at Accenture Labs in Washington, DC. She is also an Adjunct Professor at Johns Hopkins University. Her research focuses on data protection, security analytics, and trustworthy AI. She has also worked on developing, evaluating, and deploying novel approaches and machine learning models in support of endpoint and network security solutions. She received her Ph.D. degree in Electrical Engineering from the State University of New York at Buffalo.

Acknowledgement: This project is sponsored by NSF under CNS-1551221 and CCF-1950297. Special thanks to the College of Natural Sciences and Mathematics for its financial support. The University of Houston is an equal opportunity/affirmative action institution.