Academics

I earned my Bachelor's degree in Mechanical Engineering in 2014 and my Master's degree in Computer Engineering in 2016 from the Univeristy of Florida. A more detailed accounting of my work can be found in my CV.

I passed my Thesis Proposal on February 9th, 2022. A recording on my presentation is available at the following link for any interested.

Patents

Awared

In Application

  • Practical end-to-end cryptographic authentication for telephony over voice channels (US Patent Application Number 16,304,412)
  • Detecting Deep-Fake Audio Through Vocal Tract Reconstruction (US Patent Application Number 17,443,654)

Publications

Attacks as Defenses: Designing Robust Audio CAPTCHAs Using Attacks on Automatic Speech Recognition Systems


Network and Distributed System Security Symposium (NDSS) - 2023


Abstract—Audio CAPTCHAs are supposed to provide a strong defense for online resources; however, advances in speech-to-text mechanisms have rendered these defenses ineffective. Audio CAPTCHAs cannot simply be abandoned, as they are specifically named by the W3C as important enablers of accessibility. Accordingly, demonstrably more robust audio CAPTCHAs are important to the future of a secure and accessible Web. We look to recent literature on attacks on speech-to-text systems for inspiration for the construction of robust, principle-driven audio defenses. We begin by comparing 20 recent attack papers, classifying and measuring their suitability to serve as the basis of new “robust to transcription” but “easy for humans to understand” CAPTCHAs. After showing that none of these attacks alone are sufficient, we propose a new mechanism that is both comparatively intelligible (evaluated through a user study) and hard to automatically transcribe (i.e., P (transcription) = 4 × 10−5). We also demonstrate that our audio samples have a high probability of being detected as CAPTCHAs when given to speech-to-text systems (P (evasion) = 1.77 × 10−4). Finally, we show that our method can break WaveGuard, a mechanism designed to defend adversarial audio, with a 99% success rate. In so doing, we not only demonstrate a CAPTCHA that is approximately four orders of magnitude more difficult to crack, but that such systems can be designed based on the insights gained from attack papers using the differences between the ways that humans and computers process audio.

Who Are You (I Really Wanna Know)? Detecting Audio DeepFakes Through Vocal Tract Reconstruction


USENIX Security Symposium (USENIX Security) - 2022


First   Author

Generative machine learning models have made convincing voice synthesis a reality. While such tools can be extremely useful in applications where people consent to their voices being cloned (e.g., patients losing the ability to speak, actors not wanting to have to redo dialog, etc), they also allow for the creation of nonconsensual content known as deepfakes. This malicious audio is problematic not only because it can convincingly be used to impersonate arbitrary users, but because detecting deepfakes is challenging and generally requires knowledge of the specific deepfake generator. In this paper, we develop a new mechanism for detecting audio deepfakes using techniques from the field of articulatory phonetics. Specifically, we apply fluid dynamics to estimate the arrangement of the human vocal tract during speech generation and show that deepfakes often model impossible or highly-unlikely anatomical arrangements. When parameterized to achieve 99.9% precision, our detection mechanism achieves a recall of 99.5%, correctly identifying all but one deepfake sample in our dataset. We then discuss the limitations of this approach, and how deepfake models fail to reproduce all aspects of speech equally. In so doing, we demonstrate that subtle, but biologically constrained aspects of how humans generate speech are not captured by current models, and can therefore act as a powerful tool to detect audio deepfakes.

Lux: Enable Ephemeral Authorization for Display-Limited IoT Devices


ACM/IEEE Conference on Internet of Things Design and Implementation (IoTDI) - 2021


First   Author

Smart speakers are increasingly appearing in homes, enterprises, and businesses including hotels. These systems serve as hubs for other IoT devices and deliver content from streaming media services. However, such an arrangement creates a number of security concerns. For instance, providing such devices with long-term secrets is problematic with regards to vulnerable devices and fails to capture the increasingly transient nature of the relationship between users and the devices (e.g., in hotel or airbnb settings, this device is not owned by the customer and may only be used for a single day). Moreover, the limited interfaces available to such speakers make entering such credentials in a safe manner difficult. We address these problems with Lux, a system to provide ephemeral, fine-grained authorization to smart speakers which can be automatically revoked when the user and hub are no longer in the same location. We develop protocols using the LED/light channel available to many smart speaker devices to help users properly identify the device with which they are communicating, and demonstrate through a formally validated protocol that such authorization takes only a few seconds in practice. Through this effort, we demonstrate that Lux can safely authorize devices to access user accounts while limiting any long-term exposure to compromise.

Hear "No Evil", See "Kenansville": Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems


IEEE Symposium on Security and Privacy - 2021


Automatic speech recognition and voice identification systems are being deployed in a wide array of applications, from providing control mechanisms to devices lacking traditional interfaces, to the automatic transcription of conversations and authentication of users. Many of these applications have significant security and privacy considerations. We develop attacks that force mistranscription and misidentification in state of the art systems, with minimal impact on human comprehension. Processing pipelines for modern systems are comprised of signal preprocessing and feature extraction steps, whose output is fed to a machine-learned model. Prior work has focused on the models, using white-box knowledge to tailor model-specific attacks. We focus on the pipeline stages before the models, which (unlike the models) are quite similar across systems. As such, our attacks are black-box and transferable, and demonstrably achieve mistranscription and misidentification rates as high as 100% by modifying only a few frames of audio. We perform a study via Amazon Mechanical Turk demonstrating that there is no statistically significant difference between human perception of regular and perturbed audio. Our findings suggest that models may learn aspects of speech that are generally not perceived by human subjects, but that are crucial for model accuracy. We also find that certain English language phonemes (in particular, vowels) are significantly more susceptible to our attack. We show that the attacks are effective when mounted over cellular networks, where signals are subject to degradation due to transcoding, jitter, and packet loss.

Digital Healthcare-Associated Infection: A Case Study on the Security of a Major Multi-Campus Hospital System


Network and Distributed System Security Symposium (NDSS) - 2019


Modern hospital systems are complex environments that rely on high interconnectivity with the larger Internet. With this connectivity comes a vast attack surface. Security researchers have expended considerable effort to characterize the risks posed to medical devices (e.g., pacemakers and insulin pumps). However, there has been no systematic, ecosystem-wide analyses of a modern hospital system to date, perhaps due to the challenges of collecting and analyzing sensitive healthcare data. Hospital traffic requires special considerations because healthcare data may contain private information or may come from safety-critical devices in charge of patient care. We describe the process of obtaining the network data in a safe and ethical manner in order to help expand future research in this field. We present an analysis of network-enabled devices connected to the hospital used for its daily operations without posing any harm to the hospital’s environment. We perform a Digital Healthcare-Associated Infection (D-HAI) analysis of the hospital ecosystem, assessing a major multi-campus healthcare system over a period of six months. As part of the D-HAI analysis, we characterize DNS requests and TLS/SSL communications to better understand the threats faced within the hospital environment without disturbing the operational network. Contrary to past assumptions, we find that medical devices have minimal exposure to the external Internet, but that medical support devices (e.g., servers, computer terminals) essential for daily hospital operations are much more exposed. While much of this communication appears to be benign, we discover evidence of insecure and broken cryptography and misconfigured devices, and potential botnet activity. Analyzing the network ecosystem in which they operate gives us an insight into the weaknesses and misconfigurations hospitals need to address to ensure the safety and privacy of patients.

Characterizing the Security of the SMS Ecosystem with Public Gateways


ACM Transactions on Privacy and Security (TOPS) Volume 22 - 2018


Recent years have seen the Short Message Service (SMS) become a critical component of the security infrastructure, assisting with tasks including identity verification and second-factor authentication. At the same time, this messaging infrastructure has become dramatically more open and connected to public networks than ever before. However, the implications of this openness, the security practices of benign services, and the malicious misuse of this ecosystem are not well understood. In this article, we provide a comprehensive longitudinal study to answer these questions, analyzing over 900,000 text messages sent to public online SMS gateways over the course of 28 months. From this data, we uncover the geographical distribution of spam messages, study SMS as a transmission medium of malicious content, and find that changes in benign and malicious behaviors in the SMS ecosystem have been minimal during our collection period. The key takeaways of this research show many services sending sensitive security-based messages through an unencrypted medium, implementing low entropy solutions for one-use codes, and behaviors indicating that public gateways are primarily used for evading account creation policies that require verified phone numbers. This latter finding has significant implications for combating phone-verified account fraud and demonstrates that such evasion will continue to be difficult to detect and prevent.

Hello, Is It Me You're Looking For? Differentiating Between Human and Electronic Speakers for Voice Interface Security


ACM Conference on Security and Privacy in Wireless and Mobile Networks (Wisec) - 2018


First   Author

Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately, voice interfaces also create new opportunities for exploitation. Specifically, any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces.

2MA: Verifying Voice Commands via Two Microphone Authentication


ACM ASIA Conference on Computer and Communications Security (ASIACCS) - 2018


First   Author

Voice controlled interfaces have vastly improved the usability of many devices (e.g., headless IoT systems). Unfortunately, the lack of authentication for these interfaces has also introduced command injection vulnerabilities - whether via compromised IoT devices, television ads or simply malicious nearby neighbors, causing such devices to perform unauthenticated sensitive commands is relatively easy. We address these weaknesses with Two Microphone Authentication (2MA), which takes advantage of the presence of multiple ambient and personal devices operating in the same area. We develop an embodiment of 2MA that combines approximate localization through Direction of Arrival (DOA) techniques with Robust Audio Hashes (RSHs). Our results show that our 2MA system can localize a source to within a narrow physical cone (<30°) with zero false positives, eliminate replay attacks and prevent the injection of inaudible/hidden commands. As such, we dramatically increase the difficulty for an adversary to carry out such attacks and demonstrate that 2MA is an effective means of authenticating and localizing voice commands.

AuthentiCall: Efficient Identity and Content Authentication for Phone Calls


USENIX Security Symposium (USENIX Security) - 2017


Phones are used to confirm some of our most sensitive transactions. From coordination between energy providers in the power grid to corroboration of high-value transfers with a financial institution, we rely on telephony to serve as a trustworthy communications path. However, such trust is not well placed given the widespread understanding of telephony’s inability to provide end-to-end authentication between callers. In this paper, we address this problem through the AuthentiCall system. AuthentiCall not only cryptographically authenticates both parties on the call, but also provides strong guarantees of the integrity of conversations made over traditional phone networks. We achieve these ends through the use of formally verified protocols that bind low-bitrate data channels to heterogeneous audio channels. Unlike previous efforts, we demonstrate that AuthentiCall can be used to provide strong authentication before calls are answered, allowing users to ignore calls claiming a particular Caller ID that are unable or unwilling to provide proof of that assertion. Moreover, we detect 99% of tampered call audio with negligible false positives and only a worst-case 1.4 second call establishment overhead. In so doing, we argue that strong and efficient end-to-end authentication for phone networks is approaching a practical reality.

Authloop: Practical End-to-End Cryptographic Authentication for Telephony over Voice Channels


USENIX Security Symposium (USENIX Security) - 2016


Telephones remain a trusted platform for conducting some of our most sensitive exchanges. From banking to taxes, wide swathes of industry and government rely on telephony as a secure fall-back when attempting to confirm the veracity of a transaction. In spite of this, authentication is poorly managed between these systems, and in the general case it is impossible to be certain of the identity (i.e., Caller ID) of the entity at the other end of a call. We address this problem with AuthLoop, the first system to provide cryptographic authentication solely within the voice channel. We design, implement and characterize the performance of an in-band modem for executing a TLS-inspired authentication protocol, and demonstrate its abilities to ensure that the explicit single-sided authentication procedures pervading the web are also possible on all phones. We show experimentally that this protocol can be executed with minimal computational overhead and only a few seconds of user time (≈9 instead of ≈97 seconds for a naıve implementation of TLS 1.2) over heterogeneous networks. In so doing, we demonstrate that strong end-to-end validation of Caller ID is indeed practical for all telephony networks.

Detecting SMS Spam in the Age of Legitimate Bulk Messaging


ACM Conference on Security and Privacy in Wireless and Mobile Networks (Wisec) - 2016


Text messaging is used by more people around the world than any other communications technology. As such, it presents a desirable medium for spammers. While this problem has been studied by many researchers over the years, the recent increase in legitimate bulk traffic (e.g., account verification, 2FA, etc.) has dramatically changed the mix of traffic seen in this space, reducing the effectiveness of previous spam classification efforts. This paper demonstrates the performance degradation of those detectors when used on a large-scale corpus of text messages containing both bulk and spam messages. Against our labeled dataset of text messages collected over 14 months, the precision and recall of past classifiers fall to 23.8% and 61.3% respectively. However, using our classification techniques and labeled clusters, precision and recall rise to 100% and 96.8%. We not only show that our collected dataset helps to correct many of the overtraining errors seen in previous studies, but also present insights into a number of current SMS spam campaigns.

Sending Out an SMS: Characterizing the Security of the SMS Ecosystem with Public Gateways


IEEE Symposium on Security and Privacy (IEEE S&P) - 2016


Text messages sent via the Short Message Service (SMS) have revolutionized interpersonal communication. Recent years have also seen this service become a critical component of the security infrastructure, assisting with tasks including identity verification and second-factor authentication. At the same time, this messaging infrastructure has become dramatically more open and connected to public networks than ever before. However, the implications of this openness, the security practices of benign services, and the malicious misuse of this ecosystem are not well understood. In this paper, we provide the first longitudinal study to answer these questions, analyzing nearly 400,000 text messages sent to public online SMS gateways over the course of 14 months. From this data, we are able to identify not only a range of services sending extremely sensitive plaintext data and implementing low entropy solutions for one-use codes, but also offer insights into the prevalence of SMS spam and behaviors indicating that public gateways are primarily used for evading account creation policies that require verified phone numbers. This latter finding has significant implications for research combatting phone-verified account fraud and demonstrates that such evasion will continue to be difficult to detect and prevent.

Presentations, conferences, and labmates