1. Noisy statistical learning: Machine learning is one of the most important tools available for analyzing and predicting our world and its vast amount of data. Most data is stored on noisy, unreliable storage mediums. This is a departure from an idealized model of learning, where the input data is read from storage and transferred noiselessly. Nevertheless, reliability techniques exist, but they are expensive (in terms of overhead and latency). Our works seeks to answer the following question: if we have a limited protection budget for our data, how should we choose to protect features to limit divergence from the idealized, noiseless algorithm result? Preliminary results offer significant savings through a principled protection approach (compared to a naive equal protection model).

    1. Don't Fear the Bit Flips: Optimized Coding Strategies for Binary Classification, 2017.

    2. Robust Channel Coding Strategies for Machine Learning Data, Allerton 2016.

  2. Data reconstruction: We are frequently interested in finding some kernel data inside a number of different files that originated in the kernel, but have been modified independently. This is a data reconstruction problem with many applications. For example, we have access to many genetic sequences; we can use these sequences to identify the ancestral sequence (the genome of the nearest common ancestor). Our work's goal is to determine how many descendant sequences are needed in order to ensure correct reconstruction can take place. The context, in particular, is for a synchronization (insertion/deletion-correcting) code recovering from random symbol insertions. Precisely because insertion and deletion operations model changes in files, we are also interested in studying the fundamentals of these types of errors along with codes that are capable of correcting them.

    1. Exact Reconstruction from Insertions in Synchronization Codes, IEEE Transactions on Information Theory, 2017.

    2. Codes Correcting Erasures and Deletions for Rank Modulation, IEEE Transactions on Information Theory, 2016.

    3. Three Novel Combinatorial Theorems for the Insertion/Deletion Channel, ISIT 2015.

    4. Gilbert-Varshamov-like Lower Bounds for Deletion-Correcting Codes, ITW 2014.

  3. Efficient data synchronization: Efficiently reconciling or synchronizing copies of files that differ (possibly only by a little) is a crucial problem in cloud and backup storage. Our work proposes a novel file synchronization algorithm that works well in theory (nearly optimal under certain conditions) and practice (improved performance versus common synchronization tools like rsync). Our work is based on building blocks from coding theory.

    1. Synchronizing Files Under a Large Number of Insertions and Deletions, IEEE Transactions on Communications, 2016.

    2. A Practical Framework for Efficient File Synchronization, Allerton 2013.

  4. Reliable data storage in next-gen memories: New memories have revolutionized the world of data storage with their speed and power efficiency. However, modern memories suffer from specific physical limitations that lead to errors and corruption. Novel reliability and error-correcting techniques are critical to the future of these devices. My work involves developing uncommon data representations (balanced sequences or permutations instead of binary strings) and new coding techniques for non-volatile memories like flash. We have also proposed a theoretical framework to fairly represent and evaluate broad ranges of error-correction techniques for on-chip memories.

    1. On Nonuniform Noisy Decoding for LDPC Codes With Application to Radiation-Induced Errors," IEEE Transactions on Communications, 2017.

    2. A Unified Framework for Error Correction Techniques in On-chip Memories, DSN 2016, SELSE 2016 (selected as best of SELSE).

    3. Coding for Unreliable Flash Memory Cells, IEEE Communication Letters, 2014.

    4. Dynamic Threshold Schemes for Multi-level Non-volatile Memories, IEEE Transactions on Communications, 2013.