Research papers

On the Opportunities and Risks of Foundation Models

This report investigates an emerging paradigm for building artificial intelligence (AI) systems based on a general class of models which we term foundation models. A foundation model is any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks; current examples include BERT [Devlin et al.2019], GPT-3 [Brown et al. 2020], and CLIP [Radford et al. 2021].

Characterizing the Impacts of Semi-supervised Learning for Weak Supervision

Labeling training data is a critical and expensive step in producing high accuracy ML models, whether training from scratch or fine-tuning. To make labeling more efficient, two major approaches are programmatic weak supervision (WS) and semi-supervised learning (SSL). More recent works have either explicitly or implicitly used techniques at their intersection, but in various complex and ad hoc ways. In…

The Credential is Not Enough: Deception with Honeypots and Fake Credentials

Honeypots are a classic cyber-deceptive technique that allows a defender to add false information into the system in an effort to deter/delay/distract potential attackers. However, the effectiveness of honeypots is dependent on their design along with the environment into which they are deployed. In this work, we consider the scenario where there is a collection of honeypots along with a…

Foundation Models Can Robustify Themselves, For Free

Zero-shot inference is a powerful paradigm that enables the use of large pretrained models for downstream classification tasks without further training. However, these models are vulnerable to inherited biases that can impact their performance. The traditional solution is fine-tuning, but this undermines the key advantage of pretrained models, which is their ability to be used out-of-the-box. We propose ROBOSHOT, a…

Train ‘n Trade: Foundations of Parameter Markets

Organizations typically train large models individually. This is costly and time-consuming, particularly for large-scale foundation models. Such vertical production is known to be suboptimal. Inspired by this economic insight, we ask whether it is possible to leverage others’ expertise by trading the constituent parts in models, i.e., sets of weights, as if they were market commodities. While recent advances in…

The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models

Compressing large language models (LLMs), often consisting of billions of parameters, provides faster inference, smaller memory footprints, and enables local deployment. Two standard compression techniques are pruning and quantization, with the former eliminating redundant connections in model layers and the latter representing model parameters with fewer bits. The key tradeoff is between the degree of compression and the impact on…

Learning to Generate Instructions to Adapt Language Models to New Tasks

We present Bonito, the first open-source model for conditional task generation: the problem of converting unannotated corpus into a collection of tasks for instruction tuning. Our goal is to enable efficient task adaptation of instruction tuned language models on users’ specialized, private data without relying on proprietary API-access-only models like GPT-4. We create Bonito by remixing existing, general-purpose instruction tuning…

DMLR: Data-centric Machine Learning Research-Past, Present and Future

Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods…

Financial Statement Analysis with Large Language Models

Can large language models (LLMs) make informed financial decisions or are they simply a support tool? Their advanced capabilities to analyze, interpret, and generate text enable LLMs to excel across a wide range of tasks, including summarization of complex disclosures, sentiment analysis, information extraction, report generation, compliance verification, etc.

Promises and Pitfalls of Threshold-based Auto-labeling

Creating large-scale high-quality labeled datasets is a major bottleneck in supervised machine learning workflows. Threshold-based auto-labeling (TBAL), where validation data obtained from humans is used to find a confidence threshold above which the data is machine-labeled, reduces reliance on manual annotation. TBAL is emerging as a widely-used solution in practice. Given the long shelf-life and diverse usage of the resulting…