Skip to content

Update on the AISF Grantmaking and Upcoming Funding Opportunity

November 2024

Announcing the First Disbursements of the AI Safety Fund

The Artificial Intelligence Safety Fund (the AISF) is excited to announce that the first grant awards have been issued to AI safety researchers. The AISF invited grant proposals for research on novel methods to evaluate and address risks of frontier models. Twelve grantees across four countries– the United States, the United Kingdom, South Africa, and Switzerland – received funding. Grants range from USD 150,000 to USD 400,000, with a total disbursement of over $3 million USD. For information about our grantees, brief introductions to the researchers and their work are provided at the bottom of this announcement.

Solicitation, Evaluation, and Selection of First AISF Grants

The AISF awarded its first round of grants through a targeted solicitation, inviting selected applicants to submit proposals. The AISF partnered with diverse expert reviewers, including both technical experts from industry and independent third-party researchers, to ensure a comprehensive evaluation of submitted proposals. Each proposal was rigorously assessed for both the quality and feasibility of the proposed research, with careful attention given to any potential risks that could emerge from the results. Guided by expert advice, the AISF curated a well-rounded portfolio and selected twelve projects for funding, strategically targeting critical areas in AI safety such as biological safety evaluations. Grants were awarded in Summer 2024, with research findings set to be publicly available, so long as public release does not introduce information hazards or pose related safety concerns.

AISF is welcoming proposals for the next round of grants

In November 2024, the AISF will welcome a second round of research proposals from qualified researchers. The next round of funding will address priority research needs identified by AISF Funders, including  methodologies to address biosecurity and cybersecurity risks. Additional topics may be added in 2025. Requests for proposals will be posted to the AISF website in November and applications will be due in January. Successful applicants will be notified by April 2025. The AI Safety Fund supports research on state-of-the-art, general purpose AI models. Funding will only be awarded to projects researching deployed versions of those models. We welcome researchers to share information about their work on the safety of Frontier AI Models here. If you would like to be added to our mailing list, please fill out the Get in Touch! form on our website.

 About the AI Safety Fund

The AI Safety Fund (AISF) is a $10 million+ initiative, born from a collaborative vision of leading AI developers and philanthropic partners. Initial Funding for the AI Safety Fund came from leading technology companies Anthropic, Google, Microsoft, and OpenAI as well as philanthropic partners the Patrick J. McGovern Foundation, the David and Lucile Packard Foundation, Schmidt Sciences, and Jaan Tallin. The AISF works in close collaboration with the Frontier Model Forum.

Administered independently by Meridian Prime, the AISF awards research grants to independent researchers to address some of the most critical safety risks associated with the proliferated use of frontier AI systems. The AISF recognizes that industry leaders are uniquely positioned to identify high-priority research needs that promote the safe and secure deployment of AI. The Funders and partners have been thoughtful advisors to guide the fund toward the most compelling needs in AI research.

The purpose of the fund is to support and expand the field of AI safety research to promote the responsible development of frontier models, minimize risks, and enable independent, standardized evaluations of capabilities and safety. We seek to attract and support the brightest minds across the AI ecosystem to advance frontier models in alignment with human values.

AI Safety Fund Grantees

Biosafety Evaluations

ALBANY, NY

Primary Investigator: Gary Ackerman

Project Description:

Standardized evaluation is essential for any safety-minded, risk-aware industry.

Nemesys Insights, LLC will develop a Biothreat Benchmark Generation (BBG) Framework, to serve as a defensible and sustainable process for generating, implementing, and updating a set of useful biothreat benchmarks for foundational AI systems. The research will enable the identification of key areas of potential harm along the biothreat chain, thus allowing for the prioritization of mitigation efforts and enhancing safety more than is possible with more general approaches. This will facilitate pre-release testing to identify and correct critical vulnerabilities before launch, and help build trust among developers, users, and regulators by fostering ethical norms of transparency, accountability, and safety.

CAMBRIDGE, MA

Primary Investigator: Seth Donoughe, PhD

Project Description:

A major question for AI developers and regulators is whether frontier models could enable a malicious actor to create or manipulate virus-based bioweapons. However, the existing biosecurity-relevant AI benchmarks assess academic information (e.g. WMDP) or general biological knowledge and processes (e.g. MMLU, GPQA, LAB-Bench). Therefore, with AISF’s support, the SecureBio AIxBio team developed an expert-validated benchmark called Virus Methods QA (VMQA) for assessing the capability of general purpose AI models to provide practical troubleshooting for work with viruses. The multimodal benchmark includes 350 questions covering many dual-use methods, incorporating original micrographs and photos. It was created with multiple rounds of writing and revision from 60 expert virologists, making it challenging even for experienced professionals. This tool is available to safety teams, regulatory bodies, and academic researchers working on AIxBio safety.

Methodologies to Assess Dangerous Capabilities

CAPE TOWN, SOUTH AFRICA

Primary Investigator: Benjamin Sturgeon

Partners: Jacy Reese Anthis (University of Chicago); Catailn Mitelut (University of New York); Independent (Daniel Samuelson); Leo Hyams (AI Safety Cape Town).

Project Description:

While it is often rational to defer to the decisions of an AI system, the compounding effect of many people doing so over time may threaten human agency. Our project aims to measure the degree of threats to human agency that different models pose, by breaking down agency threats into concrete categories which we can then evaluate.


These categories broadly try to measure the extent to which models empower or disempower users and whether they preserve the norms of the user. An example of the first measures whether models ask appropriate clarifying questions when there is uncertainty on exactly what the user is asking for. An example of failing to preserve norms would be  the extent to which models try to shift the norms of a user by pushing its views of what is correct onto them.


We hope that by creating the first concrete tools to measure these important aspects of AI-human interaction that we can better consider how to create AI tools that enhance human capabilities rather than replacing them.

WASHINGTON DC, USA

Primary Investigators: Miranda Bogen, Center for Democracy and Technology; Dylan Hadfield-Mennel, MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

Project Description:

There is a lack of research on the risk of unintended consequences when applying seemingly benign modifications to foundational AI models after they have initially been fitted with guardrails for safety.

This research examines the extent to which characteristics of foundational models, including performance on safety, persist or change when foundation models are modified (e.g., via fine-tuning) or integrated into downstream products or systems — even if those changes are not intended to circumvent guardrails. The research will directly inform strategies to assign responsibility for assessing risks and threats from AI models and systems, and to identify where in the AI supply chain AI-related risks should be evaluated or re-evaluated.

This project will test a variety of off-the-shelf foundation models of different sizes (e.g. Llama 3 8B / 70B) across a variety of benchmarks, then modify those models and/or gather versions of those models that others have modified and repeat the suite of tests to see if results are consistent or if they diverge. While a fair amount of research has demonstrated the fragility of model guardrails in the face of adversarial conduct, this project will investigate whether that fragility extends to commonplace modifications that foundation models will inevitably be subjected to in order to be prepared for deployments beyond general-purpose chatbots.

WASHINGTON, D.C.

Primary Investigator: Kyle Crichton

Project Description:

CSET is developing a framework for approaching AI safety and security from top to bottom, providing a more comprehensive picture for policymakers and a more actionable set of guidance for practitioners. Drawing on industry best practices and academic research, this work will identify techniques for evaluating and securing frontier AI systems, providing guidance on how knowledge from domains like cybersecurity apply differently to AI-specific challenges as compared to traditional software systems. In collating best practices, this work will provide an assessment of implementation priorities and tradeoffs between safety measures and other goals (e.g. performance, transparency, privacy).

ZURICH, SWITZERLAND

Primary Investigator: Florian Tramer

Project Description:

This project focuses on the threat of poisoning attacks that tamper with a small amount of training data for large language models: many models (like ChatGPT) are trained on a huge amount of data (text, images, etc) collected from all over the internet. Properly curating this data is near-impossible at the moment, and so it may be easy for a malicious party to go and tamper with some of this data in a poisoning attack. While research has investigated poisoning attacks during initial pre-training stages, there is a lack of research in the second “alignment” stage of AI training.

These attacks are much harder to pull off though, because the data used during alignment is not collected randomly from the internet, and much better curated. The project explores various poisoning strategies during the pre-training phase, and then test whether these survive the alignment phase to cause unsafe model behavior. The goal of our research is to lay foundations for a more rigorous understanding of data poisoning strategies and their impact, to inform safer deployment of future AI systems.

LONDON, United Kingdom

Primary Investigator: Francis Rhys Ward

Partners: Tuen van der Weil;  Felix Hofstätter

Project Description:

Trustworthy evaluations are crucial for ensuring that AI systems are safe. However, capability evaluations are vulnerable to “sandbagging” — strategic underperformance on evaluations — which decreases the trustworthiness of evaluation results, and thereby sandbagging could undermine important safety decisions regarding the development and deployment of advanced AI systems.
The project conducts three streams of work:

  1. Detecting sandbagging based on irregularities in an AI system’s performance.
  2. Investigating capability elicitation techniques accessible to evaluators with limited computing resources.
  3. Building evaluations for sandbagging-relevant capabilities in frontier systems, such as the ability to imitate the performance of a weaker system.

Previously, this lab has conducted research demonstrating vulnerabilities of capability evaluations to sandbagging.

NEW YORK, NY

Primary Investigator: Andrew Trask

Project Description:

With the rapid advancement of frontier AI, establishing effective third-party oversight and evaluation is crucial to ensure responsible development and maintain public trust. However, allowing access to such models and their underlying assets (e.g., training data, user logs, model weights, etc.) poses significant risks, including privacy invasion, security vulnerabilities, and potential exposure of proprietary intellectual property (IP). Given these challenges, collaboration among researchers, institutions, and AI developers is limited, making it harder to detect and address potential harms caused by AI systems.

OpenMined is developing technical AI governance infrastructure designed to mitigate these risks and enable structured access to AI models for external researchers. When fully deployed across the AI ecosystem, this infrastructure could support responsible AI organizations in detecting and addressing potential harms proactively, helping to safeguard individuals while promoting a new wave of independent AI research.

STANFORD, CA

Primary Investigator: Percy Liang

Project Description:

Transparency is instrumental to greater accountability and more responsible development and deployment of foundation models. The Foundation Model Transparency Index (FMTI) is an ongoing initiative to comprehensively assess the transparency of major foundation developers (e.g. Anthropic, Google, Meta, OpenAI) across 100 indicators of transparency spanning the supply chain. The first FMTI in October 2023 showed striking opacity: the average score was 37/100. Six months later, the May 2024 FMTI assessed 14 developers and found transparency had improved somewhat with a 58/100 average score.

The FMTI will continue to assess transparency to better advance public accountability. It will look through indicators on labor (e.g. wages, worker protections), indicators on data ( copyright, bias, etc), indicators on downstream use (e.g. affected individuals, affected markets, recourse mechanisms) and limitations, risks, mitigations, and trustworthiness, especially with the dimensions of whether they are rigorously evaluated and externally evaluated, which are core to responsibility, ethics, and safety.

SAN FRANCISCO, CA

Primary Investigator: Bo Li

Partner: Dawn Song

Project Description:

The safety and trustworthiness of AI and LLM agents – advanced systems designed for complex text generation, remembering past conversations, and thinking ahead – is important for AI Safety overall. LLM agents trained on large datasets can inadvertently learn and perpetuate biases present in the training data.

Virtue AI, a new start-up in the AI safety research field, will address the rise of AI agents through work focused on designing novel red teaming strategies against LLM agents to address the safety and trustworthiness of LLM agents rigorously, understand the potentially severe consequences when LLM agents are attacked or misused, and explore potential defense approaches. The project aims to create red teaming strategies and a toolbox for red teaming strategies useful for stress tests on different agents. This research helps identify regulatory compliance and use-case-driven risks by subjecting AI systems to adversarial tests that reveal unsafe behaviors. By addressing these issues, the research promotes the development of AI systems that are safe and secure, ensuring that all users benefit from AI advancements without safety concerns.

Addressing Deception

LONDON, UNITED KINGDOM

Primary Investigator: Marius Hobbhahn

Project Description:

This project investigates deceptively aligned AI systems: when a model which is not actually aligned temporarily acts aligned in order to deceive its creators or its training process. Deceptively aligned models amplify existing AI risks – from biased decisions to dangerous CBRN (Chemical, Biological, Radiological, and Nuclear) capabilities – by obscuring the detection of these critical risks.

To measure the deceptive potential of AI models, this project develops White Box Evaluations for AI deception. The project aims to develop datasets and tools for simple whitebox detection methods, e.g. probes, for different types of deception. 

NEW YORK, NY

Primary Investigator: He He

Project Description:

Deceptive AI behavior poses significant challenges in evaluation and oversight, which is crucial for ensuring that AI systems operate as intended.

This project evaluates, characterizes, and mitigates deceptive behavior arising from reward hacking. The research aims to determine how and how often AI tricks humans during training; analyze what influences the emergence of deceptive behavior, like the complexity of the task and the AI’s size; and develop techniques to monitor AI during training to prevent it from exploiting human biases, ensuring they perform tasks fully. To achieve this, the project will use both synthesized and real human feedback to study AI behavior on complex tasks like long context question answering, and inspect both the behavior and internal mechanism of the model.