Transparency
v1.0 · Published May 19, 2026
Transparency makes safety-relevant information legible to stakeholders outside of AI labs, including scientists, civil society, and governments. This both incentivizes safer practices and gives external stakeholders more-informed views of risk.
Like Guidelight's other standards, this standard is organized into high-level goals (principles), and concrete things we recommend developers do (practices). We also include “directions-for-development” where new practices are needed, but the specifics aren’t yet worked out. Read more about our standards development process. Share feedback here.
Principles
Expose your risk assessment to public scrutiny.
Structured public risk assessment. Publish a report that states the developer's top-level conclusion about catastrophic risk from relevant models, including:
Publication frequency. Publish or update the report at least quarterly, and make clear what would trigger a re-assessment between cycles.
Risk category inclusion. Report, at minimum, the following risk categories:
For each category, describe your threat models and the key assumptions the assessment is operating under.
Impact of mitigations. Where the risk argument relies on mitigations, disclose:
Legibility of arguments. Make the structure of the risk argument legible to outside readers, such that a public reader can see:
External review. Disclose what, if any, external review of the assessment was conducted, including:
Senior employee attestation. A named senior employee publicly attests that the report accurately describes:
Standardized uncertainty scale. Use a standardized uncertainty scale in which key terms are defined (e.g., probability ranges for likelihood judgements and criteria for strength of different forms of evidence).
Internal processes. Disclose internal policies, practices, and roles that support the integrity of the risk assessment process.1
Documented changes. Preserve prior versions of risk assessments in a public archive with a changelog, while distinguishing substantive from editorial changes.
Inform the public about incidents in a complete and timely fashion.
Incident definition. Publish a clear definition of what the developer considers a reportable incident, including any severity thresholds.
Incident comprehensiveness. In the incident definition, include, at minimum:
Comprehensiveness of reporting. Report all incidents meeting the developer's definition, and disclose for each (subject to withholding):
Withholding of information. For any incident disclosure, identify:
Timeframe for disclosure. Publish and adhere to an incident-reporting time window for public disclosures. The developer may make a preliminary disclosure within the window and follow up with additional information thereafter, provided the initial disclosure indicates when follow-up will occur.
Incident identification. Publish a description of how the organization identifies potential incidents, including the monitoring systems and review processes used to surface and triage candidates.
Internal channels. Disclose the existence and basic structure of any internal channels through which employees can flag potential incidents, including:
catastrophic risk
Risk of severe, large-scale harm arising from frontier AI systems.
Possible causes include (but are not limited to): harms enabled by chemical, biological, radiological, or nuclear weapons; autonomous cyberattacks on critical systems; large-scale autonomous criminal action; and loss of human control over AI systems acting in pursuit of misaligned objectives.
For the purpose of a structured risk assessment, the developer may set their own severity threshold and should specify it in their public report. One example threshold would be the loss of more than 100 lives or more than $1 billion in property damage.
incident-reporting time window
The deadline, after discovering a reportable incident, by which the developer must first disclose it. Can vary with the incident's severity and complexity.
mitigation
Any measure adopted by the developer that is relied on, in whole or in part, to reduce the likelihood or severity of a catastrophic risk. This could include not only technical safeguards (e.g., training-time alignment techniques, implementations of refusals, monitoring systems, and rate limits) but also other measures (e.g., usage policies and their enforcement).
risk-relevant models
The set of models a developer should consider when assessing catastrophic risk — broadly, any model that could materially contribute to that risk.
At minimum, this should include the most capable model deployed internally, the most capable model deployed externally, any domain-specialized version relevant to a catastrophic risk category, and any model whose behavior during training or testing is itself a source of catastrophic risk.
standardized uncertainty scale
A scale for expressing uncertainty where terms have a defined, consistent meaning.
Examples: the U.S. Intelligence Community's ICD 203, which ties likelihood terms to probability ranges; and the IPCC's calibrated uncertainty language, which separately standardizes likelihood (probability ranges) and confidence in evidence (defined evidentiary criteria).
Such as, but not limited to: disclosures of the teams and roles responsible for conducting, drafting, and editing the assessment; anonymous methods of registering disagreement; and any role with authority to overrule or revise findings.
Such as, but not limited to: discoveries that certain forms of model refusal can be jailbroken; that a monitor has missed a class of behavior it was designed to catch; or that an evaluation was contaminated, gamed, or otherwise doesn't support its prior conclusion.
This standard is authored by the Guidelight team, with input from a wide range of sources; read more about our process.
Cite as:
@misc{guidelight2026transparency,
author = {{Guidelight AI Standards}},
title = {Transparency},
year = {2026},
month = may,
note = {Version 1.0, Standard},
howpublished = {\url{https://www.guidelight.ai/transparency}},
url = {https://www.guidelight.ai/transparency},
urldate = {2026-06-18}
}