米国 NIST AI 800-4 展開済みAIシステムの監視における課題 (2026.03.09)

--- > [!NOTE] 目次 ```table-of-contents title: minLevel: 0 maxLevel: 0 includeLinks: true ``` --- > [!NOTE] リスト掲載用文字列 - [米国 NIST AI 800-4 展開済みAIシステムの監視における課題 (2026.03.09)](http://maruyama-mitsuhiko.cocolog-nifty.com/security/2026/03/post-356003.html)【まるちゃんの情報セキュリティ気まぐれ日記】(2025年03月09日) --- > [!NOTE] この記事の要約（箇条書き） - NISTがAI 800-4「展開済みAIシステムの監視における課題」を公表しました。 - 本報告書は、AIシステムの信頼性確保に不可欠な実環境での継続的な事後モニタリングの課題に焦点を当てています。 - 課題は「ギャップ」「障壁」「未解決の問い」に分類されています。 - 「機能性」「運用」「人的要因」「セキュリティ」「コンプライアンス」「大規模な影響」の6つの監視カテゴリーを特定しています。 - これらの課題の特定、体系化、文書化、および専門家の見解の報告を通じて、今後の研究とイノベーションの機会を示しています。 > [!NOTE] 要約おわり --- [« オランダデータ保護庁 AIとアルゴリズム (2026.03.05)](http://maruyama-mitsuhiko.cocolog-nifty.com/security/2026/03/post-33619e.html) | [Main](http://maruyama-mitsuhiko.cocolog-nifty.com/security/) ## 2026.03.12 ### 米国 NIST AI 800-4 展開済みAIシステムの監視における課題 (2026.03.09) こんにちは、丸山満彦です。 NISTが、AI 800-4 展開済みAIシステムの監視における課題を公表していますね... AIシステムの信頼性確保には実環境での継続的な事後モニタリングが不可欠ですが、、、現状は手法・透明性・インセンティブなど多くの課題が山積ということですよね... 課題を「ギャップ」、「障壁」、「未解決の問い」に分類していますね... 「ギャップ」は、現状とあるべき姿の間に不足しているものがある、「障壁」は、あるべき姿になるために取り除かないといけない、制度的、物理的な壁がある、「未解決の問い」は、答えをみちびきだすための議論と研究が必要なもの、、、という感じですかね。。。 ● **NIST - ITL** ・2025.03.09 [**New Report: Challenges to the Monitoring of Deployed AI Systems**](https://www.nist.gov/news-events/news/2026/03/new-report-challenges-monitoring-deployed-ai-systems) | **New Report: Challenges to the Monitoring of Deployed AI Systems** | **新報告書：展開済みAIシステムの監視における課題** | | --- | --- | | As artificial intelligence (AI) systems are increasingly integrated into commercial and government applications, there is a growing demand to monitor these systems in real-world settings. While the concept of monitoring digital systems for quality assurance is not new, particularly in the cases of [cybersecurity and software continuous monitoring](https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-137.pdf), it is a vast and fragmented space in the AI sector. Given that AI systems have novel properties that introduce variability and manifest in unpredictable ways, post-deployment monitoring – from incident monitoring to field studies – is a crucial practice for confident, wide-spread AI adoption. | 人工知能（AI）システムが商業・政府アプリケーションに統合されるにつれ、実環境下での監視需要が高まっている。品質保証のためのデジタルシステム監視は、特にサイバーセキュリティやソフトウェア継続的監視の分野では新たな概念ではないが、AI分野では広範かつ断片的な領域である。AIシステムは変動性を生み出し予測不能な形で現れる新たな特性を持つため、インシデント監視から実地調査に至る展開後のモニタリングは、AIの確信を持って広範に普及させる上で不可欠な実践である。 | | To address this pressing need, in 2025 the Center for AI Standards and Innovation (CAISI) held three practitioner workshops and conducted an in-depth literature review to map the landscape, focusing on current challenges to robust and effective post-deployment monitoring of AI systems. | この喫緊の課題に対応するため、2025年にAI標準化・イノベーションセンター（CAISI）は3回の実務者ワークショップを開催し、AIシステムの堅牢かつ効果的な展開後モニタリングにおける現状の課題に焦点を当て、状況把握のための詳細な文献レビューを実施した。 | | Our findings are outlined in the new report, [NIST AI 800-4: Challenges to the Monitoring of Deployed AI Systems](https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-4.pdf), in which we identify monitoring categories and detail challenges (gaps, barriers, and open questions) to inform and spur future research in the field. The primary contribution of this report is the identification, organization, and documentation of monitoring challenges, and reporting of views expressed by experts in the field. | 調査結果は新報告書「 [NIST AI 800-4：展開済みAIシステムの監視における課題」](https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-4.pdf) にまとめられており、監視のカテゴリーを特定するとともに、今後の研究を促進するための課題（ギャップ、障壁、未解決問題）を詳細に記述している。本報告書の主な貢献は、監視上の課題を特定・体系化・文書化し、当該分野の専門家による見解を報告した点にある。 | | Six common categories of monitoring, developed via thematic coding, are listed in the table below. See Appendix B of the report for the full methodology, and Appendix C for the associated codebook. | テーマ別コーディングにより導出された6つの共通モニタリングカテゴリーを下記の表に示す。詳細な方法論は報告書の附属書B、関連コーディングブックは附属書Cを参照のこと。 | | **Monitoring Category** | **モニタリングカテゴリー** | | **Definition** | **定義** | | **Functionality Monitoring** | **機能性モニタリング** | | Does the system continue to work as intended? | システムは意図した通りに動作し続けているか？ | | Measuring system functions, capabilities, and features to ensure the system works as intended | システムが意図した通りに動作することを保証するため、システム機能・能力・特性を測定する | | **Operational Monitoring** | **運用モニタリング** | | Does the system maintain consistent service across its infrastructure? | システムはインフラ全体で一貫したサービスを維持しているか？ | | Measuring system infrastructure components, for example to ensure the system maintains consistent levels of service | システムインフラコンポーネントを測定し、例えばシステムが一貫したサービスレベルを維持していることを保証する | | **Human Factors Monitoring** | **人的要因モニタリング** | | Is the system transparent to humans and high quality? | システムは人間にとって透明性が高く、高品質か？ | | Measuring human-system interactions, for example to ensure the system produces high-quality outputs and is transparent | 人間とシステムの相互作用を測定し、例えばシステムが高品質な出力を生成し、透明性があることを保証する | | **Security Monitoring** | **セキュリティモニタリング** | | Is the system secure against attacks and misuse? | システムは攻撃や悪用に対して安全か？ | | Measuring where the system is potentially vulnerable to adversarial attacks and misuse | システムが敵対的攻撃や悪用に対して潜在的に脆弱な箇所を測定する | | **Compliance Monitoring** | **コンプライアンス監視** | | Does the system adhere to relevant regulations and directives? | システムは関連する規制や指令を遵守しているか？ | | Measuring system components for adherence to relevant laws, regulations, standards, controls, and guidelines | システムコンポーネントが関連する法律、規制、標準、統制、ガイドラインを遵守しているかを測定する | | **Large-Scale Impacts Monitoring** | **大規模影響監視** | | Does the system promote human flourishing? | システムは人間の繁栄を促進しているか？ | | Measuring system properties that have wide downstream impacts, for example to ensure the system promotes human flourishing | システムのプロパティが広範な下流影響を持つかを測定する。例えば、システムが人間の繁栄を促進することを保証するため | | To manageably synthesize the many challenges reported by practitioners and subject matter experts, we organized the database of workshop quotes and literature excerpts in two ways: (1) by monitoring category, as, for example, some monitoring challenges are more applicable to human factors than security (e.g., overhead of collecting and gauging user feedback), and (2) those challenges that are shared across categories (e.g., poor incident sharing mechanisms). Finally, we sorted open questions on AI system monitoring into “who”, “what”, “when”, “why”, and “how” to monitor. | 実務者や専門家が報告した多くの課題を管理可能な形で統合するため、ワークショップ発言と文献抜粋のデータベースを二つの方法で整理した：(1)監視カテゴリー別（例：監視課題の一部はセキュリティより人的要因に適用される、ユーザーフィードバック収集・評価のオーバーヘッドなど）、(2)カテゴリー横断的な課題（例：不十分なインシデント共有メカニズム）。最後に、AIシステム監視に関する未解決の課題を「誰が」「何を」「いつ」「なぜ」「どのように」監視するかに分類した。 | | The table below highlights a sampling of post-deployment monitoring challenges. See the report for the full list. | 下表は展開後の監視課題の抜粋を示す。完全なリストは報告書を参照のこと。 | | **Highlighted Gaps, Barriers, and Open Questions** | **顕在化したギャップ、障壁、未解決課題** | | **Category-Specific Challenges** | **カテゴリー固有の課題** | | **Gaps:** | **ギャップ：** | | Insufficient research on human-AI feedback loops | 人間とAIのフィードバックループに関する研究不足 | | Underexplored methods to detect deceptive behavior | 欺瞞的行動を検知する手法の未開拓 | | Defining metrics for beneficial impacts to humans | 人間への有益な影響を測定する指標の定義 | | **Barriers:** | **障壁：** | | Detecting performance degradation and drift | 性能劣化とドリフトの検知 | | Fragmented logging across distributed infrastructure | 分散インフラにおける断片的なロギング | | Navigating the complexity of the policy landscape | 複雑な政策環境の対応 | | **Cross-Cutting Challenges** | **横断的課題** | | **Gaps:** | **ギャップ：** | | Lack of trusted guidelines or standards for methods and tools | 手法・ツールに関する信頼できるガイドラインや標準の欠如 | | Immature information sharing ecosystem | 未成熟な情報共有エコシステム | | **Barriers:** | **障壁：** | | Scaling human-driven monitoring alongside rapid rollouts | 迅速な展開に伴う人間主導の監視の拡張 | | Balancing competitive pressures with necessary oversight | 競争圧力と必要な監視のバランス | | Hiring and training qualified AI experts | 有能なAI専門家の採用と育成 | | **Open Questions** | **未解決の問い** | | How to reduce monitoring burden on the end user or customer? | エンドユーザーや顧客の監視負担をどう軽減するか？ | | Should monitoring be based on risk-level? Tailored to the use case? | 監視はリスクレベルに基づくべきか？ユースケースに合わせるべきか？ | | What is the right cadence for monitoring? | 監視の適切な頻度は何か？ | | What is the relationship between monitoring and auditing? | 監視と監査の関係は何か？ | | How to balance and integrate automated monitoring and human-validated monitoring? | 自動監視と人間による妥当性確認監視のバランスと統合をどう図るか？ | | The identified gaps, barriers, and open questions highlight impactful opportunities for further investigation and innovation. The monitoring categories can offer a common language for describing sub-fields within AI system monitoring, and the challenges identified highlight areas where additional solutions are needed. | 特定されたギャップ、障壁、未解決の問いは、さらなる調査と革新のための重要な機会を示している。監視カテゴリーはAIシステム監視内のサブ分野を記述する共通言語となり得、特定された課題は追加ソリューションが必要な領域を浮き彫りにする。 | ・\[PDF\] [**NIST AI 800-4**](https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-4.pdf) [![20260311-172408](https://maruyama-mitsuhiko.cocolog-nifty.com/security/images/20260311-172408.png "20260311-172408")](https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.800-4.pdf) ・\[[DOCX](http://maruyama-mitsuhiko.cocolog-nifty.com/security/files/nist.ai.800420ja.docx)\]\[[PDF](http://maruyama-mitsuhiko.cocolog-nifty.com/security/files/nist.ai.800420ja.pdf)\] 仮訳 **目次...** | **Acknowledgements** | **謝辞** | | --- | --- | | **Executive Summary** | **エグゼクティブサマリー** | | **1\. Introduction** | **1\. 序論** | | 1.1 Definition: Post-Deployment AI System Monitoring | 1.1 定義：展開後AIシステム監視 | | 1.2 Methodology | 1.2 方法論 | | 1.3 Contributions | 1.3 貢献 | | **2\. Monitoring Categories** | **2\. 監視カテゴリー** | | **3\. Monitoring Challenges** | **3\. モニタリングの課題** | | **3.1 Cross-Cutting Monitoring Challenges** | **3.1 モニタリング全般に共通する課題** | | 3.1.1 Trusted Methods and Tools | 3.1.1 信頼できる手法とツール | | 3.1.2 Visibility and Transparency Issues | 3.1.2 可視性と透明性の問題 | | 3.1.3 Pace of Change | 3.1.3 変化のペース | | 3.1.4 Organizational Incentives and Culture | 3.1.4 組織的インセンティブと文化 | | 3.1.5 Resource Requirements | 3.1.5 必要なリソース | | **3.2 Challenges by Monitoring Category** | **3.2 監視カテゴリー別の課題** | | 3.2.1 Functionality | 3.2.1 機能性 | | 3.2.2 Operational | 3.2.2 運用 | | 3.2.3 Human Factors | 3.2.3 人的要因 | | 3.2.4 Security | 3.2.4 セキュリティ | | 3.2.5 Compliance | 3.2.5 コンプライアンス | | 3.2.6 Large-Scale Impacts | 3.2.6 大規模な影響 | | 3.3 Open Questions | 3.3 未解決の課題 | | 3.3.1 Responsibility – Who Monitors? | 3.3.1 責任 – 誰が監視するのか？ | | 3.3.2 Scope – What to Monitor? | 3.3.2 監視範囲 – 何を監視すべきか？ | | 3.3.3 Cadence – When to Monitor? | 3.3.3 監視の頻度 – いつ監視すべきか？ | | 3.3.4 Purpose – Why Monitor? | 3.3.4 目的 – なぜ監視するのか？ | | 3.3.5 Methods – How to Monitor? | 3.3.5 方法論 – モニタリングの実施方法 | | **4\. Conclusion** | **4\. 結論** | | **Appendix A: Monitoring Category Excerpt Count per Source** | **附属書A：監視カテゴリー別出典数** | | **Appendix B: Methodological Details** | **附属書B：方法論の詳細** | | **Appendix C: Monitoring Categories Codebook** | **附属書C：モニタリングカテゴリーコードブック** | | **Bibliography** | **参考文献** | **エグゼクティブサマリー...** | **Executive Summary** | **エグゼクティブサマリー** | | --- | --- | | As artificial intelligence (AI) is increasingly integrated into commercial and government applications, developers and deployers have begun to monitor these systems after deployment. AI evaluations conducted prior to release – called “pre-deployment” evaluations – are now common, and valuable to assess the capabilities and risks of an AI system; however, these evaluations are predominantly done in controlled testing environments that cannot account for real-world dynamics. Furthermore, AI outputs are typically non-deterministic, meaning the AI may exhibit a range of behaviors under the same input conditions. Post-deployment measurement and monitoring is therefore a crucial tool (1) to validate that an AI system is operating reliably and as expected in real-world scenarios, (2) to track unforeseen outputs that occur due to, e.g., model non-determinism or dynamic input conditions, and (3) to identify unexpected consequences of integrating AI systems in new or changing contexts. These findings can then feed back into improvements of system design and pre-deployment testing, accelerating innovation and spurring further adoption. Stakeholders across the AI ecosystem agree on the need for post-deployment monitoring; however, best practices, validated methodologies, and common terminology is nascent. | 人工知能（AI）が商業・政府アプリケーションに統合されるにつれ、開発者や展開者はシステム展開後の監視を開始している。リリース前に行われる「展開前評価」は現在一般的であり、AIシステムの能力とリスクを評価する上で有用である。しかし、これらの評価は主に制御されたテスト環境で行われ、現実世界の動的要素を考慮できない。さらに、AIの出力は通常非決定論的であり、同じ入力条件下でもAIは様々な挙動を示す可能性がある。したがって、展開後の測定と監視は重要な手段となる。具体的には、(1) AIシステムが実世界のシナリオにおいて期待通りに確実に動作していることを妥当性確認するため、(2) モデルの非決定性や動的な入力条件などにより発生する予期せぬ出力を追跡するため、(3) 新たな環境や変化する状況にAIシステムを統合した際の予期せぬ結果を識別するためである。これらの知見はシステム設計の改善や展開前テストにフィードバックされ、イノベーションを加速しさらなる普及を促す。AIエコシステム全体の関係者は展開後モニタリングの必要性で合意しているが、ベストプラクティス、妥当性確認された手法、共通用語は未だ発展途上である。 | | To address this, in 2025 the Center for AI Standards and Innovation (CAISI) within NIST held two workshops on post-deployment AI system monitoring with external stakeholders and federal agencies, followed by a larger workshop with the NIST AI Consortium. These convenings included a wide range of AI stakeholders, including compute providers, model developers, downstream deployers, application developers, and third-party evaluators. In parallel, a literature review was conducted to gather (1) published case studies of AI system monitoring in real-world applications and (2) methodologies or frameworks for AI measurement postdeployment. Literature excerpts and participant contributions were organized into six monitoring categories and a host of monitoring challenges (gaps, barriers, and open questions). | この課題に対処するため、2025年にNIST傘下のAI標準化・イノベーションセンター（CAISI）は、外部関係者や連邦機関を招いたAIシステム展開後モニタリングに関するワークショップを2回開催し、その後NIST AIコンソーシアムとの大規模ワークショップを実施した。これらの会合には、コンピューティングプロバイダ、モデル開発者、下流デプロイヤー、アプリケーション開発者、サードパーティ評価機関など、幅広いAI関係者が参加した。並行して、文献レビューを実施し、（1）実世界アプリケーションにおけるAIシステム監視の公開事例研究、（2）導入後のAI測定手法や枠組みを収集した。文献抜粋と参加者からの貢献は、6つの監視カテゴリーと多数の監視課題（ギャップ、障壁、未解決の疑問）に整理された。 | | Section 1 of this report introduces this work, including a definition of post-deployment AI system monitoring and an overview of the research methodology and contributions. Further methodological details are provided in Appendix B. | 本報告書のセクション1では、この取り組みを紹介する。これには、展開後のAIシステム監視の定義、研究方法論の概要、および貢献内容が含まれる。方法論の詳細は附属書Bに記載されている。 | | Section 2 introduces the six monitoring categories to support a more organized discussion and field of work on post-deployment monitoring (Table 1): | セクション2では、展開後モニタリングに関する議論と研究領域をより体系的に整理するため、6つのモニタリングカテゴリーを紹介する（表1）： | | • Functionality Monitoring: Does the system continue to work as intended? | • 機能性モニタリング：システムは意図した通りに動作し続けているか？ | | • Operational Monitoring: Does the system maintain consistent service across its infrastructure? | • 運用監視：システムはインフラ全体で一貫したサービスを維持しているか？ | | • Human Factors Monitoring: Is the system transparent to humans and high quality? | • 人的要因監視：システムは人間にとって透明性が高く、高品質か？ | | • Security Monitoring: Is the system secure against attacks and misuse? | • セキュリティ監視：システムは攻撃や悪用に対して安全か？ | | • Compliance Monitoring: Does the system adhere to relevant regulations and directives? | • コンプライアンス監視：システムは関連規制や指令を遵守しているか？ | | • Large-Scale Impacts Monitoring: Does the system promote human flourishing? | • 大規模影響監視：システムは人間の繁栄を促進しているか？ | | Section 3 details the challenges to robust post-deployment AI system monitoring, organized into the following groups: | セクション3では、堅牢なAIシステム展開後の監視における課題を以下のグループに分類して詳述する： | | • Cross-Cutting Challenges (Table 2): Notable challenges that are shared across monitoring categories, e.g., a lack of information sharing related to data, model components, and incidents, and difficulties with rapidly scaling up systems and hiring an AI-ready workforce. | • 横断的課題（表2）：監視カテゴリーを横断する顕著な課題。例：データ、モデル構成要素、インシデントに関する情報共有の不足、システムの迅速な拡張やAI対応人材の確保の困難さ。 | | • Category-Specific Challenges (Table 3): Notable challenges that apply to particular monitoring categories, e.g., detecting drift, logging across distributed infrastructure, capturing human-AI feedback loops, identifying deceptive behavior, navigating a complex policy landscape, and defining metrics for beneficial impacts to humans. | • カテゴリー固有の課題（表3）：特定の監視カテゴリーに適用される顕著な課題。例：ドリフトの検知、分散インフラ全体でのロギング、人間とAIのフィードバックループの捕捉、欺瞞的行動の識別、複雑な政策環境の対応、人間への有益な影響を測定する指標の定義。 | | • Open Questions (Table 4): Unsettled questions in AI system monitoring related to, for example, mitigating user-side burden, determining the optimal cadence for monitoring, and balancing automated vs. human-validated monitoring. | • 未解決課題（表4）：AIシステム監視における未解決の課題。例えば、ユーザー側の負担緩和、監視の最適な頻度の決定、自動監視と人間による妥当性確認のバランス調整など。 | | The identification and documentation of challenges to the monitoring of deployed AI systems and the reporting of views expressed by experts in the field are the primary contributions of this report. These gaps, barriers, and open questions highlight opportunities for further investigation and innovation. Notably, this report raises practitioners’ repeated calls for further guidance on post-deployment AI system monitoring methods, from field studies to incident monitoring. The findings of this report are not exclusive, but have a particular relevance, to frontier generative AI systems. | 本報告書の主な貢献は、展開済みAIシステムの監視における課題の特定と文書化、および当該分野の専門家による見解の報告である。これらのギャップ、障壁、未解決課題は、さらなる調査と革新の機会を示している。特に本報告書は、現場調査からインシデント監視に至る展開後AIシステム監視手法に関するさらなる指針を求める実務者の繰り返される要請を提起している。本報告書の知見は排他的ではないが、特に最先端の生成的AIシステムに関連性が高い。 | | [Permalink](http://maruyama-mitsuhiko.cocolog-nifty.com/security/2026/03/post-356003.html) [« オランダデータ保護庁 AIとアルゴリズム (2026.03.05)](http://maruyama-mitsuhiko.cocolog-nifty.com/security/2026/03/post-33619e.html) | [Main](http://maruyama-mitsuhiko.cocolog-nifty.com/security/) [« オランダデータ保護庁 AIとアルゴリズム (2026.03.05)](http://maruyama-mitsuhiko.cocolog-nifty.com/security/2026/03/post-33619e.html) | [Main](http://maruyama-mitsuhiko.cocolog-nifty.com/security/)