AI 安全 - AI资讯 - AI News Hub

安全 2026年3月4日

Linear Model Extraction via Factual and Counterfactual Queries

Recent research demonstrates that model extraction attacks against linear AI models can be dramatically efficient when u...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Linear Model Extraction via Factual and Counterfactual Queries

A study (arXiv:2602.09748v2) demonstrates that model extraction attacks using counterfactual explanations can completely...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Linear Model Extraction via Factual and Counterfactual Queries

New research demonstrates that model extraction attacks can fully recover linear model parameters using just a single co...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Linear Model Extraction via Factual and Counterfactual Queries

A new study (arXiv:2602.09748v2) demonstrates that counterfactual queries—common tools for AI explainability—can be weap...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Linear Model Extraction via Factual and Counterfactual Queries

New research demonstrates that linear machine learning models can be fully extracted using counterfactual queries, with ...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Linear Model Extraction via Factual and Counterfactual Queries

New research demonstrates that counterfactual queries—tools used for AI explainability—can be weaponized to perform effi...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Secure Sparse Matrix Multiplications and their Applications to Privacy-Preserving Machine Learning

Researchers have developed novel secure multi-party computation (MPC) algorithms specifically optimized for multiplying ...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Secure Sparse Matrix Multiplications and their Applications to Privacy-Preserving Machine Learning

Researchers have developed novel secure multi-party computation (MPC) algorithms optimized for multiplying secret-shared...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Secure Sparse Matrix Multiplications and their Applications to Privacy-Preserving Machine Learning

Researchers have developed novel secure multi-party computation (MPC) algorithms for multiplying secret-shared sparse ma...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Secure Sparse Matrix Multiplications and their Applications to Privacy-Preserving Machine Learning

New Multi-Party Computation (MPC) algorithms enable efficient secure sparse matrix multiplication, addressing a critical...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk

CLEAR (Calibrated Learning for Epistemic and Aleatoric Risk) is a novel machine learning framework that unifies the quan...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk

CLEAR (Calibrated Learning for Epistemic and Aleatoric Risk) is a novel calibration framework that addresses the balance...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk

The CLEAR (Calibration for Learning with Aleatoric and Epistemic Risk) framework is a novel calibration method that simu...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk

CLEAR (Calibration method for combining aLeatoric and Epistemic uncertainty in Regression) is a novel framework that joi...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk

CLEAR (Calibrated Learning for Epistemic and Aleatoric Risk) is a novel framework that quantifies both aleatoric and epi...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Inside the secret meeting that led to the AI political resistance

In January 2024, the Future of Life Institute convened a secret meeting of nearly 90 political, labor, and thought leade...

The Verge AI 阅读全文 →

安全 2026年3月4日

LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities

LiteLMGuard is a novel, model-agnostic guardrail system that provides real-time prompt-level defense for quantized small...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities

LiteLMGuard is a model-agnostic on-device guardrail that provides real-time prompt filtering for small language models (...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities

LiteLMGuard is a novel on-device prompt filtering system that protects quantized Small Language Models (SLMs) from secur...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

LiteLMGuard: Seamless and Lightweight On-Device Prompt Filtering for Safeguarding Small Language Models against Quantization-induced Risks and Vulnerabilities

LiteLMGuard is a novel, model-agnostic on-device prompt filtering system designed to protect quantized Small Language Mo...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

(Un)fair devices: Moving beyond AI accuracy in personal sensing

A comprehensive literature review reveals that machine learning models in personal health devices like smartwatches and ...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

(Un)fair devices: Moving beyond AI accuracy in personal sensing

A comprehensive literature review reveals that AI-powered personal health devices, including smart rings and fitness tra...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

(Un)fair devices: Moving beyond AI accuracy in personal sensing

A comprehensive literature review reveals that machine learning models in personal health devices like smartwatches and ...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

(Un)fair devices: Moving beyond AI accuracy in personal sensing

A comprehensive literature review reveals that machine learning models in personal health devices—including smartwatches...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation

FlexGuard is a novel LLM-based content moderation system that outputs continuous risk scores instead of binary classific...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation

FlexGuard is a novel large language model that outputs calibrated, continuous risk scores for content moderation instead...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Out-of-Support Generalisation via Weight-Space Sequence Modelling

WeightCaster is a novel AI framework that addresses out-of-support generalization by reformulating it as a sequence mode...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Out-of-Support Generalisation via Weight-Space Sequence Modelling

WeightCaster is a novel research framework that addresses out-of-support generalization in neural networks by reformulat...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Learning Contextual Runtime Monitors for Safe AI-Based Autonomy

Researchers have developed a novel framework for learning context-aware runtime monitors that dynamically select the opt...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Learning Contextual Runtime Monitors for Safe AI-Based Autonomy

Researchers have developed a novel framework for learning context-aware runtime monitors that improve safety in AI-based...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Learning Contextual Runtime Monitors for Safe AI-Based Autonomy

Researchers have developed a novel framework for context-aware runtime monitors that improves safety in AI-based control...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Dual Randomized Smoothing: Beyond Global Noise Variance

Dual Randomized Smoothing (Dual RS) is a novel adversarial robustness certification framework that replaces fixed global...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Dual Randomized Smoothing: Beyond Global Noise Variance

Dual Randomized Smoothing is a novel AI security framework that replaces the standard global noise variance in Randomize...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Dual Randomized Smoothing: Beyond Global Noise Variance

Dual Randomized Smoothing is a novel AI security framework that overcomes the fundamental limitation of standard Randomi...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols

WARP (Weight-space Reparameterization for Privacy) is a novel defense mechanism that addresses critical privacy vulnerab...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols

WARP (Weight-space Reparameterization) is a novel defense mechanism that protects machine unlearning systems from privac...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

WARP: Weight Teleportation for Attack-Resilient Unlearning Protocols

New research exposes critical privacy flaws in approximate machine unlearning, where adversaries can exploit differences...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Auditing Information Disclosure During LLM-Scale Gradient Descent Using Gradient Uniqueness

Researchers have developed Gradient Uniqueness (GNQ), an information-theoretic metric that efficiently audits privacy ri...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Auditing Information Disclosure During LLM-Scale Gradient Descent Using Gradient Uniqueness

Researchers have developed Gradient Uniqueness (GNQ), a computationally efficient framework for quantifying privacy risk...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Post-hoc Stochastic Concept Bottleneck Models

Post-Hoc Stochastic Concept Bottleneck Models (PSCBMs) are a novel framework that enhances existing Concept Bottleneck M...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Privacy Risk Predictions Based on Fundamental Understanding of Personal Data and an Evolving Threat Landscape

A new study analyzing over 5,000 identity theft cases has developed a predictive framework for privacy risks using an Id...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion

NatADiff is a novel adversarial attack framework that uses denoising diffusion models to generate natural adversarial sa...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion

NatADiff is a novel adversarial sampling method that uses denoising diffusion models to generate natural adversarial exa...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion

NatADiff is a novel framework that leverages denoising diffusion models to generate realistic and transferable adversari...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion

NatADiff is a novel adversarial sample generation method that leverages denoising diffusion models to create more realis...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

NatADiff: Adversarial Boundary Guidance for Natural Adversarial Diffusion

NatADiff is a novel framework that leverages denoising diffusion models to generate natural-looking adversarial samples,...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Know When to Abstain: Optimal Selective Classification with Likelihood Ratios

Researchers have developed a new framework for selective classification in AI by applying the Neyman-Pearson lemma to cr...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Know When to Abstain: Optimal Selective Classification with Likelihood Ratios

Researchers have applied the Neyman-Pearson lemma to selective classification, framing optimal prediction abstention as ...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Know When to Abstain: Optimal Selective Classification with Likelihood Ratios

Researchers have developed a new framework for selective classification using the Neyman-Pearson lemma and likelihood ra...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Adversarial Attacks in Weight-Space Classifiers

A new study (arXiv:2502.20314v3) demonstrates that classification models operating in the parameter-space of Implicit Ne...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Adversarial Attacks in Weight-Space Classifiers

A new security analysis reveals that classification models operating directly on Implicit Neural Representation (INR) pa...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Adversarial Attacks in Weight-Space Classifiers

A security analysis published in arXiv:2502.20314v3 demonstrates that classifiers operating on Implicit Neural Represent...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Adversarial Attacks in Weight-Space Classifiers

A new study (arXiv:2502.20314v3) demonstrates that classifiers operating in the parameter-space of Implicit Neural Repre...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Few-shot Model Extraction Attacks against Sequential Recommender Systems

A groundbreaking study demonstrates novel few-shot model extraction attacks against sequential recommender systems, enab...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Few-shot Model Extraction Attacks against Sequential Recommender Systems

A novel few-shot model extraction attack framework can successfully clone sequential recommender systems using only 10% ...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Few-shot Model Extraction Attacks against Sequential Recommender Systems

A novel few-shot model extraction attack framework demonstrates that sequential recommender systems can be effectively c...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Gravity Falls: A Comparative Analysis of Domain-Generation Algorithm (DGA) Detection Methods for Mobile Device Spearphishing

A 2022-2025 study using the Gravity Falls dataset demonstrates that Domain Generation Algorithm (DGA) detectors perform ...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Gravity Falls: A Comparative Analysis of Domain-Generation Algorithm (DGA) Detection Methods for Mobile Device Spearphishing

The Gravity Falls study demonstrates that current Domain Generation Algorithm (DGA) detection methods are inadequate aga...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Gravity Falls: A Comparative Analysis of Domain-Generation Algorithm (DGA) Detection Methods for Mobile Device Spearphishing

A 2025 study demonstrates that current Domain Generation Algorithm (DGA) detection systems perform poorly against SMS sp...

arXiv cs.LG 阅读全文 →

安全 2026年3月4日

Generalized Bayes for Causal Inference

Researchers have developed a generalized Bayesian framework for causal machine learning that provides principled uncerta...

arXiv cs.LG 阅读全文 →