Data Privacy & Security

Understand AI provider data policies to make informed choices

Why It Matters

As an AI Agent, SailFish sends the following data to AI providers:

Terminal output (IPs, paths, processes)
Command history (may contain secrets)
Server configurations
Business code & logs
Risks go beyond training — breaches, subpoenas, and internal access are all threats

International Providers

API data is not used for model training by default, with explicit contractual commitments. Consumer products (ChatGPT, Claude.ai, etc.) have different policies.

Anthropic (Claude) No training by default

Training: Not used for training by default

Retention: Auto-deleted within 30 days

Encryption: TLS in-transit + at-rest encryption

Zero retention: Zero retention agreement available

Data ownership: User owns inputs and outputs

Only opt-in via Developer Partner Program. Cleanest data policy among all providers.

View policy →

OpenAI (GPT-4o / o3) No training by default

Training: Not used by default (since Mar 2023)

Retention: Abuse monitoring logs kept 30 days

Encryption: AES-256 at-rest + TLS 1.2+ in-transit

Zero retention: Available (enterprise, by request)

Data ownership: User owns inputs and outputs

Most transparent policies, richest enterprise options. Supports customer-managed encryption keys (EKM).

View policy →

Google (Gemini) No training by default

Training: Paid API: No; Free tier: may use

Retention: Abuse detection logs kept 55 days

Encryption: In-transit and at-rest encryption

Zero retention: Not available

Data ownership: Paid API data not used to improve products

Free tier has training risk. Longer retention (55 days) than peers. Enterprise covered by Cloud DPA.

View policy →

China-based Providers

Policies below are for each provider's API platform (not consumer apps). Most API platforms have separate service agreements that may be stricter than consumer products. All providers are subject to PIPL and Data Security Law.

Bailian (Alibaba) No training by default

Training: Explicitly will not use for training

Opt-out: N/A (no training by default)

Official docs state: "We will never use your data for model training." AES-256 encryption. Clearest API policy among China providers.

View policy →

DeepSeek Opt-out available

Training: May use by default (de-identified)

Opt-out: Toggle off "data for experience optimization"

Has separate Open Platform Terms of Service, but training policy still references main privacy policy.

View policy →

Doubao (Volcengine) Opt-out available

Training: May use by default (de-identified)

Opt-out: Toggle off "help improve model"

API served via Volcengine (Ark platform) with separate Model Service Agreement and Data Authorization Agreement.

View policy →

Qianfan (Baidu) Separate agreement

Training: Separate security whitepaper for API

Opt-out: See Qianfan-specific terms

API via Qianfan platform (not Ernie Bot app). Has independent agreement, security whitepaper, AES-256 encryption, Level 3 security certification.

View policy →

MiniMax Opt-out available

Training: May use by default (de-identified)

Opt-out: Contact support to opt out

Open platform (platform.minimaxi.com) has separate platform agreement. Gaining prominence with abab model series and Hailuo AI video generation.

View policy →

Zhipu GLM Vague policy

Training: Not explicitly stated

Opt-out: Not specified

Open platform (bigmodel.cn) has separate privacy policy. Users own their data. Prohibits using outputs to train competing models.

View policy →

Kimi (Moonshot) Vague policy

Training: May use to "improve service"

Opt-out: Not specified

Open platform (platform.moonshot.ai) has separate Terms of Service and privacy policy.

View policy →

Risks Beyond Training

"Not used for training" is good news, but your data still faces multiple risks during transit and storage:

Server Breach

Data kept during retention periods is exposed if servers are compromised. Longer retention = higher risk.

Internal Access

Human reviewers may see your conversations during abuse monitoring and content moderation.

Government Requests

Law enforcement in different jurisdictions can compel providers to hand over user data.

Third-party Subprocessors

Cloud infrastructure providers, content moderation vendors may also access your data.

Retention Window

Even without training, data sitting on servers for 30-55 days widens the exposure window.

Cross-border Transfer

Data may be stored in different countries with varying levels of legal protection.

Overall Safety Rating

Recommended

No training + encryption + clear retention

AnthropicOpenAIGoogle (paid)Bailian (Alibaba)

Use with care

Manual opt-out needed / separate agreements

DeepSeekDoubaoQianfan (Baidu)MiniMax

Not for sensitive data

Training policy unclear

ZhipuKimiGoogle (free)

Legal Landscape

China

Interim Measures for Generative AI Services (Aug 2023)

Training with personal data requires user consent
Must not illegally retain identifiable user input
Authorities can demand training data disclosures

Personal Information Protection Law (PIPL)

Art. 13: Processing requires individual consent
Art. 14: Consent must be fully informed, voluntary, explicit
Art. 15: Right to withdraw; bundled consent prohibited

2024 assessment by Southern Metropolis Daily: most domestic platforms lack adequate notice and convenient opt-out mechanisms.

European Union

GDPR (General Data Protection Regulation)

Art. 5(1)(c): Data minimization — process only what's necessary
Art. 5(1)(b): Purpose limitation — no repurposing without consent
Art. 22: Right to explanation of AI decisions

EU AI Act

GPAI model rules effective August 2025
Most provisions fully effective August 2026
Requires public training data summaries (6 categories)

Cumulative GDPR fines reached €5.88 billion by 2024. Once data enters model weights, deletion is practically unenforceable.

United States

Federal Level

No comprehensive federal AI legislation
Current administration favors deregulation

State-Level Regulation

California AB 2013: AI Training Data Transparency Act
California SB 53: Frontier AI Transparency, up to $1M fines
Colorado: first comprehensive state AI law (effective Jun 2026)
Over 30 states pursuing AI-related legislation

Choose by Scenario

High-sensitivity (finance/gov)

→

Self-host open-source models (DeepSeek / Qwen)

Enterprise / production

→

Anthropic / OpenAI paid API

Personal dev / learning

→

Any API (ensure training toggle is off)

Public information

→

Any provider

Security Tips

Prefer API over consumer products — API data policies are generally stricter than ChatGPT, Ernie Bot app, etc.
For China-based providers, check settings — confirm "data for model improvement" toggles are disabled
Never paste secrets directly in conversations — even with no-training promises, data still transits their servers
Review provider policies periodically — privacy policies can change, review every 6 months
Consider local models for sensitive environments — data never leaves your machine, the safest option

Based on publicly available provider policy documents as of March 2026. For reference only, not legal advice. Policies may change — verify via original links before use.