I'm a Head of IT at a financial company in Switzerland. Multi-site, Cisco switches, Palo Alto firewalls, data center, critical trading systems, telephony, the daily reality of an on-prem infrastructure in a regulated environment.
Since early 2026, I've been using Claude AI (Anthropic) as a co-pilot to manage this infrastructure. Not a chatbot I ask questions to. An agent that connects via SSH to my equipment, reads configs, audits, documents.
Here's what I've learned in 3 months. The results, the limits, and why I think every IT manager should be looking into this now.
The context: a small team facing an infrastructure that never sleeps
If you manage IT infrastructure with a small team, you know the reality:
- Dozens of network devices spread across multiple sites
- Firewalls, VPNs, storage, virtualization
- Critical systems that tolerate zero downtime
- And a task list that grows faster than you can handle it
That's where AI entered my operations.
The trigger: "What if AI could read my configs?"
I started using Claude Code (Anthropic's CLI) to document code. Then I realized the agent could connect via SSH, read running configs, and analyze them.
I created a dedicated read-only user on all my equipment:
- Cisco switches (NX-OS, IOS, IOS-XE):
showcommands only - Palo Alto firewalls (Panorama):
superreaderrole - Storage, servers: read-only CLI access
No write access. No configure terminal. No modification possible. Read-only, audit-only.
And then everything changed.
Result #1: Full switch fleet audit, dozens of findings in one day
The first real test was a security audit across all my switches.
The AI connected via SSH to each device, pulled the running config, and analyzed it against a standardized checklist:
- Password encryption
- ACLs on management lines
- Insecure protocols active (HTTP server, Telnet, CDP)
- Spanning Tree protection (BPDU Guard, Root Guard)
- Unused ports left open
- NTP, logging, login banners
- Firmware and known CVEs
The critical findings were things I knew somewhere in my head, but had never formalized:
- A switch with HTTP server still active
- Management lines without ACL on a remote site
- Firmware with known unpatched CVEs
The report forced me to deal with them. To document is to make visible.
Result #2: Complete infrastructure documentation, from zero to a structured repo
Before AI, my documentation was scattered: notes here, emails there, manually saved configs, diagrams in my head.
In a few months, I built a single Git repo with structured documentation generated from actual configs:
- Complete inventory: every switch, every port, every VLAN
- Network topology: WAN diagrams in ASCII art
- Per-device analysis: detailed documentation of each config (core, distribution, edge, DMZ)
- Security: VPN tunnels inventoried, firewall rules documented, access policies
- Per site: each remote office has its own section
Each file was generated by the AI after reading the production config directly, then validated by me. This isn't theoretical documentation, it's the exact reflection of what's running in prod.
Result #3: Storage audit, findings divided by 3
My storage system (SAN array, HA, iSCSI + CIFS) had never been deeply audited since deployment.
The AI analyzed the complete configuration and produced a report with dozens of findings:
- Incomplete high availability configuration
- SNMP alerts not routed
- Volumes without snapshot policy
- Service accounts with excessive privileges
After fixing the critical points, a second audit divided the number of findings by 3. All remaining: informational or low priority.
Without AI, this audit would probably never have been done. Not from negligence, from lack of time.
Result #4: Migrating monitoring to an open source stack
Monitoring is where AI had the most structural impact.
I already had a complete monitoring stack, but proprietary. The problem with proprietary solutions is that AI can't really work with them: closed APIs, limited documentation, opaque formats. The agent hits walls on interfaces it can neither read nor automate.
This worked out well. I've been an open source expert for years, and I was looking for a reason to migrate. AI gave me the accelerator I was missing.
In a few weeks, I deployed a 100% open source stack with the help of AI:
- Prometheus: metrics collection (ICMP, SNMP switches/firewalls/storage)
- Grafana: network, security, storage, compute dashboards
- Loki: centralized log aggregation
- Wazuh SIEM: intrusion detection and compliance
- Alerts: latency, disks, CPU, VPN tunnels down, expiring certificates
Each component was installed, configured, tested, documented. Every new machine added to the infrastructure now follows a mandatory checklist: hardening, fail2ban, SIEM agent, log collection, metrics, dashboard.
An unmonitored machine is an invisible machine. AI helped me systematize this principle, and open source gave it the means to do so.
What AI can NOT do
After 3 months of intensive use, here are the real limits. And they matter.
It doesn't make business decisions
AI can tell me a firmware has CVEs. It can't decide whether the risk of updating in production (downtime on critical systems) is acceptable on a Tuesday at 2pm. That's my job.
It doesn't replace experience
When it analyzes a config and flags a "problem," you need human context to know whether it's a real problem or a deliberate architectural choice. Disabling certain advanced security features can be a deliberate decision, not an oversight. AI can't know that unless you tell it.
It doesn't handle human relationships
Negotiating with a vendor, convincing management to invest in infrastructure, managing a team under pressure, that remains 100% human.
It can be wrong
It can misinterpret a config, confuse a legacy artifact with an active problem, or suggest a command that doesn't work on a specific OS version. Human review is non-negotiable.
It needs guardrails
I configured read-only access everywhere. No writing, no modification. If AI could execute configuration commands on a production switch, the risk would be unacceptable. Trust is built incrementally.
The real change: from reactive to proactive
The biggest impact of AI on my operations isn't speed. It's the shift in posture.
Before, I spent most of my time in reactive mode: incidents, requests, emergencies. Documentation, audits, optimization, all of that was perpetually "for later."
Now, with an AI agent that can read, analyze and document in parallel with my daily work, I can finally do that foundational work:
- Regular audits instead of "when I have time" audits (never)
- Up-to-date documentation instead of scattered notes
- Proactive monitoring instead of discovering problems when things break
- Informed decisions instead of undocumented intuitions
For a small IT team managing multi-site infrastructure in a demanding environment, it's a force multiplier.
How to get started (practical guide)
If you're an IT manager, sysadmin, or infrastructure engineer and want to explore this approach, here's the method I recommend:
1. Start with read-only
Create a dedicated user with minimal rights on your equipment. show commands only on switches. Read-only role on your firewalls. SNMP read-only. Build trust before expanding access.
2. Start with documentation
This is the safest and most immediately useful use case. Have your running configs analyzed and generate structured documentation. You'll discover things you had forgotten.
3. Move to audits
Once documentation is in place, use AI to systematically audit: security, compliance, vendor best practices. Formalize what you know intuitively.
4. Deploy monitoring
Use AI to deploy and configure a monitoring stack. Prometheus, Grafana, Wazuh, the tools are open source and mature. AI greatly accelerates the configuration and alert tuning phase.
5. Stay in control
AI is a tool, not an autonomous colleague. Review everything. Validate everything. Version everything in Git. If you don't understand what AI proposes, don't apply it.
The tools I use
For those who want to reproduce this approach:
- Claude Code (Anthropic): the CLI that lets AI connect via SSH, read files, execute commands. It's the core of the setup.
- Git (Gitea self-hosted): everything is versioned. Documentation, redacted configs, audit reports.
- Prometheus + Grafana + Loki: open source monitoring, deployed on Linux containers.
- Wazuh: open source SIEM for detection and compliance.
Everything runs on self-hosted infrastructure. No data in the cloud, no SaaS dependency for critical operations. That's a choice.
Conclusion
In 3 months, AI allowed me to:
- Audit my entire network fleet and discover invisible critical findings
- Document the entire infrastructure from actual configs, in a structured Git repo
- Deploy a complete monitoring stack (metrics, logs, SIEM, alerts)
- Audit my storage and divide findings by 3
- Systematize security with mandatory checklists for every new machine
All while continuing to manage the daily operations of a demanding multi-site infrastructure.
If you're waiting for AI to be perfect before starting, you'll wait too long. Start small, read-only, on a concrete use case. The results speak for themselves.