Ethernet Switches Zoo Challenge

TLDR: Manage a diverse collection of Ethernet switches in a city-wide ISP network. Perl, MongoDB, Mapbox

This challenge reflects a real-world scenario from my experience working at an Internet Service Provider, where managing a diverse collection of Ethernet switches became a significant operational challenge.

Problem Description

In a city-wide ISP network, we faced the challenge of managing a "zoo" of Ethernet switches - different manufacturers, models, firmware versions, and capabilities. This diversity created significant operational complexity in terms of configuration, monitoring, and maintenance.

Key Challenges

  • Inconsistent configuration interfaces across different manufacturers
  • Varying firmware update processes and compatibility issues
  • Different monitoring capabilities and SNMP implementations
  • Inconsistent CLI commands and syntax across devices
  • Managing security patches and vulnerabilities across diverse platforms
  • Training network engineers on multiple device types
  • Inventory management and spare parts for different models

Solution Approach

The solution involved developing a unified management system that could:

  • Abstract device-specific configurations into a common format
  • Automate firmware updates and configuration changes
  • Provide a single dashboard for monitoring all devices
  • Implement standardized security policies across all devices
  • Generate device-specific configurations from templates

Perl-Based Log Management

One of the key solutions we implemented was a Perl-based system for log collection and structuring:

  • Developed custom Perl scripts to collect logs from various switch types
  • Created parsers for different log formats and structures
  • Implemented a unified logging format for all devices
  • Built a centralized log storage and analysis system
  • Used regular expressions to extract meaningful data from diverse log formats
  • Automated log rotation and archival processes

Benefits of Perl Solution

  • Powerful text processing capabilities with regular expressions
  • Excellent handling of different character encodings
  • Flexible parsing of various log formats
  • Easy integration with existing Unix/Linux tools
  • Efficient processing of large log files
  • Ability to handle real-time log streaming

Geographic Visualization with Mapbox

To provide a better understanding of the network topology and device locations, we implemented:

  • Integration with Mapbox for interactive geographic visualization
  • Real-time status display of switches on the map
  • Color-coded indicators for device status (online, offline, warning)
  • Clickable markers with detailed device information
  • Filtering options by device type, status, or region
  • Historical status tracking and visualization

Frontend Polling Implementation

  • Implemented efficient polling mechanism for real-time status updates
  • Optimized polling intervals based on device criticality
  • Batch status requests to minimize server load
  • Caching mechanisms to reduce redundant data transfer
  • WebSocket fallback for critical status changes
  • Progressive loading of map data for better performance

Evolution of Data Storage

Our data storage solution evolved significantly over time:

Initial Text-Based Approach

  • Started with simple text configuration files in a Git repository
  • Used Git for version control and change history
  • Organized files by device type and location
  • Simple but effective for small-scale deployment
  • Easy to backup and restore
  • Good for tracking configuration changes

Migration to MongoDB

  • Scaled to MongoDB for better performance and flexibility
  • Implemented comprehensive logging of hardware and network health
  • Created time-series data collections for historical analysis
  • Built aggregation pipelines for complex queries
  • Implemented data retention policies
  • Added support for real-time analytics

Benefits of MongoDB Migration

  • Improved query performance for large datasets
  • Better support for complex data relationships
  • Enhanced scalability for growing network
  • More sophisticated analytics capabilities
  • Better handling of concurrent access
  • Simplified backup and recovery procedures