Mark Mercado
Systems & Site Reliability Engineer
mamercad@gmail.com |
+1.810.620.3977 |
github |
twitter |
blog
Profile
I am an experienced system engineer that’s been working in technology for over 20 years.
I love designing greenfield environments, but don’t mind unwinding and refactoring brownfield ones.
I approach the management of systems from the perspective of a developer.
I thoroughly enjoy implementing automation and monitoring.
I have plenty of real-world experience in datacenter and cloud operations, virtualization, operating systems, databases, development and system integration.
I learn quickly and have very strong analytical problem-solving skills.
I can work on a team but am most productive when given tough problems to solve and the creative freedom to implement them.
I firmly believe in open-source software and giving back.
I am always looking out for creative challenges and complicated systems problems to tackle.
As it pertains to education, I plan on pursuing a PhD when my youngest child enters school (a few years out).
I have set up and maintained, typically from scratch, numerous mission critical multi-tier systems.
In most of my positions, I typically maintain a large role, generally working from the metal all the way to application.
As of late, I spend most of my working towards “infrastructure from code” and automation in general.
I want to continue evolving as a systems engineer and a developer.
I am comfortable and experienced in developing the full stack.
In order to be happy, I need an unrestricted Macbook, the freedom to use (and to contribute back) to the open source community, and the ability to work remotely from time to time.
Skills
- Systems Design
- Designing and building systems is quite simply, what I do best.
I’ve built so many systems over the years.
I’m comfortable with traditional systems (physical and virtual) as well as modern ones (containers and orchestration).
- Systems Monitoring
- When building systems, I build in monitoring as a fundamental feature, not an afterthought.
I thoroughly enjoy implementing meaningful and actionable monitoring.
I’m comfortable with traditional monitoring (Nagios, Icinga and Zabbix) as well as modern implementations (Sensu, Prometheus and Grafana).
- Systems Automation
- I firmly believe in building systems with the use of automation tooling frameworks, not manual steps.
I’m comfortable with traditional configuration management (Chef, Puppet, Salt, Ansible) and modern container-based solutions (Terraform, Docker, Kubernetes and Helm).
Certifications
- Going through the AWS learning paths (in progress)
- Microsoft Certified Azure Administrator Associate (earned)
- Sun Certified Systems Administrator for Solaris 10 Operating System (earned)
Publications
- Cybersecurity in Banking and Financial Sector: Security Analysis of a Mobile Banking Application
(2013)
- Panja, B., Fattaleh, D., Mercado, M., Robinson, A., Meharia, P.
- Published in 2013 International Conference on Collaboration Technologies and Systems (CTS) (thanks Dr Mani)
Technical
- Highly proficient at all levels with UNIX-like operating systems
- The major flavors of Linux (Redhat/CentOS/Oracle, Ubuntu/Debian)
- Any of the BSDs (primarily FreeBSD and OpenBSD)
- SunOS/Solaris (10 and 11), though I would not consider these anymore
- Highly proficient in traditional datacenter administration (racking, stacking, cabling)
- Highly proficient in the administration of virtualization environments (VMware, Qemu/KVM and Xen)
- Highly proficient in network monitoring (Zabbix, Nagios, Icinga, Sensu, Prometheus, Grafana)
- Highly proficient in configuration management (Salt, Ansible, Chef, Puppet and Spacewalk)
- Comfortable working with recent versions of Microsoft Windows (2003, 2008, 2012, 2016 7, 8, 10)
- Though, I would never choose Windows for a production deployment
- Well-versed in network administration and security
- Network security scanning and auditing, packet filtering, implementing local security policies
- Good understanding of networking (switching, routing, and application layers)
- Highly proficient in the installation, configuration, and maintenance (patching) of server software such as
- OpenSSH, Apache HTTPD and Tomcat, Sendmail, ISC BIND and DHCPD, HAProxy, Nginx
- Comfortable in the adminsitration of MySQL and PostgreSQL
- Well-versed in the C, C++, Perl, and PHP programming languages; written numerous programs in all listed
- Experienced in Python (2.x and 3.x) and Ruby; written numerous programs in all listed
- In the process of learning the Go and Rust languages (have only currently modified existing programs)
- Experienced in the use of sed, awk, and Bourne-style shells
- Comfortable with Java, JavaScript, SQL, Oracle PL/SQL, HTML, and Intel assembly language
- Well-versed in source control, primarily CVS, Subversion and of course, Git
- Well-versed in integration, in general (APIs), enjoy tying systems together and building pipelines
- Well-versed in SSO, responsible for Cosign and Shibboleth (SAML) in the past
- Experienced in Splunk, OSSEC, Elastisearch, Logstash, Kibana
- Experienced in OpenStack, Docker and Kubernetes (what I’m currently focused on)
Experience
Barracuda Networks (Ann Arbor, MI; Jan 2018 - Sep 2020)
Lead Site Reliability Engineer, Cloud/Infrastructure Operations
Responsible for the overall infrastructure operation of Barracuda’s hybrid cloud (multiple on-premise datacenter as well as a sizable public cloud footprint).
I work in the System Engineering team and our focus is on platform infrastructure development.
Each of us on the team is on-call 24/7 for one week, every few months.
As is typical, we live in two worlds, the old an the new. We spend our time maintaining/migrating legacy systems into the new ones which we build.
Our infrastructure consists of, predominantly, thousands of Linux machines and large amount of containers.
In the last year, I’ve produced more than 4,300 commits to our codebase and resolved more than 400 bugs.
- Implemented Stratoscale (AWS-like private cloud) for production
- Implemented Ubuntu MAAS for bare-metal provisioning
- Implemented the Foreman for datacenter management
- Physical and virtual provisioning (supporting PXE and HTTP installations)
- Simplifies DHCP and DNS management (DHCP is done, DNS is on the table)
- Implemented an OpenStack PoC deployment consisting of the following sub-projects:
- Horizon, Neutron, Magnum, Keystone, Cinder, Glance, Nova, Zun
- Image building Jenkins pipelines leveraging Packer (and Vagrant for testing)
- Cloud infrastructures leveraging Terraform (fronted by Terragrunt for DRY)
- Administering and maintaining the Atlassian suite (Confluence, Jira and Bitbucket)
- Administering and maintaining Device42 (datacenter inventory management)
- Working on the modernization of numerous legacy systems (cutting down technical debt):
- Operating systems conversions, automating manual deployments, creating systems integrations
- Completely all-in for “infrastructure as code”
- Configuration management, orchestration, testing/QA, deployments
- Just about all Linux and Puppet
- All configuration is stored in Git
- All orchestration goes through Jenkins
- Working with numerous teams to make sure that our cloud meets their requirements
- Manufacturing, platform, testing/QA, engineering, product, cross-functional
- Created custom Prometheus Exporters (Docker-based deployments) and Grafana dashboards
- Built and extended numerous infrastructure profiles, such as:
- Foreman, Qemu/KVM (libvirt), Docker, LibreNMS
- Deprecated legacy software mirrors in favor of Artifactory (virtual lift-and-shift)
- Created numerous packages (typically by way of Jenkins pipelines and fpm)
- Wrote numerous runbooks for the Operations Engineering team (first line support)
- Created our cloud images and our Vagrant boxes (both built with Packer)
Projects:
- Ansible AWX
- Wrote the initial Terraform/grunt deployment in AWS (Docker-based)
- Wrote the final production version (Kubernetes-based)
- Built custom AWX images (baked in our Python virtual environment)
- Cloud DNS
- DNS as code (by way of OctoDNS)
- PowerDNS as the authoritative source
- Consul as the recursive DNS servers
- Jenkins handles the automation
- Device42
- Regular maintenance and upgrades
- Configured for warm-failover
- Recently moved this to AWS (wrote Terraform/grunt for this deployment)
- Sitespeed.io
- Multi-site deployment
- Performs automated web performance benchmarking
- Docker-based
- Speedtest
- Multi-site deployment
- Endpoint for client network bandwidth and latency benchmarking
- Docker-based
- Central Rsyslog
- Tuned for high performance
- Packaged Prometheus exporter for metrics
- Akamai EAA
- Multi-site deployment (on-premise and public cloud)
- Wrote the Terraform to deploy proxies in AWS (CloudFormation templates)
- The Foreman
- Multi-site deployment
- Proxies in each on premise datacenter (doing DHCP with failover)
- Dynamic DNS by way of PowerDNS’s API
- Stratoscale
- Multi-site deployment (three on-premise clusters)
- Reverse-engineered the bare-metal installation procedure (for Foreman)
- Artifactory
- Implementing the infrastructure (AWS EKS) with Terragrunt and Terraform
- Implementing the application in Helm, Flux and Istio
- Image Pipeline
- Jenkins pipeline which builds our cloud image, notifications go to Slack
- Uses Packer to Kickstart an ISO, followed by shell and Puppet provisioning
- Last step is an automatic disk image upload and import to AWS (AMI)
- OpenStack
- Implemented MAAS to initially bare-metal provision the nodes (from the BMC)
- Small PoC environment consisting of Horizon, Neutron, Magnum, Keystone, Cinder, Glance
- Nova and Zun used to benchmark our key applications and test as a Terraform endpoint
- Atlassian Upgrade
- Hardware upgrade, software lift-and-shift
- Added Consul and Prometheus functionality along the way (node and Apache exporters)
- Device42 Upgrade
- Software lift-and-shift (three major versions, 13.x to 15.x to 16.x)
- Moved from on-prem to AWS (multi-region)
- Implemented warm-standby to eliminate single-instance SPoF (automatic daily refresh)
- Consul Project
- Deployed Consul clusters to all of our DCs
- Using Consul for service discovery
- Implemented “DNS forwarding” (serving up the “consul” TLD)
- Turning all Consul servers into DNS slaves (they’re acting as site name servers)
- SumoLogic Project
- Wrote the Puppet class which leverages the upstream Docker image
- Integrated with Sensu for health checks and Consul for service discovery
- Easy enough for our customers to extend the functionality (additional collectors)
- Configured all of our clients to send syslog to SumoLogic
- Puppet in AWS (ASG)
- Wrote the Terraform/Terragrunt manifest
- Handles remote state by way of S3 and DynamoDB
- The ASG is fronted by ELB (uses crypto from ACM)
- Least-privilege IAM policies are used
- The Puppet servers leverage EFS for shared storage
- Simple CloudWatch policies control the ASG (currently CPU utilization)
- VPCs and SGs are used for network isolation
- The Puppet servers used cloud-init for last-minute configuration
- Air-gapped Root CA
- Created a process to automate the creation of our root CA (all but crypto)
- Leverages Vagrant, Packer and Linux Live Kit (“vagrant up” to USB live distro)
TotalCAE (Ann Arbor, MI; Nov 2016 - Jan 2018)
Senior HPC Consultant
Embedded as a private sub-contractor in one of Detroit’s “big three” automakers working in the HPC (high-performance computing) group.
Responsible for administering a fairly large HPC cluster (more than 1200 nodes) used in vehicular testing and simulation.
I was part of a really small team (three members), and we were on-call every three weeks (24/7 for a week).
The environment was highly driven by SLAs.
We had to work with both the internal and the external (other vendors/contractors) teams, which was challenging.
- Administering a > 1,200 node HPC supercomputing cluster
- Operating system was RHEL 6.x frontend (utility) and backend (compute)
- Administered Seagate CS9000 for the Lustre parallel filesystem (~ 3/4 PB)
- Found and fixed numerous bugs which were submitted back
- Ethernet and Infiniband interconnect (Mellanox UFM)
- HPE Cluster Management Utility for imaging and bare-metal deployment
- Implemented configuration management (Ansible)
- We used this in addition to imaging (making small changes was faster this way)
- Complete various in-house integrations (monitoring, reporting, ticketing, etc)
- Administer the job scheduler (Altair PBS Pro)
- Assisted engineering (users) with backend job problems (troubleshooting)
- Implemented numerous custom dashboards (Collectd, Monit, InfluxDB, Grafana)
- Implemented new monitoring (Zabbix) and maintained legacy (Xymon)
- Implemented centralized logging and analytics (Elastic Stack)
- Implemented Robin Hood for Lustre (scratch area) cleanup
- This was far superior than the typical “cron/find/rm” strategy
- Numerous custom scripts developed and maintained to support operation
University of Michigan-Flint (Flint, MI; Feb 2014 - Mar 2017)
Systems Administration (UNIX Systems Administrator, team lead, full-time staff)
Responsibilities:
- Administer the VMware ESX virtualization infrastructure
- Administer the Dell Compellent/EqualLogic storage arrays
- Administer the Oracle Virtual Networking fabric (Infiniband
- Administer the Banner environment (ERP)
- Applications: Oracle Fusion Middleware, Oracle Forms, Oracle Database
- Operating systems: Oracle Linux 5.x and 6.x, Solaris 10 and 11
- Administer the Blackboard environment (LMS)
- Applications: HAProxy, Keepalived, Tomcat, Apache, NFS, Oracle Database
- Operating Systems: Oracle Linux 6.x
- Implement and administer the WeBWorK environment (math-centric LMS):
- Applications: WeBWorK, Apache + mod perl, MySQL
- Operating systems: Oracle Linux 6.x
- Custom: Wrote the snapshot process (Blackboard to WeBWorK)
- Implement and administer the campus Active Directory and SSO environments:
- Applications: Shibboleth (v3) and Cosign (configured for HA with Keepalived and Memcache)
- Operating systems: Oracle Linux 6.x
- Architect and administer the campus disaster recovery environment in Amazon AWS
- EC2, S3, RDS, IAM, Route53, Workspaces, VTL
- Implemented Bind in EC2 (off-campus DNS, slaves to on-prem)
- Administer the campus Backup products: ARCserve and Barracuda
- Administer the operating systems for the database group (Ellucian Banner and DegreeWorks)
- Implement and administer the following IT supporting resources:
- Gitlab (source control), Jenkins (CI/CD), Zabbix (monitoring), Redmine (portal, wiki), Oracle Enterprise Manager, PWM (password self-service tied to AD), Shell (generic shell server tied to AD), Slack (messaging, numerous integrations)
- Determine equipment and software lifecycles and budget forecasting
- Work closely on projects with other functional units on campus:
- Registrar, Financial Aid, Cashier, Academic Advising, Reporting
Projects:
- Duo
- U-M initiative; switched Weblogin from RSA to Duo
- Flint runs own instance of Cosign
- Deployed new Cosign servers to facilitate the migration
- Process almost entirely automated with Ansible
- Manual DNS cutover; little to no interruption in service
- Zabbix 3.x
- Currently running 2.x in production
- Redesigning for 3.x, distributed architecture
- Active/passive frontend tier (Apache httpd) with Heartbeat
- Active/passive server tier (Zabbix server) with Heartbeat
- Active/active database tier (MariaDB cluster)
- Single-instance self-contained proxy tier (Zabbix proxy and MariaDB)
- Shibboleth
- Implemented Shibboleth IdP 3.x (supporting SAML and CAS)
- Authentication comes from Cosign (we support all “three” SSOs)
- Deploying a handful of custom (Flint) attributes
- Rundeck
- Arbitrary job runner (web interface, execution history, etc)
- Currently running shell scripts and Ansible playbooks
- Part of a greater deployment pipeline involving
- Gitlab (code repository), Gitlab CI (builds and integration testing), Slack (result output)
- Ansible
- Dynamic inventory, based on our single-point-of-truth (Device42)
- Can be run from Rundeck or the CLI, results go to Slack
- Created a system baseline for Linux, installs/configures:
- Spacewalk client, Authentication (Winbind, SSSD), Barracuda backup agent, Collectd (for visualization with Grafana), Duo client, IPtables, Latest packages (dev, test, production, etc), Ksplice (kernel- and user-land online patching), Logwatch, eaningful /etc/motd (contact, service level, etc), Networking (hostname, adapter, etc), NTPd client, OSSEC agent, Splunk agent, Standard packages, Sendmail (root@ is meaningful and off-box, etc), SSHd (configures banner message), Zabbix agent
- Proofpoint to Barracuda SPAM migration
- Created a Perl script to migrate white/blocklists
- Leverages the Barracuda API (XMLRPC)
University of Michigan-Flint CSEP (Flint, MI; Sep 2012 – Dec 2016)
Lecturer (Adjunt Faculty)
- Taught the following courses, timing varies:
- CSC 122 Intro to Programming (Python 3.x)
- CSC 175 Prob Solving & Programming I (C++)
- CSC 275 Prob Solving & Programming II (C++)
- CSC 377 Operating Systems (Theory)
University of Michigan-Flint ITS (Flint, MI; Aug 2008 – Feb 2014)
Data and Information Management (UNIX Systems Administrator, full-time staff)
- Administer numerous UNIX servers running the Solaris operating system
- Administer Sun StorageTek fiber channel storage arrays
- Administer the campus ERP (Ellucian Banner)
- Administer the campus E-commerce software platform (Touchnet)
- Assist in the administration of Oracle databases
- Assist in the administration of Oracle Application Server and Weblogic
- Wrote numerous custom programs in Oracle PL/SQL
- Administer the source control repository (CVS)
- Wrote a custom frontend in PHP to support CVS (replaced GForge)
- Administer the network monitor (Zabbix)
- Extend Zabbix to support in-house VDI (dashboarding, utilization)
- Implement and administer the campus reporting system (Webfocus)
- Determine equipment and software lifecycles and budget forecasting
- Work closely with other functional units on campus:
- Registrar, Financial Aid, Cashier, Academic Advising, Reporting
University of Michigan-Flint ITS (Flint, MI; Jun 2006 – Jul 2008)
Data and Information Management (Business System Analyst, full-time staff)
- Developed and maintain numerous custom software (primarily PL/SQL)
- Developed numerous Oracle SQL queries, PL/SQL programs, Bash and Perl scripts
- Solely responsible for creating a web application for the Office of Financial Aid
- Enables complete administration of the scholarship application process
- Created filter program which converted ASCII transcripts into PDF
University of Michigan-Flint ITS (Flint, MI; Sep 2005 – May 2006)
Data and Information Management (Student Worker)
- Wrote custom tools to handle the backup of the Solaris servers
- Wrote XML-RPC server in Perl which runs on the Windows platform and accepts remote requests for filesystem level access (creation of home directories, permission assignment); this daemon allows remote access to a Windows machine from any other (potentially incompatible) networked environment, such as UNIX
- Wrote custom Perl scripts for verifying the consistency of the campus Active Directory database and its accompanying user environment (account status, home directories, permissions, etc)
- Wrote custom Perl scripts for in-house use in determining scholarship eligibility; involves database connectivity and heavy use of regular expressions
University of Michigan-Flint ITS (Flint, MI; Jan 2005 – Aug 2005)
Helpdesk (Student Worker)
- Answer phone calls and troubleshoot various computing problems for the entire campus
- Answer email on the HelpDesk technical support mailing list
- Create and maintin numerous QuickNote guides
Orange351 (St Louis, MO; 1999 - 2003)
(Founding Partner) Systems Administrator / Web Developer
- Implemented numerous client and server-side programs in a variety of programming languages, including PHP, Perl, JavaScript
- Wrote a modular web site search engine
- Created web pages using PHP, HTML, CSS, and JavaScript
- Designed and implemented a real-time web-based image manipulation suite, primarily in Perl
- Created a web-based client support area from scratch in PHP, which facilitated secure communications between the parties
- Designed a web-based billing application using PHP and a MySQL database which the company used for internal bookkeeping
- Designed authentication mechanism which allowed clients to view only pertinent content
Electrografix New Media (St Louis, MO; 1996 - 1999)
(Founding Partner) Systems Administrator / Computer Programmer
- Architected, configured, and administered the local area network for this software company:
- Consisted of approximately 20 workstations (Windows, Linux, and FreeBSD) and 10 servers (Windows NT, FreeBSD, Linux, and Sun Solaris)
- Installed and configured the Cisco router
- Configured and designed the firewall, which utilized FreeBSDs packet filtering capability
- Performed automated backups of all servers using custom shell scripts (mainly cron and tar)
- Administered all of the servers, which included tasks such as operating system installation and configuration, patching, server software installation and upgrading
- Designed a PHP application which allowed the non-technical staff members to use the company database without having to learn SQL
- Designed a custom publishing system for the online magazine which the company produced
Education
- Master of Science in Computer Science
- University of Michigan-Flint (Flint, MI; 2009 - 2012)
- Cumulative GPA 8.2 (of 9)
- Bachelor of Mathematics in Computer Science (Honors)
- University of Michigan-Flint (Flint, MI; 2003 - 2008)
- Cumulative GPA 3.69 (of 4); Dean’s list numerous semesters
- Bachelor of Science in Computer Science (Honors)
- University of Michigan-Flint (Flint, MI; 2003 - 2008)
- Cumulative GPA 3.69 (of 4); Dean’s list numerous semesters
- University of Missouri-Saint Louis (St Louis, MO; 1996 - 1997)
- Undecided major; took general education courses
- St Charles County Community College (St Charles, MO; 1994 - 1996)
- Undecided major; took general education courses
- Francis Howell North High School (St Charles, MO; 1990 - 1994)
- Completed high school; cumulative GPA of 3.97 (of 4) Top 5% of graduating class
Relevant Courses
Beginning, Intermediate, and Advanced C++; Digital Logic; Assembly Language; Java Programming; Computer Networking I & II; Computer Architecture; Theory of Computation; Software Engineering I; UNIX System Administration; Perl Programming; Advanced Operating Systems
Activities
- Previously held the position of Vice-Chair of the University chapter of the Association of (ACM). Participated in the ACM Regional Programming Competitions in 2004 and 2005. Helped organize and work the Annual High School Programming Competition hosted by our chapter in 2003, 2004, and 2005. Volunteer System Administrator of the club UNIX server.
- Assisted Dr. Turner (Computer Science Professor) with a research paper by writing numerous Perl scripts which simulated proprietary network routing protocols. Demonstrated some of these results at Meeting of the Minds 2006.
- Previous member of the Student Union of Mathematics (SUM). Participated in the Lower Michigan Mathematics Competition in 2005.
- Inactive member of the Chess Club. Held the highest rating before becoming inactive due to lack of time. The club has been abandoned.
References
- Please let me know before contacting them, so that I may provide a courtesy reminder:
- Doug Warner, supervisor, dwarner@barracuda.com
- Travis Newby, supervisor, tnewby@barracuda.com
- Ryan Struber, supervisor, rstruber@barracuda.com
- Ray Chandler, supervisor, rchandler@barracuda.com
- Andrew Baker, coworker, abaker@barracuda.com
- Jeremy Fugate, coworker, jfugate@barracuda.com
- Harvey Sherman, supervisor, harveys@umich.edu
- Dr Steven Turner, professor, swturner@umich.edu
- Adam Robinson, coworker, adarobin@umich.edu
- Phil Erlenbeck, coworker, perlenbe@umich.edu
- Rod Mach, supervisor, rod@totalcae.com
- Wayne Nichols, coworker, wayne@totalcae.com
Mark Mercado — mamercad@gmail.com — +1-810-223-3658