I sat at my desk with my head in my hands.

My task for the day was to uninstall agents of a popular configuration management tool from over a hundred virtual machines located in 17 data centers around the world — quickly, without stopping any other applications. I tried to figure out the easiest, most efficient way to automate this, my evening plans dissolving into a mist of frustration.

There was more bad news. These VMs were provisioned by many different admins over a long time, and many of the agents had stopped communicating with the central configuration management server. I had no idea where in each VM the agent was installed, or if it was installed with Rubygems or yum or rpm. The only access I had to the VMs was through ssh, with a single master private key that would let me in.

When an agent dies in a datacenter, does it send a ping?

I began writing the shell script. Try yum remove first. If that doesn’t work try rpm -qa. If that doesn’t work, locate the agent and delete the file and folder. Oh, and don’t forget to remove the agent user while you’re at it. But wait, what if they installed it with Rubygems?

I felt like throwing up.

That is when I met Ansible. Here was a YAML-driven automation tool that used pure SSH and no agents! I dove in feet first.

Now Ansible had a different way of doing things. Instead of issuing commands, you describe what end result you want to see, and Ansible will…

So, instead of saying yum update , you would say yum: pkg=* state=latest. You describe the desired state of the target environment, and Ansible does all it can to bring it up to the desired state by using its modules.

That’s not how it began for me though.

Commands or Bust

To my shell-script addled brain, configuring a server needed a set of commands. Whether you type them in, or run them in a bash script, they were still a sequence of directions the OS had to follow. If the script is accidentally run twice on the same host, and you haven’t checked for that one last edge case, Murphy’s law could rub your face into the dirt.

In my rush to Ansibilize my half-complete shell script, I began writing my first playbook like this (we’ll call our agent agent-daemon here forward):

---
   - hosts: all
     
     tasks:
        
        - name: If yum found it, get rid of it with yum
          shell: yum -y remove agent-daemon
          register: cleanedupwithyum

        - name: Do the rpm thang
          shell: rpm -qa agent-daemon
          when: cleanedupwithyum|failed

…and so on.

After a few more lines of this I realized something: I was writing a shell script with Ansible! I wasn’t taking advantage of its modules, or even letting it do its most important job.

Besides running agentlessly through SSH, Ansible’s biggest benefit is something called idempotence. It makes only those changes to the environment that are necessary to bring the environment to compliance with the description in the playbook. So we can boil the first two tasks down to this:

yum: pkg=agent-daemon state=absent

In the snippet above I’m calling Ansible’s yum module and asking it to make sure that agent-daemon is absent from the system. I let Ansible figure out the commands it needs to run to make that happen. It just works. If agent-daemon doesn’t exist on the server, then Ansible won’t try to uninstall it.

There’s a Module for That

Fortunately for me I landed upon Chris Fidao’s excellent Ansible tutorial at Servers for Hackers. I began thinking in terms of describing the ideal system instead of the commands I’d have to use to make it ideal. The clouds opened up, the angels sang, and my evening was rescued.

Nope. Not yet.

Here’s the sad truth. As a relentless optimizer I refactor till blue in the face, and then optimize some more. In mere moments I began writing my first custom module in Python. My wife called, and I caught myself. What am I doing? Surely someone else has faced this situation before. Then a Google search brought up the ansible module index. Ansible comes with over 450 modules to address nearly every need and environment. My task grew easier as I found and called the right modules in my playbook instead of hacking shell commands. In about an hour the final playbook was ready. In another hour all agents were gone.

From then on, I’ve used Ansible to provision, deploy, orchestrate and secure everything from mission-critical multiprovider infrastructures to cloud-based disposable development environments. It’s easy to grasp, quick to test, and doesn’t need elaborate client setup to run. Ansible plays well with other CM and CI/CD systems too. If you add Ansible Tower you get a web GUI and a REST API, together with role-based access, centralized credentials, playbook loading from git, and a solid audit trail. Ansible Galaxy allows you to find and use Ansible roles written by others in the community, and even contribute some yourself.

And The Moral of the Story Is…

For us sysadmins an automation tool like Ansible can be a game-changer. It can be used to run single ad-hoc commands, or to execute playbooks that define infrastructure as code. With dynamic inventory support from cloud providers like Amazon Web Services, Microsoft Azure and Google Cloud Compute, you can provision entire environments within minutes and tear them down just as easily. With Tower, self-service for developers and testers becomes a reality, where they can safely deploy environments from approved templates as needed. But like every application, Ansible has its own approach to scripting and control. Here are a few things I have learned that may help you leverage the best of this amazing tool.

  • Describe, don’t dictate the environment: Let Ansible know what you want to see in your target environment, not what you want it to do. This can be hard, at least initially. Instead of just issuing an unix command and being done with it, you’re being forced to declare what the final state of the environment would look like. That may seem counterintuitive to begin with, but there’s good reason for this. Leaving the actual remote execution to Ansible modules gives you a reliable, replicable and scalable way to version, maintain and deploy infrastructure instead of having to debug shell scripts for hours. Virtually any CM tool worth its salt does this in some manner, but Ansible does it without relying on agents or complicated configurations.
  • Use the modules first: For those new to Ansible (and some pretty seasoned folks too), the temptation to use the shell, command and raw modules to pass unix commands to the target system can become intense, especially when you just need stuff done quickly. Take a step back and check out the module index first. More often than not, you’ll find a module that does what you need, and does it idempotently. But what if there is no module that does what you need? You can then fall back to shell commands, or even write your own custom module.
  • Use Git to version control your playbooks: Tools like Ansible make infrastructure as code easy. You can version control your playbooks just like application code, and provide consistent environments from development to production. This is especially important when you’re using Ansible Tower.
  • Test your YAML locally for errors: YAML can be somewhat finicky about spacing and indentation. Fortunately Ansible has some of the clearest error messages I’ve seen. You can dry-run playbooks by using the –check argument on the command line to check for syntax errors.
  • Be patient while you figure things out: Ansible is easy to learn, but I’d be lying if I said it didn’t have a learning curve. Be patient with its errors, and the rewards will follow. Ansible has excellent documentation with tons of examples, and it’s constantly being improved and augmented. From the command line use the ansible-doc to see man-style documentation for each module.