In our small company of 9 people, 3 of us are spending most of our time writing code and kicking servers. We are distributed across 5 different states, most of us work from home or from coworking spaces, so it can be a challenge to stay in sync with what is happening. When I arrived, one of the first things I set up was HipChat so that we would have a company chatroom, and an engineering channel where we would wire in as much broadcast status or information radiator stuff as we could think of.
See also: ChatOps
We have GitHub pushes and merge proposals announced in the chatroom, Sprint.ly tickets announced, HelpScout support requests, PagerDuty alerts, and capistrano deploy notifications. Without any extra work for the 3 engineers, anyone in the company can get a fantastic pulse of what is being worked on, what needs attention, etc.
The one thing that was totally silent was when someone needed to SSH into a server to do some ops work. Backups stopped working, someone logged in, we didn't get a notification. We did have logs that were reviewed on a monthly basis, but it just wasn't contributing to the real time pulse I otherwise enjoyed being a part of. Here is how I fixed it.
Here is a script that is executed by the Linux Pluggable Authentication Module system when someone connects to a server via SSH. I call it login-audit.sh, and I drop it in
#!/usr/bin/sh API_KEY=YOUR_HIPCHAT_KEY ROOM=YOUR_ROOM SENDER=LoginAudit if [ "$PAM_TYPE" != "close_session" ]; then curl -d "room_id=$ROOM&from=$SENDER&message=$PAM_USER logged in to `hostname`&color=green" https://api.hipchat.com/v1/rooms/message?auth_token=$API_KEY&format=json fi
Then, on each of our servers, I added this line to
session required pam_exec.so seteuid /usr/local/bin/login-audit.sh
Presto! Now when someone goes into a server, the rest of the team knows. Of course this wouldn't stop any bad behavior, but it's a great way of keeping folks on the same side in more in sync with each other. Knowing that Joe has been in and out of a server all morning, or that nobody on the team has touched a server in weeks helps a lot with situational awareness when we need to respond to an alert.