I've fallen and I can't get up!

Submitted by gwolf on Thu, 06/05/2008 - 16:59

I think I should follow up on Victor's lament. Yes, we have a Rails application which works fine most of the time... But quite often, throws out a segmentation fault I just have been unable to pin-point. It might be related to rmagick, the only non-pure-Ruby component I am using (and I'm tempted to try minimagick instead, even if I prefer in-memory operations than on-disk, piping an image and slurping it again).
Victor came up with an easy script to check the server - but to reduce the impact it has (I was running a single Mongrel instance, which meant, whenever it dies the whole system becomes inaccessible for everybody; I replaced it with a mongrel_cluster of five processes, plus pound as a easy-to-use balancer which looks quite nice), the very simplistic and to-the-point script did no longer work.
Anyway... Ruby rocks ;-) I'm sharing this with you mostly because I am sure some readers will find more than one useful construct, not because it is precisely beautiful code. And besides, we should work on fixing the cause, not the consequence, of the bug! :)

  1. #!/usr/bin/ruby
  2. require 'yaml'
  3. confdir = '/etc/mongrel-cluster/sites-enabled'
  4. restart_cmd = '/etc/init.d/mongrel-cluster restart'
  5. needs_restart = false
  6.  
  7. (Dir.open(confdir).entries - ['.', '..']).each do |site|
  8. conf = YAML.load_file "#{confdir}/#{site}"
  9. pid_location = [conf['cwd'], conf['pid_file']].join('/').gsub(/\.pid$/, '*.pid')
  10. pid_files = Dir.glob(pid_location)
  11.  
  12. pid_files.each do |pidf|
  13. pid = File.read(pidf)
  14. begin
  15. Process.getpgid(pid.to_i)
  16. rescue Errno::ESRCH
  17. warn "Process #{pid} (cluster #{site}) is dead!"
  18. File.unlink pidf
  19. needs_restart = true
  20. end
  21. end
  22. end
  23.  
  24. system(restart_cmd) if needs_restart

Works out of the box for any Debian-packaged mongrel-cluster. Sadly, mongrel-cluster does not provide a way to restart individual servers - Of course, I could (should, even) work it out to build the specific command-line... but at least, it works for now.
Uh-oh... Does that mean it's permanent?

( categories: )