Some time ago I spent a day trying to figure out why our Jenkins Master is so slow (~30min) to start up. We have around 1000 jobs and about 100 plugins. The large amount of jobs makes us hit some performance issues that never is an issue in smaller installations. The number of plugins, makes the list of possible culprits long. Furthermore, it might be a combination of plugins causing the problem. And we may in fact have several separate issues.
Template Project + Job Config History = @#*$%*!?!
One issue that we have found is the combination of Template Project and Job Config History plugins. The Template Project implements ItemListener.onLoaded() and in a loop updates (twice) all projects using it (and we use it a lot). However, this seems to be some workaround that never(?) actually does any real work. Since the job is updated, this will trigger the Job Config History plugin (and maybe others listening to this). Which of course the Template plugin didn’t account for.
The Job Config History is potentially writing several times to disk for each job when triggered. Disk I/O is, as every programmer should know, relatively slow, but one write operation per job would be acceptable at startup if there is no other way. HOWEVER, there is a sleep 500 ms statement to avoid some clashing when writing to disk. 500 ms is an eon in computer world! Disk operations are normally a lot faster than that. 50 ms would be more reasonable value, provided that you can’t avoid a sleep call completely.
1000 jobs x 2 calls/job x 500 ms = 1000 s ? 17 minutes !
Oops! Well, that explains a large part of our slow startup time. The funny thing is that load, disk I/O, cpu and memory is low, we’re mostly just waiting. I had to do thread analysis to find this problem.
I reported JENKINS-24915. It is still reported unresolved, but may of course be resolved anyway. I haven’t tested recently since we choose to uninstall the job config history plugin, even though we like it and you could argue that it’s the template plugin that is
Jenkins loads all jobs on startup, which is reasonable. But if, for a large number of jobs, this triggers a slow operation, then all hell can break loose.
In general, you should think twice before you implement remote or disk calls in a loop. Specifically for Jenkins, doing it in an event method that potentially is called for all jobs is not a good idea.