My silence has been deafening regarding Jacque’s post, so it’s time I give some sort of response!
He has a nice screenshot of an IOL error page captured for all eternity. Damn, and we hoped no-one ever noticed… I’ve never seen that specific one before, and no-one here reported an error like that, but it looks like it could have been a problem with the rushed implementation of the tsunami donation page.
We also have a problematic database server (when don’t we) that may have been the culprit, but I can’t really tell from the shot (Jacques, do you have a full screenshot?)
Our shiny new IOL database server is really fast, but tends to lock up, and we’re still troubleshooting the error. Of course I suspect a MySQL bug, but naturally can’t prove this yet. It could also be a symptom of the change from FreeBSD to Gentoo Linux (some OS setting that we haven’t tracked down) or a consquence of our load balancer also being under some load. I dream of one day having a real budget! Unfortunately our backup server is one that can’t be used as it doesn’t handle the load well, and new machines are still arriving. Coupled with that we currently have no system administrator (the new one is arriving on Monday), and have just moved premises, with all the related hassle there, so it’s been a fun week, and a good one to be on leave!
Can’t blame Jam Warehouse though, they had nothing to do with the IOL system. They wrote the initial VNE (newspaper website) systems, but there’s not much left of their code there after Neil’s repair work. The VNE systems now cache much better than IOL, although IOL does cache reasonably well itself.
We’ve a long way to go to get IOL’s architecture right. I’m pretty clear about where I want it to go, but working on a shoestring with staff numbers, getting anything like architectural changes done seems to take forever. Neil was quite shielded when he was here, and had time to work on some fundamental changes, as I recognise the importance of this. Unfortunately right now everyone is madly doing all sorts of other things, and I’m doing the IT Manager thing of sitting in meetings all day and wondering if I could still get a directory listing from the command line should I ever see one again. (OK, it’s not that bad).
Hmm, this sounds like a page full of excuses, which it isn’t really. Responsibility for IOL tech problems ultimately rests with me (especially database errors, as I’m supposed to know what I’m doing there). So hopefully they’ll soon be fixed, and I can write something interesting about my Transkei trip!
I was running three MySQL servers with one primary and two slaves. I was mainly offloading read requests to slaves and updates and 20-30% of reads to the master for an online business directory I developed last year. It was a bit of a weird setup due to lack of funds for funding high end servers but it did the job.
Caching also comes in useful. Using memcache for storing cached data in ram rather than on disk comes in quite useful and reduces hits on the MySQL database servers. I normally have memcached running on a 128Mb instance on various boxes across the network where I use memcache.
I do have a few screenshots. One showing the article, one showing the error creating the page (ala busrep) and the IOL homepage showing the article in question.