Programming Best Practices: Profiling
My first task coming back from my work stress blogging hiatus is to finally fix problems with Akismet Auntie Spam that Lorelle reported over a month ago — if your Akismet spambox has over 10,000 spam comments then Auntie Spam is going to crash hard. Viewing that many comments at once will make Firefox use eight times more memory than normal web browsing, even without using Auntie Spam .
This means it’s time to do some code profiling . In programming, profiling means to measure your code and find out which parts are using the most time and the most memory. Profiling gives you performance analysis measurements so that you can optimize your program for speed and/or memory.
“Don’t prematurely optimize” is a programming Best Practice, and it can be summed up in the words of my grandfather: “measure twice and cut once”. You can guess at what parts need fixing, but it is much more effective to measure how your program performs so that you can focus on the worst parts. They have the most room for improvement. Without profiling you could easily spend several hours optimizing a loop that executes in negligible time and ignore the three lines that copy huge chunks of memory for No Apparent Reason. Get it working, and then use your profiler to get it working fast.
Profiling is a Skill
Another good rule is to always test with large data sets. Ideally you want a fast case for rapid prototyping of new features, and a worst case for stressful testing of that new feature. To often we use small sets of data for development and testing. We never realize how badly our code performs in real world conditions. Speed and responsiveness play a greater factor in whether or not someone becomes a regular user of your program than you might realize.
 One thing WordPress does wrong is it includes all of your comment spam in their WordPress export files. One friend saw his export file decrease from 83 MB to 8 MB once he deleted the comment spam.
 Some of the bad habits that were lurking in Auntie Spam:
- I was using a custom getElementsByClassName instead of an XPATH call. XPATH can be so much faster that walking the DOM.
- I had too many innerHTML assignments instead of leaving HTML as a string and then giving it to the web page to process as a final step
- Inefficient regular expressions
- Too many copies of the comments in memory