Not long ago I was working on a C# project that suffered from a long execution time. I tried to understand the situation and I had to decipher a highly optimized code. In the end, my refactoring divided the execution time by 4, without any logic change to the code. Let’s dive into a story of code optimization.
To find the problematic code, I first measures segments of code to find where the execution spent its time. To recall Rob Pike’s first rule of programming, “You can’t tell where a program is going to spend its time”. Luckily the program was structured in a way that it was easy to do without introducing a fancy measurement system. After some tests, it appeared that a single method from a class was responsible for 90% of the execution time.
Let’s have a look at the culprit.
The Original Code: A Recipe for Disaster
Let’s take a look at the original code:
private unsafe int FindRep(int ind, out uint ofs)
{
int nlen = Math.Min(m_level, m_inp.Length - ind);
ofs = 0;
if (ind == 0 || nlen < 3)
{
return 0;
}
int ista = ind < m_maxdist ? 0 : ind - m_maxdist;
fixed (byte* pinp = m_inp)
{
byte* psta = pinp + ista;
byte* pcur = pinp + ind;
int len = 0;
while (psta < pcur)
{
int lenw = 0;
byte* pfin = psta + nlen;
//byte* pb2;
for (byte* pb = psta, pb2 = pcur; pb < pfin; pb++, pb2++, lenw++)
{
if (*pb != *pb2) break;
}
if (lenw > len && lenw >= 3)
{
len = lenw;
ofs = (uint)(pcur - psta - 1);
if (len >= nlen) break;
}
psta++;
}
return len;
}
}
At first glance, this method seems highly optimized. The unsafe
keyword mean we will manipulate the memory directly, which can significantly speed up the execution time. Secondly, it minimizes the memory used: it relies on byte*
pointers and ad hoc data types such as uint
.
Yet once we start analyzing it, this method was riddled with pitfalls:
- Use of
unsafe
keyword: The use ofunsafe
blocks can be a sign of poor design. In this case, it was used to access them_inp
array directly, which is not only unnecessary but also increases memory usage. - Complex logic: The method had multiple nested loops and conditional statements, making it difficult to follow and debug.
- Exotic design: Overall, the code use less common ways of defining value. It privileges ternary operators over
Min
/Max
functions andwhile
loops overfor
loops.
As a result, the C# solution was stuck in Debug configuration, which meant that any performance optimization was lost. But what if I told you that with just a few tweaks, we can not only make this method more readable but also improve its performance?
The Revised Code: A Beacon of Good Practices
Here’s the revised code:
/// <summary>
/// Search for the longest repeated sequence in the input data and returns its length.
/// </summary>
/// <param name="inputDataIndex">Index in the input data buffer.</param>
/// <param name="offset">Position of the sequence in the input.</param>
/// <returns>Length of the longest repeated sequence.</returns>
private int LongestRepetition(int inputDataIndex, out uint offset)
{
const int minLength = 3;
// Limit length compression level, truncate if the length is above remaining data length
int lengthThreshold = Math.Min(m_compressionLevel, m_inputBuffer.Length - inputDataIndex);
offset = 0;
if (inputDataIndex == 0 || lengthThreshold < minLength)
{
return 0;
}
// Start position to find a repeated element, minimum is 0
int inputStart = Math.Max(inputDataIndex - m_maxIndexDist, 0);
// Translation of <cref>inputDataIndex</cref> in pointers
int maxLength = 0;
// Start identical sequence search
for (int leftIterator = inputStart; leftIterator < inputDataIndex; leftIterator++)
{
int currentLength = 0;
while (
currentLength < lengthThreshold &&
m_inputBuffer[leftIterator + currentLength] == m_inputBuffer[inputDataIndex + currentLength]
)
{
currentLength++;
}
// Check if the length is longer than the previous one
if (currentLength > maxLength && currentLength >= minLength)
{
maxLength = currentLength;
offset = (uint)(inputDataIndex - leftIterator - 1);
// Stop the algorithm if above the length limit
if (maxLength >= lengthThreshold)
break;
}
}
return maxLength;
}
So what changed? Here are the key improvements:
- Simplified logic: We refactored the method to have fewer nested loops and conditional statements, making it easier to understand and debug.
- Use of meaningful variable names: We renamed variables to better reflect their purpose (e.g.,
inputDataIndex
instead ofind
). - Documentation and Comments: Documentation draws the line between code logic and actual useful implementation. Comments can help break down big chunks of code.
But what’s even more impressive is that these changes enabled runtime optimization. By using a more efficient algorithm and simplifying the logic, we were able to achieve substantial time gains.
Human Readability
One guiding principle of good code is: “human should have to remember the least amount of information”. Let’s describe it’s implementation here:
- The first way to do so is to have meaningful variable names, so that you always remember what is in any place.
- Prefer built-in function to complex operators. When making sure to apply a positive value to
inputStart
(ista
previously), it is less complex to think “assign the bigger number between of a difference and 0” than “compare to numbers, based on the result either assign the difference of the two, or 0”. Even if there is a time difference, it would be significant for the overall time execution. - Don’t stack instructions horizontally. It is not useful and do no make things faster, and it reduces readability. In the original code the
for
loop, it was running to assignations, one termination condition and three increments each time. Turns out you only need to assign one variable, it already counts how many rounds of loop you have to do.
Performances
On the performance side we came from a highly optimized code, to something that looks like a standard code, so how is it better? Regarding all the assignments and loops we perform, there is no noticeable difference, which is still a gain as we did not optimize anything. The code has an O(n²) asymptotic complexity and execution time will come from here.
Regarding the memory processing, the whole idea of the code is to find repeated patterns in m_inputBuffer
(m_inp
previously). As it is already a table, it is easy to switch between direct index access and the use of pointers. Now that we stick to the original array structure, we can remove the unsafe
keyword from the method. Let’s measure performances.
The original code execution time was 80 seconds. With all the changes, the new code execution time is 100 seconds, so we perform 25% slower. But the story does not end here, as we removed any direct memory manipulation, we can now compile the code in Release configuration. It uses a different compiler than in the Debug configuration that will do a lot of optimizations. With this configuration, the final code execution time is a staggering 20 s, 300% faster than the original code.
As a side note, you can run “unsage” code in Release, yet the original code did not do that. I could go for this option, but maintainability loss would not worth the 5 seconds won.
The Results: A Multiple Aspect Gain
As you can see from the revised code, the method now allows for runtime configuration, which means that performance optimizations are applied only when necessary. This resulted in a significant reduction in execution time, making the code more efficient and scalable.
On top of that, the code is now much more easier to understand, which makes it easy to read and maintain. The power of good code practices is not just about following best practices; it’s about delivering high-quality software that meets your users’ needs.
When you try to optimize a code, start by measuring the execution time to find the bottlenecks. Then, try to keep as close to the standards as possible. Your goal is not to invent a new algorithm but arrange logical blocks together. If you keep in mind the story that tells your code, it should naturally emerge from the code structure itself.
So next time you’re faced with a block of cryptic code, take a step back and ask yourself: “Can I make this better?” With a little effort, you might just find yourself writing more efficient, scalable, and maintainable code.