Plus Plus Operator
Abstract
This entry in Min Blogg deals with curious behavior of the increment-operator. In particular in lines like a[c++] = b[c]; I found some inconsistencies that were both depending on language (C or C#) and compiler (cl, gcc, cs or mcs).
WARNING: It is well-defined that "a[c++] = b[c];" is undefined in C. In C# it is well-defined on the other hand.
Headlines are:
- ++ = Increment
- Pointer Arithmetics
- Use of increment on regular integers
- The Problem
- The solution
- Compilation
- Observations
- Conclusions
- See Also
1. ++ = Increment
Some time in ancient history (at latest when releasing the first C version 1972) the increment operator was invented. In general this operator is used to add one to something. For example if we increment an integer it's value is increased with one:
#include <stdio.h> void main(int argc, char * argv[]) { int c = 5; c++; printf("c is now %d\n", c); // prints "c is now 6" }
Also note that there the "inverse" of increment is called decrement and is written --.
This is completely meaningless if it hadn't been for pointers, arrays and Pointer Arithmetics.
2. Pointer Arithmetics
Pointers have some form of built-in intelligence (remember that we are talking computer science of the 1970's - I do not mean gps, mp3, self-destruction-button kind of intelligence, I mean the more crude form of intelligence). If we for example have an array of a data type complex and assign a pointer to the start of it then the increment operator (++) will make the pointer point to the next item in the array.
#include <stdio.h> void main(int argc, char * argv[]) { int arr[10]; // c points at item in position 4 (the fifth item) int* c = &arr[4]; // c sets item in position 4 to 1337 and points at the next item *c++ = 1337; }
Also note that the increment can be done in another way. The comments in the next example explains the difference
#include <stdio.h> void main(int argc, char * argv[]) { int arr[10]; // d points at item in position 4 (the fifth item) int* d = &arr[4]; // d points to the next item and sets that item to 1337 *++d = 1337; }
Not all languages have built-in support for pointers. C# allows you to use it only under certain conditions for example. Python has no increment operator but can be considered to use pointers anyway (from my perspective).
3. Use of increment on regular integers
Today I was confronted with the following situation: I was given an very long input vector and a quite short output vector. I was to add N zeros to the beginning of the output vector. Then fill the remaining positions with values from the input vector. Pretty much like this:
double[] A = new double[10] { 12, 23, 24, 34, 45, 35, 56, 56, 57, 53}; double[] B = new double[6]; int n = 3; for (int i = 0; i < n; i++) B[i] = 0; for (int i = n; i < B.Length; i++) B[i] = A[i]; // values of the array // A: 12 23 24 34 45 35 56 56 57 53 // B: 0 0 0 34 45 35
Since I had a number of variables and wanted to use a minimal amount of silly counters I tried to keep the code minimal, pretty much like this:
int c = 0; for (int i = 0; i < n; i++) { B[c] = 0; c++; } for (int i = n; i < B.Length; i++) { B[c] = A[c]; c++; }
The advantage of this is not obvious in this example. But imagine that the loops require a lot of computation and so on to determine the number of items to set in B. Also: perhaps we need to call other functions where we pass c as a parameter to know where in the arrays to insert values. Anyway: the last loop annoyed me and I wanted to lower the number of lines in it from two to one, to something like this:
for (int i = n; i < B.Length; i++) { B[c] = A[c++]; }
Please note
Please note that the best way to do exactly what I mean in the above minimal loop is something like this:
for (int i = n; i < B.Length; i++, c++) { B[c] = A[c]; }
4. The Problem
The thing to think about here is what the line B[c] = A[c++]; is decomposed to. Questions that ran through my mind was:
- Since we set a value in B[c] we must first read A[c++]. Since we read A[c++] c will be incremented before the value of c is used in B, right?
- Since I read code from left to right the compiler cannot execute things the other way around right?
- Has polish notation got anything to do with this?
- Is this compiler-specific (I remember a lecture in FORTRAN about the value of i after a loops)?
5. The solution
I created a some simple code-files that tests many possible cases:
They both first contain some declarations creating eight arrays to be filled with values from a ninth array. They also contain this horrible for-loop:
// c1-c8 are all 1 before this loop for (i = 2; i < 5; i++, c3++, ++c7) { b1[c1] = a[c1++]; b2[c2++] = a[c2]; b3[c3] = a[c3]; b4[c4++] = a[c4++]; b5[c5] = a[++c5]; b6[++c6] = a[c6]; b7[c7] = a[c7]; b8[++c8] = a[++c8]; }
6. Compilation
I compiled using:
- cl to a native C(ansi) file.
- gcc to a dito.
- the built-in cs compiler in .NET 2.x (I later tested cs from 1.1 and the results are the same). Also mcs from the Mono Platform produce the same result. (Perhaps since the all use Common Intermediate Language (CIL)?)
Output from the CIL versions
b1: 0 2 3 4 0 0 0 0 0 0 b2: 0 3 4 5 0 0 0 0 0 0 b3: 0 2 3 4 0 0 0 0 0 0 b4: 0 3 0 5 0 7 0 0 0 0 b5: 0 3 4 5 0 0 0 0 0 0 b6: 0 0 3 4 5 0 0 0 0 0 b7: 0 2 3 4 0 0 0 0 0 0 b8: 0 0 4 0 6 0 8 0 0 0
Output from the cl version
b1: 0 2 3 4 0 0 0 0 0 0 b2: 0 2 3 4 0 0 0 0 0 0 b3: 0 2 3 4 0 0 0 0 0 0 b4: 0 2 0 4 0 6 0 0 0 0 b5: 0 0 3 4 5 0 0 0 0 0 b6: 0 0 3 4 5 0 0 0 0 0 b7: 0 2 3 4 0 0 0 0 0 0 b8: 0 0 0 4 0 6 0 8 0 0
Output from the gcc version
b1: 0 2 3 4 0 0 0 0 0 0 b2: 0 2 3 4 0 0 0 0 0 0 b3: 0 2 3 4 0 0 0 0 0 0 b4: 0 2 0 4 0 6 0 0 0 0 b5: 0 3 4 5 0 0 0 0 0 0 b6: 0 0 3 4 5 0 0 0 0 0 b7: 0 2 3 4 0 0 0 0 0 0 b8: 0 0 4 0 6 0 8 0 0 0
7. Observations
- Lines that gave identical behavior in all versions:
- b1[c1] = a[c1++];
- b3[c3] = a[c3];
- b6[++c6] = a[c6];
- b7[c7] = a[c7];
- Lines that gave different behavior:
- b2[c2++] = a[c2];
- b4[c4++] = a[c4++];
- b5[c5] = a[++c5];
- b8[++c8] = a[++c8];
8. Conclusions
My guess here is that increment operator is not well defined enough.
- If an increment is last in a line it can be considered to be performed after it, f.x: b1[c1] = a[c1++];
- If an increment is first in a line it can be considered to be performed before the line, f.x: b6[++c6] = a[c6];
- If the increment is standalone (like c3 and c7) there is no problem.
- More than one increment on a line can be performed before/after it and/or in the middle of (it I guess).
My interpretation is that b2[c2++] = a[c2]; in C# is converted to:
int* p = a[c2]; c2++; b[c2] = p;
and in C(ansi)
int* p = a[c2]; b[c2] = p; c2++;
I used Reflector on the C# version and the loop there looked like this:
while (i < 5) { b1[c1] = a[c1++]; b2[c2++] = a[c2]; b3[c3] = a[c3]; b4[c4++] = a[c4++]; b5[c5] = a[++c5]; b6[++c6] = a[c6]; b7[c7] = a[c7]; b8[++c8] = a[++c8]; i++; c3++; c7++; }
so that does not help much. Perhaps some details might be give in by reading the CLI - but I don't get that yet.
9. See Also
- [3] English Wikipedia on increment
- [4] CFAQ (in particular [5])
- [6] A thread I posted in comp.lang.c
- [7] A thread I posted in microsoft.public.dotnet.languages.csharp
This page belongs in Kategori Programmering.