ASIC/FPGA Design and Verification Out Source Services
c++ SIGSEGV debug tip
In this page I discuss two cases of SIGSEGV, which I have recently
successfully debugged.
-
In the last six month I have been working with a cpp verification environment to
verify a complex sub system including an on chip CPU and its peripherals.
-
From what I have seen on the WEB, there are basically three types of memory
overruns. If a SIGSEGV happens on global memory, it is usually the easiest to
solve. In my case the two others were the ones.
-
The first namely, stack overrun, happened on almost half of the tests.
It always had the same
characteristics whether it were run in regression or in development mode,
using the debugger gdb.
-
The point of crash, as indicated by the debugger, was very close to
the memory overrun. The code developer had some fears, that his code may
fail, under some circumstances, and added some cpp assert statements, to
obtain some protection against it.
-
A careful review of the code revealed the problem. This was a function to
print, in a nicely manner, a debug message of exactly 80 characters. The
assert c++ statements, mentioned above, were targeted to capture an
event of using a larger message.
So a buffer of 80 bytes, defined on the stack, was used.
When the process of creating the nice message ended, a prefix of
the caller module name was added. The latter resulted in a memory overrun
(more than 80 byte were stored to this buffer),
which went undetected by the c++ assert statements.
-
The fix was easy. I had no access to message process code, where the problem
was, but I could control the prefixed message, which was appended.
-
The second crash was on the heap.
It crashed with 33% during
regression and always passed with no errors in development mode,
using the debugger.
The memory overrun took place regardless of
the mode of operation, but in some cases it simply wrote to a freed heap
memory location and therefor, simply, did not ended with the SIGSEGV crash.
I have also used try catch exception statement in order to rule out a
possible new (memory allocation on the heap) failure.
-
In these cases, code review might be a good path to follow. The idea to try
and stop the debugger as close as possible to the line of code, which causes
the memory overrun. I reviewed the test code and identified two
possible points of failure. One was a memory copy command and another was an
interface to system verilog DPI function, which returned data to a c++ read
back-door memory call.
-
So I disabled the test checker, and put each problematic statement in
remark and run the test 20 times. Note that checker must not work as parts
of the test are commented out.
When only the memory copy was neutralized, the rate of SIGSEGV remained unchanged.
When only read was neutralized, the problem was entirely eliminated.
-
This time I stopped the debugger 10 lines away of the offending code
and found out the c++ expected 64 bytes and the system verilog DPI function
returned 128 bytes, causing unexpected results each time the program was run.
|