SIGILL and how to fix It.¶
When working on the project’s cross-platformability capabilities, I ran into one of those few errors that programmers hate to see:
Illegal Instruction (core dumped)
This, for those who do not know is a CPU signal that something has gone horribly
wrong. It is also referred to as SIGILL
, and essentially means that the CPU
was given an assembly opcode that it doesn’t know how to interpret properly.
Another thing about this error that may make those familiar with C/C++ shiver is
that it is eerily similar to a different error:
Segmentation Fault (core dumped)
Which it should, as the two errors usually stem from the same problem: memory
corruption. What usually happens to cause these types of errors is that one or
more parts of the code end up reading or writing from a part of memory that they
shouldn’t have. For Segmentation Faults (SIGSEGV
), this is when the program
attempts to read or write to memory outside of the program’s memory space.
With SIGILL
, this generally means that the program has had its stack
corrupted, usually to such an extent that the return address is pointing to
somewhere completely random in memory. When something like this happens, it
would result in either a SIGSEGV
usually, or a SIGILL
if it somehow managed
to jump into memory that your program owns before encountering some data value
that wouldn’t be a valid instruction.
The way to solve these is to use some form of memory debugger, such as Valgrind, to have it check your code for memory leaks and corruptions. This will be able to find any forms of stack corruption that may exist in your code, allowing you to rewrite the code to fix the issue.
It’s not that simple¶
Of course, there is a catch here, I wouldn’t be writing this if there wasn’t.
Notice how earlier I wrote that SIGILL
is raised when an assembly opcode is
encountered that your CPU doesn’t know how to interpret. I never said anything
about it being a memory issue. The issue is usually encountered when a memory
issue is occurring, but that doesn’t mean it has to always be like that.
When Valgrind ended up failing to find any memory problems, I decided to step through it in GDB manually, trying to find out which function specifically it ended up failing in. However, that of course ended in failure as it again wasn’t seeming to fail on any particular line of code, just a generic failure when it finished with the else-clause in this if statement:
if(if_stream)
{
std::string file_as_string(std::istreambuf_iterator<char>(if_stream),
std::istreambuf_iterator<char>());
return jsonSerialization::streamToType(file_as_string);
}
else
{
std::cerr << "Failed to open file at: " << l_path << std::endl;
}
That is to say, according to GDB the program was consistently crashing just
after it left the final }
in the else-clause. With no other option left, I
decided to just step through the program at the assembly level to figure out
exactly which assembly opcode it was crashing on. Normally this wouldn’t be a
good idea since C++, while fairly high-level and simple on the surface, gets
converted to some very nasty assembly through the use of compiler optimizations
and various quirks of STL implementations. However, since the scope was limited,
there wouldn’t be too many instructions to have to step over before I got to
the crash. After a bit of stepping instruction by instruction, I finally came
upon the bad opcode: ud2
. After talking with another person on the team, we
were able to find out that ud2
is an Intel specific instruction meaning
Undefined Instruction. Esentially, the
reason why valgrind was not finding any memory corruption/stack overwrite errors
is because there were none. Clang was placing instructions into the code that
were supposed to raise this signal. But why would it do that? The reason,
becomes fairly obvious once you take a look at the entirety of the function that
was failing:
typeRT persistenceSystem::readFromFile(const std::filesystem::path p_folder,
const std::filesystem::path p_file_name)
{
std::filesystem::path l_path = p_folder / p_file_name;
std::ifstream if_stream(l_path);
if(if_stream)
{
std::string file_as_string(std::istreambuf_iterator<char>(if_stream),
std::istreambuf_iterator<char>());
return jsonSerialization::streamToType(file_as_string);
}
else
{
std::cerr << "Failed to open file at: " << l_path << std::endl;
}
}
If you haven’t spotted it yet, there is a conditional branch statement here,
however a value is only returned in one of those branches. This, according to
the C++ Standard, is undefined behavior, and the compiler is free to do whatever
it wants here. In this case Clang decided to insert a ud2
instruction, which
means that whenever the code was given a file path that it didn’t know existed,
it printed that the file could not be opened and then crashed. The easy fix to
this problem is to simply change the failure case to a throw
like so:
if(if_stream)
{
std::string file_as_string(std::istreambuf_iterator<char>(if_stream),
std::istreambuf_iterator<char>());
return jsonSerialization::streamToType(file_as_string);
}
std::cerr << "Failed to open file at: " << l_path << std::endl;
throw std::filesystem::filesystem_error("Could not open file", l_path,
std::make_error_code(std::errc::no_such_file_or_directory));
}