Let’s write a CodeXM checker (it’s not rocket science!)

Let’s write a CodeXM checker

All systems are go. We have liftoff. Let’s write some CodeXM.

If you’ve read the previous two posts, you should come away with a sense that writing a CodeXM checker isn’t rocket science. Let’s put that to the test.

In order to get this hands-on experience, you should have access to an installed version of Coverity® and be able to analyze a sandbox codebase. Nothing big, mind you, just something to convince yourself the checkers we’re writing do what they’re supposed to. Yeah, it’s always a good idea to test your code before putting it into production, and CodeXM checkers are no exception.

Our assignment: Enforce naming conventions

From the CodeXM capability standpoint, enforcing naming conventions is a trivial problem. It’s also not something we’ve put into Coverity, because there are far too many naming conventions out there, and traditionally Coverity has gone after smoking-gun-grade bugs. Tuning a tool to enforce your specific rules is more of a customization—but one that CodeXM can help with. This post will take you through the first few steps, and we’ll continue this in a subsequent post.

For the sake of illustration, we’ll invent a naming convention, realizing full well that yours—if you have one—is very likely different. If you do have a convention, you should—with a few tweaks—be able to adapt these examples into your own enforcement regimen. But even if naming conventions aren’t your thing, the examples here should still be good illustrations of how to write a CodeXM checker. With a few additions of your own, anything here could be the basis for an entirely different checker you might be entertaining.

Let’s get started.

First up: Local variable names

Local variable names are almost everywhere, and examining each of them is one of the simplest things you can do with a CodeXM checker. The query part would look like this:

for decl in globalset allFunctionCode 
                      % variableDeclaration {
                          .variable.identifier != myCamelCaseName
                        }

As you get familiar with CodeXM, you’ll realize this query looks for all variable declarations in your code that are not in camelCase form. myCamelCaseName is a pattern we’ll define shortly; it will match strings (such as the local variable’s identifier) when those strings conform to that pattern. Of course, in this case, we’re looking for nonconformance, so we use the != operator.

Surround this query with the checker’s usual declarative matter, and you have the following:

include `C/C++`;

checker {
  name = "NAMING_VIOLATION";
  reports = for decl in globalset allFunctionCode 
                                  % variableDeclaration {
                                      .variable.identifier != myCamelCaseName
                                    } :
    {
      events = [
        { 
          description = "Variable "
                      + decl
                      + " does not follow naming convention for local variables.";
          location = decl.location;
        }
      ];
    };
};

Note: For these posts, I’ll write checkers that analyze C or C++ code. CodeXM supports more languages, and adapting a CodeXM checker to recognize another language typically is as simple as specifying that language in the include directive instead of `C/C++`. But since I haven’t tested my code against those languages, there may be nuanced differences in their behavior. If you try it, your mileage may vary. That’s the lion’s share of productizing, by the way: The idea is usually the easy part, but making it work for everybody everywhere takes most of the effort. Fortunately, if you’re writing your own checker for your own project, you don’t have to satisfy anything more than your own constraints.

Next: Function names

We’ve seen how CodeXM can examine the function bodies looking for things such as variable declarations. Of course, CodeXM can do more. Let’s continue to use the same for loop logic, but with a different set. The loop would look like this:

for func in globalset allFunctionDefinitions % nonconformingFunction

where we are looking for the set of all function definitions in your code (you guessed that, right?) that are nonconforming functions, as determined by the pattern nonconformingFunction.

So what’s a nonconformingFunction? Look here:

pattern nonconformingFunction {
  functionDefinition {
    .functionSymbol.identifier != myProperCaseName;
  }
};

To recap: allFunctionDefinitions is the set of—wait for it—all functionDefinitions discovered in your code. Those that are nonconforming are, naturally enough, the functionDefinitions where the identifier is not a proper-case name.

Sorry. I mean a ProperCase name.

Another note: There are those that call ProperCase “camelCase,” and that’s not entirely wrong. But I make a fine and useful distinction here. You can think of camelCase has having the hump in the middle (but starting with a lowercase). By contrast, ProperCase simply starts—like proper nouns and other names—with a capital. Yeah, some call this TitleCase, too, but I’ve got to stick with one name. Demonstrations are simpler that way…

Adding exemptions

Thus we have a way of enforcing that function names must start with a capital letter but can have a mixture of lowercase, uppercase, and digits—but no underscores.

“Wait! Wait! Wait!” I hear you shout. “That means main is nonconforming, right?”

Yes, in this illustrative example, it is. Let’s just say I planned that in order to motivate the next feature: allowing exemptions to the function naming rule.

for func in globalset allFunctionDefinitions % nonconformingFunction
    where ! func matches functionNameExemptions

Now nonconforming functions have a get-out-of-jail card, if they match the exemption list.

Putting it all together

Let’s go to the big reveal so you can see everything in context. By the way, you can have multiple checkers in one CodeXM file, so what you see below can be in the same file as the checker illustrated above. In fact, as you’ll notice, the different checkers can all have the same name (but they don’t have to). This is useful if you’re cracking a particularly tough nut where the choice is to have complicated logic in a single CodeXM checker or two sibling checkers each finding its own nuanced variant of the problem. It’s all up to you how you want to package and report things.

pattern nonconformingFunction {
  functionDefinition {
    .functionSymbol.identifier != myProperCaseName;
  }
};

pattern functionNameExemptions {
  functionDefinition {
    .functionSymbol.identifier == Regex("^(main|test_?.*)$")
  }
};

checker {
  name = "NAMING_VIOLATION";
  reports = for func in globalset allFunctionDefinitions % nonconformingFunction
                where ! func matches functionNameExemptions
    :
    {
      events = [
        { description = "Function "
                      + (func.functionSymbol.identifier ?? "shown here")
                      + " does not follow naming convention for functions.";
          location = func.location;
        }
      ];
    };
};

We’ve already examined the nonconforming function pattern. The pattern that identifies exempt function names is structurally very similar, but it is different in that it uses a regular expression (a.k.a. regex) that…well, if you don’t know how to read regular expressions—and many don’t—this might need a little bit of explaining. The short-and-sweet is that main is allowed, as are functions starting with the letters test followed by an optional underscore: testFeatureX and test_FeatureX and so on. Of course, exemptions don’t need to be name-based. You could grant exemptions based on whether the function is found in a system header, or is inline, or what have you; all the information the compiler has on the given function’s definition is in the functionDefinition structure, and you can adapt the pattern accordingly.

Adding more exemptions

If we’re just analyzing C, we’re done with functions. But for C++, we need to handle a few more circumstances.

In fact, in C++, the constructor and destructor names are predetermined (by the name of the class they’re in). So they really shouldn’t be subject to the convention on function names. Also, because of the tilde (~), the destructor’s name almost certainly will not pass any naming convention anyway. So both constructors and destructors need to be exempted.

pattern functionNameExemptions {
  functionDefinition {
    .functionSymbol.identifier == Regex("^(main|test_?.*)$")
  }
| functionDefinition {
    .functionSymbol.isConstructor == true;
  }
| functionDefinition {
    .functionSymbol.isDestructor == true;
  }
};

Let’s look at that more closely. The pattern will match (meaning an exemption is granted) in any of these three circumstances:

  1. The function’s identifier (name) matches a regular expression pattern.
  2. The function is a constructor.
  3. The function is a destructor.

Cool.

Another refinement

Now, one more wrinkle: We’ve defined a pattern that stipulates plain stand-alone functions need to be ProperCase. But let’s say that class members need to be camelCase (though static members can remain ProperCase; they don’t need a this in front of them, and the capitalization helps us remember that). That means our definition of a nonconforming function changes slightly:

pattern nonconformingFunction {
  functionDefinition {
    .functionSymbol.hasThis == false;               // Static methods, plain funcs
    .functionSymbol.identifier != myProperCaseName; // without ProperCase names
  }
| functionDefinition {
    .functionSymbol.hasThis == true;                // Class methods
    .functionSymbol.identifier != myCamelCaseName;  // without camelCase names
  }
};

One oddity you may have noticed: A function’s identifier can be null, so CodeXM strict typing requires us to use the null-coalescing operator ?? to handle the case when a function has no name. That’s unlikely to happen in our case, but CodeXM wants that eventuality handled. CodeXM won’t crash with a null pointer exception, but it will balk and make you write safe code if you forget.

Tying up loose ends

We’re almost done. Remember those patterns myProperCaseName and myCamelCaseName? It should come as no surprise that they’re just regular expressions too.

pattern myCamelCaseName {
  Regex("^[a-z][a-z0-9]*([A-Z][a-z0-9]*)*$") 
};

pattern myProperCaseName {
  Regex("^[A-Z][a-z0-9]*([A-Z][a-z0-9]*)*$") 
};

If these aren’t your idea of what camelCase or ProperCase should be, you can—of course—modify them in your own CodeXM checker.

I’ve given you a fair bit to chew here. But stay tuned for the next installment, where we’ll explore how to examine more definitions in your code.

Get tips on writing a CodeXM checker:
Learn more and ask questions in the Software Integrity Community



*** This is a Security Bloggers Network syndicated blog from Software Integrity authored by Thomas M. Tuerke. Read the original post at: https://www.synopsys.com/blogs/software-security/write-codexm-checker/