Coding Style Guide

1. Preface

Many Software Engineers consider themselves craftsmen, their code a piece of art. Unfortunately, as with all other artforms, tastes differ. And, as opposed to most other artforms, other people will use your artwork as a tool - so it must be not only artful, but practical at the same time. Never forget that you are both an artist and an engineer. As an artist and craftsman, you should take pride in your work. As an engineer you should feel the responsibility for precision and correctness.

There is a third thing the work of a Software Enginer should be: a workbench. Unlikely as it might seem when you begin to work, there will come the time when you will require help in your project. There will come the time when you will abandon your project, returning at a later time, passing the project to somebody else, or just leaving it where someone might or might not find it by chance. Make sure that, whenever you finish your working day, your workbench is organized and tidy even if you plan to return to it the next day. However there might be a chance you might not.

Make sure that somebody who never saw your project before can step up to the workbench, find the tools, find the places where additional work is required, and understands what you intended and how you've been working to that goal. Consider that each day at your workbench might be your last. This requires discipline, documentation, and a good coding style - which limits your "artistic freedom".

You have to stick to rules, you have to work within limits. You cannot just step up and play a solo, however beautiful it may be, if you want your work to persist. Individual greatness of a Software Engineer can usually be measured by how well others can follow up on his work.

Some people claim specific techniques described in this document to be impractical, or being "not the way real programmers do it". Well, I have seen some of the code "real programmers" write, and more often than not, it has been a real pain to work on and with it. Sometimes this even leads to arguments about who is a "real programmer" or not, based on how witty they are in telling each other why the rules have to be broken in this or that case. Generally, I think that statements like "the source is the only real documentation" or "real programmers do not document" are really short sighted.

A Coding Style Guide is just that, a guide towards good coding style. It is not a set of firm laws, but neither should it be carelessly ignored. It is aimed at making the writing, maintaining, understanding, and using of code easier for everybody. It also reduces the probability of a great number of common (and thus all the more unnecessary) errors, and it makes hunting down the remaining ones easier.

You will find three words scattered liberally across the reasonings given in this document: readability, consistancy, explicitness. Consider them the credo of this Style Guide. The appendix offers skeleton files you can use to copy & paste your first couple of source files.

In the scope of this document, Nassi-Shneiderman diagrams (NSD) are mentioned. For those who did not encounter them yet (e.g. as a part of formal CS training), they are strongly recommended to get familiar with, which should not take longer than a couple of minutes. NSD's visualize control flows, which can be a good guide for clean code structure.

This is the annotated version of the Coding Style Guide. Whereever applicable, a reason is given for a rule in the same format as this paragraph. Some rules are given redundantly since they logically belong to more than one section. For everyday reference, we will provide an abbreviated version just listing the rules themselves.


2. Whitespacing

2.1 Tabulators

Tabulators should be converted to four spaces. Use four spaces for all standard indents. There should be no tab characters in a source file.

Historically, a tabulator is a step to the next column that is a multiple of eight. For various reasons it became popular to change this setting for on-screen representation. The resulting confusion of screen tabs, file tabs and their respective width calls for a simple solution: Not using tab characters in source code at all, and defining what a tab keypress should result in - four spaces.

2.2 Line Breaks

A source code line should not be longer than 76 characters plus newline.

The example files contain several comment lines that are exactly 76 characters long and can serve as an optical reminder. While some might argue our screen resolutions have grown beyond such limitations, terminal windows, e-mail client standard settings, and printout paper formats have not. Use the saved screen real estate for additional windows and be happy.

A line break is '\n' (Unix style), not '\r', "\n\r", or "\r\n". Double-check that your editor is configured correctly.

Make every attempt not to break log and error messages.

2.2.1 Header Files

A blank line is inserted prior to every function declaration (rather, its strategical comment). A line break is inserted after the return type declaration / in front of the function name.

This results in a function declaration occupying two lines, seperated with a blank line from previous and subsequent declarations, with all function names left aligned.

In struct or enum definitions, each element is defined on an individual line.

For readability, and to encourage tactical comments.

2.2.2 Implementation Files

One statement per line only. Positively no comma operator.

This means, no int* i, j;. This, together with rule 2.4, helps avoiding mistakes with pointer type declarations.

If breaking compound statements to fit the width of 76 characters per line, use the logic grouping of the statement to find good points for breaking it into multiple lines. Broken lines should end with an operater, preferrably the comma of a parameter list. Subsequent lines of the same statement should be indented twice (minimum). Consider using additional spaces to reflect the logical grouping. Do not be afraid to "waste" additional lines if it helps readability.

Try to avoid the use of temporary variables unless for significantly increasing readability of otherwise too complex statements.

Not because temporary variables are costly - modern compilers should optimize them away pretty well - but because they introduce identifiers into the namespace where an anonymous value would do. Use common sense, and comment on temporaries if you use them!

Place a line break directly in front of a function name; qualifiers and the return type of a function thus reside on a line of their own.

2.3 Brackets

2.3.1 Braces - {...}

Use them in all cases of for, if, while, or case, even for one-line or empty blocks. In case of an empty block, make the emptiness explicit by adding an // EMPTY special comment.

This makes it clear that the block is empty on purpose, not by mistake.

In all cases, the opening brace is placed on a line of its own, at the same indentation level as the introducing statement, and the block indented one level beyond that.

It is consistent, and you do not have to remember when the opening brace is on the same line and when it is not.

The closing brace is on a line of its own, at the same indentation level as the opening brace.

Having the opening and closing brace on the same indentation level also makes it easier to match them up visually. It would be beneficial to have the conditional of the introducing statement repeated as a comment after the closing brace. But such a redundancy could become confusing if the conditional is modified and the comment forgotten; thus, use of this rule is postponed until we have a code beautifier that is able to do this automatically.

2.3.2 Parentheses - (...)

Whenever you are using more than one operator in an expression, always put parentheses around subexpressions to make the evaluation sequence explicit.

What might seem completely obvious to you, can still hold up another developer for a considerable time. And it might just be wrong. Operators have both an associativity and a precedence; taking precautions against getting them wrong is so easy, and it also helps any code reviewer sifting through your code.

2.4 Spaces

Pointer declarations have the "*" adjacent to the type, not the name.

The type is "pointer to XY". Also consider that it is recommended to make only one declaration per line, to initialize a variable upon declaration, and that the comma operator is frowned upon.

2.4.1 Where to put Spaces
  • between an operator and a paranthesis;
  • between a paranthesis and its contents;
  • between a control-flow keyword and an opening paranthesis (switch, for, while, if, and return);
  • after a comma in a parameter list;
  • after a semicolon in a for-loop header;
  • between binary operators and their operands (but see 2.4.2 Where not to put Spaces).
2.4.2 Where not to put Spaces
  • between a function name and the opening paranthesis of the parameter list;
  • in front of a comma in a parameter list;
  • in front of a semicolon in a for loop header;
  • between unary operators and their operand;
  • between the operators member (".") and pointer-member ("->") and their operands.

3. Keywords and Operators

3.1 General

Make the structure of your program match the intent.

if (BooleanExpression)                  // BAD
{
    return false;
}
else
{
    return true;
}

return (! BooleanExpression)            // GOOD

if (BooleanExpression)                  // BAD
{
    return x;
}
return y;

return (BooleanExpression) ? x : y;     // BETTER

if (BooleanExpression)                  // GOOD
{
    return x;
}
else
{
    return y;

A condition should always evaluate to type bool. Check for "!= 0" if necessary. Do not use "!" in combination with non-bool values.

Do not switch operands around in equality checks to have the constant first.

It is sadly popular to write if (0 == x) because it is considered "safer" in case one "=" is omitted. Modern compilers issue a warning when encountering an assignment within a condition, or a non-bool condition. Do not twist your way of thinking around the language.

Do not check floating point types for equality.

"==" checks for precise equality, but floating point arithmetic is not a precise art.

In any comparison, make sure you are comparing the values of the objects in question, not their physical location.

pointer1 == pointer2 is usually not what you want.

3.2 ?:

Use this operator only to improve readability and avoid redundancy, not because you think it might improve performance.

Optimization is the job of the compiler, and ?: is not to be used as shorthand for if/else.

If you have to use this operator, always put the condition in parentheses ("(condition) ? Xyz : Abc").

3.3 for

Wherever possible, initialize the loop counter directly in the for statement, to limit its visibility.

If initialization, condition, and incrementation do not fit into one line, break the for statement into three individual lines. Vertically align condition and incrementation with the initialization.

Multiple loop counters can be rather confusing. Be especially careful with pointer types (char* src, dest...). Consider using a while construct (with the variables initialized before the while) instead of breaking rules 2.2.2 and/or 2.4.

The infinite loop is written as while (true), not as for (;;). The former construct is explicit, the latter is not.

See also rule 2.3.1.

3.4 goto, continue, break

Do not use goto, period.

Do not use break anywhere else than to end a case block, period.

Try to avoid continue where possible.

These three basically break structured programming. You will see if you ever attempt to express a control flow containing any of those in a Nassi-Shneiderman diagram. continue is a little less condemnable since it at least honors the loop condition.

Instead, consider:

  • redefining your algorithm so that the goto / break / continue becomes unnecessary;
  • using if-else;
  • putting the code in a subroutine, "breaking" from it with a return.

If using option 3), do not forget to check the return value.

3.5 if - else

If you provide both an if part and an else part, do not use a negation in the if conditional.

One corner less to wrap your thinking around.

Try to avoid "chained" else - if constructs. Rather introduce a "nested" but straightforward if - else.

Expressing else - if in a Nassi-Shneiderman diagram ends up in just that - nested if - else.

See also rule 2.3.1.

3.6 return

Use parentheses only if the return value is a complex statement including operators.

Do not try to "work around" temporary objects implied by return-by-value, for example by returning pointers where an object should be returned.

Write code for the human reader, not for what you think would be best for the compiler. Return value optimisations have been one of the major improvements in modern compilers; let them do the tweaking.

3.7 sizeof

This is considered a function, i.e. no space between the keyword and the opening paranthesis.

Always use a type as argument to sizeof, instead of a variable name.

3.8 switch - case

case statements are indented one level from the switch braces. The code of a case block is always enclosed in braces ("{}"), and indented one level from the case statement.

The braces help avoid warnings if a case code block introduces a local variable; they also help to keep the variable local.

Every case statement (including the last one in a given switch) should either have a return or a break as last statement.

Every switch should have a default: part at the end, even if that could never possibly be reached. The default: part should likewise have either a return or a break as last statement. If the default: part should not be reached logically, provide a meaningful error message (pointing out where it occured) and exit the program gracefully. This rule also applies to a switch over an enumeration.

You might expand the enum later on and forget to adapt the switch.

If you have to continue execution across a case statement, the last line in the previous case should be a // FALLTHROUGH special comment to make your intention explicit.

This should be avoided since it breaks structured programming.

Multiple case statements pointing to the same code block should be written one line each. A // FALLTHROUGH special comment is not necessary in this case.

3.10 while

The infinite loop is written as while (true), not as for (;;). The former construct is explicit on what it does, the latter is not.

See also rule 2.3.1.


4. Comments

4.1 General

Real programmers do not comment. What was hard to write should be hard to understand.
-- hacker proverb

Any fool can write code code that computers understand. Good programmers write code that humans understand.
-- Martin Fowler

Comments are important. If you feel like they get in your way while writing code, consider how often you missed that one crucial explanation of some other person's code. That other person thinks the same about your code.

Do not think you can add comments to your code later, because "later" never comes. Do not let deadline pressures tempt you to omitting code comments. It will hurt you (or some other developer) later, possibly missing some other deadline. It might even make you miss the very deadline that made you ommit comments in the first line, because co-developers will have a harder time jumping to your aid as the deadline approaches.

All comments are given in verbose US English. Avoid "slang" or "leet" terms; stick with what could be looked up in a generic dictionary. ( Leo is a good online reference.) Double-check your spelling. Consider broken language to be equivalent to broken code. (Yes, even in comments.)

Consider uncommented code as broken. Consider linguistic mistakes or lack of clarity a bug.

Yes, this means: Issue a bug report! It is important that comments are easily understood by others. Those "others" most likely includes people with significantly different background, both linguistic and technical.

Avoid "boxing" comments with asterixes etc.

The right-hand side of such boxes gets in the way when your comment has to be edited later.

Comments are always indented to the same level as the surrounding code.

4.2 Tactical Comments

Tactical comments are placed in-code. Their purpose is to clarify things for future developers and code inspectors looking for a bug, or a way to improve the system. They are not intended for clients using the implementation as a black box.

All tyctical comments are done C++ style (//) unless otherwise noted. This is now part of the C standard, and should be supported by any compiler!

This allows disabling segments of code with C style comments (/* ... */), e.g. for debugging purposes.

You are expected to explain what a piece of code does, and why you made it doing this - information that cannot be gained from the code itself. You are not expected to repeat the how of the code workings.

Explaining the how is redundant since it is already in the code. Also, people tend to forget adapting the comments when changing the working details of the code later.

Documenting the what, however, makes it easier to browse your code later - code inspectors are not required to pry apart your code line by line to understand what is going on.

The why is also important to understand your intentions. For one, it keeps others from falling into a trap you have already identified and avoided. Additionally, while your code might not actually be buggy, it might just not do what you intended. It can be hideously difficult to find logic bugs if the code inspector has no idea what you intended in the first place.

If your comment becomes lengthy and unwieldly because you cannot explain your intention with a couple of simple words, chances are you have a flaw in your design and should rethink that first. Another solution would be splitting a complex statement into multiple simpler ones, commenting them individually.

This might make tactical comments stating the obvious, like
// get database connection
Do not worry about it - redundant comments do not hurt anybody, only omitted comments do.

It is not a shame if others do not know the algorithm you used. Explain it. Provide references. Consider that the reader of your code might not even be familiar with the problem domain.

Do not try to hide weak code. Not providing the optimal solution is not a shame. Point out the suboptimal so others can come up with a better solution easily.

Another use for tactical comments is to relieve the code inspector of the necessity to open every included header file to find out what the non-local functions called actually do. Whenever you use functions that are not from the problem domain (i.e., two or three closely interoperating classes), and are not absolutely self-explaining either, consider if you could give future code inspectors a hint to what you are doing. Such a comment could also be a valuable hint if, at some later point of time, the function you are calling is changed from the semantics you expected (and expressed in your comment) to something else.

Generally, avoid trailing comments that reside on the same line as the commented code. Rather, place tactical comments on an individual line in front of the code they refer to.

Unless you have a very carefully configured syntax highlighting, trailing comments tend to clutter the code. Any vertical alignment also tends to get misaligned as you edit the code. With the line width limit of 76 characters (you still remember that, do you?), you do not have much space for trailing comments either.

There should be no need to introduce blank lines in your code to point out functional segments - such segmentation should be provided by tactical comments explaining the segment.

4.3 Strategic Comments

With the advent of javadoc, documenting your APIs using special markup comments in the source code itself has become increasingly popular - having source and API documentation close together makes it easier to keep them in sync.

Probably the most powerful tool to do this is Doxygen. However, Doxygen is also rather complex. To avoid people getting swamped in details of a documentation tool, this Style Guide defines a subset of the Doxygen commands to be used in the scope of this project.

This relieves developers of having to read another lengthy manual.

Do not use any Doxygen features not outlined in this chapter.

This would confuse developers not as familiar with Doxygen, and could break things if we want to use some other tool at a later point of time.

4.3.1 General Use

A documentation comment is much like a C style comment, but starts with /** (note the double asterixes). The first sentence of the documentation comment (i.e., the text up to the first period ".") is the brief, the remaining text up to the first tag is the verbose description.

4.3.2 @deprecated [<issue_id>] <description>

Marks parts of an API (class or function) as deprecated, i.e., further use is strongly discouraged, and support for this API call might be dropped altogether in later versions. The description should give a brief rationale, and state alternatives.

Marking a class or function as deprecated is not "natural evolution" as some claim, but more like admitting a serious design error: The implementation details were not properly hidden by the class or function interface. The initial design should take great care that its API is flexible enough to allow changing the implementation without having to deprecate parts of the API. This is the only way clients can enjoy improvements painlessly.

Do not state expected lifespans for deprecated APIs.

You will most likely be proven wrong, and you will almost certainly forget to update the expected lifespan statement. Few things spoil trust in an API as much as reading "will be removed by 1999".

4.3.3 @param <parameter name> <description>

Describes a single parameter of a function. This description should contain any prerequisites for the parameter if such are necessary, and specific behaviour of the function in case of violated prerequisites (e.g. NULL pointer).

Multiple @param comments should be placed in one block, in the sequence they are declared in the function prototype.

4.3.4 @return <description>

Describes the return value of the function - not its syntactical type, which is obvious from the function prototype, but rather its contents, implied format, error conditions, etc.

4.3.5 @see

Gives a cross reference to a function, constant, variable, file, or URL.

Use this to link logically related API calls, like OpenFile() - CloseFile().

4.3.6 @since

States the product version which first introduced the class or function.

This allows clients to keep downward compatibility in mind, and stating correct minimum requirements in their manuals.

This is explicitly a product version number.

Do not use a date here - that would be utterly meaningless for the client. Likewise, clients should not have to worry about internal version numbers of your product.

4.4 Special Comments

There are some special cases where a comment is neither tactical nor strategic, but rather used like a language extension.

4.4.1 // EMPTY

Especially for loops sometimes have no code in the loop body - all the work is being done in the header. switch - case statements also can have empty code blocks. To make it explicit that such a code block is empty on purpose, place an // EMPTY special comment into the code block.

4.4.2 // FALLTHROUGH

Sometimes it is necessary to have a case block not ending in either a break or return, but instead continuing with the code of the next case block. This behaviour is called "fallthrough". To make it explicit that such a construct is done on purpose, a // FALLTHROUGH special comment is placed where the break or return would be placed usually.

This should be avoided since it breaks structured programming.

4.4.3 // COMPILER: <description>

Using this special comment is about as condemnable as using goto. Avoid it like the plague.

This marks a piece of code known to rely on implementation-defined, non-ISO behaviour of compiler or library, and might break with another compiler or library (or another version of the compiler / library used).

The description should contain a brief explanation of what is non-standard, and why it could not be implemented in a standard way, plus name and version of the compiler or library for which the non-standard code compiles correctly.

Remember that there is only one "standard", and that is not "gcc", but "ISO".

4.4.4 // BUG: <issue_id> <description>

Marks a class or function as containing a known bug. The bug id refers to the bug tracking software employed by the project; the description should give details from a client point of view, i.e. what the client might experience when using this class or function.

Information relevant to developers trying to fix the bug should be given in the bug tracker instead.

If you place this comment in the code, add a note to the corresponding bug tracker entry.

So that the developer fixing the bug in the tracker can also remove the bug comment. The two edits might turn out to be in completely different source files, after all.

4.4.5 // TODO: [<issue_id>] description

This marks classes and functions that are unfinished in their functionality, or which are already scheduled for expansion. It also serves as a warning for potential clients that the API might change significantly in future versions.


5. Versioning

Pro-POS will introduce a VersioningStyleGuide in time for our first code releases. The following section of this document is for generic projects wishing to adopt our Coding Style Guide.

Note that the popular tags @author, @date and @version are not included in the Strategic Comments section. That has been done on purpose. They are most often neglected, and outdated information is even more confusing than none. Moreover, the information those comments would add is generally not of interest for the client programmer, and so should not appear in client documentation.

For such purposes (too), every project should employ a Version Control Software (VCS). Since this is a Coding Style Guide, only generic recommendations will be made.

Virtually all VCS packages support the "placeholder string" $Id$, which is expanded automatically upon code retrieval to contain updated administrative information, including last author, last date of modification, and version number of the individual source file. A comment containing this placeholder string should be in the first line of a file (so it shows up prominently on the first page of screen or printout).

Decide for a versioning scheme before you ship the very first draft version. Use x.y, or a.b.c, or whatever, but use it consistently, and make sure that clients understand your way of versioning so they can make an educated decision on which version to use. Using dates for versioning is discouraged, since it leaves the client clueless about intermediate versions, amount of changes between two given versions, etc.

Ideally, your operating system defines a consistent way of both the format of version numbers, and on how to include them properly into your binaries so they can be identified later, including tools to retrieve these version numbers easily. If it does not, consider switching to an OS that payed attention to such a basic service as binary versioning.


6. Naming Conventions

6.1 Language

All identifiers are named in verbose US English. Avoid "slang" or "leet" terms; stick with what could be looked up in a generic dictionary. ( Leo is a good online reference.) Double-check your spelling. Consider broken language to be equivalent to broken code.

The best linguistic pun backfires if it takes the person debugging your code extra time to figure out what you meant with it.

With identifier names that consist of more than one word, write the first letter of each word as capital (ConcatenatedLikeThis).

Exception: Constants, enum labels, template parameters and macros are written in ALL_CAPS, with as word seperator where necessary. Do not name any of those beginning with "E", "LC", or "SIG".

This helps consistency, and avoids clashes with reserved ANSI/ISO C indentifiers. It is actually more strict than necessary, but it makes the wording of the rule simpler.

Avoid abbreviations, unless absolutely necessary. (Try to think of a different, shorter name.) If you abbreviate, do not use all-caps. It is XyzSomething, not XYZSomething.

Set up a list of CodeAbbreviations? used in the project, which is readily available for any contributors; use those abbreviations consistently. (Do not switch between abbreviated and non-abbreviated versions of the same term.) Abbreviations should remain pronouncable and intuitive.

Arbitrary mixes of e.g. ObjCpy and ObjectCopy can be a real pain. Pronouncable identifiers help with mnemonics and communication, as does consistency.

Local, automatic variables - like the otherwise unimportant index of a for loop, or a strictly temporary variable) can be named in the canonical short way (i.e. i, tmp). Avoid any identifiers that are only subtly different, either orthographically, acustically, or visually.

String string; - xtype_t temp; ytype_t tmp; - int i; int j - double l0; double 1O;

If you nest for loops, use i1 - i2 - i3 instead of i, j, k. More expressive index names would be even better.

Functions are the only identifiers to be named starting with a verb, written in lowercase.

Functions do something; everything else merely is something.

6.2 Namespacing

Do not encode type or namespace information into a variable. Yes, this means no hungrarian notation.

Whatever you encode, the reader has to decode. This introduces a learning curve for newcomers, and makes identifiers longer and less pronouncable, hindering communication.

Exception: Static variables have a leading s_, and globals a leading g_.

As opposed to function arguments, local variables etc., the scope of these two is non-obvious. The prefix letter gives the developer a hint at where to look for their declaration.

Avoid statics and globals, they usually break thread safety.

6.3 Reserved & Forbidden

Do not use underscores or numbers except as described above. (Not even the witty infix "2" popular with conversion functions.)

Aside from being consistent, it also avoids clashes with reserved ANSI/ISO C identifiers. Moreover, it avoids the infamous l vs. 1, O vs. 0 confusion.

6.4 Identifiers for People, not for Compilers

Use identical identifiers in declaration, definition, and the Strategic Comments. Do not use "anonymous" parameter declarations.

Being able to refer to a parameter by name helps communication.

If you cannot come up with a good name for a function or variable, or that name consists of more than three words, that is a hint that you should rethink your design.

The construction does not "feel natural", as any good design should.

With enums, prefix the labels so they do not clash with other enums.

enum SwitchState
{
    SWITCH_ERROR = 0, // see 7.1 Variables
    SWITCH_OFF,
    SWITCH_ON

A good guideline for choosing identifiers is to see how much sense they make when used. Consider:

if (tmp->state())     // BAD
if (stack->isEmpty()) // GOOD

There should only ever be one name for a thing, and one name should not be used for different things.

If you have to use typedef for practical reasons (e.g., to make a frequently used function pointer more comfortable to handle), declare it at file scope. Do not export typedefs to your clients. If they want, they can create a typedef of their own.

This way a reader will see the typedef statement in the client code, and know what it stands for without being required to look it up elsewhere (where?).

Typedef'ed types are named with trailing _t.

The C++ Standard Library should be your guideline for naming things. Where the Standard Library does not provide an unambiguous convention, create and maintain a list of NamingConventions?.

It is quite likely that we will switch to C++ sooner or later, so some consistency wouldn't hurt. Besides, C++ coders would find size() and iterator_t much more familiar than length() and enumerator_t.

Look up the NamingConventions? if in any doubt about naming. Actually, always look up the NamingConventions?. Either there is a convention in there, or you should add it now even before you commit your code.

Avoid redefining a global identifier locally.

Confusing and easily avoided.

Do not imply implementation details in an identifier.

Instead of UserList, call it UserContainer - it might no longer be a list in the next revision.

Do not limit the identifier of a general solution to a problem domain.

Instead of UserAddress or CustomerAddress, just call it Address, so it will not look misplaced if used in a different problem domain. When instantiating the class, using the problem domain as part of the variable identifier is quite allright.


7. Design

7.1 General

Do not write code depending on implicit type conversions. Pass the correct types to functions. If a function expects a double, pass a double, and do not rely on the int you are passing to be converted implicitly.

An explicit cast is preferable over an implicit conversion. Try to avoid either.

7.2 Consistency

Get familiar with already available project libraries. Do not reimplement functionality that is already implemented and tested. If the library you need is not yet finished, help finishing it. If an existing library does not provide enough functionality, extend it.

Keeping functionality in one spot helps maintenance and future improvements. Having two tools doing almost the same can be terribly confusing.

If you absolutely have to reimplement code (e.g. because you need a subtly different behaviour), be sure to document those differences.

An unaware co-developer might replace what he considers "redundant code" with a library call, introducing a bug as subtle as the differences of your new code vs. old library code.

Communicate with the maintainer of the library code to find out the best, least confusing way to provide the two functionalities. Place your implementation "close" to the existing one - two functions into the same source file, for example.

7.3 Memory Management

Never allocate memory and expect others to deallocate it.

Always assign NULL to a pointer that points to deallocated memory. While you are at it, try to avoid having such pointer variables around longer than absolutely necessary.

Keep in mind that including a header should never allocate memory - either static or on the heap. Use extern.

Remember that static and global data usually means your code is not thread-safe.

7.4 Variables

Always use unsigned types for variables which cannot reasonably have negative values.

Always use inclusive lower and exclusive upper limits.

Indices are starting at 0 and ending at size() - 1. The size of the interval between the limits is the difference between the limits. The limits are equal if the interval is empty. The upper limit is never less than the lower limit. Such invariants make debugging easier.

Initialize variables upon declaration.

It is actually "cheaper" than assigning a value later, avoids referencing uninitialized variables by accident, and keeps memory consumption at a minimum duration. (Why reserve memory for a value you do not actually need until much later?)

In blocks of variable declarations, align identifiers, initializations, and comments vertically.

Always declare variables with the smallest possible scope (limit visibility).

Consider for (int i = 0; ...) if you do not need i outside of the loop. Smack your compiler vendor if the compiler complains about multiple successive occurences of this construct, because this is part of the standard.

Prefer prefix increment and decrement (++i) over postfix (i++).

It actually makes a difference in C++.

Generally avoid statics and globals.

They usually break thread safety.

With enums, add an "error / invalid / undefined" state, and make it the first label in the enum, with value set to 0.

Do not abuse #define or enum for defining constants. Use const.

Write what you mean, mean what you write. Do not trick your way around the language. See also 9. Preprocessor.

Avoid "magic numbers". If you consider using a constant value (other than "0") in any kind of expression, declare a const.

If you have to change all occurences of a magic number later on, you will find that it is no fun to search & replace, say, a "2"...

Avoid embedded assignments.

a = b + c;            // GOOD
d = r + a;

d = r + (a = b + c);  // BAD

Optimizing is the job of the compiler. There are also some subtle traps you might fall into (temporary objects, evaluation order).

7.5 Pointers

Pointers can represent an array, an element of an array, null, and one past the end of an array. They can be reassigned to point somewhere else.

Avoid =typedef=ing pointer types.

If it is a pointer to X, it should not be named Y. A pointer type can be NULL, and the client should be made aware of this possibility.

While you are at it, avoid typedef in general.

Name things for what they are, not what you use them for.

Exception: You should try to avoid function pointers altogether. If you have to use function pointers, typedef=ing them into something more manageable can greatly help readability. Keep the =typedef prominent and well-documented, though.

7.6 Calling and Returning from Functions

The return value of a function should always be declared explicitly. Do not use "implicit int".

In calling functions, pass objects by reference wherever possible. Use const to protect such parameters against modification.

Call-by-value creates temporary objects at the cost of performance and memory; in just about all cases this is not welcome.

Non-const ("out") parameters - passing an object by reference and modifying it within the function - are strictly forbidden. All parameters passed by reference should be passed const. (See Const Correctness.)

While they are popular with many developers for claims of "performance", consider that the argument object might well reside in read-only memory, be shared among multiple threads etc.

There are style guides around that advocate having only one return statement per function, at the "cost" of temporary objects. While having a clear control path is beneficial, this must not lead to awkard constructs. Use Nassi-Shneiderman diagrams as visualization of your control flow; there should be a return in each "bottom box".

In most cases, both rules will yield identical code with only one return; the NSD-based rules results in "cleaner" structures only in tricky cases. Even if unfamiliar with NSDs, it should not take you longer than one day to get used to this to the point where you no longer have to scribble down NSDs on paper.

Make all returns explicit (even in a void function).

7.7 Optimization

There are two stages of optimization: Stage one is done during design, stage two after implementation. Do not attempt to optimize ad-hoc during implementation.

Premature optimization is the root of all evil. -- D. Knuth

Only after implementing and debugging, if you can prove (i.e., with the profiler) that you have a performance problem, and know where exactly, you may consider optimizing. Think twice before you begin. Consider going back to optimizing the design instead of tweaking the implementation that usually yields much better performance improvements.

If you do not consider a redesign to be worth the effort, think again about whether you need the optimization at all.

Do not use inline in public library interfaces.

Any client of a public interface containing inline will break if you ever change the implementation, because the code is compiled into the client instead of linked from the library.

Exception: If a function is only a simple wrapper for another function.

Header files should have a zero memory "cost". Do not define data elements in header files.

Declaring a data element does not introduce any "cost"; defining a data element does.

Do not use bit-shift operators for arithmetic operations (e.g. x >>= 3 instead of x /= 8).

Optimizing by "code tweaking" is the job of the compiler, not the developer.

7.8 Multithreading

Take care of multithreading / concurent access. If there is any chance that your construct will be referenced outside the thread environment it is created in, consider reference counting and access locks (monitors). If in doubt, assume your code will be used re-entrant.

What is not, might yet be, and then you would end up with non-reproducable bugs that are nearly impossible to fix.

Re-enentrant code must not use global non-const or static variables carelessly.

7.9 Const Correctness

Use const wherever possible. Make all parameters of function calls const. Make pointers and the objects they point to const, unless you are sure you have to modify those things.

Using const will make your functions more trustworthy, and if you happen to modify a value by accident, the compiler will inform you of your error. (Prefer compiler errors over run-time errors.) It might be awkard and uncomfortable at first, but you will soon find out it makes for much "tighter" code. The compiler is your friend, so use it to the maximum extend.

If you have to "open up" the const correctness of your code, chances are you have discovered a design flaw. Fix the flaw, not the const correctness.

For consistency reasons, the const keyword is always placed to the right of the object declared constant. It is int const * for a pointer to a int constant, int * const for a pointer constant to int, and int const * const for a pointer constant to an int constant.

Note the language convention of "int constant", as opposed to "constant int". Making this customary will help writing correct code and avoids misunderstandings with int const *.

7.10 File Layout

7.10.1 Header Files

Every header file should begin with this:

/* $Id$
   $COPYRIGHT$
*/

#ifndef __HEADERNAME
#define __HEADERNAME __HEADERNAME

And end with this:

#endif // __HEADERNAME

This way, you can globally search & replace the copyright notice you would like to use. Typos, obsolete statements etc. can be corrected in one place this way.

The header should contain, in this order:

  • #ifndef header guard;
  • standard =#include <>=s;
  • non-standard =#include ""=s;
  • extern global variables;
  • file-scope static variables;
  • any =const=ants you need in the header;
  • forward declarations;
  • =typedef=s;
  • =struct=s and =union=s;
  • the functions.

Remember that global and static variables jeopardize thread safety.

Debuggers usually have trouble with =inline=d functions, because, well, they are inlined and no longer a function at all. The "inline include kludge" can work around this problem.

_TODO: inline include kludge

7.10.2 Implementation Files

Either use one implementation file per header, or create a directory named after the header file containing one implementation file per function.

By putting functions in seperate implementation files allows linkers to link only the binary for the function in question. If a header declares many different functions, of which only a few are used in an executable, this can significantly reduce executable size.

Sometimes it is desireable to have local "helper" functions. Those are to be declared and defined in the implementation file, so their visibility is limited to where they are used.

Define things in the same sequence they were declared in the header file.

Do not nest loops and conditionals too deeply. If the indenting tabs threaten to push your code off screen, that hints towards your algorithm being too complex, your coding style too long winded, or parts of your algorithms being better placed into a subroutine.

Functions should be crisp, short, and simple - no longer than one or two screen pages. They should do one thing, and that one thing they should do well. Do not implement tricky side effects that might help you at the moment, but break the function for others to come.

Experience shows that the understanding for a piece of code rapidly deteriorates once you stop working on it. Short functions with minimal interfaces help in understanding when you - or someone else - return to review the code, looking for bugs or a way to improve perfomance and/or functionality. It also helps in testing a function thoroughly.

7.11 #include

Include only if you must, use forward declarations where you can.

Unnecessarily includes clutter the namespace and introduce unnecessary compilation dependancies.

Make your source files "self-contained"; do not rely on the fact that some header you include includes in turn some other header you need. That way, your code will not break if that "second level include" is changed at a later point.


8. DNA (Do Not Assume)

The following rules are stated as "Murphy info" instead of negative "don't"s, since people seem to feel a certain negative attitude towards negative rules. Read them carefully and ponder their implications a bit.

8.1 Cross-Platform DNA

  • There are no native datatypes. The only datatypes you might use are those declared in global/types.h and the standard library.
  • char, short, int and long are of different size each, just like float, double, and long double.
  • int is not 32 bits.
  • char is neither signed nor unsigned.
  • char cannot hold a number, only characters.
  • Casting a short type into a longer one (int -> long) breaks alignment rules of the CPU.
  • int and int* are of different size.
  • int* and long* are of different size (as are pointers to any other datatype).
  • You do remember the native datatypes do not even exist?
  • 'a' - 'A' does not yield the same result as 'z' - 'Z'.
  • 'Z' - 'A' does not yield the same result as 'z' - 'a', and is not equal to 25.
  • You cannot do anything with a NULL pointer except test its value; dereferencing it will crash the system.
  • Arithmetics involving both signed and unsigned types do not work.
  • Alignment rules for datatypes change randomly.
  • Internal layout of datatypes changes randomly.
  • Specific behaviour of over- and underflows changes randomly.
  • Function-call ABIs change randomly.
  • Operands are evaluated in random order.
  • Only the compiler can work around this randomness. The randomness will change with the next release of the CPU / OS / compiler.
  • For pointers, == and != only work for pointers to the exact same datatype.
  • <, >, <, > work only for pointers into the same array. They work only for char=s explicitly declared =unsigned.
  • You still remember the native datatypes do not exist?
  • size_t (the type of the return value of sizeof) can not be cast into any other datatype.
  • ptrdiff_t (the type of the return value of substracting one pointer from the other) can not be cast into any other datatype.
  • wchar_t (the type of a multibyte character) can not be cast into any other datatype.
  • Any ..._t datatype cannot be cast into any other datatype

8.2 Reliability and Security DNA

  • The input data for your functions is always corrupt in a most subtle way.
  • Memory allocations always fail. (Quick: What is the behaviour of malloc() upon failure?)
  • Someone will pull the network plug.
  • The power supply will fail at the worst moment imaginable.
  • Denial-of-service attacks will hit your machine.
  • The RAM in your system corrupts everything you store in it.
  • There is always an Eve.
  • Any Mallory is a major league black hat.
  • Mallory has physical access to your machine.

If you do not know who Eve and Mallory are, or think that "black hat" is a Linux distribution, you are in serious trouble, and should brush up your knowledge on security and cryptography.

In case of a system crash, network or power failure, your system should always recover to its previous state after reboot / network reconnect / power-up. Automatically and without asking for administrative measures. Make sure to log anything that went awry during your recovery, but do not just turn belly-up and wait for the admin to appear.


9. Preprocessor

If defining a "flag" symbol is necessary (e.g. for header guards), it should be done like

#define SYMBOL SYMBOL

Even if there is an identifier by just that name, the source will still compile correctly.

Macros are strongly discouraged. Use const, inline, or any other things done by the compiler, before you consider writing a macro. All exceptions to this rule must be agreed upon by project decision, documented globally (MacroDefinitions?), and defined in one central include (global/macros.h).

Although C gurus swear by them, macros are a first-grade error source, confuse debuggers and developers alike, and usually give zero readability.

If you think otherwise, please contribute a macro definition trick that allows a macro to be used syntactically similar to a function (i.e., including the () operator), both capable of being used as a single statement if written with trailing semi-colon (think if - else, while and for), and a printf() statement (i.e., printf( "%d", MACRO() );).

Keep macros at a documented, consistent, absolute minimum where no other construct would do.

Repeat, if you do any macro, that has to be in global/macros.h, and documented in full (what, how, why) in MacroDefinitions?.

Prefer #if defined() over #ifdef.

The former does check if the symbol evaluates to true, the latter simply checks if the symbol is defined. A symbol defined to zero would be #ifdef, but not #if defined().


10. Compiler

The compiler should always be run in the strictest possible error checking mode. Where available, using additional coverage / purify testing tools is strongly recommended.

We are looking at providing a test environment on the Pro-POS server, to make such checks automatic. Any help in this department - pointing out available free tools, bringing in experience in using them - is very welcome.

Compiler warnings are not acceptable, period.

Even if you have to change perfectly legal code to get around it. That other developer might not know that the dozens of warnings zipping across his terminal are perfectly legal code.

If you are using gcc, always run it with -Wall -Werror.

Do not assume that your code will always be compiled with a specific compiler, for a specific platform or a specific CPU. Do not assume it will always be linked with a specific "standard" library. Do not assume.

Check chapter 8, DNA (Do Not Assume).

Stick to the language standard, and mark anything that does not strictly adhere to it with a // COMPILER Special Comment stating the what, why, compiler and library version. Consider each such section as a bug requiring a fix.


11. The Usual Suspects (tm)

Stupid as it may sound, if your code does not work as intended, check for The Usual Suspects (tm):

  • Check, double-check, and re-check for off-by-one errors whenever using pointers, arrays etc.!
  • C-style strings are one byte longer than their contents because of the trailing '\0'. strlen() does not count that '\0'!
  • Be constantly aware of potential buffer overflow. Whenever something is written or read, that could be a major security / stability issue!
  • Floating point variables are not a precise art. Do not use them for loop counters. Do not test them for equality. Be aware that they might hold only an approximation of the value assigned.
  • Do not ignore compiler warnings. The most innocent warning could hide a major bug!
  • Do not assume. Make provisions for corrupt input data, failing memory allocations, dying network connections.

A. References

"The C Programming Language, Second Edition, ANSI C", by Brian W. Kernighan and Dennis M. Ritchie, (c) 1988, 1978 by Bell Telephone Laboratories, Incorporated

 Dinkumware "The Unabridged Library Reference"

 I. Nassi, B. Shneiderman, "Flowchart Techniques for Structured Programming", SIGPLAN Notices 12, August 1973


Last edited on March 3, 2004 10:49 am.