Eternal Coding Standard

  • Introduction

    Having a standard way to format and layout code helps reduce the chance of common errors, and allows the programmer to easily see the program flow and intent of the original author. Coding is hard, but many simple errors can be avoided by using good coding style practices.

    However, coming to a consensus as to what the one true coding style should be has had programmers debating for decades! I'm sure many programmers will disagree with large portions of this coding standard, but feel free to email me stating your case.

    I have written below the coding standard I currently use, and one that has evolved over 30 years of professional coding. Although some recommendations may seem finicky, applying them as you write the code takes no extra time, and could save a significant amount of time in the future.

    Keep in mind that the actual time taken physically typing the code is miniscule compared to developing the algorithm, compiling, debugging, code maintenance, and documentation. Also, any given code will be viewed and debugged many more times than it is actually written, so spending an extra couple of seconds when writing the code helps save time in the long term.

    Some of the suggestions below will be considered obvious, and it would never occur to many programmers to not do it the way suggested. However, if it is mentioned below, I've seen it in production code, so it did occur to do it that way to some programmer at some point.

    The key to a good coding standard is readability, reducing ambiguity, and the easing of code maintenance.

  • Use your company's coding standard.

    It doesn't matter what it is, it should be applied to the best of your ability. You are free to lobby inside your company for tweaks, updates, and new rules, but this tends to be a very slow and iterative process. Most companies have a set of rules, and a set of recommendations. This means you are free to experiment in some less contentious cases.

    Also, feel free to email me to lobby for a change.

  • Don't edit existing code.

    Apply the coding standard to the code you write or refactor, not to anyone else's. It's easy to go hog-wild reformatting code, but this could introduce bugs, and almost certainly annoy the programmer whose code is being reformatted.

    Consistency is key. If you're editing some code that uses a slightly different style, use that style while in that code. An inconsistent standard is worse than no standard at all.

  • Use of languages.

    Generally:

    • For performance critical, cross-platform code - use C++.
    • For tools, scripts, and prototyping - use C#.

    Obviously, web pages and web APIs can require Javascript and Java to be exploited properly, so don't try to force an inappropriate language when there is a well designed and supported one already there.

    Keeping the number of used languages to a minimum means more code sharing between projects, less of a learning curve for you and other programmers on your team, and fewer errors made from the programmers having to be a 'Jack of all trades'.

    C# is to C++ as C++ is to assembly. You can write entire projects in assembly, and this was practical in the era of 6502 and 68000, but the additional productivity of C++ eventually made this no longer worthwhile.

    LinqPad is very useful in writing one-off utility scripts in C# that run in immediate mode, and if the script gets large, it trivially can be put into a standalone project to single step debug and, if necessary, compile into a standalone executable. Due to the bytecode nature of compiled C#, most command line based utilities can also run on the Mac or Linux under DotNetCore, and on other processors (such as ARM) without any recompilation. Now that C# is open source, support will only improve over time. Finally, the ability of LinqPad to test SQL/Linq queries has proven invaluable to me in the past. To me, the above negates the need for scripting languages such as Python or Perl.

    C++ is cross platform, but the amount of glue code required to support disparate platforms is not for those without strong intestinal fortitude; it requires a lot of planning and custom code to make it work. However, it is the most performant coding layer practical to use nowadays. The include header model for C++ means that compile times for large projects will cause a significant loss of productivity unless addressed aggressively, especially if the project uses templates heavily.

    C++ now comes in multiple versions; C++14, C++17, C++20. Feel free to use the latest version that is supported for all your target platforms. If one of your platforms only supports C++14, then that is the version to use. As an aside, I disagree with the direction that C++ is going; the effort seems to be on making the language easier for new users to learn when the real problem is maintaining large legacy codebases.

    If another language is required, try to ensure it can be single step debugged, as this is a critical way to learn how a new language works.

  • Bracing - Always use bracing, and always use the Allman/BSD style.

    
        if( Condition )
        {
            DoThis();
        }
                

    Braces should always be used to prevent programmers adding lines with the correct indentation and expecting it to be part of the scope. The following would be easy to misread, and the intention of the code would be ambiguous; did the original author intend the second function call to be within the condition scope?

    
        if( Condition )
            DoThis();
            DoThisToo();
                

    The Allman/BSD bracing style is easier to read than the K+R style, especially in the case of multi-line conditionals where the condition and the code blur together:

    
        if( ConditionA
            && CondtionB ) {
            DoThis();
        }
                

    If your company's coding standard requires K+R bracing, adding a blank line after the opening brace does make the code less unreadable, if not embracing the spirit of the standard.

    
        if( ConditionA ) {
            DoThis();
        }
                

    K+R does allow more lines of code on screen, and this was of great benefit in the era of small monitors, but that era has long since passed.

    In the past, I have compilers fail to compile clauses properly when braces were not included. Granted, this was a very old compiler.

    As an aside, I've found the vast majority of self taught programmers chose the Allman style, and most K+R fans came from academia. It would seem that unless indoctrinated by previous work, the Allman style is more intuitive.

  • 80 column limits.

    As with K+R bracing, this was a boon in the era of small monitors, but with longer variable and function names preferred, this would be very tedious today. Lines should not become too long, but use a sensible limit, such as 200, more as a recommendation than a hard and fast rule.

  • Never use the keywords 'auto' (C++) and 'var' (C#) in production code.

    These are useful tools when prototyping code that will not eventually be used, or making a quick LinqPad utility script, but significantly reduce readability of the code. The code listed below is rendered meaningless without explicit types. Also, if GetResults() changed the container it returned, the automatic iterator generation would happily compile, but may not work with Function(), or generate a non-obvious compile error.

    
        var Results = GetResults();
        foreach( var Result in Results )
        {
            Function( Result );
        }
                

    Possibly the one acceptable case of the 'auto' keyword is when generating complex template iterators:

    
        Dictionary<string, string> Items;
        for( auto& Item : Items )
        {
        }
                

    However, I would argue that making the programmer hunt back in the code for the type of 'Items' is time that could be saved by not using the 'auto' keyword.

    Some would argue that as auto types the variables implicitly, there is guaranteed to be no additional (and potentially slow) implicit conversion. It can also auto const variables to help the compiler optimise. These are good features in theory, but I don't think they outweigh the loss of readability.

    Another argument for 'auto' is that it makes the language more accessible as an amateur programmer doesn't have to know the type. I disagree with this as it makes the compiler more 'magic' in its operation, and doesn't help the amateur understand what is going on. It is not a good thing that he can't construct the correct type, so just types 'auto' and hopes for the best.

    I've never liked the 'auto' keyword, and the more I find it in code the more I dislike it.

    Resharper for C# and Resharper C++ for C++ are excellent tools to enforce coding styles and have a 'Use explicit type' refactor option to fix up 'auto' and 'var' defines.

  • Name your variables and functions with sensible and descriptive names.

    
        List<string> UserNames = CollectUserNames();
        foreach( string UserName in UserNames )
        {
            Display( UserName );
        }
                

    We can see from the above code:

    • It is not a simple operation to retrieve the user names; the term is 'Collect' not 'Get'.
    • User names are simple strings.
    • User names are in a container, and plural, so more than one is expected.
    • The user name is being displayed to the user.

    Try to keep the names as generic, but specific, as possible. For example, 'UserNames' is good as it's plural, refers to a container of user names which can be used for whatever required purpose. 'UserNamesToDisplayInUI' is not so good as a subsequent clause may want to do something else with the user names, and the 'DisplayInUI' is redundant as if it is displayed, it will be in the UI.

    Some other tips for naming:

    • Don't name a variable or function the same as a class or struct. e.g. Variable* Variable.
      • Having a naming convention that prefixes each class with a letter can be useful to help not run short of names to use. e.g. CVariable* Variable.
      • I would not recommend using '_' for this as it's a small character that is easy to miss.
    • Don't vary names purely by case. e.g. shutdownHAL() vs. ShutdownHAL().
    • Put the more unique name at the front. e.g. Use LoginWidgetOKClick() rather than ClickLoginWidgetOK(). This is because you wish to group by the unique widget, not the generic operation performed on the widget, during auto-complete.
    • Don't vary names purely by namespace so the class will be different depending on the namespace it is in. e.g. defining a class 'class Renderer : Render::Renderer'.
    • If the variable name is an acronym, capitalize the letters to show this.
    • Do not start a variable name with '_' as this is much more likely to clash with a system variable e.g. '_time'.

  • Limit variable scope as much as possible.

    It's much easier to read code if the variable is only defined for a small section; it makes it clearer which variables are parameters, and which are local (and hence temporary) to the function.

  • Don't use Hungarian Notation.

    Hungarian tends to hinder more than help understand code; use descriptive variable names instead. For example, 'Scale' will almost always be a float, and anything with 'Index' or 'Count' in the name will almost always be an integer. Hungarian can get especially gnarly when it is used at the expense of decent naming e.g. 'lpwcsX'.

    Another problem is Hungarianish notation when a coder using his own custom set of prefixes e.g. 'f' for flag and not float.

    The one exception to this is the Unreal Engine convention of prefixing booleans with a 'b'.

  • Define one variable per line.

    For example, don't do the following as it can be mistaken for a longer variable name, and it's harder to comment out or remove either variable when testing or refactoring. The second line highlights how it can be more difficult to differentiate between a comma as a parameter separator, and a comma as a variable separator.

    
        int32 Index, Count;
        int32 Index( Count, Maximum), Count;
                

  • Keep checks and booleans positive when possible.

    Naming a boolean 'bFeatureEnabled' makes it easy to decide whether to enable said feature. Naming a bool 'bFeatureNotDisabled' requires the programmer to waste time evaluating a double negative in their head. This can be made even more convoluted with a poorly constructed if clause:

    
        if( !bFeatureNotDisabled == false )
                

  • Make conditionals boolean.

    Although '0' and 'false' are theoretically the same in C++; keeping conditionals as booleans aids readability:

    
        if( Index != 0 )
                

    This is also a requirement in C#, so is a good practice to get used to.

  • Use the correct literal types.

    This is best illustrated by an example:

    
        float Scale = 0.0f;
        double ExactScale = 0.0;
        int32 Index = 0;
        bool bCondition = true;
        class* Pointer = nullptr;
                

    C++11 adds the new keyword 'nullptr', and this should be used over NULL if every used compiler is up to date.

  • Generously use parentheses.

    The following operation:

    5 + 2 * 3
    - is deterministic, but can easily be misread by a programmer. Writing it as:
    5 + ( 2 * 3 )
    - shows the programmer the exact intent of the code. The same applies to operators such as |, &, || and &&. For example:
    if( ConditionA || ConditionB && ConditionC )
    - is harder to decipher than:
    if( ConditionA || ( ConditionB && ConditionC ) )

  • Multiline conditional formatting.

    Put the operator as the first element of the new line, e.g.

    
        if( ConditionA
            && ( ConditionB || ConditionC )
            && ( ConditionD ) )
        {
        }
                

    This makes it easier to add or remove sections, and more logically represents the actual logical intent of the condition.

  • Pointer and reference identifier formatting.

    These should be flush up against the type, not the variable. This is because these operators are part of the type, and not part of the variable. 'Class*' is the type, which is a different type to 'Class'.

    
        Structure* Variable = nullptr;
        Structure& Variable;
                

    This approach also helps the programmer discern pointers from operators. 'x * y' is a multiplication, 'x* y' is a pointer variable, and 'x *y' is an attempt to dereference.

    In the olden days of many variable declarations per line, putting the '*' next to the variable name made sense. In modern day C++, where many variables have been moved to the class definition, local variables are mostly declared on demand, variable names are much longer, and there is a strong recommendation to use a single variable declaration per line - putting the operator next to the class is desirable.

  • Put spaces around binary operators, and single spaces before unary operators.

    With binary and unary operators using the same symbols as pointers and references, proper spacing helps disambiguate the various usage cases:

    
        1 + ( 2 * -3 )
        ( 1 + 2 ) * -3
                

  • Spaces in parentheses.

    I much prefer the style used in previous examples, as the 'if' is the owner of the 'Condition' being checked.

    
        if( Condition )
                

    Another popular style is:

    
        if (Condition)
                

    However, this is less intuitive to me as the 'if' seems independent of the 'Condition' being checked. Logically, I think 'if', then consider the 'condition'; I don't construct the 'condition', then apply an 'if' to it. Also, Visual Assist can get confused with this style as it doesn't associate the '(' with the 'if' keyword, so makes the wrong auto-complete suggestions.

  • Consider the number of parameters being passed to functions.

    As a general rule, a function with any more than four parameters should be rare. Consider refactoring to pass in an initialization class to avoid the argument list getting out of hand.

    Old compilers used to pass four or less parameters via registers, and any more via the stack, and this had a negative effect on performance in some cases. This is obviously no longer the case, but it does keep the code much easier to follow when the parameter list is kept to a minimum set. It can become much easier to make a mistake when passing several parameters of the same type. Compare the two function declarations below; which one is the easiest to read, and which one is less likely to get the order of parameters mixed up?

    
        void RenderRectangle( Context* DeviceContext, int32 Left, int32 Right, int32 Top, int32 Bottom, int32 ExcludeLeft, int32 ExcludeTop, int32 ExcludeRight, int32 ExcludeBottom );
        void RenderRectangle( Context* DeviceContext, Rectangle& Area, Rectangle& ExcludeArea );
            

    To illustrate my point, did you notice the order of parameters in the first line was different for the area compared to the exclude area?

  • Basic type names.

    Unreal Engine 4 uses the following, and these are as good as any: int8, int16, int32, int64, uint8, uint16, uint32, uint64, float, double, decimal, bool. The main criterion here is that all fundamental types are named in all lower case.

    For readability, I find byte, char, short, and word aliases are very useful too.

    The 'long' type especially should be avoided. It is dependent on the compiler, the processor architecture, and the language used. For example, in managed C++, 'long' is 32 bits, whereas in C# it is 64 bits.

  • Enums.

    There's a new way of defining enums that works very well:

    
        namespace EnumName
        {
            enum Type
            {
                Alpha,
                Beta,
            };
        }
            

    This means that all references to the enum can be found by searching for 'EnumName', auto-complete works much better, it is strongly typed when being used as a parameter, and this is much more like the C# approach.

    This has been superseded by the C++11 enum class, which is less convoluted and allows forward declarations of enums.

    
        enum class EnumName
        {
            Alpha,
            Beta,
        };
            

    It should go without saying that enums should be used over magic numbers; having an assumption that -2 represents an uninitialized state while 4, 5, and 12 are defined states is simply confusing.

  • Copyright messages.

    Always put a comment with a copyright message at the top of all source files. This message should take the form:

    
        // Copyright 2010-2021 Eternal Developments, LLC. All Rights Reserved.
            

    The ANSI copyright character (©) is not legally valid, so spell out copyright. This also avoids any problems with non-ASCII characters.

  • Use the override keyword.

    This has ubiquitous support in both C++ and C#, and is another hint to both the compiler and the programmer that the function is meant to override another in a parent class, and if this isn't the case, the compiler should let us know that something is unexpected.

    Related to this, always mark virtual functions as virtual even though this is only required in the base implementation. This shows the programmer that there is additional information in other source files.

  • Use #pragma once.

    The header file model of C++ can make it very easy to include the same header file many times; using '#pragma once' can prevent the compiler having to read in and parse the entire file every time it is referenced. It is by no means a fix, but it is an improvement on the old approach of defining a define based on the header file name, and checking to see if it is defined. This method has to parse the file to find the '#endif' as there could be code after it, whereas the '#pragma once' approach does not.

    On the subject of header files, try to only include them in C++ files to minimize the number of times a header file is included. For example, if two seperate header files include the same base header file, including the base header file in the source file before the inclusion of the other files saves the base header file being included twice. This is often not viable in practice, but something to bear in mind.

  • Define naming.

    Historically speaking, these have always been underscore separated, and all caps. For example:

    
        #define COMPILER_CPP_VERSION 10
                     

    I'm not entirely certain where this originated, but is a convention that has been used for a long time, and I see no reason to break it.

  • Use #if over #ifdef.

    The #if directive can have logical conditions applied - such as 'and', 'or' and 'not'. This is not possible with the #ifdef directive, so makes the former easier to use. It is also good practice to add a comment after the '#endif' specifying the '#if' it is referring to as that could be many pages of code away.

    
        #if ENGINE_MINOR_VERSION == 5 || ENGINE_MINOR_VERSION == 6
        // Code
        #elif ENGINE_MINOR_VERSION == 4
        // Code
        #else
        #error Unknown ENGINE_MINOR_VERSION!
        #endif // ENGINE_MINOR_VERSION
            

  • Don't use 'this' to disambiguate fields.

    For example, the following code is still ambiguous to the programmer, the 'this' keyword doesn't help. Use a naming convention (such as prefixing with 'In'), or properties, instead.

    
        public void SetFileName( string FileName )
        {
            this.FileName = FileName;                
                

  • Blank line after scope.

    A blank line after a scope can avoid issues with misreading an 'else' clause.

    
        if( ConditionA )
        {
        }
        else if( CondtionB )
        {
        }
    
        if( ConditionA )
        {
        }
        if( CondtionB )
        {
        }
                

    Also, add a blank line before any artificial scoping (such as for a scoped profiling macro) so as to not give the impression that the previous line is related to the scope.

  • Group variables of the same type together.

    C and C++ pack variables to their minimal alignment requirements, so grouping types together can reduce memory usage. The following structure is 16 bytes:

    
        int32 Count;
        bool bHasCount;
        int32 Index;
        bool bHasIndex;
                

    The following structure contains the same information, but is only 10 bytes:

    
        int32 Count;
        int32 Index;
        bool bHasCount;
        bool bHasIndex;
                

    Also note that the above structure will be padded to 12 bytes if put in an array. Sometimes grouping by use case makes more sense than grouping by type, but bear the alignment in mind when doing that.

  • Never use 'goto' - ever!

    One of the first rules learned as a programmer is to never use the goto statement, as it breaks the flow of code in unexpected ways. This still makes sense 30 years after I first heard it. If it seems like a goto statement is required, generally extracting that small section of code into a function can avoid it.

  • Use casting templates

    reinterpret_cast<> and static_cast<> are compile time directives to show whether the the cast is meant to be for related types. For example, use reinterpret_cast<> when converting from a pointer to an integer, and static_cast<> when converting from a child of a class to the base class. In the former case, any future programmer will know that the design requires an unsafe change between unrelated types and this is OK; in the latter, the types will have to be related in some way, so that still needs to be the case.

  • Never use unsigned by itself.

    A variable declaration such as the following is unclear:

    
        unsigned Count;
                

    It's an unsigned what? Using types with the unsigned built into the name (e.g. uint32) are a lot clearer.

  • Use const when appropriate.

    const can be a very useful tool to give hints to the compiler hints as to what it can optimize, ensure data doesn't get modified when it shouldn't, and to the programmer as to its purpose. The main caveat can be adding this to a function prototype can result in a lot of recompilation, so it's sometimes unwise to add consts after the fact. As a general rule, I'll const a pointer variable when safety is required (e.g. const char*), but leave basic types (e.g. int32) de-consted for convenience. It also depends how often the function is called; I'm much more likely to add a const on a low level, performance critical function (such as a string operation) than on a function that's called once per frame.

  • Use tabs not spaces, and one tab is four spaces!

    Using spaces rather than tabs can result in source files being much larger than they need to be, and can have an adverse effect on compilation times. It is also easier to edit tabbed files as it is 1 key press to move columns rather than 4 key presses with spaces.

  • Add class/field inheritances on separate lines.

    
        class Foo
            : public Bar
            , public Qux
        {
        }
                

    This makes it easier to see which classes are inherited, and easier to add or remove inheritances. It also helps differentiate between parameters passed to a constructor, and fields set in the constructor.

    
        public Foo( string Input, int32 Count )
            : public Bar( Input )
            , public Qux( Count )
        {
        }
                

  • Use initializer lists.

    Along the same lines as putting inheritance on separate lines, use initializer lists and put each member on a unique line. This is the path most compilers have been configured to make the most optimal code.

    
        FSimpleClass()
            : Pointer( nullptr )
            , Index( 0 )
            , Scale( 0.0f )
        {
        }
                

    Some compilers require the initializer list to be in the same order as they are defined in the class, and even if this is not required, it is a good practice.

  • Use of struct vs. class.

    On the note of passing by reference, there are 3 ways of passing a parameter to a function in C++:

    
    	void FSimpleClass::Method( FOtherClass* otherPointer, FOtherClass& otherReference, FOtherClass otherValue )
    		
    The otherPointer is an address in memory that contains that class instance - it uses the '->' operator to access fields and methods. This can be a nullptr which is typically an error or failure condition. The otherReference is also an address in memory but it uses the '.' operator to access fields and methods. It can not be null. Any operation performed on the instance using these methods of parameter passing are on the single passed in instance and will change the instance. The otherValue creates an entirely new class instance that is a copy of the original (this can be very slow). Any operation on this instance will be forgotten when the method exits.

    For C#, all classes are passed as a reference, and all structs are passed as values (a copy) unless the 'ref' keyword is used.

    The C# convention is reasonable to follow in C++ but not all encompassing. Another differentiator would be that classes perform a function, whereas structs hold data - even though both would have member functions and data fields. For example, a container for a generic value (which could be int, bool, or string) would be a struct, whereas functions utilizing those values would be in a class.

  • Function length.

    There is no hard and fast rule for this, but generally speaking, anything more than a couple of hundred lines is bad. One of the points of functional programming is to break down complex logic into easier to manage chunks; if the function just continues endlessly, the benefit of this is lost. Furthermore, Visual Assist and Intellisense take the programmer to the start of the function, and the programmer could waste time searching for the part he is looking for.

    The process of breaking down a long function into smaller logical sections can often reveal flaws in the program flow that are not obvious in many screens of sequential code. Finally, having the base function call several subfunctions shows the overall logic of the function clearly, whereas having to scroll down several pages of code for each section makes it more difficult than it needs to be.

  • Don't allow a this of nullptr.

    If a method does not access any class members, a typed pointer that is nullptr can call the method successfully. However, this functionality is undefined in the C++ specification and is prone to failure with compiler variations and optimizations. Also, if the pointer type inherits from other classes, a nullptr pointer may not be 0 which makes comparisons dangerous. This article goes into this in much more detail.

    Making the function static means the compiler will verify that no members are accessed and avoid any future problems with code maintenance.

  • Don't use symlinks.

    At least, don't use symlinks in code that is checked in to source control. The PC does not support these properly, and these will cause problems when not syncing to a Unix based file system. Even on supported systems, they just act as an extra layer of indirection leading to more work for anyone maintaining the code.

  • Forward declarations.

    While defining a class that has pointer members to another class or struct, prefix the pointer with class or struct keyword. This acts as an implicit forward declaration, and can save on having to include some header files leading to less header file dependencies and quicker compile times. It also makes clear whether the pointer is a class or a struct.

    
        class FPublicClass
        {
            class FClientClass* PrivateClass;
            struct FClientStruct* PrivateStruct;
        };
                

    In the above class definition, there is no need to include the header files for the FClientClass class or the FClientStruct struct in the header, just in the C++ file where FPublicClass is implemented. Furthermore, the aforementioned class and struct can be updated without having to recompile anything that includes the FPublicClass definition.

  • Don't put code with side effects inside an assert.

    This should be self evident, but asserts are only run during development, so an alternate code path would be run in a released application. I'm not a huge fan of asserts in the first place as they are terminal, but they can be useful in APIs to verify the caller is making the correct assumptions.

  • Function layout.

    Some programmers insist on one entry and one exit for every function. This is an ideal, and not practical for most purposes. I tend to define a result to return as default, then have a series of early outs, then do the logic and return the result if it is set. e.g.

    
        Object* Function( Input* Value )
        {
            Object* Result = nullptr;
            if( Value == nullptr )
            {
                // Potential warning message here
                return Result;
            }
    
            if( ThisIsDoable )
            {
                Result = Logic();
            }
    
            return Result;
        }
                

    This means extra early outs can be added easily and the return value is already set to something sensible. It also shows very clearly the set of prerequisites for the function to work, and if any of these fail for whatever reason, the code does not crash. Finally, the return type can be changed with only editing the declaration and the first line of the function, thereby reducing the chance of a compile error caused by missing a return case.

  • Try to avoid adding generic include folder.

    To illustrate the point, prefer using '#include "include/utils.h"' over '#include "utils.h"' and adding "include" to the list of paths to search for header files. This is to avoid contention when more than one library has an include file called 'utils.h', to speed up the searching for header files (there is nowhere to search in this case; it must exist in the specific location), and to avoid any issues with compiler command lines getting too long.

    Another approach is to only put public headers in the include folder and expose that, but keep most headers local and inaccessible to the rest of the program. As public headers are most likely specifically named (e.g. 'MyThirdPartyLibrary.h'), the likelihood of a name collision is much lower than something more generically name (e.g. 'utils.h').

  • Inlined blank functions.

    Try and put the scope braces on separate lines. This makes it easier to distinguish between declarations and inline functions as it is easy to miss ';' instead of '{}' at the end of a line. e.g.

    
        void SimpleSetter()
        {
        }
                

  • Use Doxygen style (C++) and Xml style (C#) comments when possible.

    The Doxygen format can be recognized by '/** Comment */', the Xml format for Sandcastle can be recognized by /// <summary>Comment</summary>. The Sandcastle comment header is automatically stubbed in by Visual Studio after typing ///.

  • What to put in comments.

    Some general notes when commenting code:

    • Code should be self commenting to large degree with appropriate function and variable names.
    • Comment in the header file as that is where most programmers will look first. Optionally copy the comments to the source file, but try to have one authoritative location which propagates elsewhere.
    • Try your best to keep the comments up to date, as I've been misled by out of date comments more times than I care to mention.
    • Try not to repeat the function name in the comments. For example, don't comment the function 'AcquireLatestMatrix' with 'Acquires the latest matrix'.
    • Always mention the units required. For example, is 'float Angle' in degrees or radians?
    • If you ever have to revisit a function to make a fix or update, that is an excellent time to add a comment.

    Here is an excellent article on writing self commenting code, and explains some of the other benefits of this approach.

    Also, put comments above the line they are commenting, and not at the end of the same line. This prevents the auto formatting in Visual Studio and VAX from misaligning comments on subsequent lines.

  • Multiple keywords.

    It can sometimes clarify code when variable declarations are tabbed out to line up. However, in the era of templated variables, no longer having to declare variables at the opening of scope, and a multitude of script keywords, separating with a single space now makes more sense. Tabbing out can add a lot of space between the class and the variable making it more difficult to associate the two. In the contrived UnrealScript example below, tabbing out Index would separate it from its type.

    
        var const native localizable transient array<int> Counts;
        var int Index;
            

    It is also very easy to get into the bad habit of adding a new field which is slightly longer than the others, and then tabbing out every other field in the class to match.

  • Multiple assignment operators.

    Multiple assignments on the same line (as below) is bad practice as it is unclear what is being assigned to what, and it's non-trivial to decipher whether this is a typo.

    
        x = y = z;
            

  • Lambdas.

    Lambdas have their uses, but can be coded in a way to confuse the programmer and cause performance problems. The C++ lambda format is no where near as clear as the C# lambda format.

    One use case which I don't like is using a lambda to enable the use the auto keyword rather than writing a static helper function. Write the helper function; your future self will thank you.

    Never put a single & in the capture statement and capture each parameter explicitly with a specific name. The single & will make a copy of all local variables (which could be huge) to pass to the lambda.

    One excellent use case is to pass a small function that needs to run on another thread. UE4 rendering does this.

C# Specific

  • String formatting.

    Dynamically construct strings as you would read them. Use the following style:

    
        WriteLine( UserName + " has " + ItemCount + " items." );
            

    - and not the style that emulates C++:

    
        WriteLine( "{0} has {1} items.", UserName, ItemCount );
    			

    The original style is easier to parse as it matches how it is read normally. Also, it is impossible to have too many or too few arguments.

    The latest C# has a new string interpolation feature which allows an even nicer format. This has the added benefit of being much more localization friendly; if another language reorders the string to be grammatically correct, the localizer can do this without altering the code.

    
        WriteLine( $"{UserName} has {ItemCount} items." );
            

  • Return empty containers rather than null in C#.

    In most cases, I find it best to return an empty container over null, unless the null also doubles as an error message. For example, if I ask for a list of names to display, returning an empty list is safer as any loop iterations will still work, just not have anything to iterate over. There is no need to check for null in this case. However, if a lack of names meant something was badly wrong, or an early out was required, then returning null would be more suitable.

    Some people disagree.

  • Use List<> over arrays in C#.

    Always prefer lists over arrays in C#; the list is much more usable than an array (such as LINQ extensions), and can be trivially converted to an array when required (with .ToArray()). If lists are not performant for the requirement tasks, arrays likely won't be either.

  • Alphabetize #using directives.

    When many assemblies are referenced, it can be easy to miss that one is included and end up adding it twice. If the directives are in order, it's much easier to find the correct line. Also, try to only reference assemblies that are actually used.

    Resharper has excellent tools for cleaning up references, and C# code in general.

  • Use AssemblyResolve when appropriate.

    C# is a bytecode interpreted language, so can run on any processor that supports the framework. However, when an 'Any CPU' assembly interfaces with native code, the assembly will only run on a specific processor. The best approach to handle this is to add an AssemblyResolve callback to the portable code, and put the assemblies in subfolders dependent on their target architecture which can be found by the callback.

    For example, a Perforce utility could be pure managed code, but when it needs a native interface dll it won't be able to find it in the current folder, so will hit the AssemblyResolve callback. The callback will see the request for 'p4api.net, Version=2013.2.69.1914, Culture=neutral, PublicKeyToken=f6b9b9d036c873e1', work out the current architecture and look for the architecture specific assembly needed.

    This approach can also be used to more efficiently share assemblies without having to copy them everywhere.

  • Put interops in their own class called NativeMethods.

    This quarantines the non-portable code into a single class. Should the code fail to run on a different architecture with an exception in the NativeMethods class, the reason will be obvious. Also, should the code require porting to another processor, all changes should be localized to a single source file.

Best Practices

These are not strictly related to a coding style, but I think they are good recommendations to follow.

  • Multithreading.

    Be very careful when multithreading any code - it needs to be designed to operate safely in parallel. The best way is to pick systems that are very much independent of each other, and interact with other threads in a selective surgical manner. When they do interact, try and keep any thread locks as short as possible to minimize the time a thread has to wait. The mutex variable used to handle the locking needs to be written to and hence can be in a const function. However, this can unravel quickly and force de-consting of a large amount of code resulting in a less safe project. Marking the mutex variable as 'mutable' gets around this - it is 'conceptually' const for the system it's locking.

    Be wary of the time to spin up threads as this can take quite some time. It's best to spawn threads at startup and have them gracefully wait for work using semaphores. Another approach is to spawn a set of generic worker threads at startup and use them for appropriate tasks (a thread pool approach). The std transform class can execute in parallel by request and seems to work very well. However, I'm skeptical of the efficiency of doing this due to the overhead of using threads for such typically small tasks.

    The 'Parallel Stacks' debugging window in Visual Studio has been invaluable in finding thread locks.

    The TQueue container in Unreal Engine 4 is naturally thread safe; this has been a boon when passing messages from one thread to another. The downside of this is there is no way to know the size of queue without tracking the number of elements added and removed separately.

  • Multithreading and memory allocations.

    One of the major problems with multithreading is channeling many threads through a single function that requires a thread lock, and this is often the case with memory allocation and deallocation. This can be especially problematic with extensive STL usage as the containers do a lot of memory manipulations under the hood. If you're getting slower performance running in single threaded mode than in multithreaded mode, this is likely the cause. The fix for this is to replace the allocator with one that mitigates this problem as much as possible, and the latest and greatest of these at the time of writing seem to be mimalloc and tbbmalloc (UE4 can use either). How they mitigate this problem is beyond the scope of this document, but your favorite search engine will help.

    mimalloc
    This is the simplest malloc replacement that avoids the problems mentioned above using a single lib file.

    Threading Building Blocks. (This includes TBBMalloc)
    This is a suite of libraries that can be use to generate highly performant code on Intel processors. It includes an allocator that avoids the problems mentioned above that works with all processors. However, it does require using a dll (for non-malloc reasons), which adds a small amount of complexity to any build processes or source control you have.

    In the old days all memory management was handled by the programmer which led to less safe code, but the programmer knew exactly what was going on. With the advent of STL, the vast majority of allocations are handled magically in libraries so the programmer has no direct knowledge of the dragons he is unleashing. This is more problematic as many younger programmers seem regard memory allocations as almost free - which is the opposite of the case.

  • Fix warnings.

    Always try to use the highest warning setting, and fix all warnings. Sometimes this is impossible, so the correct approach is to disable the warning with a #pragma, and add a comment that states what the warning is, why it had to be disabled, and what version of the tools it applies to. If it is specific to a file, just disable for that file.

    It is good practice to revisit the list of disabled warnings periodically or with each new toolchain release. Sometimes a warning that made sense a couple of years ago no longer applies, or was hiding a bug that someone spent a couple of days tracking down.

    If multiple tool chains are used (such MSVC and clang), make sure all the warnings match in both. As most programmers will only develop in one or the other, they will only find out about a warning they checked in when a co-worker or a build system send them an email. This is a waste of time for all concerned.

    The latest Visual Studio has the ability to run code analysis on managed code, and this is very useful to make sure all the common faults are dealt with (the proper use of Dispose() being a prime example). It is very fussy, so it's best to start with complete code analysis, and then enable or disable each warning on its merits.

  • Always use strong versioning for binaries.

    This can take many forms, but use one version throughout the entire system. Propagating the changelist number the binary was built from is a good example. Encoding this as part of the four digit version number is another way of making this easy to find. Having a small text file as part of all builds is very useful for QA to quickly locate build versions without having to run the game (which can sometimes be a long process).

    A stale version number is much better than no version number. Programmers who sync directly to source control may get the version of the previous build, but at least they know which previous build, and can use that information for diffing purposes.

    There is a trick in C# where you can define all the common version attributes in a common AssemblyInfo source file, then add this to each of your projects as a link. The common file has the company and copyright info, as well as a single version number to be used by all your projects in the solution. This can be updated by your build process. The project specific AssemblyInfo just contains the project specific information, such as the title and description.

  • Use UTF8 for text files - and understand the format differences.

    There are many different text formats which I define the common ones as below (these are not strict definitions per se, but sufficient for this document)

    • ASCII (1 byte per character. Character codes 32 to 127)
    • ANSI (1 byte per character. ASCII and a code page of codes from 128 to 255)
    • UTF8 (a superset of ASCII, also known as multibyte character encoding, but different from Microsoft's MBCS.)
    • UTF16 (typically 2 bytes per character from the Base Multilingual Page, but can also be 4 bytes per character)
    • UTF32 (4 bytes per character)

    ASCII cannot be used for any accented characters or Asian glyphs used in foreign languages as there is no way to represent them.

    ANSI uses different code pages to add the accented characters, which varies depending on which accented characters are needed. This means character code x will represent something different depending on the code page selected, and make international development troublesome. It cannot be used to represent Asian glyphs as there are too many.

    Typically, two bytes per character is sufficient for the vast majority of practical purposes, and UTF16 represents this adequately. However, UTF16 is almost never parsed correctly to properly handle 4 byte characters (which is part of the spec). It is also wasteful on memory as coding languages are based in English, and the format is difficult to automatically detect if there is no byte order marker (BOM).

    UTF-32 takes up twice the memory of UTF-16 for minimal benefit.

    ASCII is the only absolutely safe option to work in all cases, however, if non-ASCII characters are required, UTF-8 is the best character encoding scheme.

    UTF-8 is a superset of ASCII, so if no multibyte characters are used, it takes up no more memory. As it is parsed sequentially, there are no problems caused by little and big endian architectures. It supports every character (all 1,112,064 of them) with up to four bytes. Here is a much more thorough overview on Wikipedia: UTF-8

    As memory bandwidth is very much a performance limiting factor in modern machines, efficiently storing strings can be helpful.

    Apparently, I'm not alone in coming to this conclusion; UTF-8 Everywhere goes into even more detail as to the reasons why. It also details best practices for writing code that will help push the C++ and STL specifications in the right direction.

  • Encryption types.

    Use AES (a.k.a. Rijndael) for symmetrical encryption to secure the privacy of data - it is a performant standard used world wide. It the successor to DES.

    For checksumming, use SHA256 or SHA512. SHA and MD5 have theoretical weaknesses that could be exploited, and are being deprecated by Microsoft.

    The 7-Zip library has excellent source code that allow you to write your own implementations.

    Microsoft go into this in more detail here.

    With the advent of quantum computing, the above encryption and checksumming may become worthless. Keep an eye out for quantum computing safe versions.

  • Source control.

    All source should be revision controlled, and there are several systems out there to handle this. If you store binaries or assets in the system, the only practical solution is to use Perforce. This is free for up to twenty users, and has excellent plug-in support, community forums, and tools.

    If the source control system does not store large binary assets, then other solutions such as Team Foundation Server, Git or Mercurial could be used. I've always used the source control system to store, backup, and deliver binary assets, so my experience with these is limited.

  • Be descriptive in error and warning messages.

    Mention the name of the field, class, error code or asset that failed and the reason why. For example, this is a bad error message: 'Failed to load movie', whereas this is a better one: 'Movie Subsystem: Failed to load movie "LoadingScreen" with error code 3 - the asset could not be found. Program will continue, but movie will not be shown.'

    Also, most users of the program will not know if an error is fatal or not. If a user sees an error message, he will wonder if that is related to any problems he is having.

    A prime example of an atrocious error message is 'Permission denied' on the Mac - this could mean literally anything. Mentioning the permission that failed on which file would make this far more useful.

  • Use a standard format for data in text files.

    When storing information in a text file, use a format that is well defined and supported; Xml and Json fit both categories well. Criteria to look for are:

    • Can the format be validated without knowing the original class structure?
    • Are there third party libraries that can trivially serialize this format?
    • Does the format support substructures?
    • Does the format support containers such as lists and dictionaries?
    • Will the format handle and preserve escape sequences properly?
    • If there is a syntax error in the data, will the parser recover automatically?
    • Can the file be resaved without changing the data?

    An example of a bad format is the 'ini' file format; it's reasonable to use it for trivial key equals value style data, but does not handle structures or containers very well at all. It is also very bad at handling escaped strings which can be lost in a resave.

  • Store times as UTC.

    Store times as UTC (also known as GMT or Zulu time), and convert to and from the local time when displaying to the user. This avoids potential issues with calculating the time difference between a user and a server in a different time zone. With the changes to the clock in spring and autumn, this can be a non-trivial calculation to do properly.

    Use a standard date format. I prefer the Perforce standard of most to least significant e.g. the first of February 2015 at four thirty eight pm would be written as '2015/02/01 16:38:00'. This approach allows strings of this ilk to be easily sorted by time. However, databases sometimes require the US standard, which would write the same time as '02/01/2015 16:38:00'. This can be problematic as a client computer outside the US would treat that date as the second of January. The conclusion here is to always specify the format string when making strings for a date, and do not rely on the system to do it as the locale on the local system could generate unexpected results.

    The two standard ways of storing a generic time are:

    • A 32 bit signed integer of the number of seconds since midnight 1st January 1970 - this is called Unix time, POSIX time, or Epoch Time. This format can not represent any time past 03:14:07 on Tuesday, 19 January 2038.
    • A 64 bit value representing the number of 100 nanosecond ticks since midnight 1st January 0001 A.D. This is designed to represent values ranging from midnight 1st January, 0001 C.E. through 11:59:59 P.M. 31st December 9999 C.E. in the Gregorian calendar

  • Use macros in project definitions.

    Maintaining C++ project files can be very tedious. To minimize this, use macros to define file names and folders as much as possible to allow setting of multiple configurations at once.

    Setting the OutputDir to '$(SolutionDir)$(Platform)\$(Configuration)\' will mean that new platforms can be added and a standard folder structure will be maintained.

  • Use a standard default folder.

    Generally, all applications will load files using a path of some sort. There can be confusion as to where the path is relative to; for example, relative to the current working directory? Relative to the location of the running executable? The convention of relative to the running executable is a good one, but I prefer relative to the root of the project. This would be defined as the lowest folder where all files can be referenced without a parent ('..') reference. For Unreal Engine 3, this would be base of the branch e.g. 'UE3'. For Unreal Engine 4, this would be the root of the project e.g. 'UE4' and 'UE4\Samples\Games\'. This is so that paths can be easily constructed programmatically (Branch/game/asset path) without any relative path suppression. The path 'Game/Content/asset.dat' is the same as 'Game/Content/../Content/asset.dat', but the string is not, which is added complexity for no reason. Also, when constructing command lines there is never any need to guess the number of '../'s required.

  • Use '/' (as opposed to '\') in paths and use consistent casing for file names.

    With some file systems being partially case sensitive, use the same slashing and casing for all includes and source files. As the vast majority of compilation will be done on case insensitive Windows systems, the compile tool should check for this if at all possible.

    In C#, using the 'FileInfo' class can be used to standardize the file name (and also collapse any relative pathing). Use the 'FileInfo' class over the 'File' class as it is significantly quicker and more usable.

  • Always use an appropriate extension when naming files.

    As most programs use the extension as the initial filter as to which files it can load, using a sensible extension helps users decide what can and can not be loaded. For example, '.cs' refers to C# files which are known to be some form of text. The extension '.bin' is a little too generic, but demonstrates how it needs a custom tool to load it properly. Files with no extension are the most annoying as they can't be associated with an application by default; double clicking on the file 'makefile' will always ask for the executable to load it even though it's been opened up numerous times by notepad.

  • Unit Tests.

    At Primal Space Systems, we use the unit testing features of Visual Studio to run a plethora of regression tests before each checkin. This has been proven to be invaluable, and I would recommend checking out the system. It's another project that is interrogated on link for tests to run; and these can be categorized with several different criteria. It does not require setting up Team Foundation Server.

Deprecated

These are some old conventions and styles I used to use, but no longer do.

  • Parentheses around return variables.

    I used to put parentheses around return parameters to be consistent with other C++ keywords (if, switch, etc.) but this standard wasn't used at Epic so I trained myself out of it.

    
        return( true );
                

  • void in function declarations.

    I used to put void when no parameters were passed in function declarations, e.g.

    
        void Function( void );
                

    This made is easier to differentiate declarations vs. function calls in the find in files results. It was also required in QuakeC. However, C# doesn't allow this, so I no longer do this in C++ either.

Formatting tools

The rules for formatting code can be set up the Tools Menu -> Options Dialog -> Test Editor -> C# or C++ -> Formatting in Visual Studio.

EditorConfig is an attempt to standardize code formatting options. Resharper C++ also has some very useful formatting extensions. I personally think this is the way forward.

Further reading

How To Write Unmaintainable Code is a great article on how to obfuscate code. It mentions many items that are the exact opposite of what it written above, and can be used as guide as to what not to do.

The Codeless Code comic strip is an entertaining take on software development. Their take on threading (here) is a fine example.

An interesting debate on the use of the auto keyword here.

C# is a continually evolving language, and the latest features can be seen here.

Someone else's coding standard. We agree on some items, disagree on others - I Like Whitespace.

Another coding standard. They prioritize some other aspects - C++ Programming Style Guidelines. Again, these are all good points, I just don't agree with all of them.

The Google coding standard - Google C++ Style Guide. I do not agree with many of these points.