Quite of news revealed that many of software attack was caused by submitted the password to the SCM(source code management), such as Git, SVN, etc. When you search in GitHub, you still can found some of commits that pushed the plain text of password to the web sites. So how do we store sensitive data of your application, for example configuration file which contains several credentials. Such as database connection information, third parties assigned tokens, email address, etc?
List of methods that storing the configuration data
Anyway, we need those credentials sooner or later, at some point. Before we come up with idea that to avoid those credentials be leaked, let's listing normally where we can stored those credentials:
- text file, plain text or encrypted text.
- external storage, such as web services or databases.
- environment variable provided by hosting.
- passed as command parameters
- asking for password if need
Each of above method has pros and cons. let's discuss it one by one. First of all, do NOT directly save your credential info to the source code.
The easy way to store the configuration data was saving it to text file in various of format. For example, XML, JSON, TOML, INI, to name a few. You have to exclude these configuration file to avoid it being accidentally committed. for the
git, you can add these files to
In the practice, you add the sample file to the SCM as documentation to your configuration file, it will use the real file when application looking for the configuration file. for exmaple:
"host": "ipv4 address",
"port": "ranged from 4000-5000"
Named the file to
config.json.sample and committed to SCM, in runtime, you get the real configuration value by read the file
The encrypted file is not good option, because it not address the issue, you still have to decrypt somewhere. so it leads to the root question that where to put those keys be used by decryption process?
If you have bunch of configuration values, it's practical to store them to database and add the interface that your application can access to. In essence, it sort of trust dependencies. Such as the interface(via web service, directly database connection) that only available to you. only specific settings can access that resources.
All you need to do was setup the environment. you have update these setup when environment changed. No changes related to the source code, in program point of view, it just there. You do not need care about them, leave it to infrastructure team or DevOps team。
It common to find in source code that reading the credential information from environment variables. The common pattern was reading these configuration values during application starting process and terminate the application if it's not present.
Nothing changes to your source code, you just reading the environment variables. you can commit these piece of code that reading the environment variable. The problems turns to how to initialize the environment variables.
In my experience, for example, you manage the processes by supervisord, you can setup the environment variables in
supervisord.conf. So that these will be available during application lifecycle.
Command line parameter
If there is very limited values, such as username and password, you can pass them as command line parameter. A caution was these values may stored in shell history, or even you can inspect them by checking the them with
/proc or the tool like 'process explorer' in Windows.
Asking for password interactively
If you develop the application that interactive with user, you can ask for the password on demand.
In summary, you cannot directly embed the sensitive data directly in source code. leave the file that contains sensitive data out of SCM.