GitGuardian warns against shared source code
According to a study by GitGuardian unveiled before the Clusif, many source codes shared on GitHub reveal annoying secrets.
Many developments in companies are based on code shared via code sharing platforms such as GitHub. The natural counterpart of the exploitation of this resource is obviously to share the changes made. But this sharing requires “cleaning” the code of elements constituting company secrets. However, according to a study carried out by GitGuardian and unveiled at a Clusif conference at the end of October, this is not always the case. For this study, the repositories on GitHub were analyzed and what was sometimes found makes one shudder: access codes, API keys, access to databases, private keys, certificates… According to GitGuardian, the speed required of developers to deliver their production would entail these negligence.
These company secrets (access codes, API keys, access to databases, private keys, certificates, etc.), for practical reasons, are also sometimes stored in multiple places and not always under the control enterprises. It can thus be found on drives or personal removable media of developers (Google, Dropbox, external hard drives, etc.) or on storage spaces that are certainly internal (Sharepoint, etc.) but with poorly controlled access. The main problem stems from the fact that the product code often includes the famous hard-coded secrets in the lines of code and not in separate configuration files. Sometimes, it is the configuration files, effectively separated, from logs or metadata that are included in the repository! However, solutions exist to filter these sensitive files during the deposit.
France, fifth source of leaks in the world
GitGuardian gave an example for Uber from a Federal Trade Commission report: In 2016, Uber discovered an AWS S3 key giving access to a hard-coded Uber storage bucket in a public code repository made in 2014 , two years before. Another example was the compromise of access from the Codecov editor’s continuous integration pipeline, making it possible to target 20,000 of the editor’s customers.
To carry out a quantitative study, GitGuardian analyzed one billion repositories on GitHub (at the rate of five million per day) with 250 algorithms for searching for trade secrets. Between 2019 and 2020, the proportion of code including trade secrets increased by 20%. Two million secrets have thus been identified at the rate of 5,000 per day (therefore one deposit out of a thousand analyzed). 85% of leaks are made via deposits made on personal accounts, 15% on business accounts. 27.6% of secrets are Google keys (GCP, Maps, etc.), 15.9% of coding tool data (API, etc.), 15.4% of access to databases or spaces of storage. 27.9% of secrets are present in Python code, 18.8% in JavaScript. If India is the first source country of leaks and the United States third behind Brazil, France is fifth (behind Nigeria, ahead of Russia).