A recent study published in the scientific journal Genome Biology describes a concerning error rate in genetic studies caused by default settings in the commonly used spreadsheet software Microsoft Excel. The default cell formatting option in Excel will inadvertently convert gene symbols to dates and floating-point numbers. Examples include:
- gene symbol SEPT2 (Septin 2) is converted by default to the date ‘2-Sep’ or ‘2006/09/02’
- gene symbol MARCH1 [Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase] is converted by default to the date ‘1-Mar’
- RIKEN identifiers were automatically converted to floating point numbers, such as accession ‘2310009E13’ to ‘2.31E+13’
Researchers Mark Ziemann, Yotam Eren and Assam El-Osta reviewed the supplementary files of genetics studies from 18 journals published between 2005 and 2015, finding that approximately 20% contained gene lists with such inappropriate gene name conversions. This occurs at increasing rates despite the fact that the problem was originally publicly reported in 2004. In other words, the scientific community has failed to learn from its mistakes in this regard. Linear-regression estimates show gene name errors in supplementary files have increased at an annual rate of 15 % over the past five years, growing much faster than the 3.8% increase in published papers per year. This comparative growth may occur because such lists are frequently re-used.
The solution requires foresight in changing the default options on each newly created spreadsheet rather than retroactive adjustment. Neither Excel, nor other spreadsheet software such as LibreOffice Calc or Apache OpenOffice Calc, allow one to reset the default to eliminate conversion. Once converted, a subsequent formatting change will not return the entry to its original text, as the underlying stored data has been permanently altered. Instead one can only troubleshoot by sorting the data, thereby bringing the re-formatted dates to the top of the column for re-entry. Surprisingly, the only spreadsheet software that does not suffer from this issue is Google Sheets, a free program.