It's not a date
Erin Kernohan-Berning
8/6/20254 min read
If you’ve ever had to use Microsoft Excel, you’ve probably entered a number into a cell and had the program change it into a date. Or you’ve tried to put a date into a cell and Excel turned it into a different date. There are ways to prevent this annoyance if you dig into Excel’s settings a little, but most people open up a piece of software expecting it to just… work.
The automatic conversion of numbers to dates in Excel has been causing problems in the field of genomics. When genes are discovered, they are given an alphanumeric symbol often referring to whatever protein the gene encodes. Genes that encode septins, for example, might be named SEPT1, SEPT2, SEPT3, and so on. That is, until they are loaded into an Excel spreadsheet where SEPT1 becomes 01-Sep, the short date format for September 1.
In 2004, the National Institute of Health raised the alarm that using Excel in bioinformatics research could result in errors because of this date conversion. In 2020, the HUGO Gene Nomenclature Committee (HGNC) introduced new naming conventions including renaming the SEPT genes to SEPTIN to be more resilient to Excel’s automatic data conversion features. Since the new naming conventions, 27 genes have been renamed.
According to an article on Retraction Watch by Mandhri Abeysooriya and Mark Ziemann, between 2014 and 2020 31% of 11,000 articles published had Excel related errors in their supplemental data. The issue continues to be so prevalent that Ziemann maintains an online dashboard with monthly reports on how many Excel gene lists have been found to contain date conversion errors, as well as errors if the computer was set to a non-English language.
Reading about this problem reminded me of a story I heard about in university linking Microsoft Power Point to the Space Shuttle Columbia disaster in 2003. The Columbia Accident Investigation Board (CAIB) undertook a 6-month examination of the tragedy that led to the deaths of seven astronauts and the destruction of NASA’s flagship shuttle. The cause of the accident was a large piece of foam that had broken off the external fuel tank during launch, striking the leading edge of Columbia’s left wing and damaging the thermal tile that would protect the shuttle from the heat and pressure associated with re-entry.
The investigators released a 248-page report that covered everything from material failures to organizational failures that led to the accident. On page 191 of the CAIB report is a scathing analysis by Dr. Edward Tufte (Yale University) of the Power Point slides used by the Debris Assessment Team to communicate the potential extent of the damage to the shuttle. The reliance on Power Point was singled out by the CAIB as contributing to the miscommunication of the true risk of the foam strike to the Columbia crew. Power Point alone obviously didn’t determine the outcome of the disaster more than the foam, or the parade of decisions that resulted in allowing a fatally damaged shuttle to re-enter Earth’s atmosphere. But it certainly became a bullet point in that piece of history.
Some of the genes that have been renamed by the HGNC serve important roles in research. Errors in SEPTIN1 have been implicated in Alzheimer’s disease. MARCHF1 (formerly MARCH1) encodes a protein in the body that can also be a tumour promoter, so is a possible therapeutic target for cancer treatment. So, Excel accidentally destroying research data about these genes is not a small thing.
In 2023, nearly 20 years after researchers discovered the gene to date conversion problem, and 3 years after genes started to be renamed because of that problem, Microsoft introduced a way to turn off their automatic date conversions. Researchers have also started to create tools to help update old gene datasets at risk of automatic conversion errors.
One lesson we can take from this is the importance of knowing the ins and outs of the tools we are using and understanding how to use them well. This becomes even more important as tools “powered by AI” become increasingly ubiquitous. As humans, we need to be in the driver’s seat and to not just accept the default of what our software spits out at us.
Another lesson is that, in the words of Cory Doctorow, “how a system fails is every bit as important as how it works.” The consequence of bad design in Power Point shouldn’t be fatal, and the consequence of bad design in Excel shouldn’t be to render useless a bunch of research that could have led to better cancer treatment. Inasmuch as can be done, software companies do need to understand how their software is being used, and how it failing may impact their users.
Learn more
Guest post: Genomics has a spreadsheet problem 2023. Mandhri Abeysooriya and Mark Ziemann (Retraction Watch) Last accessed 2025/08/01
Scientists rename human genes to stop Microsoft Excel from misreading them as dates 2020. James Vincent (The Verge) Last accessed 2025/08/01
Gene name errors are widespread in the scientific literature 2016. Mark Ziemann, Yotam Eren and Assam El-Osta (Genome Biology) Last accessed 2025/08/01
Guidelines for Human Gene Nomenclature 2020. Elspeth A. Bruford (Nat Genet) Last accessed 2025/08/01
Columbia Accident Investigation Board Report Volume 1 2003. CAIB (Internet Archive) Last accessed 2025/08/01
Death by PowerPoint: the slide that killed seven people 2019. James Thomas (McDreeamie-Musing) Last accessed 2025/08/01
Dissecting "The Great Power Point Panic of 2003" 2023. Troy Chollar, Nolan Haims, and Sandy Johnson (The Presentation Podcast) Last accessed 2025/08/01
Gene Updater: a web tool that autocorrects and updates for Excel misidentified gene names 2022. Clara W. T. Koh, Justin S. G. Ooi, Gabrielle L. C. Joly and Kuan Rong Chan (Nature) Last accessed 2025/08/01
Microsoft Fixes Excel Feature That Forced Scientists to Rename Human Genes 2023. Dua Rashid (Gizmodo) Last accessed 2025/08/01
Better failure for social media 2022. Cory Doctorow (Pluralistic) Last accessed 2025/08/01
Correction log
Nothing here yet.