|Posted by JJ The Psychotherapist on June 23, 2021 at 2:10 PM|
According to a preprint study, Chinese researchers appear to have deleted important data from a global database run by the National Institutes of Health that could shed light on the origins of the COVID-19 pandemic.
The data was recovered from cloud storage by an American scientist, who published his findings on Tuesday. "Recovery of deleted deep sequencing data sheds more light on the early Wuhan SARS-CoV-2 epidemic," according to the paper, suggests that early virus samples from the Wuhan seafood market, which have been the focus of most studies on the pandemic's origins until now, "are not fully representative of the viruses actually present in Wuhan at that time."
The paper has not yet been peer-reviewed, so its findings should not be taken as definitive. According to scientists who reviewed the paper, the recovered virus samples do not support either the "lab leak" hypothesis or the "natural origins" hypothesis of SARS-CoV-2 origins. However, these researchers believe the virus spread in Wuhan earlier than the Chinese government claimed, and the paper's author, Dr. Jesse Bloom, believes his findings should raise doubts about China's willingness to fully share all relevant COVID-19 data.
Bloom, an influenza virus expert at the Fred Hutchinson Cancer Research Center, also believes that his research should give scientists hope that they will be able to learn more about SARS-early CoV-2's spread without the need for an international investigation.
Bloom read a paper in the course of his SARS-CoV-2 research that analyzed data from a Wuhan University project that sequenced 45 positive coronavirus cases between January and early February 2020. The Chinese study was peer-reviewed and published in June 2020, and it developed an improved technique for testing for and diagnosing COVID-19 cases.
The Chinese researchers' SARS-CoV-2 sequences were uploaded to the National Institutes of Health's Sequence Read Archive (SRA), a database that stores what are essentially maps of how viruses are built. These sequences can aid scientists in understanding how a virus emerged and evolved over time, and this research could lead to information that can help prevent the next pandemic.
When Bloom went to the SRA to look at the Chinese sequences, however, he discovered that the data had been erased. "is designed as a permanent archive of deep sequencing data." he wrote in his paper. The only way data can be deleted is if the original researchers send an email requesting it, explaining why they want it deleted, and having that request approved by SRA staff.
The National Institutes of Health "reviewed the submitting investigator's request to withdraw the data" in June 2020, according to an NIH spokesperson, and then removed it.
"The requestor indicated the sequence information had been updated, was being submitted to another database, and wanted the data removed from SRA to avoid version control issues," the spokesperson explained. "Submitting investigators hold the rights to their data and can request withdrawal of the data."
Bloom attempted to contact the Wuhan University researchers to inquire about their request to delete the data, but received no response. "there is no plausible scientific reason for the deletion" he wrote in his paper, "it therefore seems likely the sequences were deleted to obscure their existence."
He was able to recover some of the data from the Google Cloud, and he was able to obtain 34 early positive COVID-19 samples, from which he was able to reconstruct partial viral sequences from 13 of them.
Bloom explained why these sequences are important for understanding the virus's origins in a Twitter thread about his paper.
"Although events that led to emergence of #SARSCoV2 in Wuhan are unclear (zoonosis vs lab accident), everyone agrees deep ancestors are coronaviruses from bats," Bloom said.
"As a result, we expect the first #SARSCoV2 sequences to be more similar to bat coronaviruses, with #SARSCoV2 becoming more divergent from these ancestors as it evolved." That, however, is not the case!" He went on to say more.
"Instead, early Huanan Seafood Market #SARSCoV2 viruses are more different from bat coronaviruses than #SARSCoV2 viruses collected later in China and even other countries."
These findings suggest that the first virus samples from Huanan Seafood Market, which scientists originally suspected as the source of the viral outbreak, were not the virus's earliest evolutions. That means SARS-CoV-2 was circulating before China reported its first confirmed COVID-19 case on December 8, 2019, and did not necessarily come from the wet market.
Professor Rasmus Neilsen, a genomics expert at the University of California, Berkeley, said the findings "are the most important data that we have received regarding the origins of Covid-19 for more than a year."
Bloom claims that his research has a number of significant implications.
"First, [the] fact this dataset was deleted should make us skeptical that all other relevant early Wuhan sequences have been shared," he wrote on Twitter, noting that China had ordered many labs to destroy early virus samples.
"Sequence sharing could be further limited by fact that scientists in China are under an order from the State Council requiring central approval of all publications," he added.
The second major implication of this research is that "even if efforts for more on-the-ground investigations are stymied," "it may be possible to obtain additional information about early spread of #SARSCoV2 in Wuhan."
"It should be immediately possible for the NIH to determine the date and purported reason for deletion of the data set analyzed here," Bloom wrote in his paper, "because the only way sequences can be deleted from the SRA is by e-mail request to SRA staff." He also suggested that SRA email records be examined to see if any more requests to remove early SARS-CoV-2 sequences from the database had been made.
"Importantly, SRA deletions do not imply any malfeasance: there are legitimate reasons for removing sequencing runs, and the SRA houses >13-million runs making it infeasible for its staff to validate the rationale for all requests," Bloom explained. "However, the current study suggests that, at least in one case, science's trusting structures were abused to obfuscate sequences relevant to SARS-early CoV-2's spread in Wuhan."
"A careful re-evaluation of other archived forms of scientific communication, reporting, and data could shed additional light on the early emergence of the virus."