U.S. flag An official website of the United States government

On Oct. 1, 2024, the FDA began implementing a reorganization impacting many parts of the agency. We are in the process of updating FDA.gov content to reflect these changes.

  1. Home
  2. Food
  3. Science & Research (Food)
  4. Microbiology Research (Food)
  5. Whole Genome Sequencing (WGS) Program
  6. Examples of How FDA Has Used Whole Genome Sequencing of Foodborne Pathogens For Regulatory Purposes
  1. Whole Genome Sequencing (WGS) Program

Examples of How FDA Has Used Whole Genome Sequencing of Foodborne Pathogens For Regulatory Purposes

Back to Whole Genome Sequencing Program (WGS) Home Page

Genomic data from foodborne pathogens, by itself and in combination with other information, is a robust resource that can help public health officials identify and understand the source of foodborne illness outbreaks. It can be used: to determine which illnesses are part of an outbreak and which are not; to determine which ingredient in a multi-ingredient food is responsible for an outbreak; to identify geographic regions from which a contaminated ingredient may have originated; to differentiate sources of contamination, even within the same outbreak; to link illnesses to a processing facility even before the food product vector has been identified; to link small numbers of illnesses that otherwise might not have been identified as common outbreak; and to identify unlikely routes of contamination.

Below are some examples of how FDA has used whole genome sequencing of foodborne pathogens for regulatory purposes.

To differentiate sources of contamination, even within the same outbreak

In 2010 a nationwide salmonellosis outbreak, resulting in over 1,900 illnesses, was attributed to Salmonella Enteritidis (SE) in eggs. Whole genome sequencing played an important role in: 1) determining the size of the outbreak; 2) confirming the sources of the eggs implicated in the outbreak; and 3) revealing which sources were responsible for which illnesses. Here’s how.

In July 2010 CDC identified a nationwide sustained increase in the number of reported salmonellosis illnesses that shared a common PFGE subtyping pattern known as pattern 4 (JEGX01.004). However, because PFGE subtyping is unable to distinguish the many strains of Salmonella Enteritidis that share PFGE pattern 4 (40% of SE share this pattern), public health officials were unable to determine if the increase in illnesses was due to a single outbreak or multiple outbreaks. Using traditional epidemiology methods public health officials determined that shell eggs were a likely source of many of the illnesses. A traceback investigation to determine a common source for the eggs in question pointed to two egg producers in the Midwest United States. A number of environmental samples collected from the two egg producers tested positive for SE and shared a PFGE pattern that was indistinguishable from the clinical samples collected from the individuals who became ill. This was a strong indicator that there was a link between the farms and the illnesses, but with PFGE pattern 4 containing many different strains of SE it wasn’t possible to know for sure if the SE found on the egg farms was the same strain responsible for the illnesses. All that was known was that they shared a common PFGE pattern that represented many strains of SE.

Enter whole genome sequencing. When whole genome sequencing was performed on clinical samples and environmental samples, a much clearer picture emerged. The genomic sequences for the SE found at the two egg production sites were very closely related but distinguishable. The sequences from both of the sites were also very closely related to the sequences from the clinical samples -- so closely related that they were both deemed to match the outbreak. Thus with the increased subtyping resolution that whole genome sequencing provided, public health officials had the ability to determine which illnesses were part of the outbreak. Because there was a slight difference in the genomic information from the two egg production sites investigators could further delineate the specific egg processor to which an individual illness was linked.

Read FDA’s article in PLOS about Salmonella Enteritidis found in eggs

To determine which ingredient in a multi-ingredient food harbored the pathogen associated with an illness outbreak

In 2009-2010, nearly 300 people in 44 states and the District of Columbia became ill from ingesting the Salmonella Montevideo pathogen. Standard epidemiology tools, such as food consumption questionnaires, suggested spiced salami was the culprit. However, conventional lab methods could not differentiate between the Salmonella Montevideo found in spiced salami produced at a New England processing facility during this outbreak and the Salmonella Montevideo that was isolated from pistachios held in the same facility during an outbreak some years earlier. Which food had really caused this outbreak? Was it the salami, the red and black pepper rub used as a spice coating on the salami, or was it tied back to the pistachios that had been the problem in the past?

To find out scientists compared the genomic sequences of the Salmonella Montevideo collected from patients who became ill after eating the spiced meat product to the genomic sequences of the Salmonella Montevideo isolated 1) from the finished product, 2) from the red and black pepper rub, 3) from an environmental sample from the processing facility, and 4) from the pistachios collected years earlier. The genomic information revealed that the illnesses were linked to the red and black pepper rub used as the spice rub at the New England facility.

Read FDA’s article in The New England Journal of Medicine (March 2011) about how whole genome sequencing helped resolve this salmonellosis outbreak

To narrow the search for the source of a contaminated ingredient, even when the source is halfway around the world

In 2012, 425 individuals in the U.S. became sick from ingesting food that contained either Salmonella Bareilly or Salmonella Nchanga. Through traditional epidemiology methods the illnesses were ultimately linked to a frozen raw yellowfin tuna product known as Nakaochi Scrape, which had been imported from India. (Nakaochi Scrape is tuna backmeat that is scraped from the bones of tuna and may be used in sushi, sashimi, ceviche, and other similar dishes.)

As part of the outbreak investigation FDA performed whole genome sequencing on Salmonella isolated from product samples and from clinical samples to determine their DNA makeup. This helped to more accurately determine which illnesses were part of the outbreak and which illnesses were similar but unrelated.

However FDA did something else. It performed whole genome sequencing on about a dozen Salmonella Bareilly isolates it had in its freezers from previous Salmonella Bareilly food contamination events. What FDA found was that the Salmonella Bareilly DNA for the samples tied to the 2012 outbreak was very similar to the Salmonella Bareilly DNA isolated from shrimp that came from a processing plant in southwest India several years earlier. In fact, the plant that processed the Nakaochi Scrape was only about 5 miles away from the plant that processed the shrimp. This was a significant finding because it was an indicator that the paring of genomic information with geographic information might have the potential to be a powerful tool for traceback investigations. If investigators could associate a pathogen’s genomic information with a certain geographic area, they could use that information to focus their resources on ingredients originating from the geographic area where the particular pathogen strain responsible for an illness has historically been known to be present. It was this event that provided the impetus for creating the GenomeTrakr Network and the increased use of genomic information in foodborne outbreak investigations.

As a clue to the possible source of illnesses -- even before a food has been associated with illnesses by traditional epidemiological methods

In 2014 FDA conducted environmental sampling at almond and peanut butter facilities as part of an assignment designed to gather baseline data on the presence of foodborne pathogens in nut butter processing facilities. Samples from one of the facilities tested positive for Salmonella Braenderup. PFGE analysis was performed and the PFGE patterns were found to be indistinguishable from the PFGE pattern of a small number of clinical isolates collected from individuals who had become ill from salmonellosis in the previous months, but for which no common link between the illnesses had been established. However, PFGE analysis is limited in its ability to differentiate strains of Salmonella so whole genome sequencing was performed on both the isolates from the environmental samples and the isolates from the clinical samples to determine their relatedness. The pathogens from the samples were an extremely close match, differing only in the amount they would be expected to diverge (2 base pairs) due to pathogen replication during the time they were in a lab. They isolates were, in essence, the same.

In this instance whole genome sequencing achieved multiple foodborne illness investigation tasks.

1) It confirmed that the clinical isolates were related to each other and that the individuals who became sick represented a cluster of illnesses caused by the same pathogenic organism. Whole genome sequencing was later used both to include additional illnesses with the cluster and to exclude others, based on the pathogen strain’s DNA relatedness.

2) Whole genome sequencing showed that the strain of Salmonella Braenderup that caused the illnesses was virtually identical to the strain of Salmonella Braenderup isolated from the processing facility. Thus a connection between the individuals who became ill and a processing plant was confirmed, even though to that point in time traditional epidemiology methods had not revealed a common food consumed by the individuals who had become ill. Subsequent epidemiologic analysis also pointed to the relationship between the illnesses and the pathogen found in the processing plant. This is significant because the high degree of certainty in determining the relatedness of pathogens may provide important traceback investigation clues, even in advance of epidemiologic evidence.

By utilizing whole genome sequencing to definitively identify pathogens the causes of sporadic foodborne illnesses may become known in the future.

To determine unexpected vectors for food contamination

In 2008 a dry cereal manufacturer in the Midwest United States was linked to a foodborne illness outbreak that resulted in 33 consumers becoming sick. When investigators tested samples of the cereal they found that it contained Salmonella. When the samples were subtyped using PFGE analysis three different PFGE patterns were revealed, suggesting three different sources of contamination. However, additional analysis using whole genome sequencing revealed that the samples had a recently common lineage and were in fact the same strain of Salmonella Agona -- which suggested a common source. Investigators then found the same strain of Salmonella Agona in an environmental sample collected from the cereal processing plant. It was in the processing plant and in the cereal.

The next step was to try and determine how the contamination took place. As it turns out CDC had a clinical isolate of Salmonella Agona that was collected in 1998 when a salmonellosis outbreak linked to the same cereal manufacturer occurred. When that clinical isolate was sequenced it was revealed to be virtually identical to the strain responsible for causing the illnesses 10 years later. But how could a strain match so closely after 10 years? Irrespective of whether it was reintroduced from an outside source or had been present in the processing environment for 10 years one would expect greater genetic diversity to have arisen over the span of a decade. Investigators think they know how.

The 1998 outbreak was linked to contaminated water used in the cereal processing plant. That same year a renovation was performed at the processing plant and the same contaminated water was used to mix the mortar used for the renovation. The Salmonella Agona that was in the water now lay dormant in the mortar. In 2008 another renovation was performed at the processing plant and some of the mortar from the 1998 renovation was disrupted causing the dormant Salmonella Agona bacteria to be released into the environment and to emerge from its dormant state. It began to multiply and colonize in the processing plant, making its way into the cereal product and causing illnesses. It was the ability of whole genome sequencing to reveal how close the genetic makeup of the 1998 and 2008 Salmonella Agona isolate samples were that led to uncovering the atypical vector of contamination. Without the detailed subtyping resolution offered by whole genome sequencing, this connection may not have been made.

Back to Top