Pan genomics as a step in the right direction


A Micro Essay

Dr Callum goodale-hall 18/1/25
An aligned selection of reads performed by CACTUS

The history of genetics is paved with racism. Particularly in the 19th and 20th centuries, the study of genetics and inheritance was weaponised by white supremacists to further their ideologies and used as an excuse to persecute non-whites.

Geneticists must work to overcome the racist roots of their science and continue to be aware of the ethically abhorrent views of many of their predecessors. This is one of the areas in which the developing field of pan genomics can offer us a chance to do better science, and to follow good principles of equality and inclusion.  

The use of a reference genome has always been a sticking plaster used to mask complexity. Reductive by nature, the use of a reference genome implies a “normal” human genome and defines any differences as deviances. Hg38, has been the standard human reference genome since 2013. Its architects sought to make some effort to make the genome less Eurocentric by choosing an individual of mixed African and European heritage as the main scaffold of the assembly. They then chose the allele from the scaffold which matched the allele most frequently found from a small panel of other individuals. This genome is the primary assembly to which most human genomic work has been aligned for over a decade.

In recent years, an effort to create a more complete human genome has resulted in CHM13, or the telomere to telomere (T2T) genome. Unfortunately, this also falls into the same trap, as it was not only created from a single individual, but from only one haplotype on that individual. This further limits broader applicability.

These genomic references fail to explain the complexity of the human genome at a population level and, as a result of technological limitations, define a default normal of either complete or mixed European ancestry. While likely unintentional, this oversight still marginalizes many global populations.

Having explored the historical context, we come to a modern solution. Pan genomics groups seek to create a genomic reference which contains the many complexities of human populations. They do this by defining the human genome not as a single string, but as a mathematical graph which defines genetic variations as nodes, and links them with edges. Groups working on these graphs have been able to compress up to 100 genomes together into a single graph which is able to describe many further genomes by taking a given path through the graph. The groups have chosen to create the graphs from populations all across the world including indigenous and under-represented populations, with peoples from all continents represented in one single mathematical object. This approach can be used to more accurately align sequencing reads from an individual with roots from anywhere in the world. In the future, it is likely that the pan-genome can be expanded further, so that any human genome can be described using a path through the graph.

Increased accuracy in sequencing reads will be more important as we continue the path towards hyper-personalised medicine, and specialised genetic medicine. Pan genomics is the method by which we can be certain that this medicine works well for everyone, regardless of whether their genetic background arbitrarily matches that of the reference genome.

A graph of 99 human genomes takes up less space than the 99 separate genomes. This is because many of the nodes are shared between many individuals, these highly conserved regions remind us that we are more the same than different. The poetry of representing all genomes with one object, connected at billions of points, and still able to showcase all of the diversity of the human race speaks for itself.

Visualisation of a graph of a section of chromosome 8.

Leave a Reply

Your email address will not be published. Required fields are marked *