10  The C-Value Paradox

10.1 A Surprising Mystery

10.1.1 What Is the C-Value Paradox?

Here’s a puzzle that confused scientists for years:

You might expect: More complex organisms → More DNA

What actually happens: Complexity and DNA amount don’t match!

The “C-value” means the total amount of DNA in an organism. The “paradox” is that this value doesn’t correspond to organism complexity!

10.1.2 Some Shocking Examples

Let’s compare genome sizes:

Organism Genome Size Complexity
Humans 3.2 billion bp Very complex
Mouse 2.5 billion bp Complex mammal
Chicken 1 billion bp Complex bird
Fruit fly 140 million bp Simple insect
Rice 389 million bp Plant
Onion 16 billion bp Simple plant!
Paris japonica (plant) 150 billion bp Just a flower!
Lungfish 130 billion bp Fish
Amoeba dubia 670 billion bp Single-celled!

Wait, WHAT?!

  • An onion has 5 times more DNA than you!

  • A single-celled amoeba has 200 times more DNA than a human!

  • A lungfish has 40 times more DNA than you!

Are onions more complex than humans? Of course not!

10.2 Why Doesn’t More DNA = More Complex?

10.2.1 Reason 1: Most DNA Doesn’t Code for Proteins

Remember from Chapter 7:

  • Only 1-2% of human DNA codes for proteins

  • The rest is non-coding (regulatory, structural, repetitive, etc.)

Different organisms have different amounts of non-coding DNA:

  • Humans: ~98% non-coding

  • Some plants: 99% non-coding

  • Pufferfish: ~90% non-coding (they have LESS junk!)

Think of it like:

  • Two books can be very different sizes

  • But have the same number of actual words

  • One just has bigger margins and more spacing!

10.2.2 Reason 2: Polyploidy (Extra Chromosome Sets)

Some organisms have multiple complete copies of their genome!

Ploidy levels:

  • Diploid (2n) = 2 sets of chromosomes (like humans)

  • Triploid (3n) = 3 sets

  • Tetraploid (4n) = 4 sets

  • Hexaploid (6n) = 6 sets

  • And so on!

Examples:

  • Wheat: Hexaploid (6 copies!)

  • Strawberries: Octoploid (8 copies!)

  • Goldfish: Can be 100-ploid or more!

Having extra sets doesn’t make you more complex—it’s like having 4 copies of the same book instead of 1!

10.2.3 Reason 3: Transposable Elements and Repetitive Sequences

Transposable elements are pieces of DNA that can copy themselves and jump to new locations.

They’re like:

  • 🦘 DNA that can hop around the genome

  • 📋 Copy-paste functions gone wild

  • 🦠 Ancient viral DNA that got stuck in the genome

How common are they?

  • Humans: 45% of genome is transposable elements!

  • Corn: 85% transposable elements!

  • Some plants: Over 90%!

These elements multiply over time, making genomes bigger without adding new genes!

Think of it like:

  • A book where sentences copy themselves over and over

  • The book gets huge but doesn’t have more unique information

10.2.3.1 Types of Repetitive DNA

Repetitive sequences are a MAJOR component of eukaryotic genomes!

10.2.4 1. Interspersed Repeats (Scattered Throughout Genome)

SINEs - Short Interspersed Nuclear Elements:

  • Short repetitive sequences (~100-400 bp)

  • Copy themselves via RNA intermediate (retrotransposition)

  • Cannot move on their own (need help from LINEs)

  • Humans: ~1.5 million copies!

Famous example - Alu elements:

  • Most common SINE in humans

  • ~300 bp long

  • ~1.1 million copies in your genome!

  • Makes up ~10% of human genome

  • Named after AluI restriction enzyme that cuts it

LINEs - Long Interspersed Nuclear Elements:

  • Long repetitive sequences (~6,000 bp)

  • Can copy and paste themselves independently

  • Encode their own machinery for movement

  • Humans: ~500,000 copies

Famous example - LINE-1 (L1):

  • ~6 kb long

  • ~17% of human genome!

  • Codes for reverse transcriptase

  • Most copies are “dead” (cannot jump anymore)

  • ~80-100 still active in humans

  • Can cause diseases when they jump into genes!

LTRs - Long Terminal Repeats:

  • From ancient retroviruses that infected our ancestors

  • Virus integrated into germline DNA

  • Got passed down through generations

  • Humans: ~8% of genome

  • Includes HERVs (Human Endogenous Retroviruses)

  • Most are now inactive

DNA Transposons:

  • “Cut and paste” mechanism (not copy-paste)

  • Move directly as DNA (no RNA intermediate)

  • Humans: ~3% of genome

  • All are “dead” in humans (none can move anymore!)

  • Still active in some organisms (bacteria, plants, flies)

10.2.5 2. Tandem Repeats (Clustered Together)

STRs - Short Tandem Repeats (also called microsatellites):

  • Very short sequences (2-6 bp) repeated many times

  • Example: CACACACACACA (CA repeated 6 times)

  • Used in DNA fingerprinting!

  • Used in paternity tests

  • Highly variable between individuals

Satellite DNA:

  • Very long arrays of repeats

  • Found at centromeres and telomeres

  • Important for chromosome structure

  • Named because they appear as “satellite” bands in density gradients

Example:

Centromere satellite: AAATAT-AAATAT-AAATAT-AAATAT (repeated thousands of times)

#### Impact on Genome Annotation: Repeat Masking

**The problem with repeats**:

- Interfere with sequence assembly

- Confuse gene prediction algorithms

- Cause misalignment in sequence comparisons

- Make genome analysis much harder!

**Solution: Repeat Masking**

**What is repeat masking?**

- Computational process to identify and "hide" repetitive sequences

- Replace repeats with "N"s or lowercase letters

- Allows gene prediction to focus on unique sequences

**How it works**:
Original sequence:
ATGCCCAAAGGGALUALUALUATGCGATAG

After repeat masking:
ATGCCCAAGGGxxxxxxxxxxxATGCGATAG
                ↑
         Alu element masked

```

Tools for repeat masking:

  • RepeatMasker: Most widely used

  • RepeatModeler: Identifies novel repeats

  • Uses databases of known repeats (Repbase)

Workflow in genome annotation:

  1. Sequence genome

  2. Mask repeats first! (critical step)

  3. Predict genes in masked sequence

  4. Avoid false gene predictions in repetitive regions

Why this matters:

  • Without masking: Find “gene” in Alu element (wrong!)

  • With masking: Ignore Alu, find real genes

  • Improves annotation accuracy dramatically

Additional complications:

  • Some transposons have been “domesticated”

    • Now serve useful functions!

    • Example: SETMAR gene in primates (from transposon)

  • Some regulatory elements evolved from transposons

  • So can’t just ignore all repeats!

10.2.5.1 Evolutionary Perspective on Transposons

Are transposons “junk” or functional?

Arguments for “junk”:

  • Most copies are broken/inactive

  • Seem parasitic (just copy themselves)

  • Cause diseases when they jump

Arguments for functional:

  • Some became regulatory elements

  • Contribute to genome evolution

  • Source of genetic variation

  • Can be activated under stress

Current view: Mostly junk, but some have been repurposed!

Barbara McClintock’s discovery:

  • Discovered transposable elements in corn (1940s-50s)

  • Called them “jumping genes”

  • Nobody believed her at first!

  • Won Nobel Prize in 1983 (finally recognized!)

  • Now we know they’re in ALL organisms

10.2.6 Reason 4: Intron Size Variation

Remember introns (the parts of genes that get removed)?

Different organisms have different sized introns:

  • Compact genomes (pufferfish): Small introns

  • Large genomes (lungfish): HUGE introns

Genes can be the same, but take up different amounts of space!

It’s like:

  • Writing a sentence with normal spaces vs. GIANT spaces

  • Same words, different total length

10.2.7 Reason 5: Number of Genes ≠ Complexity

Surprisingly, organisms with similar complexity can have very different gene numbers:

Organism Estimated Genes
Humans ~20,000-25,000
Rice ~35,000-40,000
Water flea ~31,000
Roundworm (C. elegans) ~20,000
Fruit fly ~14,000

Rice has MORE genes than humans! But humans are clearly more complex.

Why?

  • Humans have more complex gene regulation

  • Humans use alternative splicing more (one gene → many proteins)

  • Quality over quantity!

10.3 Why Plants Often Have Larger Genomes

10.3.1 The Plant Genome Size Mystery

Plants tend to have larger genomes than animals. Why?

10.3.2 Reason 1: Plants Can Handle “Junk”

Animals:

  • Need to move quickly (flight, running, swimming)

  • Need to make energy-efficient cells

  • Can’t afford to carry too much extra DNA

  • Smaller genomes are favored

Plants:

  • Don’t move around

  • Get energy from the sun (photosynthesis)

  • Can afford to have lots of extra DNA

  • No strong pressure to keep genomes small

Think of it like:

  • Animals = Travelers who pack light

  • Plants = Staying home, can keep everything!

10.3.3 Reason 2: Polyploidy Is Common in Plants

Many plants are polyploid:

  • Whole genome duplications happen often

  • Plants can survive and thrive with extra chromosomes

  • Animals usually can’t (too many chromosomes is often lethal)

Why plants tolerate polyploidy better:

  • More flexible gene regulation

  • Can handle imbalanced gene doses

  • Sometimes gives advantages (bigger fruits, hardier plants)

10.3.4 Reason 3: Transposable Elements Love Plants

For some reason, transposable elements proliferate more in plant genomes:

  • Less efficient cleanup of transposable elements

  • Plants may have weaker systems to remove them

  • They just accumulate over time

Like a closet that never gets cleaned out!

10.3.5 Reason 4: Less Pressure to Delete DNA

In animals:

  • Non-functional DNA is deleted over evolution

  • Smaller genomes are advantageous (metabolic cost)

In plants:

  • Less pressure to delete non-functional DNA

  • It just stays there

  • Over millions of years, it builds up

10.3.6 Reason 5: Recent Whole Genome Duplications

Many plant lineages have undergone recent genome duplications:

  • Doubles all the DNA at once

  • Some extra genes are lost, but many remain

  • Leads to larger genomes

Example: Bread wheat had two genome duplications, ending up with 6 sets of chromosomes!

10.4 Implications for Evolution

10.4.1 What the C-Value Paradox Teaches Us

1. Genome Size ≠ Gene Number

  • Big genome doesn’t mean more genes

  • Much DNA is non-coding

2. Gene Number ≠ Complexity

  • It’s about HOW genes are used, not how many

  • Regulation and alternative splicing matter more

3. “Junk DNA” Can Accumulate

  • Not all DNA is functional

  • Evolution doesn’t always optimize

  • Different organisms have different “junk tolerance”

4. Evolution Is Flexible

  • No single “best” genome size

  • Different strategies work for different lifestyles

  • Plants and animals evolved different solutions

10.4.2 Compact vs. Expanded Genomes

Compact genome strategy (pufferfish, fruit flies):

  • Small genes

  • Small introns

  • Less repetitive DNA

  • Efficient!

Expanded genome strategy (salamanders, lungfish, onions):

  • Large genes

  • Large introns

  • Lots of repetitive DNA

  • Inefficient but tolerable!

Both strategies work! There’s no “right” answer.

10.4.3 What Creates Complexity Then?

If genome size and gene number don’t create complexity, what does?

Sources of complexity:

  1. Gene regulation - How and when genes are turned on/off

  2. Alternative splicing - One gene → multiple proteins

  3. Protein modifications - Adding chemical groups to proteins after they’re made

  4. Protein-protein interactions - How proteins work together

  5. Non-coding RNAs - RNA molecules that regulate genes

  6. Epigenetics - Controlling genes without changing DNA sequence

  7. Development - How organisms grow from embryo to adult

Think of it like:

  • Having a simple set of LEGO bricks (genes)

  • But incredibly complex instructions for how to use them (regulation)

  • The complexity is in the instructions, not the number of bricks!

10.5 Real-World Applications

10.5.1 Genome Sequencing Costs

Understanding the C-value paradox helps with:

  • Choosing model organisms: Scientists often pick organisms with small genomes (easier/cheaper to sequence)

  • Crop improvement: Understanding why crop genomes are so large

  • Evolutionary studies: Tracking genome size changes over time

10.5.2 Agriculture

  • Some crops have huge genomes (wheat, strawberry)

  • Understanding polyploidy helps with breeding

  • Can create new crop varieties through genome duplication

10.5.3 Medicine

  • Humans have a medium-sized genome (lucky for sequencing!)

  • Understanding that genome size doesn’t equal complexity

  • Realizing that much disease comes from regulation, not just genes

10.6 Fun Facts! 🎉

  • The smallest bacterial genome is only 160,000 base pairs!

  • The largest known genome belongs to a plant (Paris japonica) at 150 billion bp—50 times larger than humans!

  • Pufferfish have compact genomes because they need lightweight cells for buoyancy

  • Salamanders have huge genomes but nobody knows why!

  • Wheat has a larger genome than humans AND is hexaploid (6 sets of chromosomes)!

  • Some goldfish have up to 100 sets of chromosomes!

10.7 Key Takeaways

  • C-value paradox = Genome size doesn’t correlate with organism complexity

  • Some simple organisms (onions, amoebas) have much larger genomes than humans

  • Reasons for the paradox:

    • Most DNA is non-coding

    • Polyploidy (multiple genome copies)

    • Transposable elements (“junk DNA”)

    • Variable intron sizes

    • Gene number doesn’t equal complexity

  • Plants often have larger genomes because:

    • Can tolerate “junk” DNA (don’t need to move)

    • Polyploidy is common

    • More transposable elements

    • Less pressure to delete non-functional DNA

  • Complexity comes from:

    • Gene regulation (not gene number)

    • Alternative splicing

    • Protein modifications

    • Complex development

  • Evolution lesson: There’s no “optimal” genome size—different strategies work for different lifestyles


Sources: Information adapted from Nature Education, evolutionary genomics research papers, and comparative genomics studies.

Tools for repeat masking:

  • RepeatMasker: Most widely used

  • RepeatModeler: Identifies novel repeats

  • Uses databases of known repeats (Repbase)

Workflow in genome annotation:

  1. Sequence genome

  2. Mask repeats first! (critical step)

  3. Predict genes in masked sequence

  4. Avoid false gene predictions in repetitive regions

Why this matters:

  • Without masking: Find “gene” in Alu element (wrong!)

  • With masking: Ignore Alu, find real genes

  • Improves annotation accuracy dramatically

Additional complications:

  • Some transposons have been “domesticated”

    • Now serve useful functions!

    • Example: SETMAR gene in primates (from transposon)

  • Some regulatory elements evolved from transposons

  • So can’t just ignore all repeats!

10.7.0.1 Evolutionary Perspective on Transposons

Are transposons “junk” or functional?

Arguments for “junk”:

  • Most copies are broken/inactive

  • Seem parasitic (just copy themselves)

  • Cause diseases when they jump

Arguments for functional:

  • Some became regulatory elements

  • Contribute to genome evolution

  • Source of genetic variation

  • Can be activated under stress

Current view: Mostly junk, but some have been repurposed!

Barbara McClintock’s discovery:

  • Discovered transposable elements in corn (1940s-50s)

  • Called them “jumping genes”

  • Nobody believed her at first!

  • Won Nobel Prize in 1983 (finally recognized!)

  • Now we know they’re in ALL organisms

10.7.1 Reason 4: Intron Size Variation

Remember introns (the parts of genes that get removed)?

Different organisms have different sized introns:

  • Compact genomes (pufferfish): Small introns

  • Large genomes (lungfish): HUGE introns

Genes can be the same, but take up different amounts of space!

It’s like:

  • Writing a sentence with normal spaces vs. GIANT spaces

  • Same words, different total length

10.7.2 Reason 5: Number of Genes ≠ Complexity

Surprisingly, organisms with similar complexity can have very different gene numbers:

Organism Estimated Genes
Humans ~20,000-25,000
Rice ~35,000-40,000
Water flea ~31,000
Roundworm (C. elegans) ~20,000
Fruit fly ~14,000

Rice has MORE genes than humans! But humans are clearly more complex.

Why?

  • Humans have more complex gene regulation

  • Humans use alternative splicing more (one gene → many proteins)

  • Quality over quantity!

10.8 Why Plants Often Have Larger Genomes

10.8.1 The Plant Genome Size Mystery

Plants tend to have larger genomes than animals. Why?

10.8.2 Reason 1: Plants Can Handle “Junk”

Animals:

  • Need to move quickly (flight, running, swimming)

  • Need to make energy-efficient cells

  • Can’t afford to carry too much extra DNA

  • Smaller genomes are favored

Plants:

  • Don’t move around

  • Get energy from the sun (photosynthesis)

  • Can afford to have lots of extra DNA

  • No strong pressure to keep genomes small

Think of it like:

  • Animals = Travelers who pack light

  • Plants = Staying home, can keep everything!

10.8.3 Reason 2: Polyploidy Is Common in Plants

Many plants are polyploid:

  • Whole genome duplications happen often

  • Plants can survive and thrive with extra chromosomes

  • Animals usually can’t (too many chromosomes is often lethal)

Why plants tolerate polyploidy better:

  • More flexible gene regulation

  • Can handle imbalanced gene doses

  • Sometimes gives advantages (bigger fruits, hardier plants)

10.8.4 Reason 3: Transposable Elements Love Plants

For some reason, transposable elements proliferate more in plant genomes:

  • Less efficient cleanup of transposable elements

  • Plants may have weaker systems to remove them

  • They just accumulate over time

Like a closet that never gets cleaned out!

10.8.5 Reason 4: Less Pressure to Delete DNA

In animals:

  • Non-functional DNA is deleted over evolution

  • Smaller genomes are advantageous (metabolic cost)

In plants:

  • Less pressure to delete non-functional DNA

  • It just stays there

  • Over millions of years, it builds up

10.8.6 Reason 5: Recent Whole Genome Duplications

Many plant lineages have undergone recent genome duplications:

  • Doubles all the DNA at once

  • Some extra genes are lost, but many remain

  • Leads to larger genomes

Example: Bread wheat had two genome duplications, ending up with 6 sets of chromosomes!

10.9 Implications for Evolution

10.9.1 What the C-Value Paradox Teaches Us

1. Genome Size ≠ Gene Number

  • Big genome doesn’t mean more genes

  • Much DNA is non-coding

2. Gene Number ≠ Complexity

  • It’s about HOW genes are used, not how many

  • Regulation and alternative splicing matter more

3. “Junk DNA” Can Accumulate

  • Not all DNA is functional

  • Evolution doesn’t always optimize

  • Different organisms have different “junk tolerance”

4. Evolution Is Flexible

  • No single “best” genome size

  • Different strategies work for different lifestyles

  • Plants and animals evolved different solutions

10.9.2 Compact vs. Expanded Genomes

Compact genome strategy (pufferfish, fruit flies):

  • Small genes

  • Small introns

  • Less repetitive DNA

  • Efficient!

Expanded genome strategy (salamanders, lungfish, onions):

  • Large genes

  • Large introns

  • Lots of repetitive DNA

  • Inefficient but tolerable!

Both strategies work! There’s no “right” answer.

10.9.3 What Creates Complexity Then?

If genome size and gene number don’t create complexity, what does?

Sources of complexity:

  1. Gene regulation - How and when genes are turned on/off

  2. Alternative splicing - One gene → multiple proteins

  3. Protein modifications - Adding chemical groups to proteins after they’re made

  4. Protein-protein interactions - How proteins work together

  5. Non-coding RNAs - RNA molecules that regulate genes

  6. Epigenetics - Controlling genes without changing DNA sequence

  7. Development - How organisms grow from embryo to adult

Think of it like:

  • Having a simple set of LEGO bricks (genes)

  • But incredibly complex instructions for how to use them (regulation)

  • The complexity is in the instructions, not the number of bricks!

10.10 Real-World Applications

10.10.1 Genome Sequencing Costs

Understanding the C-value paradox helps with:

  • Choosing model organisms: Scientists often pick organisms with small genomes (easier/cheaper to sequence)

  • Crop improvement: Understanding why crop genomes are so large

  • Evolutionary studies: Tracking genome size changes over time

10.10.2 Agriculture

  • Some crops have huge genomes (wheat, strawberry)

  • Understanding polyploidy helps with breeding

  • Can create new crop varieties through genome duplication

10.10.3 Medicine

  • Humans have a medium-sized genome (lucky for sequencing!)

  • Understanding that genome size doesn’t equal complexity

  • Realizing that much disease comes from regulation, not just genes

10.11 Fun Facts! 🎉

  • The smallest bacterial genome is only 160,000 base pairs!

  • The largest known genome belongs to a plant (Paris japonica) at 150 billion bp—50 times larger than humans!

  • Pufferfish have compact genomes because they need lightweight cells for buoyancy

  • Salamanders have huge genomes but nobody knows why!

  • Wheat has a larger genome than humans AND is hexaploid (6 sets of chromosomes)!

  • Some goldfish have up to 100 sets of chromosomes!

10.12 Key Takeaways

  • C-value paradox = Genome size doesn’t correlate with organism complexity

  • Some simple organisms (onions, amoebas) have much larger genomes than humans

  • Reasons for the paradox:

    • Most DNA is non-coding

    • Polyploidy (multiple genome copies)

    • Transposable elements (“junk DNA”)

    • Variable intron sizes

    • Gene number doesn’t equal complexity

  • Plants often have larger genomes because:

    • Can tolerate “junk” DNA (don’t need to move)

    • Polyploidy is common

    • More transposable elements

    • Less pressure to delete non-functional DNA

  • Complexity comes from:

    • Gene regulation (not gene number)

    • Alternative splicing

    • Protein modifications

    • Complex development

  • Evolution lesson: There’s no “optimal” genome size—different strategies work for different lifestyles


Sources: Information adapted from Nature Education, evolutionary genomics research papers, and comparative genomics studies.