10 The C-Value Paradox

10.1 A Surprising Mystery

10.1.1 What Is the C-Value Paradox?

Here’s a puzzle that confused scientists for years:

You might expect: More complex organisms → More DNA

What actually happens: Complexity and DNA amount don’t match!

The “C-value” means the total amount of DNA in an organism. The “paradox” is that this value doesn’t correspond to organism complexity!

10.1.2 Some Shocking Examples

Let’s compare genome sizes:

Organism	Genome Size	Complexity
Humans	3.2 billion bp	Very complex
Mouse	2.5 billion bp	Complex mammal
Chicken	1 billion bp	Complex bird
Fruit fly	140 million bp	Simple insect
Rice	389 million bp	Plant
Onion	16 billion bp	Simple plant!
Paris japonica (plant)	150 billion bp	Just a flower!
Lungfish	130 billion bp	Fish
Amoeba dubia	670 billion bp	Single-celled!

Wait, WHAT?!

An onion has 5 times more DNA than you!
A single-celled amoeba has 200 times more DNA than a human!
A lungfish has 40 times more DNA than you!

Are onions more complex than humans? Of course not!

10.2 Why Doesn’t More DNA = More Complex?

10.2.1 Reason 1: Most DNA Doesn’t Code for Proteins

Remember from Chapter 7:

Only 1-2% of human DNA codes for proteins
The rest is non-coding (regulatory, structural, repetitive, etc.)

Different organisms have different amounts of non-coding DNA:

Humans: ~98% non-coding
Some plants: 99% non-coding
Pufferfish: ~90% non-coding (they have LESS junk!)

Think of it like:

Two books can be very different sizes
But have the same number of actual words
One just has bigger margins and more spacing!

10.2.2 Reason 2: Polyploidy (Extra Chromosome Sets)

Some organisms have multiple complete copies of their genome!

Ploidy levels:

Diploid (2n) = 2 sets of chromosomes (like humans)
Triploid (3n) = 3 sets
Tetraploid (4n) = 4 sets
Hexaploid (6n) = 6 sets
And so on!

Examples:

Wheat: Hexaploid (6 copies!)
Strawberries: Octoploid (8 copies!)
Goldfish: Can be 100-ploid or more!

Having extra sets doesn’t make you more complex—it’s like having 4 copies of the same book instead of 1!

10.2.3 Reason 3: Transposable Elements and Repetitive Sequences

Transposable elements are pieces of DNA that can copy themselves and jump to new locations.

They’re like:

🦘 DNA that can hop around the genome
📋 Copy-paste functions gone wild
🦠 Ancient viral DNA that got stuck in the genome

How common are they?

Humans: 45% of genome is transposable elements!
Corn: 85% transposable elements!
Some plants: Over 90%!

These elements multiply over time, making genomes bigger without adding new genes!

Think of it like:

A book where sentences copy themselves over and over
The book gets huge but doesn’t have more unique information

10.2.3.1 Types of Repetitive DNA

Repetitive sequences are a MAJOR component of eukaryotic genomes!

10.2.4 1. Interspersed Repeats (Scattered Throughout Genome)

SINEs - Short Interspersed Nuclear Elements:

Short repetitive sequences (~100-400 bp)
Copy themselves via RNA intermediate (retrotransposition)
Cannot move on their own (need help from LINEs)
Humans: ~1.5 million copies!

Famous example - Alu elements:

Most common SINE in humans
~300 bp long
~1.1 million copies in your genome!
Makes up ~10% of human genome
Named after AluI restriction enzyme that cuts it

LINEs - Long Interspersed Nuclear Elements:

Long repetitive sequences (~6,000 bp)
Can copy and paste themselves independently
Encode their own machinery for movement
Humans: ~500,000 copies

Famous example - LINE-1 (L1):

~6 kb long
~17% of human genome!
Codes for reverse transcriptase
Most copies are “dead” (cannot jump anymore)
~80-100 still active in humans
Can cause diseases when they jump into genes!

LTRs - Long Terminal Repeats:

From ancient retroviruses that infected our ancestors
Virus integrated into germline DNA
Got passed down through generations
Humans: ~8% of genome
Includes HERVs (Human Endogenous Retroviruses)
Most are now inactive

DNA Transposons:

“Cut and paste” mechanism (not copy-paste)
Move directly as DNA (no RNA intermediate)
Humans: ~3% of genome
All are “dead” in humans (none can move anymore!)
Still active in some organisms (bacteria, plants, flies)

10.2.5 2. Tandem Repeats (Clustered Together)

STRs - Short Tandem Repeats (also called microsatellites):

Very short sequences (2-6 bp) repeated many times
Example: CACACACACACA (CA repeated 6 times)
Used in DNA fingerprinting!
Used in paternity tests
Highly variable between individuals

Satellite DNA:

Very long arrays of repeats
Found at centromeres and telomeres
Important for chromosome structure
Named because they appear as “satellite” bands in density gradients

Example:

Centromere satellite: AAATAT-AAATAT-AAATAT-AAATAT (repeated thousands of times)


#### Impact on Genome Annotation: Repeat Masking

**The problem with repeats**:

- Interfere with sequence assembly

- Confuse gene prediction algorithms

- Cause misalignment in sequence comparisons

- Make genome analysis much harder!

**Solution: Repeat Masking**

**What is repeat masking?**

- Computational process to identify and "hide" repetitive sequences

- Replace repeats with "N"s or lowercase letters

- Allows gene prediction to focus on unique sequences

**How it works**:

Original sequence:
ATGCCCAAAGGGALUALUALUATGCGATAG

After repeat masking:
ATGCCCAAGGGxxxxxxxxxxxATGCGATAG
                ↑
         Alu element masked

```

Tools for repeat masking:

RepeatMasker: Most widely used
RepeatModeler: Identifies novel repeats
Uses databases of known repeats (Repbase)

Workflow in genome annotation:

Sequence genome
Mask repeats first! (critical step)
Predict genes in masked sequence
Avoid false gene predictions in repetitive regions

Why this matters:

Without masking: Find “gene” in Alu element (wrong!)
With masking: Ignore Alu, find real genes
Improves annotation accuracy dramatically

Additional complications:

Some transposons have been “domesticated”
- Now serve useful functions!
- Example: SETMAR gene in primates (from transposon)
Some regulatory elements evolved from transposons
So can’t just ignore all repeats!

10.2.5.1 Evolutionary Perspective on Transposons

Are transposons “junk” or functional?

Arguments for “junk”:

Most copies are broken/inactive
Seem parasitic (just copy themselves)
Cause diseases when they jump

Arguments for functional:

Some became regulatory elements
Contribute to genome evolution
Source of genetic variation
Can be activated under stress

Current view: Mostly junk, but some have been repurposed!

Barbara McClintock’s discovery:

Discovered transposable elements in corn (1940s-50s)
Called them “jumping genes”
Nobody believed her at first!
Won Nobel Prize in 1983 (finally recognized!)
Now we know they’re in ALL organisms

10.2.6 Reason 4: Intron Size Variation

Remember introns (the parts of genes that get removed)?

Different organisms have different sized introns:

Compact genomes (pufferfish): Small introns
Large genomes (lungfish): HUGE introns

Genes can be the same, but take up different amounts of space!

It’s like:

Writing a sentence with normal spaces vs. GIANT spaces
Same words, different total length

10.2.7 Reason 5: Number of Genes ≠ Complexity

Surprisingly, organisms with similar complexity can have very different gene numbers:

Organism	Estimated Genes
Humans	~20,000-25,000
Rice	~35,000-40,000
Water flea	~31,000
Roundworm (C. elegans)	~20,000
Fruit fly	~14,000

Rice has MORE genes than humans! But humans are clearly more complex.

Why?

Humans have more complex gene regulation
Humans use alternative splicing more (one gene → many proteins)
Quality over quantity!

10.3 Why Plants Often Have Larger Genomes

10.3.1 The Plant Genome Size Mystery

Plants tend to have larger genomes than animals. Why?

10.3.2 Reason 1: Plants Can Handle “Junk”

Animals:

Need to move quickly (flight, running, swimming)
Need to make energy-efficient cells
Can’t afford to carry too much extra DNA
Smaller genomes are favored

Plants:

Don’t move around
Get energy from the sun (photosynthesis)
Can afford to have lots of extra DNA
No strong pressure to keep genomes small

Think of it like:

Animals = Travelers who pack light
Plants = Staying home, can keep everything!

10.3.3 Reason 2: Polyploidy Is Common in Plants

Many plants are polyploid:

Whole genome duplications happen often
Plants can survive and thrive with extra chromosomes
Animals usually can’t (too many chromosomes is often lethal)

Why plants tolerate polyploidy better:

More flexible gene regulation
Can handle imbalanced gene doses
Sometimes gives advantages (bigger fruits, hardier plants)

10.3.4 Reason 3: Transposable Elements Love Plants

For some reason, transposable elements proliferate more in plant genomes:

Less efficient cleanup of transposable elements
Plants may have weaker systems to remove them
They just accumulate over time

Like a closet that never gets cleaned out!

10.3.5 Reason 4: Less Pressure to Delete DNA

In animals:

Non-functional DNA is deleted over evolution
Smaller genomes are advantageous (metabolic cost)

In plants:

Less pressure to delete non-functional DNA
It just stays there
Over millions of years, it builds up

10.3.6 Reason 5: Recent Whole Genome Duplications

Many plant lineages have undergone recent genome duplications:

Doubles all the DNA at once
Some extra genes are lost, but many remain
Leads to larger genomes

Example: Bread wheat had two genome duplications, ending up with 6 sets of chromosomes!

10.4 Implications for Evolution

10.4.1 What the C-Value Paradox Teaches Us

1. Genome Size ≠ Gene Number

Big genome doesn’t mean more genes
Much DNA is non-coding

2. Gene Number ≠ Complexity

It’s about HOW genes are used, not how many
Regulation and alternative splicing matter more

3. “Junk DNA” Can Accumulate

Not all DNA is functional
Evolution doesn’t always optimize
Different organisms have different “junk tolerance”

4. Evolution Is Flexible

No single “best” genome size
Different strategies work for different lifestyles
Plants and animals evolved different solutions

10.4.2 Compact vs. Expanded Genomes

Compact genome strategy (pufferfish, fruit flies):

Small genes
Small introns
Less repetitive DNA
Efficient!

Expanded genome strategy (salamanders, lungfish, onions):

Large genes
Large introns
Lots of repetitive DNA
Inefficient but tolerable!

Both strategies work! There’s no “right” answer.

10.4.3 What Creates Complexity Then?

If genome size and gene number don’t create complexity, what does?

Sources of complexity:

Gene regulation - How and when genes are turned on/off
Alternative splicing - One gene → multiple proteins
Protein modifications - Adding chemical groups to proteins after they’re made
Protein-protein interactions - How proteins work together
Non-coding RNAs - RNA molecules that regulate genes
Epigenetics - Controlling genes without changing DNA sequence
Development - How organisms grow from embryo to adult

Think of it like:

Having a simple set of LEGO bricks (genes)
But incredibly complex instructions for how to use them (regulation)
The complexity is in the instructions, not the number of bricks!

10.5 Real-World Applications

10.5.1 Genome Sequencing Costs

Understanding the C-value paradox helps with:

Choosing model organisms: Scientists often pick organisms with small genomes (easier/cheaper to sequence)
Crop improvement: Understanding why crop genomes are so large
Evolutionary studies: Tracking genome size changes over time

10.5.2 Agriculture

Some crops have huge genomes (wheat, strawberry)
Understanding polyploidy helps with breeding
Can create new crop varieties through genome duplication

10.5.3 Medicine

Humans have a medium-sized genome (lucky for sequencing!)
Understanding that genome size doesn’t equal complexity
Realizing that much disease comes from regulation, not just genes

10.6 Fun Facts! 🎉

The smallest bacterial genome is only 160,000 base pairs!
The largest known genome belongs to a plant (Paris japonica) at 150 billion bp—50 times larger than humans!
Pufferfish have compact genomes because they need lightweight cells for buoyancy
Salamanders have huge genomes but nobody knows why!
Wheat has a larger genome than humans AND is hexaploid (6 sets of chromosomes)!
Some goldfish have up to 100 sets of chromosomes!

10.7 Key Takeaways

C-value paradox = Genome size doesn’t correlate with organism complexity
Some simple organisms (onions, amoebas) have much larger genomes than humans
Reasons for the paradox:
- Most DNA is non-coding
- Polyploidy (multiple genome copies)
- Transposable elements (“junk DNA”)
- Variable intron sizes
- Gene number doesn’t equal complexity
Plants often have larger genomes because:
- Can tolerate “junk” DNA (don’t need to move)
- Polyploidy is common
- More transposable elements
- Less pressure to delete non-functional DNA
Complexity comes from:
- Gene regulation (not gene number)
- Alternative splicing
- Protein modifications
- Complex development
Evolution lesson: There’s no “optimal” genome size—different strategies work for different lifestyles

Sources: Information adapted from Nature Education, evolutionary genomics research papers, and comparative genomics studies.

Tools for repeat masking:

RepeatMasker: Most widely used
RepeatModeler: Identifies novel repeats
Uses databases of known repeats (Repbase)

Workflow in genome annotation:

Sequence genome
Mask repeats first! (critical step)
Predict genes in masked sequence
Avoid false gene predictions in repetitive regions

Why this matters:

Without masking: Find “gene” in Alu element (wrong!)
With masking: Ignore Alu, find real genes
Improves annotation accuracy dramatically

Additional complications:

Some transposons have been “domesticated”
- Now serve useful functions!
- Example: SETMAR gene in primates (from transposon)
Some regulatory elements evolved from transposons
So can’t just ignore all repeats!

10.7.0.1 Evolutionary Perspective on Transposons

Are transposons “junk” or functional?

Arguments for “junk”:

Most copies are broken/inactive
Seem parasitic (just copy themselves)
Cause diseases when they jump

Arguments for functional:

Some became regulatory elements
Contribute to genome evolution
Source of genetic variation
Can be activated under stress

Current view: Mostly junk, but some have been repurposed!

Barbara McClintock’s discovery:

Discovered transposable elements in corn (1940s-50s)
Called them “jumping genes”
Nobody believed her at first!
Won Nobel Prize in 1983 (finally recognized!)
Now we know they’re in ALL organisms

10.7.1 Reason 4: Intron Size Variation

Remember introns (the parts of genes that get removed)?

Different organisms have different sized introns:

Compact genomes (pufferfish): Small introns
Large genomes (lungfish): HUGE introns

Genes can be the same, but take up different amounts of space!

It’s like:

Writing a sentence with normal spaces vs. GIANT spaces
Same words, different total length

10.7.2 Reason 5: Number of Genes ≠ Complexity

Surprisingly, organisms with similar complexity can have very different gene numbers:

Organism	Estimated Genes
Humans	~20,000-25,000
Rice	~35,000-40,000
Water flea	~31,000
Roundworm (C. elegans)	~20,000
Fruit fly	~14,000

Rice has MORE genes than humans! But humans are clearly more complex.

Why?

Humans have more complex gene regulation
Humans use alternative splicing more (one gene → many proteins)
Quality over quantity!

10.8 Why Plants Often Have Larger Genomes

10.8.1 The Plant Genome Size Mystery

Plants tend to have larger genomes than animals. Why?

10.8.2 Reason 1: Plants Can Handle “Junk”

Animals:

Need to move quickly (flight, running, swimming)
Need to make energy-efficient cells
Can’t afford to carry too much extra DNA
Smaller genomes are favored

Plants:

Don’t move around
Get energy from the sun (photosynthesis)
Can afford to have lots of extra DNA
No strong pressure to keep genomes small

Think of it like:

Animals = Travelers who pack light
Plants = Staying home, can keep everything!

10.8.3 Reason 2: Polyploidy Is Common in Plants

Many plants are polyploid:

Whole genome duplications happen often
Plants can survive and thrive with extra chromosomes
Animals usually can’t (too many chromosomes is often lethal)

Why plants tolerate polyploidy better:

More flexible gene regulation
Can handle imbalanced gene doses
Sometimes gives advantages (bigger fruits, hardier plants)

10.8.4 Reason 3: Transposable Elements Love Plants

For some reason, transposable elements proliferate more in plant genomes:

Less efficient cleanup of transposable elements
Plants may have weaker systems to remove them
They just accumulate over time

Like a closet that never gets cleaned out!

10.8.5 Reason 4: Less Pressure to Delete DNA

In animals:

Non-functional DNA is deleted over evolution
Smaller genomes are advantageous (metabolic cost)

In plants:

Less pressure to delete non-functional DNA
It just stays there
Over millions of years, it builds up

10.8.6 Reason 5: Recent Whole Genome Duplications

Many plant lineages have undergone recent genome duplications:

Doubles all the DNA at once
Some extra genes are lost, but many remain
Leads to larger genomes

Example: Bread wheat had two genome duplications, ending up with 6 sets of chromosomes!

10.9 Implications for Evolution

10.9.1 What the C-Value Paradox Teaches Us

1. Genome Size ≠ Gene Number

Big genome doesn’t mean more genes
Much DNA is non-coding

2. Gene Number ≠ Complexity

It’s about HOW genes are used, not how many
Regulation and alternative splicing matter more

3. “Junk DNA” Can Accumulate

Not all DNA is functional
Evolution doesn’t always optimize
Different organisms have different “junk tolerance”

4. Evolution Is Flexible

No single “best” genome size
Different strategies work for different lifestyles
Plants and animals evolved different solutions

10.9.2 Compact vs. Expanded Genomes

Compact genome strategy (pufferfish, fruit flies):

Small genes
Small introns
Less repetitive DNA
Efficient!

Expanded genome strategy (salamanders, lungfish, onions):

Large genes
Large introns
Lots of repetitive DNA
Inefficient but tolerable!

Both strategies work! There’s no “right” answer.

10.9.3 What Creates Complexity Then?

If genome size and gene number don’t create complexity, what does?

Sources of complexity:

Gene regulation - How and when genes are turned on/off
Alternative splicing - One gene → multiple proteins
Protein modifications - Adding chemical groups to proteins after they’re made
Protein-protein interactions - How proteins work together
Non-coding RNAs - RNA molecules that regulate genes
Epigenetics - Controlling genes without changing DNA sequence
Development - How organisms grow from embryo to adult

Think of it like:

Having a simple set of LEGO bricks (genes)
But incredibly complex instructions for how to use them (regulation)
The complexity is in the instructions, not the number of bricks!

10.10 Real-World Applications

10.10.1 Genome Sequencing Costs

Understanding the C-value paradox helps with:

Choosing model organisms: Scientists often pick organisms with small genomes (easier/cheaper to sequence)
Crop improvement: Understanding why crop genomes are so large
Evolutionary studies: Tracking genome size changes over time

10.10.2 Agriculture

Some crops have huge genomes (wheat, strawberry)
Understanding polyploidy helps with breeding
Can create new crop varieties through genome duplication

10.10.3 Medicine

Humans have a medium-sized genome (lucky for sequencing!)
Understanding that genome size doesn’t equal complexity
Realizing that much disease comes from regulation, not just genes

10.11 Fun Facts! 🎉

The smallest bacterial genome is only 160,000 base pairs!
The largest known genome belongs to a plant (Paris japonica) at 150 billion bp—50 times larger than humans!
Pufferfish have compact genomes because they need lightweight cells for buoyancy
Salamanders have huge genomes but nobody knows why!
Wheat has a larger genome than humans AND is hexaploid (6 sets of chromosomes)!
Some goldfish have up to 100 sets of chromosomes!

10.12 Key Takeaways

C-value paradox = Genome size doesn’t correlate with organism complexity
Some simple organisms (onions, amoebas) have much larger genomes than humans
Reasons for the paradox:
- Most DNA is non-coding
- Polyploidy (multiple genome copies)
- Transposable elements (“junk DNA”)
- Variable intron sizes
- Gene number doesn’t equal complexity
Plants often have larger genomes because:
- Can tolerate “junk” DNA (don’t need to move)
- Polyploidy is common
- More transposable elements
- Less pressure to delete non-functional DNA
Complexity comes from:
- Gene regulation (not gene number)
- Alternative splicing
- Protein modifications
- Complex development
Evolution lesson: There’s no “optimal” genome size—different strategies work for different lifestyles

Sources: Information adapted from Nature Education, evolutionary genomics research papers, and comparative genomics studies.