Why are partial PostgreSQL HASH indices not smaller than full indices?

I want to create the most efficient index for a sparsely populated column. I only need equality operations, so a HASH index should be beneficial.

Now I'm wondering why a partial HASH index isn't smaller than a full hash index:

CREATE INDEX full_hash    ON mytable USING HASH(my_id); # 256 MB

CREATE INDEX partial_hash ON mytable USING HASH(my_id) WHERE my_ID IS NOT NULL; # 256 MB



CREATE INDEX full_btree    ON mytable (my_id); # 537 MB

CREATE INDEX partial_btree ON mytable (my_id) WHERE my_ID IS NOT NULL; # 32 MB

Both hash indices take exactly the same amount of space (as shown in pgHero). However, when using standard BTREE indices, the partial index takes only 5% of the space of the full index.

Are partial HASH indices not supported in PostgreSQL 10?

asked yesterday

Ortwin Gentz

1324

add a comment |

I want to create the most efficient index for a sparsely populated column. I only need equality operations, so a HASH index should be beneficial.

Now I'm wondering why a partial HASH index isn't smaller than a full hash index:

CREATE INDEX full_hash    ON mytable USING HASH(my_id); # 256 MB

CREATE INDEX partial_hash ON mytable USING HASH(my_id) WHERE my_ID IS NOT NULL; # 256 MB



CREATE INDEX full_btree    ON mytable (my_id); # 537 MB

CREATE INDEX partial_btree ON mytable (my_id) WHERE my_ID IS NOT NULL; # 32 MB

Both hash indices take exactly the same amount of space (as shown in pgHero). However, when using standard BTREE indices, the partial index takes only 5% of the space of the full index.

Are partial HASH indices not supported in PostgreSQL 10?

asked yesterday

Ortwin Gentz

1324

add a comment |

I want to create the most efficient index for a sparsely populated column. I only need equality operations, so a HASH index should be beneficial.

Now I'm wondering why a partial HASH index isn't smaller than a full hash index:

CREATE INDEX full_hash    ON mytable USING HASH(my_id); # 256 MB

CREATE INDEX partial_hash ON mytable USING HASH(my_id) WHERE my_ID IS NOT NULL; # 256 MB



CREATE INDEX full_btree    ON mytable (my_id); # 537 MB

CREATE INDEX partial_btree ON mytable (my_id) WHERE my_ID IS NOT NULL; # 32 MB

Both hash indices take exactly the same amount of space (as shown in pgHero). However, when using standard BTREE indices, the partial index takes only 5% of the space of the full index.

Are partial HASH indices not supported in PostgreSQL 10?

asked yesterday

Ortwin Gentz

1324

I want to create the most efficient index for a sparsely populated column. I only need equality operations, so a HASH index should be beneficial.

Now I'm wondering why a partial HASH index isn't smaller than a full hash index:

CREATE INDEX full_hash    ON mytable USING HASH(my_id); # 256 MB

CREATE INDEX partial_hash ON mytable USING HASH(my_id) WHERE my_ID IS NOT NULL; # 256 MB



CREATE INDEX full_btree    ON mytable (my_id); # 537 MB

CREATE INDEX partial_btree ON mytable (my_id) WHERE my_ID IS NOT NULL; # 32 MB

Both hash indices take exactly the same amount of space (as shown in pgHero). However, when using standard BTREE indices, the partial index takes only 5% of the space of the full index.

Are partial HASH indices not supported in PostgreSQL 10?

postgresql index index-tuning postgresql-10

asked yesterday

Ortwin Gentz

1324

asked yesterday

Ortwin Gentz

1324

asked yesterday

Ortwin Gentz

1324

asked yesterday

Ortwin Gentz

1324

asked yesterday

Ortwin Gentz

1324

add a comment |

2 Answers
2

active

oldest

votes

I would argue that this is a bug in the hash index code. When you create an index on an already-populated table, it tries to pre-size the index to hold all the data so that it doesn't have to keep splitting buckets as the index is created. But the code for doing this does not take the NULL fraction of the column nor (apparently) the selectivity of the partial index clause into account, so it arrives at a too-large number for the pre-sizing.

If you were to create the index first, and then populated the table, you will find that the hash index is small, whether you made it partial or not. If the table is going to grow substantially after the index is created, the extra space consumed by the index upon original creation will be put to good use.

answered yesterday

jjanes

13.6k917

3

I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.

– jjanes
23 hours ago

1

Oh, and I submitted a bug already: postgresql.org/message-id/…

– Ortwin Gentz
23 hours ago

add a comment |

It's not explicitly stated in the documentation, but in the source code there is the following comment:

/*

 * We do not insert null values into hash indexes.  This is okay because

 * the only supported search operator is '=', and we assume it is strict.

 */

So the is not null predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null values with = would never return true).

edited yesterday

answered yesterday

a_horse_with_no_name

40.5k777113

2

Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.

– Ortwin Gentz
yesterday

This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.

– jjanes
yesterday

The full btree index is more than double the size of the hash index.

– Ortwin Gentz
yesterday

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f231647%2fwhy-are-partial-postgresql-hash-indices-not-smaller-than-full-indices%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

answered yesterday

jjanes

13.6k917

3

I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.

– jjanes
23 hours ago

1

Oh, and I submitted a bug already: postgresql.org/message-id/…

– Ortwin Gentz
23 hours ago

add a comment |

answered yesterday

jjanes

13.6k917

3

I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.

– jjanes
23 hours ago

1

Oh, and I submitted a bug already: postgresql.org/message-id/…

– Ortwin Gentz
23 hours ago

add a comment |

answered yesterday

jjanes

13.6k917

answered yesterday

jjanes

13.6k917

answered yesterday

jjanes

13.6k917

answered yesterday

jjanes

13.6k917

answered yesterday

jjanes

13.6k917

3

I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.

– jjanes
23 hours ago

1

Oh, and I submitted a bug already: postgresql.org/message-id/…

– Ortwin Gentz
23 hours ago

add a comment |

3

I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.

– jjanes
23 hours ago

1

Oh, and I submitted a bug already: postgresql.org/message-id/…

– Ortwin Gentz
23 hours ago

I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.

– jjanes
23 hours ago

Oh, and I submitted a bug already: postgresql.org/message-id/…

– Ortwin Gentz
23 hours ago

add a comment |

It's not explicitly stated in the documentation, but in the source code there is the following comment:

/*

 * We do not insert null values into hash indexes.  This is okay because

 * the only supported search operator is '=', and we assume it is strict.

 */

So the is not null predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null values with = would never return true).

edited yesterday

answered yesterday

a_horse_with_no_name

40.5k777113

2

Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.

– Ortwin Gentz
yesterday

This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.

– jjanes
yesterday

The full btree index is more than double the size of the hash index.

– Ortwin Gentz
yesterday

add a comment |

It's not explicitly stated in the documentation, but in the source code there is the following comment:

/*

 * We do not insert null values into hash indexes.  This is okay because

 * the only supported search operator is '=', and we assume it is strict.

 */

So the is not null predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null values with = would never return true).

edited yesterday

answered yesterday

a_horse_with_no_name

40.5k777113

2

Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.

– Ortwin Gentz
yesterday

This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.

– jjanes
yesterday

The full btree index is more than double the size of the hash index.

– Ortwin Gentz
yesterday

add a comment |

It's not explicitly stated in the documentation, but in the source code there is the following comment:

/*

 * We do not insert null values into hash indexes.  This is okay because

 * the only supported search operator is '=', and we assume it is strict.

 */

So the is not null predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null values with = would never return true).

edited yesterday

answered yesterday

a_horse_with_no_name

40.5k777113

It's not explicitly stated in the documentation, but in the source code there is the following comment:

/*

 * We do not insert null values into hash indexes.  This is okay because

 * the only supported search operator is '=', and we assume it is strict.

 */

So the is not null predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null values with = would never return true).

edited yesterday

answered yesterday

a_horse_with_no_name

40.5k777113

edited yesterday

answered yesterday

a_horse_with_no_name

40.5k777113

answered yesterday

a_horse_with_no_name

40.5k777113

answered yesterday

a_horse_with_no_name

40.5k777113

2

Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.

– Ortwin Gentz
yesterday

This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.

– jjanes
yesterday

The full btree index is more than double the size of the hash index.

– Ortwin Gentz
yesterday

add a comment |

2

Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.

– Ortwin Gentz
yesterday

This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.

– jjanes
yesterday

The full btree index is more than double the size of the hash index.

– Ortwin Gentz
yesterday

Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.

– Ortwin Gentz
yesterday

This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.

– jjanes
yesterday

The full btree index is more than double the size of the hash index.

– Ortwin Gentz
yesterday

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk