Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit c75171a

Browse files
SofiaSM45pre-commit-ci[bot]
andauthored
BUG: Fix #57608: queries on categorical string columns in HDFStore.select() return unexpected results. (#61225)
* BUG: Fix #57608: queries on categorical string columns in HDFStore.select() return unexpected results. In function __init__() of class Selection (pandas/core/io/pytables.py), the method self.terms.evaluate() was not returning the correct value for the where condition. The issue stemmed from the function convert_value() of class BinOp (pandas/core/computation/pytables.py), where the function searchedsorted() did not return the correct index when matching the where condition in the metadata (categories table). Replacing searchsorted() with np.where() resolves this issue. * BUG: Follow-up for #57608: check if metadata is sorted before search * BUG: Follow-up for #57608: use direct match via np.flatnonzero * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 09a17c7 commit c75171a

File tree

3 files changed

+26
-1
lines changed

3 files changed

+26
-1
lines changed

‎doc/source/whatsnew/v3.0.0.rst‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -775,6 +775,7 @@ I/O
775775
- Bug in :meth:`DataFrame.to_stata` when writing more than 32,000 value labels. (:issue:`60107`)
776776
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)
777777
- Bug in :meth:`HDFStore.get` was failing to save data of dtype datetime64[s] correctly (:issue:`59004`)
778+
- Bug in :meth:`HDFStore.select` causing queries on categorical string columns to return unexpected results (:issue:`57608`)
778779
- Bug in :meth:`read_csv` causing segmentation fault when ``encoding_errors`` is not a string. (:issue:`59059`)
779780
- Bug in :meth:`read_csv` raising ``TypeError`` when ``index_col`` is specified and ``na_values`` is a dict containing the key ``None``. (:issue:`57547`)
780781
- Bug in :meth:`read_csv` raising ``TypeError`` when ``nrows`` and ``iterator`` are specified without specifying a ``chunksize``. (:issue:`59079`)

‎pandas/core/computation/pytables.py‎

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -239,7 +239,8 @@ def stringify(value):
239239
if conv_val not in metadata:
240240
result = -1
241241
else:
242-
result = metadata.searchsorted(conv_val, side="left")
242+
# Find the index of the first match of conv_val in metadata
243+
result = np.flatnonzero(metadata == conv_val)[0]
243244
return TermValue(result, result, "integer")
244245
elif kind == "integer":
245246
try:

‎pandas/tests/io/pytables/test_store.py‎

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@
2323
timedelta_range,
2424
)
2525
import pandas._testing as tm
26+
from pandas.api.types import (
27+
CategoricalDtype,
28+
)
2629
from pandas.tests.io.pytables.common import (
2730
_maybe_remove,
2831
ensure_clean_store,
@@ -1107,3 +1110,23 @@ def test_store_bool_index(tmp_path, setup_path):
11071110
df.to_hdf(path, key="a")
11081111
result = read_hdf(path, "a")
11091112
tm.assert_frame_equal(expected, result)
1113+
1114+
1115+
@pytest.mark.parametrize("model", ["name", "longname", "verylongname"])
1116+
def test_select_categorical_string_columns(tmp_path, model):
1117+
# Corresponding to BUG: 57608
1118+
1119+
path = tmp_path / "test.h5"
1120+
1121+
models = CategoricalDtype(categories=["name", "longname", "verylongname"])
1122+
df = DataFrame(
1123+
{"modelId": ["name", "longname", "longname"], "value": [1, 2, 3]}
1124+
).astype({"modelId": models, "value": int})
1125+
1126+
with HDFStore(path, "w") as store:
1127+
store.append("df", df, data_columns=["modelId"])
1128+
1129+
with HDFStore(path, "r") as store:
1130+
result = store.select("df", "modelId == model")
1131+
expected = df[df["modelId"] == model]
1132+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /