148,726 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
Advice
3
votes
3
replies
57
views
Reading in XML data in python
I am looking for some assistance with how to convert the below XML data into a dataframe.
I have managed to write a working code in R (XML package, code is messy) but then I realised it might even be ...
4
votes
2
answers
195
views
Pandas rolling window over a time period if some data might be missing within groups
I have a dataset with a column of groups, dates, day of the week and some data columns. For each date in each group, I want to work out the same day average from the last 3 weeks (l3w). I've been ...
0
votes
2
answers
86
views
Polars add elements to list
Suppose I have the following polars DataFrame:
df = pl.DataFrame({"a": [["A111", "A110"], ["Z254"], ["B897", "C768", "D456"]]})
...
0
votes
0
answers
75
views
zscore function not found [closed]
I'm working in R and have a dataframe mtcars with cars having a column wt (weight of the cars). I'm trying to calculate skewness of weights.
The following is the exercise as shown in the book. The 1st ...
4
votes
4
answers
184
views
Changing column values based on values of separate columns
I'm trying to figure out how to change values in a column (Age), based on the values of two separate columns (Species and Length).
I have a dataset of fish lengths, with all of them designated either &...
3
votes
1
answer
123
views
Replacing several rows of data in a column efficiently using condition in a pipeline
I have the following dataframe:
df <- data.frame(
Form=rep(c("Fast", "Medium", "Slow"), each = 3),
Parameter =rep(c("Fmax", "TMAX", "B&...
3
votes
1
answer
75
views
faster methods to remove substrings stored in one column from strings stored in another column
hist_df_2["time"] = hist_df_2.apply(lambda row : hist_df_2['timestamp'].replace(str(hist_df_2['date']), ''), axis=1)
I tried this to remove the date part from the timestamp. However, for ...
1
vote
1
answer
129
views
Broadcasting DataFrames across NumPy array dimensions
I'm working with a large Pandas DataFrame and a multi-dimensional NumPy array. My goal is to efficiently "broadcast" a specific column of the DataFrame across one or more dimensions of the ...
-1
votes
0
answers
83
views
Best approaches to applying a function to more than one column in df at once? [duplicate]
Say I have a pandas dataframe of > 2 columns and > 2 rows, I want to apply a function, such as a datatype conversion, to each element in at least two columns. I would like for it to be efficient,...
Best practices
0
votes
4
replies
44
views
unlink a file from dataframe
I have uploaded an Excel file in Python data frame. But once it's loaded, the file gets locked for further changes. Now I want to unlink the file so that I can make changes in file directly as well.
4
votes
0
answers
135
views
Filter empty string in a polars lazyframe
I am trying to filter out the URI column from a parquet file having over 50 million rows containing empty string using
import polars as pl
lf = pl.scan_parquet("data.parquet")
lf.filter(pl....
5
votes
3
answers
249
views
How to query columns that are lists or dicts?
How can I query columns that are lists or dicts? Here is some basic JSON-like data.
[
{
"id": 1,
"name": "John Doe",
"age": 30,
&...
1
vote
3
answers
114
views
Access data frame from binary file
if I have saved a data frame using pickle in a binary file how can I access it?
def create_dataset(path):
"""
creates an binary file with dataset saved in it.
"&...
-3
votes
2
answers
112
views
How to print the value counts of a user-selected column in a pandas DataFrame? [closed]
I’m trying to write a Python script that allows the user to input the name of a column and then prints the value counts of that column from a pandas DataFrame. Here's what I currently have:
def ...
4
votes
2
answers
124
views
Is it the expected behaviour for `pl.int_ranges(scalar1, scalar2).list.sample(n)` to generate a column with a same sample filled? and why?
Given a DataFrame that with a column of multiple rows, I try to generate a column with different random samples for each row from a same range, so I tried to write this:
>>> import polars as ...