I have a DataFrame A with one column location_ms. I want to split by ; and : to get DataFrame B.
DataFrame A(Beginning):
DataFrame B(Final):
My code below seems to be very roundabout and would love to see a better implementation towards the problem. By doing the splits I create a DataFrame with each element being a list of lists. Then I flatten that list of lists to create the final DataFrame.
def locpapersrc_table(df):
toflattenrows = df['location_ms'].str.split(';').apply(lambda x:[c.split(':') for c in x]).values.tolist()
singlelistoflist = [item for sublist in toflatten for item in sublist]
tmp = pd.DataFrame(singlelistoflist)
return tmp
This version2 is slower than the first but is another method that is also very roundabout.
def version2(df):
xx = df["location_ms"].str.split(';',expand = True).T
tmp = pd.melt(xx).dropna().drop(['variable'],axis=1)['value'].str.split(':',expand=True)
return tmp
Thank You!
-
3Please do not post code or dataframes as images, make them text pleaseU13-Forward– U13-Forward2018年11月16日 01:40:20 +00:00Commented Nov 16, 2018 at 1:40
1 Answer 1
Try something like this.
split_df = df['location_ms'].str.split(pat=";", expand=True)
Throw in something like this if you want to merge it back into the original dataframe.
df = df.merge(split_df, left_index=True, right_index=True)
df = df.drop('location_ms')
For your new problem (splitting by ; and :):
split_df = df['location_ms'].str.split(pat=";", expand=True)
subsplit_df = pd.DataFrame(index = split_df.index)
for i in range(split_df.shape[1]):
subsplit_df = subsplit_df.merge(split_df.iloc[:, i].str.split(pat=":", expand=True), left_index=True, right_index=True)
subsplit_df.columns = range(subsplit_df.shape[1])
You can merge it back in as above if you want.
4 Comments
; and ; to get DataFrame B" is a direct quote from your problem. I've edited the answer to match your new criteria.