I have a data frame looks like this:
P Q L
1 2 3
2 3
4 5 6,7
The objective is to check if there is any value in L, if yes, extract the value on L and P column:
P L
1 3
4,6
4,7
Note there might more than one values in L, in the case of more than 1 value, I would need two rows.
Bellow is my current script, it cannot generate the expected result.
df2 = []
ego
other
newrow = []
for item in data_DF.iterrows():
if item[1]["L"] is not None:
ego = item[1]['P']
other = item[1]['L']
newrow = ego + other + "\n"
df2.append(newrow)
data_DF2 = pd.DataFrame(df2)
解决方案
First I extract multiple values of column L to new dataframe s with duplicity index from original index. Remove unnecessary columns L and Q. Then output join to original df and drop rows with NaN values.
print df
P Q L
0 1 2 3
1 2 3 NaN
2 4 5 6,7
s = df['L'].str.split(',').apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1) # to line up with df's index
s.name = 'L'
print s
0 3
2 6
2 7
Name: L, dtype: object
df = df.drop( ['L', 'Q'], axis=1)
df = df.join(s)
print df
P L
0 1 3
1 2 NaN
2 4 6
2 4 7
df = df.dropna().reset_index(drop=True)
print df
P L
0 1 3
1 4 6
2 4 7