Skip to content

Added footer to read_html #8552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 6, 2014
Merged

Added footer to read_html #8552

merged 1 commit into from
Dec 6, 2014

Conversation

mjsu
Copy link
Contributor

@mjsu mjsu commented Oct 13, 2014

read_html neglected table footers. Although rare, some sites use the footer for data. First time pull request so hopefully not messed up.

@cpcloud
Copy link
Member

cpcloud commented Oct 13, 2014

Needs a test.

@cpcloud
Copy link
Member

cpcloud commented Oct 13, 2014

Add it in tests/test_html.py.

It might be good to have a test that exercises the <footer> parsing on a very simple example (maybe even hand written HTML) and then another test with an example of some HTML from the wild that contains <footer>s.

@cpcloud
Copy link
Member

cpcloud commented Oct 13, 2014

Should probably also test an empty footer like <footer></footer>.

@@ -594,7 +594,7 @@ def _expand_elements(body):

def _data_to_frame(data, header, index_col, skiprows, infer_types,
parse_dates, tupleize_cols, thousands):
head, body, _ = data # _ is footer which is rarely used: ignore for now
head, body, foot = data # _ is footer which is rarely used: ignore for now
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the comment now

@mjsu
Copy link
Contributor Author

mjsu commented Oct 14, 2014

Ok thanks! I was going to add my local test to tests/test_html.py but found an error which meant I hadn't tested this modification properly (and now getting NaNs where there shouldn't be). Will keep working on it, and getting my head around how to use git.

@mjsu
Copy link
Contributor Author

mjsu commented Oct 14, 2014

I was getting NaN as my test (and real world example) used <td> rather than <th> inside <tfoot></tfoot> so it wasn't parsing correctly.

@jreback jreback added IO HTML read_html, to_html, Styler.apply, Styler.applymap IO Data IO issues that don't fit into a more specific label labels Oct 15, 2014
@jreback jreback added this to the 0.15.1 milestone Oct 15, 2014
@jreback
Copy link
Contributor

jreback commented Nov 24, 2014

@mjsu

  • pls add a release note documenting this in enhancements in v0.15.2 (reference this issue)
  • pls squash

@cpcloud will review after

@mjsu
Copy link
Contributor Author

mjsu commented Nov 26, 2014

Hmm, I think I have messed this up! (still very novice at git/github)

@jreback
Copy link
Contributor

jreback commented Nov 30, 2014

@mjsu ?

@mjsu
Copy link
Contributor Author

mjsu commented Nov 30, 2014

Yep?

@jreback
Copy link
Contributor

jreback commented Nov 30, 2014

@mjsu just pinging in u. This needs a rebase / squash and to add those items mentioned above.

@mjsu
Copy link
Contributor Author

mjsu commented Nov 30, 2014

I thought I did it. I wasn't quite sure what I was doing so may have done it wrong?

@jreback
Copy link
Contributor

jreback commented Nov 30, 2014

see here https://github.com/pydata/pandas/wiki/Using-Git

try

git rebase -i origin/master

squash to 1 commit

then

git push myfork thisbranchname -f

@mjsu
Copy link
Contributor Author

mjsu commented Nov 30, 2014

That done it? (slowly getting my head around this stuff, thanks for the help!)

@jreback
Copy link
Contributor

jreback commented Nov 30, 2014

good.

@cpcloud ?

@jreback
Copy link
Contributor

jreback commented Dec 4, 2014

@cpcloud pls have a look and merge if ok (you prob have to rebase it)

@jreback
Copy link
Contributor

jreback commented Dec 6, 2014

@cpcloud ?

@cpcloud
Copy link
Member

cpcloud commented Dec 6, 2014

looks good to me, needs a rebase looks like there's some merge conflicts.

@jreback jreback merged commit 7587bf1 into pandas-dev:master Dec 6, 2014
@jreback
Copy link
Contributor

jreback commented Dec 6, 2014

@mjsu thanks!

@mjsu mjsu deleted the add-footer-to-read-html branch December 6, 2014 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label IO HTML read_html, to_html, Styler.apply, Styler.applymap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants