-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
ENH: read-html fixes #3616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: read-html fixes #3616
Conversation
let me know when you need merging on any PR's |
this closes #3606, right? |
yep, that is fixed already. i might be able to get to the rest of this today, i know the 0.11.1 rls is due today...the annoyances of the parsing may have to wait tho or i might just open up the flavor argument to allow one of |
the main issue is the import errors..... |
that is also fixed in this. |
yep... |
take your time...btw |
ok thanks. i'm working on a cmdline interface to store neurophys data and it's due tmrw so pandas may have to wait... |
@jreback @y-p what do u think about removing the pure |
@cpcloud I would rather see correct and slow then wrong but fast! let's see premature optimization is evil Can always add it back in 0.12 (or after) if you discover how to fix it. And you have the flavor option, so sort of 'easy' to add it. (course have to edit stuff to take it out...docs,install docs,docstrings...) of course if there are cases where lxml can do better (and is correct), but bombs on other cases, then you could always raise on those (but that may be more trouble than its worth) |
i think the xpath implementation of lxml might be broken... :( @jreback can i leave the code for |
ok |
@jreback this ready 2 go as soon as travis passes. |
that is odd. travis is not running arg |
ah there we go |
@cpcloud thanks...this is great... I edited the v0.11.1 a bit (as this is new, just announcing it). I think an example is warranted. Maybe take a df, do a separate PR |
see this: I don't think travis was actually testing html5lib stuff....(I just added it in) add in ci/install.sh (right after bs4)....and test |
going to put in a separate issue |
ok. |
Some updates and bug fixes. See release notes for more details.
sort of pointless right now since we don't really have control over the speed of the parsing libraryvbench
stuffFigure out whyreported a bug w/ example to lxml peoplelxml
chooses to ignore thingsFigure out whysame as abovebs4
'sthead.find_all(['th', 'td'])
parses differently thanlxml
'sthead.xpath('.//thead//th|.//thead//td')
even whenlxml
is thebs4
backend.