Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add test automation #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sideshowbarker wants to merge 15 commits into master from sideshowbarker/test-automation
Closed

Conversation

@sideshowbarker
Copy link
Member

@sideshowbarker sideshowbarker commented Aug 17, 2020
edited
Loading

The changes in this pull-request branch enable us, for all pull requests and pushes, to automate testing of the parser against the html5lib-tests suite — including CI execution (using GitHub Actions) of the tests .

The changes include the following commits:

...along with some related minor changes to get things working smoothly.

Copy link

I pulled your branch to investigate the error, but I ran into this:

[INFO] [java] Exception in thread "main" java.io.FileNotFoundException: C:\Users\****\oss\htmlparser\html5lib-tests\encoding (Acceso denegado)
[INFO] [java] at java.base/java.io.FileInputStream.open0(Native Method)
[INFO] [java] at java.base/java.io.FileInputStream.open(FileInputStream.java:212)
[INFO] [java] at java.base/java.io.FileInputStream.<init>(FileInputStream.java:154)
[INFO] [java] at java.base/java.io.FileInputStream.<init>(FileInputStream.java:109)
[INFO] [java] at nu.validator.htmlparser.test.EncodingTester.main(EncodingTester.java:116)

which is caused by ant giving a directory name to EncodingTester while that class expects a filename. Obviously the FileInputStream is not happy.

Is that branch fully updated?

Copy link
Contributor

Indeed, EncodingTester hasn't been adapted yet to take a directory, though the others have ( see 62baf8a )

Copy link

For the record, I was able to execute the tests without the fork="true" and all I had to do is to update the version of the maven-antrun-plugin from 1.7 to 3.0.0. Updating the maven-bundle-plugin from 2.3.7 to 5.1.1 doesn't hurt either.

sideshowbarker reacted with thumbs up emoji

Copy link
Member Author

For the record, I was able to execute the tests without the fork="true" and all I had to do is to update the version of the maven-antrun-plugin from 1.7 to 3.0.0

Yes — I changed the version locally, and removed the fork="true" attributes, and confirmed it works as expected.

Updating the maven-bundle-plugin from 2.3.7 to 5.1.1 doesn't hurt either.

OK, went ahead and made that change too.

Pushed the updates to this branch. Thanks again very much.

Copy link
Member Author

Well I just now see that dropping fork="true" causes the Java 8 build to fail.

https://github.com/validator/htmlparser/pull/44/checks?check_run_id=1015950576

Maybe I should just drop Java 8 from the test matrix? Or do we want to be testing Java 8?

Copy link

carlosame commented Aug 22, 2020
edited
Loading

Maybe I should just drop Java 8 from the test matrix? Or do we want to be testing Java 8?

If you test with Java 8, ant is going to look for tools.jar which you removed from classpath in d0dd5ab.

Testing with version 11 and then the latest JDK (14 currently) makes sense, Java 8 is possibly unnecessary.

sideshowbarker reacted with thumbs up emoji

Copy link

Incidentally, in css4j I'm compiling my tests for target 8 due to a JPMS bug in the maven-compiler-plugin.

That bug is caused by using test classes in one module from the test classes in another module (yes the test-jar is listed as a dependency), and in principle this project has nothing close to that so it should not be affected.

@sideshowbarker sideshowbarker force-pushed the sideshowbarker/test-automation branch 7 times, most recently from 1b971e8 to 016b9ce Compare August 23, 2020 22:30
Copy link
Member Author

So as y’all may have noticed, I changed the code in this PR branch to remove the kludgey AntRun additions I’d made, and instead use @anthonyvdotbe’s Html5libTest class.

That approach currently requires adding an extra config setting, to teach Surefire where the html5lib-tests directory is.

That config setting would be unnecessary if/when we end up moving to a new directory layout that follows Maven conventions. But after we do ever get there, the POM can be updated to drop the extra config setting.

@anthonyvdotbe Can you confirm you’re agreeing to license the Html5libTest code under the existing project license in the https://github.com/validator/htmlparser/blob/master/LICENSE.txt file? (It’s essentially the MIT license.)

If so, do you want to have a Copyright (c) 2020 Anthony Vanelverdinghe line included in the license/copyright header comment in the source — along with a Copyright (c) 2020 Mozilla Foundation line?

Or are you instead OK with the license header having only the Copyright (c) 2020 Mozilla Foundation line? (In other words, in that case, could you confirm that you’re waiving your right to have your name included in the copyright statement.)

Copy link
Contributor

@anthonyvdotbe Can you confirm you’re agreeing to license the Html5libTest code under the existing project license in the https://github.com/validator/htmlparser/blob/master/LICENSE.txt file? (It’s essentially the MIT license.)

Yes, I confirm that I agree.

Or are you instead OK with the license header having only the Copyright (c) 2020 Mozilla Foundation line? (In other words, in that case, could you confirm that you’re waiving your right to have your name included in the copyright statement.)

I confirm that I'm waiving my right to have my name included in the copyright statement.

Copy link

I changed the code in this PR branch to remove the kludgey AntRun additions I’d made, and instead use @anthonyvdotbe’s Html5libTest class.

The conversion of all the tests (including Html5libTest) to JUnit is something that could be done after this PR is merged, and mostly looks like a search-and-replace trick.

@sideshowbarker sideshowbarker force-pushed the sideshowbarker/test-automation branch 6 times, most recently from f6a787f to d84dbbb Compare September 4, 2020 04:41
sideshowbarker and others added 15 commits September 4, 2020 14:25
This change causes com.sun.tools to be included as a dependency
only if the JDK version is 1.8.
Otherwise, without this change, the build fails when run under any
Java version later than Java 8.
This change causes the Maven AntRun plugin to invoke the javac and java
commands with fork=true when run in a Java 8 environment.
Otherwise, without this change, the build fails when run under Java 8.
This change updates the TokenizerTester code to expect its input test
data to have the string "Data state" to identify PCDATA tests — rather
than the string "DATA state".
The test data in the html5lib-tests suite uses "Data state", so without
this change, running TokenizerTester against html5lib-tests causes
TokenizerTester to fail with a "Broken test data" harness failure.
This change makes TokenizerTester, TreeTester, and EncodingTester exit
with status code 1 if any test cases fail.
Otherwise, without this change, we won’t catch the test failures when
running tests under CI.
This change makes TokenizerTester, TreeTester, and EncodingTester stop
emitting the word "Success" to standard error every time a test passes.
Otherwise, without this change, in test output, we end up with so many
"Success" lines that the actual test failures are effectively obscured
(you have to scroll back through the output/log to hunt for failures).
This change makes the sniffing limit in HtmlInputStreamReader settable.
Without this change, the HtmlInputStreamReader sniffing limit is
hardcoded to 1024 — and in the context of testing, that has the effect
of limiting HtmlInputStreamReader to only being useful for testing
expected output of the meta prescan.
So this change makes it possible for HtmlInputStreamReader to also be
used for testing the results for the state where the expected character
encoding is not limited to what can be determined by checking the first
1024 bytes of the input stream.
This change updates EncodingTester to make it test the result for cases
when the expected character encoding is not limited to what can be
determined by checking only the first 1024 bytes of the input stream.
Otherwise, without this change, EncodingTester is limited to only being
useful for testing the output of the meta prescan.
This change makes the TokenizerTester(InputStream stream) constructor
public — as the corresponding constructors for TreeTester and
EncodingTester already are.
This change refines how Html5libTest handles filenames; it adds a
mechanism to allow a required/expected file extension to be specified
for each test type, and uses the mechanism to specify that ".test" is
the required/expected extension for tokenizer tests, and that ".dat" is
the required/expected extension for tree-construction test files and for
encoding test files.
Without this change, Html5libTest only deals correctly with the ".test"
extension for tokenizer test files — but not with the ".dat" extension
for tree-construction test files and encoding test files.
This change makes Html5libTest correctly handle tests in the
html5lib-tests suite which have cases with so-called "double-escaped"
"input" and "output" values — for example, values that contain the
literals "\\u0000" and "\\uFFFD" rather than "\u0000" and "\uFFFD".
This change replaces Path.of() calls in Html5libTest with Paths.get().
Per https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/Path.html#of(java.net.URI)
Path.of() was introduced in Java 11. So Java 8 has no Path.of(); see
also https://docs.oracle.com/javase/8/docs/api/java/nio/file/Path.html
We need to continue to support Java 8 for the time being. It seems
Paths.get() will eventually end up being formally deprecated; by the
time it finally is, we may also be able to quit supporting Java 8 — and
so we can just switch to Path.of() then.
Copy link

I see that you re-enabled testing for Java 8, which is a good thing (commits 05b9024 and a27c7b6), but using Java 8 is unlikely to get along very well with the current modularization patches (as long as there is the requirement that the minimum JDK to process the POM is 11). And running automated testing with Java 11 or higher is going to be a quite different game once the project is modularized.

It's not that you cannot do either of those things (you can even use Java 8 if the POM is complicated enough) but modularization is highly coupled with testing, and setting up an automated testing infrastructure without accounting for modules has the potential of being a waste of time.

IMHO either this PR or the modularization PR should be merged as soon as possible (BTW after making any necessary changes), and then the other PR can rebase on top of it. Yes I assume that you can't do a lot either way, just making sure that you are aware of this.

Copy link
Member Author

@carlosame Thanks for the heads-up about the issue dealing with Java 8 in the face of the modularization patches.

It’d be pretty disappointing if Maven, in all its baroque complexity, didn’t provide a way to build a modularized jar for current Java versions while also remaining able to ensure the code at least continues to compile under Java 8.

I personally am not willing to cavalierly ignore all the users who possibly might be constrained to needing to use the parser in a Java 8 environment. I don’t have a way to measure how many such users the parser might have. For all I know, it could be there are so few users of the parser that need Java 8 compatibility that we don’t need to worry about it. Or it could be that there are some important users for whom we’d be abandoning when we rightly shouldn’t be.

But if we can’t even ourselves continue to have an automated way to test that the code continues compiles under Java 8, then we have no way to know when some code change we made has broken compatibility with Java 8.

Copy link
Member Author

On further reflection, it occurs to me to ask: As far as building with Maven goes, couldn’t we just have one pom.xml for building/testing the "normal" build for Java 11+, and then a second pom.xml for building/testing the Java 8 build?

Copy link

ensure the code at least continues to compile under Java 8

Hmm that's not what I was meaning. The modularization patches in PR #43 generate bytecode for Java 8 (IMHO should be 7) except for the module-info file. One thing is "compiling for Java 8", another is "testing with Java 8". In fact, the most likely outcome for this project's modularization is that the actual bytecode being tested will be version 8 but the JDK used during the tests is 11, 15 or whatever.

Actually, your current patches in this PR are always testing Java 8 bytecode, you are just producing and testing that level-8 bytecode with different JDKs. And source-level compatibility is set to 8 as well.

Copy link

Additional clarification to my previous post: the idea is that Java 8 users will be able to use the compiled jar files, but won't be able to build them. A modular JDK is required to build.

If you still want to test with Java 8 and want to avoid undesirable POM hacks, you could set up a separate test project that would depend on a level-8 test-jar produced by the main build. But I see no real reason to do that.

Copy link
Member Author

I believe everything in this PR branch is now also in the main branch — so I’m going ahead and closing this.

@sideshowbarker sideshowbarker deleted the sideshowbarker/test-automation branch September 2, 2021 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

1 more reviewer

@carlosame carlosame carlosame left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /