Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Optimise module to filename #4600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
guibou wants to merge 4 commits into master
base: master
Choose a base branch
Loading
from optimise_module_to_filename
Draft

Conversation

Copy link
Collaborator

@guibou guibou commented May 26, 2025

(only latests commit are relevant, the other are from other MR which should be merged before this work is merged. Or we can rebase)

This is related to #4598.

This changes the file to module associating logic done during dependency
graph building.

Before, each time a module Foo.Bar is found, HLS is testing inside all
the import path for the existence of a relevant fiel.. It means that for
i import paths and m modules to locate, m * n filesystem
operations are done. Note also that this involves a lot of complex
string concatenation primitive to build the FilePath.

A module is tested for each import for each of the file of the
project. We also test for boot files, doubling the number of test.

In #4598 we have a project with 1100 modules, in more than 250 import
paths and we count more than 17000 import statments, resulting on
over 6 millions test for file existences. This project was blocking for
more than 3 minutes during HLS startup.

This commit changes the way this is computed:

  • At startup, a Map ModuleName FilePath (the real type is a bit more
    involved for performance, multiples unit and boot files handling) is
    built by scanning all the import paths for files representing the
    different modules.
  • Directory scanning is efficient and if import path only contains
    haskell module, this will never do more job that listing the files of
    the project.
  • The lookup is now simplify a Map lookup.

The performance improvement is as follows:

  • The number of IO operation is dramatically reduced, from multiples
    millions to a few recursive directories listing.
  • A lot of the boilerplate of converting path had be removed.
  • TODO: add an RTS stats before / after with number of allocations
  • On my project, the graph building time is reduced from a few minutes
    to 3s.

Limitations:

  • How to rebuild the Map if the content of one directory change?
  • If one directory is filled with millions of files which are not of
    interested, performance can be damaged. TODO: add a diagnostic during
    this phase so the user can learn about this issue.

Code status:

  • The lookup is not fully restored, especially it does not include the
    handling of home unit as well as reexport.
  • The initialisation phase is cached inside a TVar stored as a top
    level identifier using unsafePerformIO. This is to be improved.

soulomoon reacted with rocket emoji
Copy link
Collaborator Author

guibou commented May 28, 2025

I'm running this code in production on our large codebase for the 3rd day.

a) My users are super happy, a few quotes:

image

(it was more than 5 minutes before ;)

image

image

Ok, that's all for the shameless plug ;)

I think I observed a few reload / performance issues which are not related and would definitely do some followup issues. I'm investigating (this is difficult to describe right now. Looks like HLS is not able to give diagnostic back if it is typechecking a file which depends on the file I just modified and if this typechecking takes time).

I also observed a surprising behavior, I need to investigate if that's linked to my MR or to something else:

  • I had started HLS in a context where one of the dependencies (from the package db) was version X (which actually exposes a reexported symbol. e.g. Database.Selda which reexports Text).
  • I then had changed my working copy to a version of my code which uses a new version of Database.Selda, without the reexport of Text, and my code was updated to import Data.Text (Text) everywhere it was previously using the reexport.
  • From my editor, I restarted hls
  • Warnings started to appear in HLS about redundant import for import Data.Text (Text) (which was true with the previous version of Selda). Note that the warning appeared on files which are NOT open in the editor, but HLS got notified about the change and reevaluated them.
  • Opening the file in the editor, the warning disappears.

It looks like a multi-home issue, see. #3237, which would make sense considering that I had done nothing to fix multiple home. But this is still rather surprising because HLS was supposed to restart and hence should have completely ignored the old selda version. Maybe it picked some information from the cache.

Copy link
Collaborator

soulomoon commented May 28, 2025
edited
Loading

Maybe it picked some information from the cache

We handle FOI(File of interest) and none-FOI differently.
See getModIfaceRule. This might be causing what you see @guibou

Copy link
Collaborator Author

guibou commented May 29, 2025

We handle FOI(File of interest) and none-FOI differently.
See getModIfaceRule. This might be causing what you see @guibou

@soulomoon Thank you. This is indeed interesting. I will confirm that the bug I'm observing is not linked to my change and open an issue. This is not a dramatic issue (e.g. it is not everyday that you restart the editor with breaking changes in the packages from the package database), but it is not convenient and can be confusing.

Copy link
Collaborator Author

guibou commented May 31, 2025

I've found a bug, when looking for module in the import path, an exception is raised if the import path is empty.

This is related to #4598.
This changes the file to module associating logic done during dependency
graph building.
Before, each time a module `Foo.Bar` is found, HLS is testing inside all
the import path for the existence of a relevant fiel.. It means that for
`i` import paths and `m` modules to locate, `m * n` filesystem
operations are done. Note also that this involves a lot of complex
string concatenation primitive to build the `FilePath`.
A module is tested for each `import` for each of the file of the
project. We also test for `boot` files, doubling the number of test.
In #4598 we have a project with `1100` modules, in more than 250 import
paths and we count more than `17000` `import` statments, resulting on
over 6 millions test for file existences. This project was blocking for
more than 3 minutes during HLS startup.
This commit changes the way this is computed:
- At startup, a `Map ModuleName FilePath` (the real type is a bit more
 involved for performance, multiples unit and boot files handling) is
 built by scanning all the import paths for files representing the
 different modules.
- Directory scanning is efficient and if import path only contains
 haskell module, this will never do more job that listing the files of
 the project.
- The lookup is now simplify a `Map` lookup.
The performance improvement is as follows:
- The number of IO operation is dramatically reduced, from multiples
 millions to a few recursive directories listing.
- A lot of the boilerplate of converting path had be removed.
- TODO: add an RTS stats before / after with number of allocations
- On my project, the graph building time is reduced from a few minutes
 to 3s.
Limitations:
- How to rebuild the `Map` if the content of one directory change?
- If one directory is filled with millions of files which are not of
 interested, performance can be damaged. TODO: add a diagnostic during
 this phase so the user can learn about this issue.
Code status:
- The `lookup` is not fully restored, especially it does not include the
 handling of home unit as well as reexport.
- The initialisation phase is cached inside a `TVar` stored as a top
 level identifier using `unsafePerformIO`. This is to be improved.
A note about performance
Most users won't see the benefits of these change, but I think they
apply to everbody:
- We are still doing 1 lookup per `import` per module. But the lookup
 result is not multiples IO, so it should be faster by a large amount.
- Most project only have 1 (or a few) import paths so won't benefit as
 dramatically as me from this.
TODO for allocations
It does not seem to be used and apparently had never been used (after a
quick look in the repository diff).
Ask m.pickering about this.
@guibou guibou force-pushed the optimise_module_to_filename branch from 2d8386c to fdf3085 Compare September 21, 2025 11:59
Copy link
Collaborator Author

guibou commented Sep 21, 2025

I've found a bug, when looking for module in the import path, an exception is raised if the import path is empty.

Fixed in latest commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Reviewers

@berberman berberman Awaiting requested review from berberman berberman will be requested when the pull request is marked ready for review berberman is a code owner

@michaelpj michaelpj Awaiting requested review from michaelpj michaelpj will be requested when the pull request is marked ready for review michaelpj is a code owner

@wz1000 wz1000 Awaiting requested review from wz1000 wz1000 will be requested when the pull request is marked ready for review wz1000 is a code owner

At least 1 approving review is required to merge this pull request.

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants

AltStyle によって変換されたページ (->オリジナル) /