Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 51ad657

Browse files
committed
refactor: WIP. Module name to filepath optimisation
This is related to #4598. This changes the file to module associating logic done during dependency graph building. Before, each time a module `Foo.Bar` is found, HLS is testing inside all the import path for the existence of a relevant fiel.. It means that for `i` import paths and `m` modules to locate, `m * n` filesystem operations are done. Note also that this involves a lot of complex string concatenation primitive to build the `FilePath`. A module is tested for each `import` for each of the file of the project. We also test for `boot` files, doubling the number of test. In #4598 we have a project with `1100` modules, in more than 250 import paths and we count more than `17000` `import` statments, resulting on over 6 millions test for file existences. This project was blocking for more than 3 minutes during HLS startup. This commit changes the way this is computed: - At startup, a `Map ModuleName FilePath` (the real type is a bit more involved for performance, multiples unit and boot files handling) is built by scanning all the import paths for files representing the different modules. - Directory scanning is efficient and if import path only contains haskell module, this will never do more job that listing the files of the project. - The lookup is now simplify a `Map` lookup. The performance improvement is as follows: - The number of IO operation is dramatically reduced, from multiples millions to a few recursive directories listing. - A lot of the boilerplate of converting path had be removed. - TODO: add an RTS stats before / after with number of allocations - On my project, the graph building time is reduced from a few minutes to 3s. Limitations: - How to rebuild the `Map` if the content of one directory change? - If one directory is filled with millions of files which are not of interested, performance can be damaged. TODO: add a diagnostic during this phase so the user can learn about this issue. Code status: - The `lookup` is not fully restored, especially it does not include the handling of home unit as well as reexport. - The initialisation phase is cached inside a `TVar` stored as a top level identifier using `unsafePerformIO`. This is to be improved. A note about performance Most users won't see the benefits of these change, but I think they apply to everbody: - We are still doing 1 lookup per `import` per module. But the lookup result is not multiples IO, so it should be faster by a large amount. - Most project only have 1 (or a few) import paths so won't benefit as dramatically as me from this. TODO for allocations
1 parent 321e12e commit 51ad657

File tree

5 files changed

+97
-32
lines changed

5 files changed

+97
-32
lines changed

‎ghcide/ghcide.cabal‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ library
107107
, unliftio-core
108108
, unordered-containers >=0.2.10.0
109109
, vector
110+
, pretty-simple
110111

111112
if os(windows)
112113
build-depends: Win32

‎ghcide/src/Development/IDE/Core/RuleTypes.hs‎

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -392,6 +392,9 @@ type instance RuleResult GetModSummary = ModSummaryResult
392392
-- | Generate a ModSummary with the timestamps and preprocessed content elided, for more successful early cutoff
393393
type instance RuleResult GetModSummaryWithoutTimestamps = ModSummaryResult
394394

395+
type instance RuleResult GetModulesPaths = (M.Map ModuleName (UnitId, NormalizedFilePath),
396+
M.Map ModuleName (UnitId, NormalizedFilePath))
397+
395398
data GetParsedModule = GetParsedModule
396399
deriving (Eq, Show, Generic)
397400
instance Hashable GetParsedModule
@@ -494,6 +497,13 @@ data GetModSummaryWithoutTimestamps = GetModSummaryWithoutTimestamps
494497
instance Hashable GetModSummaryWithoutTimestamps
495498
instance NFData GetModSummaryWithoutTimestamps
496499

500+
-- | Scan all the import directory for existing modules and build a map from
501+
-- module name to paths
502+
data GetModulesPaths = GetModulesPaths
503+
deriving (Eq, Show, Generic)
504+
instance Hashable GetModulesPaths
505+
instance NFData GetModulesPaths
506+
497507
data GetModSummary = GetModSummary
498508
deriving (Eq, Show, Generic)
499509
instance Hashable GetModSummary

‎ghcide/src/Development/IDE/Core/Rules.hs‎

Lines changed: 58 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
{-# LANGUAGE CPP #-}
55
{-# LANGUAGE DuplicateRecordFields #-}
66
{-# LANGUAGE TypeFamilies #-}
7+
{-# LANGUAGE PartialTypeSignatures #-}
78

89
-- | A Shake implementation of the compiler service, built
910
-- using the "Shaker" abstraction layer for in-memory use.
@@ -93,7 +94,7 @@ import Data.Proxy
9394
import qualified Data.Text as T
9495
import qualified Data.Text.Encoding as T
9596
import qualified Data.Text.Utf16.Rope.Mixed as Rope
96-
import Data.Time (UTCTime (..))
97+
import Data.Time (UTCTime (..), getCurrentTime, diffUTCTime)
9798
import Data.Time.Clock.POSIX (posixSecondsToUTCTime)
9899
import Data.Tuple.Extra
99100
import Data.Typeable (cast)
@@ -173,6 +174,12 @@ import System.Info.Extra (isWindows)
173174

174175
import qualified Data.IntMap as IM
175176
import GHC.Fingerprint
177+
import Text.Pretty.Simple
178+
import qualified Data.Map.Strict as Map
179+
import System.FilePath (takeExtension, takeFileName, normalise, dropTrailingPathSeparator, dropExtension, splitDirectories)
180+
import Data.Char (isUpper)
181+
import System.Directory.Extra (listFilesRecursive, listFilesInside)
182+
import System.IO.Unsafe
176183

177184
data Log
178185
= LogShake Shake.Log
@@ -311,30 +318,21 @@ getParsedModuleDefinition packageState opt file ms = do
311318
getLocatedImportsRule :: Recorder (WithPriority Log) -> Rules ()
312319
getLocatedImportsRule recorder =
313320
define (cmapWithPrio LogShake recorder) $ \GetLocatedImports file -> do
321+
314322
ModSummaryResult{msrModSummary = ms} <- use_ GetModSummaryWithoutTimestamps file
315-
(KnownTargets targets targetsMap) <- useNoFile_ GetKnownTargets
323+
-- TODO: should we reverse this concatenation, there are way less
324+
-- source import than normal import in theory, so it should be faster
316325
let imports = [(False, imp) | imp <- ms_textual_imps ms] ++ [(True, imp) | imp <- ms_srcimps ms]
317326
env_eq <- use_ GhcSession file
318327
let env = hscEnv env_eq
319328
let import_dirs = map (second homeUnitEnv_dflags) $ hugElts $ hsc_HUG env
320329
let dflags = hsc_dflags env
321330
opt <- getIdeOptions
322-
let getTargetFor modName nfp
323-
| Just (TargetFile nfp') <- HM.lookup (TargetFile nfp) targetsMap = do
324-
-- reuse the existing NormalizedFilePath in order to maximize sharing
325-
itExists <- getFileExists nfp'
326-
return $ if itExists then Just nfp' else Nothing
327-
| Just tt <- HM.lookup (TargetModule modName) targets = do
328-
-- reuse the existing NormalizedFilePath in order to maximize sharing
329-
let ttmap = HM.mapWithKey const (HashSet.toMap tt)
330-
nfp' = HM.lookupDefault nfp nfp ttmap
331-
itExists <- getFileExists nfp'
332-
return $ if itExists then Just nfp' else Nothing
333-
| otherwise = do
334-
itExists <- getFileExists nfp
335-
return $ if itExists then Just nfp else Nothing
331+
332+
moduleMaps <- use_ GetModulesPaths file
336333
(diags, imports') <- fmap unzip $ forM imports $ \(isSource, (mbPkgName, modName)) -> do
337-
diagOrImp <- locateModule (hscSetFlags dflags env) import_dirs (optExtensions opt) getTargetFor modName mbPkgName isSource
334+
335+
diagOrImp <- locateModule moduleMaps (hscSetFlags dflags env) import_dirs (optExtensions opt) modName mbPkgName isSource
338336
case diagOrImp of
339337
Left diags -> pure (diags, Just (modName, Nothing))
340338
Right (FileImport path) -> pure ([], Just (modName, Just path))
@@ -624,6 +622,43 @@ getModuleGraphRule recorder = defineEarlyCutOffNoFile (cmapWithPrio LogShake rec
624622
fs <- toKnownFiles <$> useNoFile_ GetKnownTargets
625623
dependencyInfoForFiles (HashSet.toList fs)
626624

625+
{-# NOINLINE cacheVar #-}
626+
cacheVar = unsafePerformIO (newTVarIO mempty)
627+
628+
getModulesPathsRule :: Recorder (WithPriority Log) -> Rules ()
629+
getModulesPathsRule recorder = defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetModulesPaths file -> do
630+
env_eq <- use_ GhcSession file
631+
632+
cache <- liftIO (readTVarIO cacheVar)
633+
case Map.lookup (envUnique env_eq) cache of
634+
Just res -> pure (mempty, ([], Just res))
635+
Nothing -> do
636+
let env = hscEnv env_eq
637+
let import_dirs = map (second homeUnitEnv_dflags) $ hugElts $ hsc_HUG env
638+
opt <- getIdeOptions
639+
let exts = (optExtensions opt)
640+
let acceptedExtensions = concatMap (\x -> ['.':x, '.':x <> "-boot"]) exts
641+
642+
(unzip -> (a, b)) <- flip mapM import_dirs $ \(u, dyn) -> do
643+
(unzip -> (a, b)) <- flip mapM (importPaths dyn) $ \dir' -> do
644+
let dir = dropTrailingPathSeparator dir'
645+
let predicate path = pure (path == dir || isUpper (head (takeFileName path)))
646+
let dir_number_directories = length (splitDirectories dir)
647+
let toModule file = mkModuleName (intercalate "." $ drop dir_number_directories (splitDirectories (dropExtension file)))
648+
649+
-- TODO: we are taking/droping extension, this could be factorized to save a few cpu cycles ;)
650+
-- TODO: do acceptedextensions needs to be a set ? or a vector?
651+
modules <- fmap (\path -> (toModule path, toNormalizedFilePath' path)) . filter (\y -> takeExtension y `elem` acceptedExtensions) <$> liftIO (listFilesInside predicate dir)
652+
let isSourceModule (_, path) = "-boot" `isSuffixOf` fromNormalizedFilePath path
653+
let (sourceModules, notSourceModules) = partition isSourceModule modules
654+
pure $ (Map.fromList notSourceModules, Map.fromList sourceModules)
655+
pure (fmap (u,) $ mconcat a, fmap (u, ) $ mconcat b)
656+
657+
let res = (mconcat a, mconcat b)
658+
liftIO $ atomically $ modifyTVar' cacheVar (Map.insert (envUnique env_eq) res)
659+
660+
pure (mempty, ([], Just $ (mconcat a, mconcat b)))
661+
627662
getModuleGraphSingleFileRule :: Recorder (WithPriority Log) -> Rules ()
628663
getModuleGraphSingleFileRule recorder =
629664
defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetFileModuleGraph file -> do
@@ -632,8 +667,12 @@ getModuleGraphSingleFileRule recorder =
632667

633668
dependencyInfoForFiles :: [NormalizedFilePath] -> Action (BS.ByteString, DependencyInformation)
634669
dependencyInfoForFiles fs = do
670+
-- liftIO $ print ("fs length", length fs)
635671
(rawDepInfo, bm) <- rawDependencyInformation fs
672+
-- liftIO $ print ("ok with raw deps")
673+
-- liftIO $ pPrint rawDepInfo
636674
let (all_fs, _all_ids) = unzip $ HM.toList $ pathToIdMap $ rawPathIdMap rawDepInfo
675+
-- liftIO $ print ("all_fs length", length all_fs)
637676
msrs <- uses GetModSummaryWithoutTimestamps all_fs
638677
let mss = map (fmap msrModSummary) msrs
639678
let deps = map (\i -> IM.lookup (getFilePathId i) (rawImports rawDepInfo)) _all_ids
@@ -712,6 +751,7 @@ loadGhcSession recorder ghcSessionDepsConfig = do
712751
IdeGhcSession{loadSessionFun} <- useNoFile_ GhcSessionIO
713752
-- loading is always returning a absolute path now
714753
(val,deps) <- liftIO $ loadSessionFun $ fromNormalizedFilePath file
754+
-- TODO: this is responsible for a LOT of allocations
715755

716756
-- add the deps to the Shake graph
717757
let addDependency fp = do
@@ -1232,6 +1272,7 @@ mainRule recorder RulesConfig{..} = do
12321272
getModIfaceRule recorder
12331273
getModSummaryRule templateHaskellWarning recorder
12341274
getModuleGraphRule recorder
1275+
getModulesPathsRule recorder
12351276
getModuleGraphSingleFileRule recorder
12361277
getFileHashRule recorder
12371278
knownFilesRule recorder

‎ghcide/src/Development/IDE/Import/FindImports.hs‎

Lines changed: 27 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55

66
module Development.IDE.Import.FindImports
77
( locateModule
8-
, locateModuleFile
98
, Import(..)
109
, ArtifactsLocation(..)
1110
, modSummaryToArtifactsLocation
@@ -14,9 +13,8 @@ module Development.IDE.Import.FindImports
1413
) where
1514

1615
import Control.DeepSeq
17-
import Control.Monad.Extra
1816
import Control.Monad.IO.Class
19-
import Data.List (find, isSuffixOf)
17+
import Data.List (isSuffixOf)
2018
import Data.Maybe
2119
import qualified Data.Set as S
2220
import Development.IDE.GHC.Compat as Compat
@@ -26,7 +24,8 @@ import Development.IDE.Types.Diagnostics
2624
import Development.IDE.Types.Location
2725
import GHC.Types.PkgQual
2826
import GHC.Unit.State
29-
import System.FilePath
27+
import Data.Map.Strict (Map)
28+
import qualified Data.Map.Strict as Map
3029

3130

3231
#if MIN_VERSION_ghc(9,11,0)
@@ -70,6 +69,7 @@ data LocateResult
7069
| LocateFoundReexport UnitId
7170
| LocateFoundFile UnitId NormalizedFilePath
7271

72+
{-
7373
-- | locate a module in the file system. Where we go from *daml to Haskell
7474
locateModuleFile :: MonadIO m
7575
=> [(UnitId, [FilePath], S.Set ModuleName)]
@@ -94,6 +94,7 @@ locateModuleFile import_dirss exts targetFor isSource modName = do
9494
maybeBoot ext
9595
| isSource = ext ++ "-boot"
9696
| otherwise = ext
97+
-}
9798

9899
-- | This function is used to map a package name to a set of import paths.
99100
-- It only returns Just for unit-ids which are possible to import into the
@@ -110,36 +111,45 @@ mkImportDirs _env (i, flags) = Just (i, (importPaths flags, reexportedModules fl
110111
-- Haskell
111112
locateModule
112113
:: MonadIO m
113-
=> HscEnv
114+
=> (Map ModuleName (UnitId, NormalizedFilePath),Map ModuleName (UnitId, NormalizedFilePath))
115+
-> HscEnv
114116
-> [(UnitId, DynFlags)] -- ^ Import directories
115117
-> [String] -- ^ File extensions
116-
-> (ModuleName -> NormalizedFilePath -> m (Maybe NormalizedFilePath)) -- ^ does file exist predicate
117118
-> Located ModuleName -- ^ Module name
118119
-> PkgQual -- ^ Package name
119120
-> Bool -- ^ Is boot module
120121
-> m (Either [FileDiagnostic] Import)
121-
locateModule env comp_info exts targetFor modName mbPkgName isSource = do
122+
locateModule moduleMaps@(moduleMap, moduleMapSource) env comp_info exts modName mbPkgName isSource = do
122123
case mbPkgName of
123124
-- 'ThisPkg' just means some home module, not the current unit
124125
ThisPkg uid
126+
-- TODO: there are MANY lookup on import_paths, which is a problem considering that it can be large.
125127
| Just (dirs, reexports) <- lookup uid import_paths
126-
-> lookupLocal uid dirs reexports
128+
-> lookupLocal moduleMaps uid dirs reexports
127129
| otherwise -> return $ Left $ notFoundErr env modName $ LookupNotFound []
128130
-- if a package name is given we only go look for a package
129131
OtherPkg uid
130132
| Just (dirs, reexports) <- lookup uid import_paths
131-
-> lookupLocal uid dirs reexports
133+
-> lookupLocal moduleMaps uid dirs reexports
132134
| otherwise -> lookupInPackageDB
133135
NoPkgQual -> do
134136

135137
-- Reexports for current unit have to be empty because they only apply to other units depending on the
136138
-- current unit. If we set the reexports to be the actual reexports then we risk looping forever trying
137139
-- to find the module from the perspective of the current unit.
138-
mbFile <- locateModuleFile ((homeUnitId_ dflags, importPaths dflags, S.empty) : other_imports) exts targetFor isSource $ unLoc modName
140+
---- locateModuleFile ((homeUnitId_ dflags, importPaths dflags, S.empty) : other_imports) exts targetFor isSource $ unLoc modName
141+
--
142+
-- TODO: handle the other imports, the unit id, ..., reexport.
143+
-- - TODO: should we look for file existence now? If the file was
144+
-- removed from the disk, how will it behaves? How do we invalidate
145+
-- that?
146+
let mbFile = case Map.lookup (unLoc modName) (if isSource then moduleMapSource else moduleMap) of
147+
Nothing -> LocateNotFound
148+
Just (uid, file) -> LocateFoundFile uid file
139149
case mbFile of
140150
LocateNotFound -> lookupInPackageDB
141151
-- Lookup again with the perspective of the unit reexporting the file
142-
LocateFoundReexport uid -> locateModule (hscSetActiveUnitId uid env) comp_info exts targetFor modName noPkgQual isSource
152+
LocateFoundReexport uid -> locateModule moduleMaps (hscSetActiveUnitId uid env) comp_info exts modName noPkgQual isSource
143153
LocateFoundFile uid file -> toModLocation uid file
144154
where
145155
dflags = hsc_dflags env
@@ -180,12 +190,15 @@ locateModule env comp_info exts targetFor modName mbPkgName isSource = do
180190
let genMod = mkModule (RealUnit $ Definite uid) (unLoc modName) -- TODO support backpack holes
181191
return $ Right $ FileImport $ ArtifactsLocation file (Just loc) (not isSource) (Just genMod)
182192

183-
lookupLocal uid dirs reexports = do
184-
mbFile <- locateModuleFile [(uid, dirs, reexports)] exts targetFor isSource $ unLoc modName
193+
lookupLocal moduleMaps@(moduleMapSource, moduleMap) uid dirs reexports = do
194+
-- mbFile <- locateModuleFile [(uid, dirs, reexports)] exts targetFor isSource $ unLoc modName
195+
let mbFile = case Map.lookup (unLoc modName) (if isSource then moduleMapSource else moduleMap) of
196+
Nothing -> LocateNotFound
197+
Just (uid, file) -> LocateFoundFile uid file
185198
case mbFile of
186199
LocateNotFound -> return $ Left $ notFoundErr env modName $ LookupNotFound []
187200
-- Lookup again with the perspective of the unit reexporting the file
188-
LocateFoundReexport uid' -> locateModule (hscSetActiveUnitId uid' env) comp_info exts targetFor modName noPkgQual isSource
201+
LocateFoundReexport uid' -> locateModule moduleMaps (hscSetActiveUnitId uid' env) comp_info exts modName noPkgQual isSource
189202
LocateFoundFile uid' file -> toModLocation uid' file
190203

191204
lookupInPackageDB = do

‎ghcide/src/Development/IDE/Types/HscEnvEq.hs‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{-# LANGUAGE CPP #-}
22
module Development.IDE.Types.HscEnvEq
33
( HscEnvEq,
4-
hscEnv, newHscEnvEq,
4+
hscEnv, newHscEnvEq, envUnique,
55
updateHscEnvEq,
66
envPackageExports,
77
envVisibleModuleNames,

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /