Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit ab06e51

Browse files
committed
refactor: WIP. Module name to filepath optimisation
This is related to #4598. This changes the file to module associating logic done during dependency graph building. Before, each time a module `Foo.Bar` is found, HLS is testing inside all the import path for the existence of a relevant fiel.. It means that for `i` import paths and `m` modules to locate, `m * n` filesystem operations are done. Note also that this involves a lot of complex string concatenation primitive to build the `FilePath`. A module is tested for each `import` for each of the file of the project. We also test for `boot` files, doubling the number of test. In #4598 we have a project with `1100` modules, in more than 250 import paths and we count more than `17000` `import` statments, resulting on over 6 millions test for file existences. This project was blocking for more than 3 minutes during HLS startup. This commit changes the way this is computed: - At startup, a `Map ModuleName FilePath` (the real type is a bit more involved for performance, multiples unit and boot files handling) is built by scanning all the import paths for files representing the different modules. - Directory scanning is efficient and if import path only contains haskell module, this will never do more job that listing the files of the project. - The lookup is now simplify a `Map` lookup. The performance improvement is as follows: - The number of IO operation is dramatically reduced, from multiples millions to a few recursive directories listing. - A lot of the boilerplate of converting path had be removed. - TODO: add an RTS stats before / after with number of allocations - On my project, the graph building time is reduced from a few minutes to 3s. Limitations: - How to rebuild the `Map` if the content of one directory change? - If one directory is filled with millions of files which are not of interested, performance can be damaged. TODO: add a diagnostic during this phase so the user can learn about this issue. Code status: - The `lookup` is not fully restored, especially it does not include the handling of home unit as well as reexport. - The initialisation phase is cached inside a `TVar` stored as a top level identifier using `unsafePerformIO`. This is to be improved.
1 parent 321e12e commit ab06e51

File tree

5 files changed

+96
-16
lines changed

5 files changed

+96
-16
lines changed

‎ghcide/ghcide.cabal‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ library
107107
, unliftio-core
108108
, unordered-containers >=0.2.10.0
109109
, vector
110+
, pretty-simple
110111

111112
if os(windows)
112113
build-depends: Win32

‎ghcide/src/Development/IDE/Core/RuleTypes.hs‎

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -392,6 +392,9 @@ type instance RuleResult GetModSummary = ModSummaryResult
392392
-- | Generate a ModSummary with the timestamps and preprocessed content elided, for more successful early cutoff
393393
type instance RuleResult GetModSummaryWithoutTimestamps = ModSummaryResult
394394

395+
type instance RuleResult GetModulesPaths = (M.Map ModuleName (UnitId, NormalizedFilePath),
396+
M.Map ModuleName (UnitId, NormalizedFilePath))
397+
395398
data GetParsedModule = GetParsedModule
396399
deriving (Eq, Show, Generic)
397400
instance Hashable GetParsedModule
@@ -494,6 +497,13 @@ data GetModSummaryWithoutTimestamps = GetModSummaryWithoutTimestamps
494497
instance Hashable GetModSummaryWithoutTimestamps
495498
instance NFData GetModSummaryWithoutTimestamps
496499

500+
-- | Scan all the import directory for existing modules and build a map from
501+
-- module name to paths
502+
data GetModulesPaths = GetModulesPaths
503+
deriving (Eq, Show, Generic)
504+
instance Hashable GetModulesPaths
505+
instance NFData GetModulesPaths
506+
497507
data GetModSummary = GetModSummary
498508
deriving (Eq, Show, Generic)
499509
instance Hashable GetModSummary

‎ghcide/src/Development/IDE/Core/Rules.hs‎

Lines changed: 55 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
{-# LANGUAGE CPP #-}
55
{-# LANGUAGE DuplicateRecordFields #-}
66
{-# LANGUAGE TypeFamilies #-}
7+
{-# LANGUAGE PartialTypeSignatures #-}
78

89
-- | A Shake implementation of the compiler service, built
910
-- using the "Shaker" abstraction layer for in-memory use.
@@ -93,7 +94,7 @@ import Data.Proxy
9394
import qualified Data.Text as T
9495
import qualified Data.Text.Encoding as T
9596
import qualified Data.Text.Utf16.Rope.Mixed as Rope
96-
import Data.Time (UTCTime (..))
97+
import Data.Time (UTCTime (..), getCurrentTime, diffUTCTime)
9798
import Data.Time.Clock.POSIX (posixSecondsToUTCTime)
9899
import Data.Tuple.Extra
99100
import Data.Typeable (cast)
@@ -173,6 +174,12 @@ import System.Info.Extra (isWindows)
173174

174175
import qualified Data.IntMap as IM
175176
import GHC.Fingerprint
177+
import Text.Pretty.Simple
178+
import qualified Data.Map.Strict as Map
179+
import System.FilePath (takeExtension, takeFileName, normalise, dropTrailingPathSeparator, dropExtension, splitDirectories)
180+
import Data.Char (isUpper)
181+
import System.Directory.Extra (listFilesRecursive, listFilesInside)
182+
import System.IO.Unsafe
176183

177184
data Log
178185
= LogShake Shake.Log
@@ -311,6 +318,7 @@ getParsedModuleDefinition packageState opt file ms = do
311318
getLocatedImportsRule :: Recorder (WithPriority Log) -> Rules ()
312319
getLocatedImportsRule recorder =
313320
define (cmapWithPrio LogShake recorder) $ \GetLocatedImports file -> do
321+
314322
ModSummaryResult{msrModSummary = ms} <- use_ GetModSummaryWithoutTimestamps file
315323
(KnownTargets targets targetsMap) <- useNoFile_ GetKnownTargets
316324
let imports = [(False, imp) | imp <- ms_textual_imps ms] ++ [(True, imp) | imp <- ms_srcimps ms]
@@ -333,8 +341,11 @@ getLocatedImportsRule recorder =
333341
| otherwise = do
334342
itExists <- getFileExists nfp
335343
return $ if itExists then Just nfp else Nothing
344+
345+
moduleMaps <- use_ GetModulesPaths file
336346
(diags, imports') <- fmap unzip $ forM imports $ \(isSource, (mbPkgName, modName)) -> do
337-
diagOrImp <- locateModule (hscSetFlags dflags env) import_dirs (optExtensions opt) getTargetFor modName mbPkgName isSource
347+
348+
diagOrImp <- locateModule moduleMaps (hscSetFlags dflags env) import_dirs (optExtensions opt) getTargetFor modName mbPkgName isSource
338349
case diagOrImp of
339350
Left diags -> pure (diags, Just (modName, Nothing))
340351
Right (FileImport path) -> pure ([], Just (modName, Just path))
@@ -624,6 +635,43 @@ getModuleGraphRule recorder = defineEarlyCutOffNoFile (cmapWithPrio LogShake rec
624635
fs <- toKnownFiles <$> useNoFile_ GetKnownTargets
625636
dependencyInfoForFiles (HashSet.toList fs)
626637

638+
{-# NOINLINE cacheVar #-}
639+
cacheVar = unsafePerformIO (newTVarIO mempty)
640+
641+
getModulesPathsRule :: Recorder (WithPriority Log) -> Rules ()
642+
getModulesPathsRule recorder = defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetModulesPaths file -> do
643+
env_eq <- use_ GhcSession file
644+
645+
cache <- liftIO (readTVarIO cacheVar)
646+
case Map.lookup (envUnique env_eq) cache of
647+
Just res -> pure (mempty, ([], Just res))
648+
Nothing -> do
649+
let env = hscEnv env_eq
650+
let import_dirs = map (second homeUnitEnv_dflags) $ hugElts $ hsc_HUG env
651+
opt <- getIdeOptions
652+
let exts = (optExtensions opt)
653+
let acceptedExtensions = concatMap (\x -> ['.':x, '.':x <> "-boot"]) exts
654+
655+
(unzip -> (a, b)) <- flip mapM import_dirs $ \(u, dyn) -> do
656+
(unzip -> (a, b)) <- flip mapM (importPaths dyn) $ \dir' -> do
657+
let dir = dropTrailingPathSeparator dir'
658+
let predicate path = pure (path == dir || isUpper (head (takeFileName path)))
659+
let dir_number_directories = length (splitDirectories dir)
660+
let toModule file = mkModuleName (intercalate "." $ drop dir_number_directories (splitDirectories (dropExtension file)))
661+
662+
-- TODO: we are taking/droping extension, this could be factorized to save a few cpu cycles ;)
663+
-- TODO: do acceptedextensions needs to be a set ? or a vector?
664+
modules <- fmap (\path -> (toModule path, toNormalizedFilePath' path)) . filter (\y -> takeExtension y `elem` acceptedExtensions) <$> liftIO (listFilesInside predicate dir)
665+
let isSourceModule (_, path) = "-boot" `isSuffixOf` fromNormalizedFilePath path
666+
let (sourceModules, notSourceModules) = partition isSourceModule modules
667+
pure $ (Map.fromList notSourceModules, Map.fromList sourceModules)
668+
pure (fmap (u,) $ mconcat a, fmap (u, ) $ mconcat b)
669+
670+
let res = (mconcat a, mconcat b)
671+
liftIO $ atomically $ modifyTVar' cacheVar (Map.insert (envUnique env_eq) res)
672+
673+
pure (mempty, ([], Just $ (mconcat a, mconcat b)))
674+
627675
getModuleGraphSingleFileRule :: Recorder (WithPriority Log) -> Rules ()
628676
getModuleGraphSingleFileRule recorder =
629677
defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetFileModuleGraph file -> do
@@ -632,8 +680,12 @@ getModuleGraphSingleFileRule recorder =
632680

633681
dependencyInfoForFiles :: [NormalizedFilePath] -> Action (BS.ByteString, DependencyInformation)
634682
dependencyInfoForFiles fs = do
683+
-- liftIO $ print ("fs length", length fs)
635684
(rawDepInfo, bm) <- rawDependencyInformation fs
685+
-- liftIO $ print ("ok with raw deps")
686+
-- liftIO $ pPrint rawDepInfo
636687
let (all_fs, _all_ids) = unzip $ HM.toList $ pathToIdMap $ rawPathIdMap rawDepInfo
688+
-- liftIO $ print ("all_fs length", length all_fs)
637689
msrs <- uses GetModSummaryWithoutTimestamps all_fs
638690
let mss = map (fmap msrModSummary) msrs
639691
let deps = map (\i -> IM.lookup (getFilePathId i) (rawImports rawDepInfo)) _all_ids
@@ -1232,6 +1284,7 @@ mainRule recorder RulesConfig{..} = do
12321284
getModIfaceRule recorder
12331285
getModSummaryRule templateHaskellWarning recorder
12341286
getModuleGraphRule recorder
1287+
getModulesPathsRule recorder
12351288
getModuleGraphSingleFileRule recorder
12361289
getFileHashRule recorder
12371290
knownFilesRule recorder

‎ghcide/src/Development/IDE/Import/FindImports.hs‎

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55

66
module Development.IDE.Import.FindImports
77
( locateModule
8-
, locateModuleFile
98
, Import(..)
109
, ArtifactsLocation(..)
1110
, modSummaryToArtifactsLocation
@@ -14,9 +13,8 @@ module Development.IDE.Import.FindImports
1413
) where
1514

1615
import Control.DeepSeq
17-
import Control.Monad.Extra
1816
import Control.Monad.IO.Class
19-
import Data.List (find, isSuffixOf)
17+
import Data.List (isSuffixOf)
2018
import Data.Maybe
2119
import qualified Data.Set as S
2220
import Development.IDE.GHC.Compat as Compat
@@ -26,7 +24,8 @@ import Development.IDE.Types.Diagnostics
2624
import Development.IDE.Types.Location
2725
import GHC.Types.PkgQual
2826
import GHC.Unit.State
29-
import System.FilePath
27+
import Data.Map.Strict (Map)
28+
import qualified Data.Map.Strict as Map
3029

3130

3231
#if MIN_VERSION_ghc(9,11,0)
@@ -70,6 +69,7 @@ data LocateResult
7069
| LocateFoundReexport UnitId
7170
| LocateFoundFile UnitId NormalizedFilePath
7271

72+
{-
7373
-- | locate a module in the file system. Where we go from *daml to Haskell
7474
locateModuleFile :: MonadIO m
7575
=> [(UnitId, [FilePath], S.Set ModuleName)]
@@ -94,6 +94,7 @@ locateModuleFile import_dirss exts targetFor isSource modName = do
9494
maybeBoot ext
9595
| isSource = ext ++ "-boot"
9696
| otherwise = ext
97+
-}
9798

9899
-- | This function is used to map a package name to a set of import paths.
99100
-- It only returns Just for unit-ids which are possible to import into the
@@ -110,36 +111,47 @@ mkImportDirs _env (i, flags) = Just (i, (importPaths flags, reexportedModules fl
110111
-- Haskell
111112
locateModule
112113
:: MonadIO m
113-
=> HscEnv
114+
=> (Map ModuleName (UnitId, NormalizedFilePath),Map ModuleName (UnitId, NormalizedFilePath))
115+
-> HscEnv
114116
-> [(UnitId, DynFlags)] -- ^ Import directories
115117
-> [String] -- ^ File extensions
116118
-> (ModuleName -> NormalizedFilePath -> m (Maybe NormalizedFilePath)) -- ^ does file exist predicate
117119
-> Located ModuleName -- ^ Module name
118120
-> PkgQual -- ^ Package name
119121
-> Bool -- ^ Is boot module
120122
-> m (Either [FileDiagnostic] Import)
121-
locateModule env comp_info exts targetFor modName mbPkgName isSource = do
123+
locateModule moduleMaps@(moduleMap, moduleMapSource) env comp_info exts targetFor modName mbPkgName isSource = do
122124
case mbPkgName of
123125
-- 'ThisPkg' just means some home module, not the current unit
124126
ThisPkg uid
127+
-- TODO: there are MANY lookup on import_paths, which is a problem considering that it can be large.
125128
| Just (dirs, reexports) <- lookup uid import_paths
126-
-> lookupLocal uid dirs reexports
129+
-> lookupLocal moduleMaps uid dirs reexports
127130
| otherwise -> return $ Left $ notFoundErr env modName $ LookupNotFound []
128131
-- if a package name is given we only go look for a package
129132
OtherPkg uid
130133
| Just (dirs, reexports) <- lookup uid import_paths
131-
-> lookupLocal uid dirs reexports
134+
-> lookupLocal moduleMaps uid dirs reexports
132135
| otherwise -> lookupInPackageDB
133136
NoPkgQual -> do
134137

135138
-- Reexports for current unit have to be empty because they only apply to other units depending on the
136139
-- current unit. If we set the reexports to be the actual reexports then we risk looping forever trying
137140
-- to find the module from the perspective of the current unit.
138-
mbFile <- locateModuleFile ((homeUnitId_ dflags, importPaths dflags, S.empty) : other_imports) exts targetFor isSource $ unLoc modName
141+
---- locateModuleFile ((homeUnitId_ dflags, importPaths dflags, S.empty) : other_imports) exts targetFor isSource $ unLoc modName
142+
--
143+
-- TODO: handle the other imports, the unit id, ..., reexport.
144+
-- - Previous implementation was using homeUnitId dflags
145+
-- - Handle the -boot
146+
-- - Have a look at "targetFor"
147+
--
148+
let mbFile = case Map.lookup (unLoc modName) (if isSource then moduleMapSource else moduleMap) of
149+
Nothing -> LocateNotFound
150+
Just (uid, file) -> LocateFoundFile uid file
139151
case mbFile of
140152
LocateNotFound -> lookupInPackageDB
141153
-- Lookup again with the perspective of the unit reexporting the file
142-
LocateFoundReexport uid -> locateModule (hscSetActiveUnitId uid env) comp_info exts targetFor modName noPkgQual isSource
154+
LocateFoundReexport uid -> locateModule moduleMaps (hscSetActiveUnitId uid env) comp_info exts targetFor modName noPkgQual isSource
143155
LocateFoundFile uid file -> toModLocation uid file
144156
where
145157
dflags = hsc_dflags env
@@ -180,12 +192,16 @@ locateModule env comp_info exts targetFor modName mbPkgName isSource = do
180192
let genMod = mkModule (RealUnit $ Definite uid) (unLoc modName) -- TODO support backpack holes
181193
return $ Right $ FileImport $ ArtifactsLocation file (Just loc) (not isSource) (Just genMod)
182194

183-
lookupLocal uid dirs reexports = do
184-
mbFile <- locateModuleFile [(uid, dirs, reexports)] exts targetFor isSource $ unLoc modName
195+
lookupLocal moduleMaps@(moduleMapSource, moduleMap) uid dirs reexports = do
196+
error "MOXOOO"
197+
-- mbFile <- locateModuleFile [(uid, dirs, reexports)] exts targetFor isSource $ unLoc modName
198+
let mbFile = case Map.lookup (unLoc modName) (if isSource then moduleMapSource else moduleMap) of
199+
Nothing -> LocateNotFound
200+
Just (uid, file) -> LocateFoundFile uid file
185201
case mbFile of
186202
LocateNotFound -> return $ Left $ notFoundErr env modName $ LookupNotFound []
187203
-- Lookup again with the perspective of the unit reexporting the file
188-
LocateFoundReexport uid' -> locateModule (hscSetActiveUnitId uid' env) comp_info exts targetFor modName noPkgQual isSource
204+
LocateFoundReexport uid' -> locateModule moduleMaps (hscSetActiveUnitId uid' env) comp_info exts targetFor modName noPkgQual isSource
189205
LocateFoundFile uid' file -> toModLocation uid' file
190206

191207
lookupInPackageDB = do

‎ghcide/src/Development/IDE/Types/HscEnvEq.hs‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{-# LANGUAGE CPP #-}
22
module Development.IDE.Types.HscEnvEq
33
( HscEnvEq,
4-
hscEnv, newHscEnvEq,
4+
hscEnv, newHscEnvEq, envUnique,
55
updateHscEnvEq,
66
envPackageExports,
77
envVisibleModuleNames,

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /