π± > π This release is a nightly pre-release and not intended for production yet. We recommend using a new virtual environment. For more details on the new features and usage guides, see the v3 documentation.
pip install -U spacy-nightly --pre
SentenceRecognizer, Morphologizer, Lemmatizer, AttributeRuler and Transformer.DependencyMatcher for matching patterns within the dependency parse using Semgrex operators.Matcher.For more info on how to migrate from spaCy v2.x , see the detailed migration guide.
link command and shortcut names are now deprecated. There can be many different trained pipelines and not just one "English model", so you should always use the full package name like en_core_web_sm explicitly.meta.json is now only used to provide meta information like the package name, author, license and labels. It's not used to construct the processing pipeline anymore. This is all defined in the config.cfg, which also includes all settings used to train the pipeline.train, pretrain and debug data commands now only take a config.cfg.Language.add_pipe now takes the string name of the component factory instead of the component function.@Language.component or @Language.factory decorator.Language.update, Language.evaluate and TrainablePipe.update methods now all take batches of Example objects instead of Doc and GoldParse objects, or raw text and a dictionary of annotations.begin_training methods have been renamed to initialize and now take a function that returns a sequence of Example objects to initialize the model instead of a list of tuples.Matcher.add and PhraseMatcher.add now only accept a list of patterns as the second argument (instead of a variable number of arguments). The on_match callback becomes an optional keyword argument.Doc flags like Doc.is_parsed or Doc.is_tagged have been replaced by Doc.has_annotation.spacy.gold module has been renamed to spacy.training.PRON_LEMMA symbol and -PRON- as an indicator for pronoun lemmas has been removed.TAG_MAP and MORPH_RULES in the language data have been replaced by the more flexible AttributeRuler.Lemmatizer is now a standalone pipeline component and doesn't provide lemmas by default or switch automatically between lookup and rule-based lemmas. You can now add it to your pipeline explicitly and set its mode on initialization.π | Removed | Replacement |
| --- | --- |
| Language.disable_pipes | Language.select_pipes, Language.disable_pipe, Language.enable_pipe |
| Language.begin_training, Pipe.begin_training, ... | Language.initialize, Pipe.initialize, ... |
π· | Doc.is_tagged, Doc.is_parsed, ... | Doc.has_annotation |
π | GoldParse | Example |
| GoldCorpus | Corpus |
| KnowledgeBase.load_bulk, KnowledgeBase.dump | KnowledgeBase.from_disk, KnowledgeBase.to_disk |
| Matcher.pipe, PhraseMatcher.pipe | not needed |
| gold.offsets_from_biluo_tags, gold.spans_from_biluo_tags, gold.biluo_tags_from_offsets | training.biluo_tags_to_offsets, training.biluo_tags_to_spans, training.offsets_to_biluo_tags |
| spacy init-model | spacy init vectors |
| spacy debug-data | spacy debug data |
| spacy profile | spacy debug profile |
| spacy link, util.set_data_path, util.get_data_path | not needed, symlinks are deprecated |
π The following deprecated methods, attributes and arguments were removed in v3.0. Most of them have been deprecated for a while and many would previously raise errors. Many of them were also mostly internals. If you've been working with more recent versions of spaCy v2.x, it's unlikely that your code relied on them.
π | Removed | Replacement |
| --- | --- |
| Doc.tokens_from_list | Doc. __init__ |
π | Doc.merge, Span.merge | Doc.retokenize |
| Token.string, Span.string, Span.upper, Span.lower | Span.text, Token.text |
| Language.tagger, Language.parser, Language.entity | Language.get_pipe |
| keyword-arguments like vocab=False on to_disk, from_disk, to_bytes, from_bytes | exclude=["vocab"] |
| n_threads argument on Tokenizer, Matcher, PhraseMatcher | n_process |
π² | verbose argument on Language.evaluate | logging (DEBUG) |
| SentenceSegmenter hook, SimilarityHook | user hooks, Sentencizer, SentenceRecognizer |