Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 46cde80

Browse files
Merge pull request avinashkranjan#144 from kazuyoshi-tech/feature-document-summary
add document summary creater
2 parents a633552 + 710cb93 commit 46cde80

File tree

7 files changed

+129
-0
lines changed

7 files changed

+129
-0
lines changed

‎DocumentSummaryCreater/README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Document-Summary-Creater
2+
A python script to create a sentence summary
3+
4+
## Prerequisites
5+
##### This script needs Python 3.*
6+
7+
pip install these libraries from requirements.txt
8+
* sumy
9+
* spacy
10+
* neologdn
11+
12+
and run the command to download some libraries
13+
14+
```bash
15+
$ python -m spacy download en_core_web_sm
16+
$ python -c "import nltk; nltk.download('punkt')"
17+
```
18+
19+
## Usage:
20+
* Run main.py and enter the path of the text file
21+
* After that, a text file that summarizes the read text file into two tenths is created

‎DocumentSummaryCreater/main.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
from summary_make import summarize_sentences
2+
3+
def main():
4+
filepath = input("please input text's filepath->")
5+
with open(filepath) as f:
6+
sentences = f.readlines()
7+
sentences = ' '.join(sentences)
8+
9+
summary = summarize_sentences(sentences)
10+
11+
filepath_index = filepath.find('.txt')
12+
outputpath = filepath[:filepath_index]+'_summary.txt'
13+
14+
with open(outputpath, 'w') as w:
15+
for sentence in summary:
16+
w.write(str(sentence)+'\n')
17+
18+
if __name__ == "__main__":
19+
main()

‎DocumentSummaryCreater/origin_text.txt

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
Water is an inorganic, transparent, tasteless, odorless, and nearly colourless chemical substance, which is the main constituent of Earth's hydrosphere and the fluids of all known living organisms. It is vital for all known forms of life, even though it provides no calories or organic nutrients. Its chemical formula is H2O, meaning that each of its molecules contains one oxygen and two hydrogen atoms, connected by covalent bonds.
2+
3+
"Water" is the name of the liquid state of H2O at standard ambient temperature and pressure. It forms precipitation in the form of rain and aerosols in the form of fog. Clouds are formed from suspended droplets of water and ice, its solid state. When finely divided, crystalline ice may precipitate in the form of snow. The gaseous state of water is steam or water vapor. Water moves continually through the water cycle of evaporation, transpiration (evapotranspiration), condensation, precipitation, and runoff, usually reaching the sea.
4+
5+
Water covers 71% of the Earth's surface, mostly in seas and oceans.[1] Small portions of water occur as groundwater (1.7%), in the glaciers and the ice caps of Antarctica and Greenland (1.7%), and in the air as vapor, clouds (formed of ice and liquid water suspended in air), and precipitation (0.001%).[2][3]
6+
7+
Water plays an important role in the world economy. Approximately 70% of the freshwater used by humans goes to agriculture.[4] Fishing in salt and fresh water bodies is a major source of food for many parts of the world. Much of the long-distance trade of commodities (such as oil, natural gas, and manufactured products) is transported by boats through seas, rivers, lakes, and canals. Large quantities of water, ice, and steam are used for cooling and heating, in industry and homes. Water is an excellent solvent for a wide variety of substances both mineral and organic; as such it is widely used in industrial processes, and in cooking and washing. Water, ice and snow are also central to many sports and other forms of entertainment, such as swimming, pleasure boating, boat racing, surfing, sport fishing, diving, ice skating and skiing.
8+
9+
The word water comes from Old English wæter, from Proto-Germanic *watar (source also of Old Saxon watar, Old Frisian wetir, Dutch water, Old High German wazzar, German Wasser, vatn, Gothic 𐍅𐌰𐍄𐍉 (wato), from Proto-Indo-European *wod-or, suffixed form of root *wed- ("water"; "wet").[5] Also cognate, through the Indo-European root, with Greek ύδωρ (ýdor), Russian вода́ (vodá), Irish uisce, and Albanian ujë.
10+
11+
Water (H
12+
2O) is a polar inorganic compound that is at room temperature a tasteless and odorless liquid, nearly colorless with a hint of blue. This simplest hydrogen chalcogenide is by far the most studied chemical compound and is described as the "universal solvent" for its ability to dissolve many substances.[6][7] This allows it to be the "solvent of life":[8] indeed, water as found in nature almost always includes various dissolved substances, and special steps are required to obtain chemically pure water. Water is the only common substance to exist as a solid, liquid, and gas in normal terrestrial conditions.[9]
13+
14+
Along with oxidane, water is one of the two official names for the chemical compound H
15+
2O;[10] it is also the liquid phase of H
16+
2O.[11] The other two common states of matter of water are the solid phase, ice, and the gaseous phase, water vapor or steam. The addition or removal of heat can cause phase transitions: freezing (water to ice), melting (ice to water), vaporization (water to vapor), condensation (vapor to water), sublimation (ice to vapor) and deposition (vapor to ice).
17+
18+
Water differs from most liquids in that it becomes less dense as it freezes.[14] In 1 atm pressure, it reaches its maximum density of 1,000 kg/m3 (62.43 lb/cu ft) at 3.98 °C (39.16 °F).[15] The density of ice is 917 kg/m3 (57.25 lb/cu ft), an expansion of 9%.[16][17] This expansion can exert enormous pressure, bursting pipes and cracking rocks (see Frost weathering).[18]
19+
20+
In a lake or ocean, water at 4 °C sinks to the bottom and ice forms on the surface, floating on the liquid water. This ice insulates the water below, preventing it from freezing solid. Without this protection, most aquatic organisms would perish during the winter.
21+
22+
On a pressure/temperature phase diagram (see figure), there are curves separating solid from vapor, vapor from liquid, and liquid from solid. These meet at a single point called the triple point, where all three phases can coexist. The triple point is at a temperature of 273.16 K (0.01 °C) and a pressure of 611.657 pascals (0.00604 atm);[30] it is the lowest pressure at which liquid water can exist. Until 2019, the triple point was used to define the Kelvin temperature scale.[31][32]
23+
24+
The water/vapor phase curve terminates at 647.096 K (373.946 °C; 705.103 °F) and 22.064 megapascals (3,200.1 psi; 217.75 atm).[33] This is known as the critical point. At higher temperatures and pressures the liquid and vapor phases form a continuous phase called a supercritical fluid. It can be gradually compressed or expanded between gas-like and liquid-like densities, its properties (which are quite different from those of ambient water) are sensitive to density. For example, for suitable pressures and temperatures it can mix freely with nonpolar compounds, including most organic compounds. This makes it useful in a variety of applications including high-temperature electrochemistry and as an ecologically benign solvent or catalyst in chemical reactions involving organic compounds. In Earth's mantle, it acts as a solvent during mineral formation, dissolution and deposition.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Water " is the name of the liquid state of H2O at standard ambient temperature and pressure .
2+
Clouds are formed from suspended droplets of water and ice , its solid state .
3+
Water covers 71 % of the Earth 's surface , mostly in seas and oceans.
4+
Water is an excellent solvent for a wide variety of substances both mineral and organic ; as such it is widely used in industrial processes , and in cooking and washing .
5+
This simplest hydrogen chalcogenide is by far the most studied chemical compound and is described as the " universal solvent " for its ability to dissolve many substances.
6+
[11 ] The other two common states of matter of water are the solid phase , ice , and the gaseous phase , water vapor or steam .
7+
The addition or removal of heat can cause phase transitions : freezing ( water to ice ) , melting ( ice to water ) , vaporization ( water to vapor ) , condensation ( vapor to water ) , sublimation ( ice to vapor ) and deposition ( vapor to ice ) .
8+
On a pressure / temperature phase diagram ( see figure ) , there are curves separating solid from vapor , vapor from liquid , and liquid from solid .
9+
The triple point is at a temperature of 273.16 K ( 0.01 ° C ) and a pressure of 611.657 pascals ( 0.00604 atm);[30 ] it is the lowest pressure at which liquid water can exist .
10+
At higher temperatures and pressures the liquid and vapor phases form a continuous phase called a supercritical fluid .

‎DocumentSummaryCreater/preprocessing.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
import spacy
2+
import neologdn
3+
4+
class EnglishCorpus:
5+
# Preparation of morphological analyzer
6+
def __init__(self):
7+
self.nlp = spacy.load("en_core_web_sm")
8+
9+
# Pre-processing of line breaks and special characters
10+
def preprocessing(self, text:str) -> str:
11+
text = text.replace("\n", "")
12+
text = neologdn.normalize(text)
13+
14+
return text
15+
16+
# Divide sentences into sentences while retaining the results of morphological analysis
17+
def make_sentence_list(self, sentences:str) -> list:
18+
doc = self.nlp(sentences)
19+
self.ginza_sents_object = doc.sents
20+
sentence_list = [s for s in doc.sents]
21+
22+
return sentence_list
23+
24+
# Put a space between words
25+
def make_corpus(self) -> list:
26+
corpus = []
27+
for s in self.ginza_sents_object:
28+
tokens = [str(t) for t in s]
29+
corpus.append(" ".join(tokens))
30+
31+
return corpus

‎DocumentSummaryCreater/requirements.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
sumy==0.8.1
2+
spacy==2.3.2
3+
neologdn==0.4

‎DocumentSummaryCreater/summary_make.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
from preprocessing import EnglishCorpus
2+
3+
from sumy.parsers.plaintext import PlaintextParser
4+
from sumy.nlp.tokenizers import Tokenizer
5+
from sumy.utils import get_stop_words
6+
from sumy.summarizers.lex_rank import LexRankSummarizer
7+
8+
def summarize_sentences(sentences:str, language="english") -> list:
9+
# Preparation sentences
10+
corpus_maker = EnglishCorpus()
11+
preprocessed_sentences = corpus_maker.preprocessing(sentences)
12+
preprocessed_sentence_list = corpus_maker.make_sentence_list(preprocessed_sentences)
13+
corpus = corpus_maker.make_corpus()
14+
parser = PlaintextParser.from_string(" ".join(corpus), Tokenizer(language))
15+
16+
# Call the summarization algorithm and do the summarization
17+
summarizer = LexRankSummarizer()
18+
summarizer.stop_words = get_stop_words(language)
19+
summary = summarizer(document=parser.document, sentences_count=len(corpus)*2//10)
20+
21+
return summary

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /