You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+10-4Lines changed: 10 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,11 +2,15 @@
2
2
3
3
**code-bert present version is available for Linux and Mac only. We are working on the Windows release. Please hang on**
4
4
5
+
codeBERT is a package to **automatically review you code documentation**. codeBERT currently works for Python code.
6
+
7
+
🔨 Given a function body `f` as a string of code tokens (including special tokens such as `indent` and `dedent`) and a doc string `d` as a string of Natual Language tokens. Predict whether `f` and `d` are assciated or not (meaning, whether they represent the same concept or not)
8
+
5
9
This is [CodistAI](https://codist-ai.com/) open source version to easily use the fine tuned model based on our open source MLM code model [codeBERT-small-v2](https://huggingface.co/codistai/codeBERT-small-v2)
6
10
7
11
[codeBERT-small-v2](https://huggingface.co/codistai/codeBERT-small-v2) is a RoBERTa model, trained using Hugging Face Transformer library and then we have fine tuned the model on the task of predicting the following -
8
12
9
-
Given a function body `f` as a string of code tokens (including special tokens such as `indent` and `dedent`) and a doc string `d` as a string of Natual Language tokens. Predict whether `f` and `d` are assciated or not (meaning, whether they represent the same concept or not)
13
+
10
14
11
15
## An example
12
16
@@ -25,7 +29,7 @@ def get_file(filename):
25
29
26
30
```
27
31
28
-
Using our another open source library [tree-hugger](https://github.com/autosoft-dev/tree-hugger) it is fairly trivial to get the code and separate out the function body and the docstring with a single API call.
32
+
💡 Using our another open source library [tree-hugger](https://github.com/autosoft-dev/tree-hugger) it is fairly trivial to get the code and separate out the function body and the docstring with a single API call.
29
33
30
34
We can use then, the [`process_code`](https://github.com/autosoft-dev/code-bert/blob/2dd35f16fa2cdb96f75e21bb0a9393aa3164d885/code_bert/core/data_reader.py#L136) method from this prsent repo to process the code lines in the proper format as [codeBERT-small-v2](https://huggingface.co/codistai/codeBERT-small-v2) would want.
31
35
@@ -84,7 +88,7 @@ So, let's say you have a directory called `test_files` with some python files in
84
88
85
89
A prompt will appear to confirm the model location. Once you confirm that then the algorithm will take one file at a time and analyze that, recursively on the whole directory.
86
90
87
-
It should produce a report like the following -
91
+
🏆 It should produce a report like the following -
88
92
89
93
90
94
```
@@ -107,4 +111,6 @@ Function "get_file" with Dcostring """opens a url"""
0 commit comments