Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

prototype code lifetime #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
asurkov merged 6 commits into master from clnext
Aug 24, 2017
Merged

prototype code lifetime #17

asurkov merged 6 commits into master from clnext
Aug 24, 2017

Conversation

@asurkov
Copy link
Contributor

@asurkov asurkov commented Aug 17, 2017
edited
Loading

here's a new version. not yet finished (I still need to sort out a rename thing), but I think it's ready for more feedback: code style and alg comments are welcome. I'm also not yet clear what exactly we want to land. I guess all those prints/test methods should go away or do we want to keep them until the code integrated and possibly covered by unit testing?

I didn't tried it on something heavy yet, so we might come with something more memory thrifty, for example, replace those line huge arrays on continues line chucks, if we find out it's a problem.

Copy link
Member

@sergey48k sergey48k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the first pass. I don't believe I quite understand the algorithm, but I think it can be done simpler. I'll get back with more feedback after a new revision is uploaded.

@@ -0,0 +1,231 @@
// Copyright 2017 Sourcerer Inc. All Rights Reserved.
Copy link
Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

author

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np, I'll file a follow up email on it

*/
class CodeLine(val from: RevCommitLine, val to: RevCommitLine, val text: String) {

// XXX(Alex): oldId and newId may be computed as a hash built from commit,
Copy link
Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO
the idea is that we can grep for TODO to find nuggets like that, even though it's not 100% actionable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np


/**
* Id of the code line in a revision when the line was added. Used to
* update the line's lifetime computed in previous iterations.
Copy link
Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are 'previous iterations'?

compute()
}

fun test(email: String) {
Copy link
Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should go to separate unit tests, and they should be based on JUnit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a testing suit :) check out the pull request description, I left a more generic question about it


// Build a map of file names and their code lines.
while (treeWalk.next()) {
// TODO(alex): skip binary files.
Copy link
Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this TODO needs to be addressed before this goes to production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean the patch cannot be landed at master before it's addressed?

val insCount = edit.getLengthB()
println("del ($delStart, $delEnd), ins ($insStart, $insEnd)");

// Deletion case. Chaise down the deleted lines through the
Copy link
Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chase

* update the line's lifetime computed in previous iterations.
*
*/
val oldId: String = ""
Copy link
Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see these oldId and newId being used at all. What are they for?

*
* TODO(Alex): the text arg is solely for testing proposes (remove it)
*/
class CodeLine(val from: RevCommitLine, val to: RevCommitLine, val text: String) {
Copy link
Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since our goal is to compute a distribution of line ages per author, I don't believe you need this class at all. Your algo only adds entries to it, and your test code iterates through all CodeLine instances, without regard to from/to, etc. You could simply keep track of author->array of line ages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a bunch of related comments: this one, about oldId/newId one, probably the one about the alg is more complicated than it should be and what I'm doing about lines that are not yet deleted and couple more; I'll try to answer them here.

Existing lines are reported too, their age is the head's date minus date when they were inserted. When you got a living line and stored its age somewhere, then you will have to update its age on next iteration. That's why I have oldId and newId: newId in the head from previous iteration matches the oldId in the tail in a next iteration.

codeLines.add(cl)
idx++
}
newLines.subList(insStart, insEnd).clear()
Copy link
Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, at this point, it's too many comments inline for me to understand what the code does :). I will wait for another iteration, and then comment more.

}
while (parentCommit != null && parentCommit != tail)

// If tail revision is specified then we have file code lines unclaimed,
Copy link
Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how do we treat lines that are still alive in the head revision?


/**
* Scans through the repo for alive and deleted code lines, and stores them
* in the |codeLines| list.
Copy link
Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use square brackets to link to property.


// File was deleted, put its lines into the files map.
if (diff.changeType == DiffEntry.ChangeType.DELETE) {
var lines = getFileLines(oldId, oldPath, commit)!!
Copy link
Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you prefer to throw NPE here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't happen because we filter out binary files above so I use !! to terminate the program since something bad and unpredictable happening. Is NPE more suitable for this kind of things?

commit: RevCommit) : ArrayList<RevCommitLine>? {
val fileLoader = repo.open(fileId)
if (RawText.isBinary(fileLoader.openStream())) {
return null
Copy link
Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe empty array is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels more complicated with me, null is sort of convenient way to report about error. I'm good with empty array as a result, however I would say the code will be less efficient. I don't think though it will be ever a bottleneck. So if you really prefer your way, then I can do it, otherwise I would keep it as is.

Copy link
Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to say that ideology of a Kotlin is to get rid of a null references so I don't think that using null where it can be empty list is a good practice at least stdlib don't do it.

Kotlin's type system is aimed at eliminating the danger of null references from code, also known as the The Billion Dollar Mistake. Null Safety

sergey48k reacted with thumbs up emoji
Copy link
Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you double checking for binary format, in a mean of function reusability I think. If you want to do so isn't it better to throw exception IllegalStateException("Binary file format") it will terminate execution, it's self-explanatory and doesn't affect other code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kotlin indeed eliminated the danger of null references, and there's no way I could introduce it myself writing in kotlin :) the worst thing can happen is a program termination if I do !!

I can go with empty arrays, it's rather a matter of style, but it's ok.
I like the self explaining exceptions too, will use it

Copy link
Member

@astansler astansler Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you mean by shouldn't fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant no exception. At first we collect the lines, and all binary files are ignored (no exception), then we reuse the function to collect the lines for deleted files, and no binary files is expected here, because we filtered them above, so if we run into a binary file at this point, then threre's something wrong with code above, that's why !! or exceptions.

I could inline the body of the helper function, if you like that more, at least that would address your concerns about null and !!. It might not look less nicer, but I can live with that.

Copy link
Member

@astansler astansler Aug 23, 2017
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically what you do:

fun getFileLines(fileId, filePath, commit) : ArrayList? {
 if (RawText.isBinary(fileLoader.openStream())) {
 return null
 }
}
val lines = getFileLines(fileId, filePath, commit)
if (lines == null) { // !!
 throw NullPointerException()
}

What you can do:

fun getFileLines(fileId, filePath, commit) : ArrayList {
 if (RawText.isBinary(fileLoader.openStream())) {
 throw IllegalStateException()
 }
}
val lines = getFileLines(fileId, filePath, commit) // Lines never null.

In that way when calling helper function there is no need to deal with nulls and exceptions anymore.

Copy link
Member

@sergey48k sergey48k Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, after carefully reading this debate, I'd like to break the tie. I agree with @anatolykislov for the following reasons: (i) it's better to have a descriptive exception, and (ii) any function needs to be written defensively and not assume the caller did a good job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm told twice that I'm wrong that I has to be wrong, let me defend myself though :).

I'm not sure how I break (ii) since Kotlin takes care of it on a compiling stage, and not sure why we have to have an exception in the first place (i). I think the concept of exceptions is not quite suitable for this case: exceptions is a good way to say something wrong happens, mostly to react to unexpected (bad) input. If you need to catch a problem in your own code, then assertions concept is used. The assertions are seen in debug build only and totally ignored in release builds, and thus they have zero impact (cost) for release builds. Assertions unlike exceptions are used for something that should never happen, to debug your app. I believe that this case suits better for assertions than exceptions, regardless this concept is present in Kotlin or not.

So getFileLines is called twice: a) when the files map is created and b) when deleted files are processed. I'd say assertions is more appropriate for the latter case, and no exceptions is for the first case, otherwise we end up with try/catch constructions, which is somewhat ugly, because binary files is a normal thing for a common repo. You don't suggest to use exceptions mechanism as a return value, like hey here's is a binary file, and thus no lines for you, right?

Anyway, let me unwrap body of this function right into code, those few lines are not worth such long debates and probably don't even deserve its own function, at least it will be less ambiguous and not so concernful.

* identify a line and update its lifetime computed at the previous
* iteration.
*/
val oldId: String = ""
Copy link
Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we don't send lines to server oldId and newId is hashes of commits in which they appeared and deleted?
Do we want to send lines to server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the server should keep (hash, age) pairs, which are used for age calculation for sure and to update age of living lines on a subsequent run. The server may but doesn't have to keep the records of deleted lines.

Copy link
Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of storing them on the server? We could use local storage.

Copy link
Member

@sergey48k sergey48k Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we should avoid sending anything to server except for stuff we can't do without. we don't need some blogger saying to everyone that sourcerer uploads per-line information of your proprietary code. if it can be done locally, let's do it locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see how commit info is different from line info when uploaded from a local machine in terms of blogger's speculations, but I guess lines can be stored and processed locally if we can set up a local DB for it.

Copy link
Member

@sergey48k sergey48k Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no line info uploads to server. Local DB is fine assuming it does not introduce new dependencies to the app. A plain text file is probably the best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok, however it's out of scope of this pull request, which was about alg to compute lines age. So, what's left before we can call it done? I assume, revert Main.kt changes, remove testing func, anything else? Or do you want to extend the scope to support lines age storage?

Copy link
Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case, it's not a wanted change, right? :)

*/
class CodeLongevity(repoPath: String, tailRev: String) {
val repo = FileRepository(repoPath)
val head: RevCommit = RevWalk(repo).parseCommit(repo.resolve("HEAD"));
Copy link
Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be master branch. If no master presented then no computation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have on top of your head how to get a master branch? getOriginHead apparently doesn't work for local repos.

Copy link
Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repo.resolve("refs/heads/master")


// A step back in commits history. Update the files map according
// to the diff.
val diffs = df.scan(parentCommit, commit)
Copy link
Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is scan works if parentCommit is null? Probably precondition while would be more readable and no additional test would be required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the way to obtain a first commit changes / could you give an example?

Copy link
Member

@astansler astansler Aug 18, 2017
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

var commit: RevCommit? = revWalk.next()
while (commit != null && commit != tail) {
 var parentCommit: RevCommit? = revWalk.next()
 ...
 commit = parentCommit
}

var insStart = edit.getBeginB()
var insEnd = edit.getEndB()
val insCount = edit.getLengthB()
println("del ($delStart, $delEnd), ins ($insStart, $insEnd)");
Copy link
Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semicolon here and below.

Copy link
Member

@sergey48k sergey48k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this actually looks very good. But I found a couple of things I'd like to see changed, and I would like the problem with the exception in getFileLines to be resolved.

* identify a line and update its lifetime computed at the previous
* iteration.
*/
val oldId: String = ""
Copy link
Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading it now.

/**
* A pretty print of a code line; debugging.
*/
fun printme() {
Copy link
Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, you could make toString() which would make it more standard. I guess.

revWalk.markStart(head)

var commit: RevCommit? = revWalk.next() // move the walker to the head
while (commit != null && commit != tail) {
Copy link
Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably just while (commit != tail) will work just as well.

astansler reacted with thumbs up emoji
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tail it not empty and not a tail (just a random revision), then we have to crash, no?

Copy link
Member

@astansler astansler Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asurkov yes, you right, keep it then

while (commit != null && commit != tail) {
var parentCommit: RevCommit? = revWalk.next()

println("commit: ${commit!!.getName()}; '${commit.getShortMessage()}'")
Copy link
Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably prints will become logs or will go away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a quick hint for me: what goes into logs?


// Skip binary files.
var fileId =
if (newPath != DiffEntry.DEV_NULL) newId else oldId
Copy link
Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this fit one line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you know that?! it's exactly 80 chars

files.put(oldPath, lines)
}

val path = if (newPath != DiffEntry.DEV_NULL) newPath else oldPath
Copy link
Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here. It's not entirely clear what DEV_NULL is.

if (delCount > 0) {
var tmpLines = ArrayList<RevCommitLine>(delCount)
var idx = delStart
while (idx < delEnd) {
Copy link
Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while loop here but for loop on line 214? I sense inconsistency!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do you find those!

lines.addAll(delStart, tmpLines)
}

// Insertion case. Report it.
Copy link
Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Track it" maybe?

commit: RevCommit) : ArrayList<RevCommitLine>? {
val fileLoader = repo.open(fileId)
if (RawText.isBinary(fileLoader.openStream())) {
return null
Copy link
Member

@sergey48k sergey48k Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, after carefully reading this debate, I'd like to break the tie. I agree with @anatolykislov for the following reasons: (i) it's better to have a descriptive exception, and (ii) any function needs to be written defensively and not assume the caller did a good job.

/**
* Returns lines array of a text file of the given revision.
*/
private fun getFileLines(fileId: ObjectId, filePath: String,
Copy link
Member

@sergey48k sergey48k Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename this? Usually 'get' accompanies functions that are 'light' in computing. This does a lot, so maybe readFileLines? or collectFileLines?

Copy link
Member

@sergey48k sergey48k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

if (diff.changeType == DiffEntry.ChangeType.DELETE) {
// Text files only at this point, throw if not.
var lines = getFileLines(oldId, oldPath, commit)!!
val fileLoader = repo.open(oldId)
Copy link
Member

@sergey48k sergey48k Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually liked it better in a separate function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a positive side, there's not double binary check anymore

continue
}
line.printme()
println(line.toString())
Copy link
Member

@astansler astansler Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In prod logger should be used for debug output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove those on integration stage I think

@asurkov asurkov merged commit e3ca0f0 into master Aug 24, 2017
@asurkov asurkov deleted the clnext branch August 24, 2017 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@sergey48k sergey48k sergey48k approved these changes

@astansler astansler astansler approved these changes

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /