Search code, repositories, users, issues, pull requests...

Copy link

Contributor

@asurkov asurkov commented Aug 17, 2017 •

edited

Loading

here's a new version. not yet finished (I still need to sort out a rename thing), but I think it's ready for more feedback: code style and alg comments are welcome. I'm also not yet clear what exactly we want to land. I guess all those prints/test methods should go away or do we want to keep them until the code integrated and possibly covered by unit testing?

I didn't tried it on something heavy yet, so we might come with something more memory thrifty, for example, replace those line huge arrays on continues line chucks, if we find out it's a problem.


 prototype code lifetime

47bbf5f

@asurkov asurkov requested review from astansler and sergey48k

August 17, 2017 02:27

sergey48k

sergey48k requested changes

Aug 17, 2017

Copy link

Member

@sergey48k sergey48k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the first pass. I don't believe I quite understand the algorithm, but I think it can be done simpler. I'll get back with more feedback after a new revision is uploaded.

@@ -0,0 +1,231 @@

Copy link

Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

author

Copy link

Contributor Author

@asurkov asurkov Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np, I'll file a follow up email on it

src/main/kotlin/app/CodeLongevity.kt Outdated

class CodeLine(val from: RevCommitLine, val to: RevCommitLine, val text: String) {

// XXX(Alex): oldId and newId may be computed as a hash built from commit,

Copy link

Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO
the idea is that we can grep for TODO to find nuggets like that, even though it's not 100% actionable.

Copy link

Contributor Author

@asurkov asurkov Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/main/kotlin/app/CodeLongevity.kt Outdated

/**

* Id of the code line in a revision when the line was added. Used to

* update the line's lifetime computed in previous iterations.

Copy link

Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are 'previous iterations'?

src/main/kotlin/app/CodeLongevity.kt Outdated

compute()

}

fun test(email: String) {

Copy link

Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should go to separate unit tests, and they should be based on JUnit.

Copy link

Contributor Author

@asurkov asurkov Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a testing suit :) check out the pull request description, I left a more generic question about it

src/main/kotlin/app/CodeLongevity.kt Outdated

// Build a map of file names and their code lines.

while (treeWalk.next()) {

// TODO(alex): skip binary files.

Copy link

Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this TODO needs to be addressed before this goes to production.

Copy link

Contributor Author

@asurkov asurkov Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean the patch cannot be landed at master before it's addressed?

src/main/kotlin/app/CodeLongevity.kt Outdated

val insCount = edit.getLengthB()

println("del ($delStart, $delEnd), ins ($insStart, $insEnd)");

// Deletion case. Chaise down the deleted lines through the

Copy link

Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chase

* update the line's lifetime computed in previous iterations.

val oldId: String = ""

Copy link

Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see these oldId and newId being used at all. What are they for?

* TODO(Alex): the text arg is solely for testing proposes (remove it)

class CodeLine(val from: RevCommitLine, val to: RevCommitLine, val text: String) {

Copy link

Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since our goal is to compute a distribution of line ages per author, I don't believe you need this class at all. Your algo only adds entries to it, and your test code iterates through all CodeLine instances, without regard to from/to, etc. You could simply keep track of author->array of line ages.

Copy link

Contributor Author

@asurkov asurkov Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a bunch of related comments: this one, about oldId/newId one, probably the one about the alg is more complicated than it should be and what I'm doing about lines that are not yet deleted and couple more; I'll try to answer them here.

Existing lines are reported too, their age is the head's date minus date when they were inserted. When you got a living line and stored its age somewhere, then you will have to update its age on next iteration. That's why I have oldId and newId: newId in the head from previous iteration matches the oldId in the tail in a next iteration.

src/main/kotlin/app/CodeLongevity.kt Outdated

codeLines.add(cl)

idx++

}

newLines.subList(insStart, insEnd).clear()

Copy link

Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, at this point, it's too many comments inline for me to understand what the code does :). I will wait for another iteration, and then comment more.

src/main/kotlin/app/CodeLongevity.kt Outdated

}

while (parentCommit != null && parentCommit != tail)

// If tail revision is specified then we have file code lines unclaimed,

Copy link

Member

@sergey48k sergey48k Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how do we treat lines that are still alive in the head revision?


 support files deletion and renaming

c52048a

astansler

astansler requested changes

Aug 18, 2017

src/main/kotlin/app/CodeLongevity.kt Outdated

/**

* Scans through the repo for alive and deleted code lines, and stores them

* in the |codeLines| list.

Copy link

Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use square brackets to link to property.

src/main/kotlin/app/CodeLongevity.kt Outdated

// File was deleted, put its lines into the files map.

if (diff.changeType == DiffEntry.ChangeType.DELETE) {

var lines = getFileLines(oldId, oldPath, commit)!!

Copy link

Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you prefer to throw NPE here?

Copy link

Contributor Author

@asurkov asurkov Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't happen because we filter out binary files above so I use !! to terminate the program since something bad and unpredictable happening. Is NPE more suitable for this kind of things?

src/main/kotlin/app/CodeLongevity.kt Outdated

commit: RevCommit) : ArrayList<RevCommitLine>? {

val fileLoader = repo.open(fileId)

if (RawText.isBinary(fileLoader.openStream())) {

return null

Copy link

Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe empty array is better?

Copy link

Contributor Author

@asurkov asurkov Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels more complicated with me, null is sort of convenient way to report about error. I'm good with empty array as a result, however I would say the code will be less efficient. I don't think though it will be ever a bottleneck. So if you really prefer your way, then I can do it, otherwise I would keep it as is.

Copy link

Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to say that ideology of a Kotlin is to get rid of a null references so I don't think that using null where it can be empty list is a good practice at least stdlib don't do it.

Kotlin's type system is aimed at eliminating the danger of null references from code, also known as the The Billion Dollar Mistake. Null Safety

Copy link

Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you double checking for binary format, in a mean of function reusability I think. If you want to do so isn't it better to throw exception IllegalStateException("Binary file format") it will terminate execution, it's self-explanatory and doesn't affect other code.

Copy link

Contributor Author

@asurkov asurkov Aug 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kotlin indeed eliminated the danger of null references, and there's no way I could introduce it myself writing in kotlin :) the worst thing can happen is a program termination if I do !!

I can go with empty arrays, it's rather a matter of style, but it's ok.
I like the self explaining exceptions too, will use it

Copy link

Member

@astansler astansler Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you mean by shouldn't fail?

Copy link

Contributor Author

@asurkov asurkov Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant no exception. At first we collect the lines, and all binary files are ignored (no exception), then we reuse the function to collect the lines for deleted files, and no binary files is expected here, because we filtered them above, so if we run into a binary file at this point, then threre's something wrong with code above, that's why !! or exceptions.

I could inline the body of the helper function, if you like that more, at least that would address your concerns about null and !!. It might not look less nicer, but I can live with that.

Copy link

Member

@astansler astansler Aug 23, 2017 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically what you do:

fun getFileLines(fileId, filePath, commit) : ArrayList? {
 if (RawText.isBinary(fileLoader.openStream())) {
 return null
 }
}
val lines = getFileLines(fileId, filePath, commit)
if (lines == null) { // !!
 throw NullPointerException()
}

What you can do:

fun getFileLines(fileId, filePath, commit) : ArrayList {
 if (RawText.isBinary(fileLoader.openStream())) {
 throw IllegalStateException()
 }
}
val lines = getFileLines(fileId, filePath, commit) // Lines never null.

In that way when calling helper function there is no need to deal with nulls and exceptions anymore.

Copy link

Member

@sergey48k sergey48k Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, after carefully reading this debate, I'd like to break the tie. I agree with @anatolykislov for the following reasons: (i) it's better to have a descriptive exception, and (ii) any function needs to be written defensively and not assume the caller did a good job.

Copy link

Contributor Author

@asurkov asurkov Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm told twice that I'm wrong that I has to be wrong, let me defend myself though :).

I'm not sure how I break (ii) since Kotlin takes care of it on a compiling stage, and not sure why we have to have an exception in the first place (i). I think the concept of exceptions is not quite suitable for this case: exceptions is a good way to say something wrong happens, mostly to react to unexpected (bad) input. If you need to catch a problem in your own code, then assertions concept is used. The assertions are seen in debug build only and totally ignored in release builds, and thus they have zero impact (cost) for release builds. Assertions unlike exceptions are used for something that should never happen, to debug your app. I believe that this case suits better for assertions than exceptions, regardless this concept is present in Kotlin or not.

So getFileLines is called twice: a) when the files map is created and b) when deleted files are processed. I'd say assertions is more appropriate for the latter case, and no exceptions is for the first case, otherwise we end up with try/catch constructions, which is somewhat ugly, because binary files is a normal thing for a common repo. You don't suggest to use exceptions mechanism as a return value, like hey here's is a binary file, and thus no lines for you, right?

Anyway, let me unwrap body of this function right into code, those few lines are not worth such long debates and probably don't even deserve its own function, at least it will be less ambiguous and not so concernful.

* identify a line and update its lifetime computed at the previous

* iteration.

val oldId: String = ""

Copy link

Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we don't send lines to server oldId and newId is hashes of commits in which they appeared and deleted?
Do we want to send lines to server?

Copy link

Contributor Author

@asurkov asurkov Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the server should keep (hash, age) pairs, which are used for age calculation for sure and to update age of living lines on a subsequent run. The server may but doesn't have to keep the records of deleted lines.

Copy link

Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of storing them on the server? We could use local storage.

Copy link

Member

@sergey48k sergey48k Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we should avoid sending anything to server except for stuff we can't do without. we don't need some blogger saying to everyone that sourcerer uploads per-line information of your proprietary code. if it can be done locally, let's do it locally.

Copy link

Contributor Author

@asurkov asurkov Aug 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see how commit info is different from line info when uploaded from a local machine in terms of blogger's speculations, but I guess lines can be stored and processed locally if we can set up a local DB for it.

Copy link

Member

@sergey48k sergey48k Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no line info uploads to server. Local DB is fine assuming it does not introduce new dependencies to the app. A plain text file is probably the best.

Copy link

Contributor Author

@asurkov asurkov Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok, however it's out of scope of this pull request, which was about alg to compute lines age. So, what's left before we can call it done? I assume, revert Main.kt changes, remove testing func, anything else? Or do you want to extend the scope to support lines age storage?

Copy link

Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading it now.

Copy link

Contributor Author

@asurkov asurkov Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case, it's not a wanted change, right? :)

src/main/kotlin/app/CodeLongevity.kt Outdated

class CodeLongevity(repoPath: String, tailRev: String) {

val repo = FileRepository(repoPath)

val head: RevCommit = RevWalk(repo).parseCommit(repo.resolve("HEAD"));

Copy link

Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be master branch. If no master presented then no computation.

Copy link

Contributor Author

@asurkov asurkov Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have on top of your head how to get a master branch? getOriginHead apparently doesn't work for local repos.

Copy link

Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repo.resolve("refs/heads/master")

// A step back in commits history. Update the files map according

// to the diff.

val diffs = df.scan(parentCommit, commit)

Copy link

Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is scan works if parentCommit is null? Probably precondition while would be more readable and no additional test would be required.

Copy link

Contributor Author

@asurkov asurkov Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the way to obtain a first commit changes / could you give an example?

Copy link

Member

@astansler astansler Aug 18, 2017 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

var commit: RevCommit? = revWalk.next()
while (commit != null && commit != tail) {
 var parentCommit: RevCommit? = revWalk.next()
 ...
 commit = parentCommit
}

src/main/kotlin/app/CodeLongevity.kt Outdated

var insStart = edit.getBeginB()

var insEnd = edit.getEndB()

val insCount = edit.getLengthB()

println("del ($delStart, $delEnd), ins ($insStart, $insEnd)");

Copy link

Member

@astansler astansler Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semicolon here and below.


 address reviewer's comments

c461a24

sergey48k

sergey48k requested changes

Copy link

Member

@sergey48k sergey48k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this actually looks very good. But I found a couple of things I'd like to see changed, and I would like the problem with the exception in getFileLines to be resolved.

* identify a line and update its lifetime computed at the previous

* iteration.

val oldId: String = ""

Copy link

Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading it now.

src/main/kotlin/app/CodeLongevity.kt Outdated

/**

* A pretty print of a code line; debugging.

fun printme() {

Copy link

Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, you could make toString() which would make it more standard. I guess.

revWalk.markStart(head)

var commit: RevCommit? = revWalk.next() // move the walker to the head

while (commit != null && commit != tail) {

Copy link

Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably just while (commit != tail) will work just as well.

Copy link

Contributor Author

@asurkov asurkov Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tail it not empty and not a tail (just a random revision), then we have to crash, no?

Copy link

Member

@astansler astansler Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asurkov yes, you right, keep it then

src/main/kotlin/app/CodeLongevity.kt Outdated

while (commit != null && commit != tail) {

var parentCommit: RevCommit? = revWalk.next()

println("commit: ${commit!!.getName()}; '${commit.getShortMessage()}'")

Copy link

Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably prints will become logs or will go away.

Copy link

Contributor Author

@asurkov asurkov Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a quick hint for me: what goes into logs?

src/main/kotlin/app/CodeLongevity.kt Outdated

// Skip binary files.

var fileId =

if (newPath != DiffEntry.DEV_NULL) newId else oldId

Copy link

Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this fit one line?

Copy link

Contributor Author

@asurkov asurkov Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you know that?! it's exactly 80 chars

files.put(oldPath, lines)

}

val path = if (newPath != DiffEntry.DEV_NULL) newPath else oldPath

Copy link

Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here. It's not entirely clear what DEV_NULL is.

src/main/kotlin/app/CodeLongevity.kt Outdated

if (delCount > 0) {

var tmpLines = ArrayList<RevCommitLine>(delCount)

var idx = delStart

while (idx < delEnd) {

Copy link

Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while loop here but for loop on line 214? I sense inconsistency!

Copy link

Contributor Author

@asurkov asurkov Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do you find those!

src/main/kotlin/app/CodeLongevity.kt Outdated

lines.addAll(delStart, tmpLines)

}

// Insertion case. Report it.

Copy link

Member

@sergey48k sergey48k Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Track it" maybe?

src/main/kotlin/app/CodeLongevity.kt Outdated

commit: RevCommit) : ArrayList<RevCommitLine>? {

val fileLoader = repo.open(fileId)

if (RawText.isBinary(fileLoader.openStream())) {

return null

Copy link

Member

@sergey48k sergey48k Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/main/kotlin/app/CodeLongevity.kt Outdated

/**

* Returns lines array of a text file of the given revision.

private fun getFileLines(fileId: ObjectId, filePath: String,

Copy link

Member

@sergey48k sergey48k Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename this? Usually 'get' accompanies functions that are 'light' in computing. This does a lot, so maybe readFileLines? or collectFileLines?


 address more comments

d6ddc17

sergey48k

sergey48k approved these changes

Copy link

Member

@sergey48k sergey48k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

if (diff.changeType == DiffEntry.ChangeType.DELETE) {

// Text files only at this point, throw if not.

var lines = getFileLines(oldId, oldPath, commit)!!

val fileLoader = repo.open(oldId)

Copy link

Member

@sergey48k sergey48k Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually liked it better in a separate function.

Copy link

Contributor Author

@asurkov asurkov Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a positive side, there's not double binary check anymore

astansler

astansler approved these changes

continue

}

line.printme()

println(line.toString())

Copy link

Member

@astansler astansler Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In prod logger should be used for debug output.

Copy link

Contributor Author

@asurkov asurkov Aug 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove those on integration stage I think

asurkov added 2 commits

August 24, 2017 19:34


 rollback Main.kt changes

1943bd3


 rollback Main.kt changes, part2

@asurkov asurkov merged commit e3ca0f0 into master