-
Notifications
You must be signed in to change notification settings - Fork 275
prototype code lifetime #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the first pass. I don't believe I quite understand the algorithm, but I think it can be done simpler. I'll get back with more feedback after a new revision is uploaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np, I'll file a follow up email on it
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
the idea is that we can grep for TODO to find nuggets like that, even though it's not 100% actionable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are 'previous iterations'?
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should go to separate unit tests, and they should be based on JUnit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a testing suit :) check out the pull request description, I left a more generic question about it
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this TODO needs to be addressed before this goes to production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean the patch cannot be landed at master before it's addressed?
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see these oldId and newId being used at all. What are they for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since our goal is to compute a distribution of line ages per author, I don't believe you need this class at all. Your algo only adds entries to it, and your test code iterates through all CodeLine instances, without regard to from/to, etc. You could simply keep track of author->array of line ages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a bunch of related comments: this one, about oldId/newId one, probably the one about the alg is more complicated than it should be and what I'm doing about lines that are not yet deleted and couple more; I'll try to answer them here.
Existing lines are reported too, their age is the head's date minus date when they were inserted. When you got a living line and stored its age somewhere, then you will have to update its age on next iteration. That's why I have oldId and newId: newId in the head from previous iteration matches the oldId in the tail in a next iteration.
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, at this point, it's too many comments inline for me to understand what the code does :). I will wait for another iteration, and then comment more.
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how do we treat lines that are still alive in the head revision?
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use square brackets to link to property.
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you prefer to throw NPE here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That shouldn't happen because we filter out binary files above so I use !! to terminate the program since something bad and unpredictable happening. Is NPE more suitable for this kind of things?
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe empty array is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels more complicated with me, null is sort of convenient way to report about error. I'm good with empty array as a result, however I would say the code will be less efficient. I don't think though it will be ever a bottleneck. So if you really prefer your way, then I can do it, otherwise I would keep it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to say that ideology of a Kotlin is to get rid of a null references so I don't think that using null where it can be empty list is a good practice at least stdlib don't do it.
Kotlin's type system is aimed at eliminating the danger of null references from code, also known as the The Billion Dollar Mistake. Null Safety
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you double checking for binary format, in a mean of function reusability I think. If you want to do so isn't it better to throw exception IllegalStateException("Binary file format") it will terminate execution, it's self-explanatory and doesn't affect other code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kotlin indeed eliminated the danger of null references, and there's no way I could introduce it myself writing in kotlin :) the worst thing can happen is a program termination if I do !!
I can go with empty arrays, it's rather a matter of style, but it's ok.
I like the self explaining exceptions too, will use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you mean by shouldn't fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant no exception. At first we collect the lines, and all binary files are ignored (no exception), then we reuse the function to collect the lines for deleted files, and no binary files is expected here, because we filtered them above, so if we run into a binary file at this point, then threre's something wrong with code above, that's why !! or exceptions.
I could inline the body of the helper function, if you like that more, at least that would address your concerns about null and !!. It might not look less nicer, but I can live with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically what you do:
fun getFileLines(fileId, filePath, commit) : ArrayList? {
if (RawText.isBinary(fileLoader.openStream())) {
return null
}
}
val lines = getFileLines(fileId, filePath, commit)
if (lines == null) { // !!
throw NullPointerException()
}
What you can do:
fun getFileLines(fileId, filePath, commit) : ArrayList {
if (RawText.isBinary(fileLoader.openStream())) {
throw IllegalStateException()
}
}
val lines = getFileLines(fileId, filePath, commit) // Lines never null.
In that way when calling helper function there is no need to deal with nulls and exceptions anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, after carefully reading this debate, I'd like to break the tie. I agree with @anatolykislov for the following reasons: (i) it's better to have a descriptive exception, and (ii) any function needs to be written defensively and not assume the caller did a good job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm told twice that I'm wrong that I has to be wrong, let me defend myself though :).
I'm not sure how I break (ii) since Kotlin takes care of it on a compiling stage, and not sure why we have to have an exception in the first place (i). I think the concept of exceptions is not quite suitable for this case: exceptions is a good way to say something wrong happens, mostly to react to unexpected (bad) input. If you need to catch a problem in your own code, then assertions concept is used. The assertions are seen in debug build only and totally ignored in release builds, and thus they have zero impact (cost) for release builds. Assertions unlike exceptions are used for something that should never happen, to debug your app. I believe that this case suits better for assertions than exceptions, regardless this concept is present in Kotlin or not.
So getFileLines is called twice: a) when the files map is created and b) when deleted files are processed. I'd say assertions is more appropriate for the latter case, and no exceptions is for the first case, otherwise we end up with try/catch constructions, which is somewhat ugly, because binary files is a normal thing for a common repo. You don't suggest to use exceptions mechanism as a return value, like hey here's is a binary file, and thus no lines for you, right?
Anyway, let me unwrap body of this function right into code, those few lines are not worth such long debates and probably don't even deserve its own function, at least it will be less ambiguous and not so concernful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we don't send lines to server oldId and newId is hashes of commits in which they appeared and deleted?
Do we want to send lines to server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the server should keep (hash, age) pairs, which are used for age calculation for sure and to update age of living lines on a subsequent run. The server may but doesn't have to keep the records of deleted lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of storing them on the server? We could use local storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we should avoid sending anything to server except for stuff we can't do without. we don't need some blogger saying to everyone that sourcerer uploads per-line information of your proprietary code. if it can be done locally, let's do it locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see how commit info is different from line info when uploaded from a local machine in terms of blogger's speculations, but I guess lines can be stored and processed locally if we can set up a local DB for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please no line info uploads to server. Local DB is fine assuming it does not introduce new dependencies to the app. A plain text file is probably the best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok, however it's out of scope of this pull request, which was about alg to compute lines age. So, what's left before we can call it done? I assume, revert Main.kt changes, remove testing func, anything else? Or do you want to extend the scope to support lines age storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just in case, it's not a wanted change, right? :)
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be master branch. If no master presented then no computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have on top of your head how to get a master branch? getOriginHead apparently doesn't work for local repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repo.resolve("refs/heads/master")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is scan works if parentCommit is null? Probably precondition while would be more readable and no additional test would be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's the way to obtain a first commit changes / could you give an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
var commit: RevCommit? = revWalk.next()
while (commit != null && commit != tail) {
var parentCommit: RevCommit? = revWalk.next()
...
commit = parentCommit
}
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semicolon here and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this actually looks very good. But I found a couple of things I'd like to see changed, and I would like the problem with the exception in getFileLines to be resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading it now.
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, you could make toString() which would make it more standard. I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably just while (commit != tail) will work just as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if tail it not empty and not a tail (just a random revision), then we have to crash, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asurkov yes, you right, keep it then
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably prints will become logs or will go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a quick hint for me: what goes into logs?
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this fit one line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you know that?! it's exactly 80 chars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here. It's not entirely clear what DEV_NULL is.
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while loop here but for loop on line 214? I sense inconsistency!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do you find those!
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Track it" maybe?
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, after carefully reading this debate, I'd like to break the tie. I agree with @anatolykislov for the following reasons: (i) it's better to have a descriptive exception, and (ii) any function needs to be written defensively and not assume the caller did a good job.
src/main/kotlin/app/CodeLongevity.kt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rename this? Usually 'get' accompanies functions that are 'light' in computing. This does a lot, so maybe readFileLines? or collectFileLines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually liked it better in a separate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a positive side, there's not double binary check anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In prod logger should be used for debug output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove those on integration stage I think
Uh oh!
There was an error while loading. Please reload this page.
here's a new version. not yet finished (I still need to sort out a rename thing), but I think it's ready for more feedback: code style and alg comments are welcome. I'm also not yet clear what exactly we want to land. I guess all those prints/test methods should go away or do we want to keep them until the code integrated and possibly covered by unit testing?
I didn't tried it on something heavy yet, so we might come with something more memory thrifty, for example, replace those line huge arrays on continues line chucks, if we find out it's a problem.