The code below will go through an excel spread sheet taken from kaggle, is named "SteamGames (71k games)" and the creator is "MEXWELL".
inside the for
loop I take only the required information, create a Game object which is very straight forward with only gets
, sets
and a compareTo
function for sorting later also a score
variable that I can assign and sort by.
I create ArrayList for the list of all games and one for only the name. This is important because I realized that there are duplicates inside the spreadsheet I downloaded so I have to check to see if the the game I’m adding is already on the list or not. This causes a bit of performance issue as it takes about 5 seconds on my machine to run the code but I don't see a better solution aside from finding another dataset.
as for my second part this is where I choose games that will get a score and then assign it, I included games that cost at least five dollars and have at least 20 reviews in total. Then I assign the score based on positive negative reviews and the price the higher amount of positive reviews will result in higher score while higher price and negative review count will result in lower score.
Also important to note that some games have some missing information hence the try
and catch
and the counter of "corrupted" games. In total there were about 250 out of 71k corrupted games.
Then I just sorted using a compareTo function implemented in the Game class chose the number of games that will get written into the txt file, got it written and thats it.
I deleted the titles from the excel sheet to make it easier go through as I knew in which places the names prices and reviews were.
If you have any feedback I would appreciate it, I’m not too disappointed with the performance issue but if you have some insight feel free to share and maybe where to go next with this project
import java.io.*;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class Main{
public static void main(String[] args) throws Exception{
//first part
String line = "";
String splitBy = ",";
List<String[]> games=new ArrayList<String[]>();
try {
BufferedReader br = new BufferedReader(new FileReader("games.csv"));
while ((line = br.readLine()) != null) {
String[] game = line.split(splitBy); // use comma as separator
games.add(game);
}
List<String> listOfNames = new ArrayList<String>();
listOfNames.add(games.get(0)[1]);
List<Game> bestGames = new ArrayList<Game>();
for (String[] game : games) {
String name = game[1];
String price = game[7];
String positive = game[23];
String negative = game[24];
Game newGame = new Game(name, price, positive, negative);
if(listOfNames.contains(name) && !name.equals(games.get(0)[1])){
continue;
}
listOfNames.add(name);
bestGames.add(newGame);
}
//second part
int numberOfFails = 0;
for (int i = 1; i < bestGames.size(); i++) {
try {
double price = Double.parseDouble(bestGames.get(i).getPrice());
int positive = Integer.parseInt(bestGames.get(i).getPositive());
int negative = Integer.parseInt(bestGames.get(i).getNegative());
if(price >= 5.0 && (positive + negative >= 20)){
double score = (positive)/(price + negative);
bestGames.get(i).setScore(score);
}
} catch (Exception e) {
numberOfFails++;
}
}
Collections.sort(bestGames);
File bestGamesFile = new File("C:\\Users\\guybo\\Java_Data_Projects\\Main.java\\bestGames.txt");
FileWriter myWriter = new FileWriter("bestGames.txt");
int numberOfBestGames = 50;
for(int i = 0; i < numberOfBestGames; i++){
String name = bestGames.get(i).getName();
String price = bestGames.get(i).getPrice();
myWriter.write(name + "\n" + price + "$" + "\n" + "\n");
}
myWriter.close();
} catch (Exception e) {
e.getMessage();
}
}
}
public class Game implements Comparable<Game>{
private String name;
private String price;
private String positive;
private String negative;
private double score;
public Game(String name, String price, String positive, String negative){
this.name = name;
this.price = price;
this.positive = positive;
this.negative = negative;
this.score = 0;
}
public double getScore() {
return score;
}
public void setScore(double score) {
this.score = score;
}
public String getName() {
return name;
}
public String getNegative() {
return negative;
}
public String getPositive() {
return positive;
}
public String getPrice() {
return price;
}
public void setName(String name) {
this.name = name;
}
public void setNegative(String negative) {
this.negative = negative;
}
public void setPositive(String positive) {
this.positive = positive;
}
public void setPrice(String price) {
this.price = price;
}
@Override
public int compareTo(Game game) {
return -Double.compare(this.score, game.getScore());
}
}
-
1\$\begingroup\$ Nice idea! Frame challenge: Rather than factoring price into the score, you could approach this as a classic knapsack problem: What is the highest total score of games you can purchase within a given budget? \$\endgroup\$Davislor– Davislor2023年07月03日 19:30:56 +00:00Commented Jul 3, 2023 at 19:30
-
\$\begingroup\$ @Davislor cool idea i would definetly do that! \$\endgroup\$Ellie– Ellie2023年07月04日 16:38:18 +00:00Commented Jul 4, 2023 at 16:38
4 Answers 4
First you should split your main method into several shorter ones to clearly identify the steps of your algorithm. By naming them correctly your code will be easier to understand and to maintain.
Second you should effectively use constants instead of magic numbers to identify the columns in your CSV. As a side note, if your project become more complex, you should use a library to load your CSV file instead of doing it manually to handle escaping (columns containing the "," character for example...)
You could simplify your code by using the features offered by Java natively : For example, you can read a file in one line: List<String> lines = Files.readAllLines(Paths.get(filename));
, same for writing the result to a file.
You could use java streams.
I think your List<String> listOfNames
is not necessary and should be removed.
The score calculation should be delegate to the Game
class.
You could make the properties of Game
final and remove setters.
I prefer not to make my classes Comparable so that the sort should be externalized. (For example if you display the results in a table, the end user will click on column names to sort the results): Collections.sort(games, Comparator.comparing(Game::getScore).reversed());
This is a proposition of what could be done:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
public class Main {
public static final String CSV_SEPARATOR = ",";
private static final int NAME_INDEX = 1;
private static final int PRICE_INDEX = 7;
private static final int POSITIVE_INDEX = 23;
private static final int NEGATIVE_INDEX = 24;
public static void main(String[] args) throws Exception {
// Read the games from file
List<Game> games = readGamesFromFile("games.csv");
// Sort the games by score in descending order
Collections.sort(games, Comparator.comparing(Game::getScore).reversed());
writeBestGamesToFile("bestGames.txt", games);
}
private static List<Game> readGamesFromFile(String filename) throws IOException {
List<String> lines = Files.readAllLines(Paths.get(filename));
return lines.stream()
.map(line -> {
String[] columns = line.split(CSV_SEPARATOR);
return new Game(columns[NAME_INDEX], columns[PRICE_INDEX], columns[POSITIVE_INDEX], columns[NEGATIVE_INDEX]);
})
.toList();
}
private static void writeBestGamesToFile(String outputFilename, List<Game> games) throws IOException {
Files.write(Paths.get(outputFilename), games.stream().map(game -> game.getName() + " : " + game.getPrice() + "$").collect(Collectors.toList()));
}
}
public class Game {
private final String name;
private final String price;
private final String positive;
private final String negative;
public Game(String name, String price, String positive, String negative) {
this.name = name;
this.price = price;
this.positive = positive;
this.negative = negative;
}
public double getScore() {
try {
double price = Double.parseDouble(getPrice());
int positive = Integer.parseInt(getPositive());
int negative = Integer.parseInt(getNegative());
if (price >= 5.0 && (positive + negative >= 20)) {
return (positive) / (price + negative);
}
} catch (Exception e) {
}
return -1;
}
public String getName() {
return name;
}
public String getNegative() {
return negative;
}
public String getPositive() {
return positive;
}
public String getPrice() {
return price;
}
}
Structure your code properly (Single responsibility principle)
Don't pile all the logic into one method (like it can be observed in the main
).
Dissect it into several self-contained narrow-focused, easy to comprehend, easy to test parts.
Use Try-with-resources
Resources are might be leaked when they are not closed properly.
There's no attempt to close the reader in the code. And the writer wouldn't be closed in case if an exception occurs.
In order to ensure that IO-streams are closed in any scenario, you should either invoke close()
on every resource inside the finally
block, or use Java 7 try-with-resources statement. The latter option is way more convenient (you're not forced to declare resources outside the try
block, and there's no need in invoking close
manually).
Leverage NIO API to dial with files
Making use of methods featured by the Files
class will simplify your code.
You can implement IO-related operations in a more concise and declarative manner, also it might save you from some mistakes which might occur while messing with low-level logic (for instance, in your code you forget to wrap the FileWritter
with a BufferedWriter
which is always advisable).
Comparable vs Comparator
There's only one valid use case when a class should implement Comparable
interface: its instances should have a natural ordering (akin to the ordering of strings and integer numbers).
This interface imposes a total ordering on the objects of each class that implements it. This ordering is referred to as the class's natural ordering, and the class's
compareTo
method is referred to as its natural comparison method.
The implementation of compareTo
shown above looks rather as a hack than an implementation imposing a genuine natural ordering. Suppose that arranging Game
instances by score in ascending order might be required instead in the future, changing the behavior of compareTo
wouldn't be a prudent move. Hence, I'd suggest defining a Comparator
instead.
Favor Immutability
Use mutation with caution and only when needed.
Since there's no clear reason for changing Game
instance after it was constructed, it's better to make it immutable to begin with.
Use Sets to eliminate duplicates
When you need to get rid of duplicates, consider using a Set
.
In this case, we can utilize a HashSet
, but note that it requires to implement properly equals
and hasCode
in the Game class, otherwise duplicated instances would not be spotted. Also note that in on order to reap the performance benefits of using HashSet
, the hash function should produce hashes that would disperse the elements more or less evenly among the buckets.
In the code below, Stream.distinct()
makes use of a LinkedHashSet
under the hood.
List
is not a good choice of the data structure when you need to perform contains
check.
Proper descriptive names
.split(splitBy)
-> .split(DELIMITER)
(or SEPARATOR, etc.)
int positive
-> int positiveReviews
int negative
-> int negativeReviews
Diamond operator
Since Java 7 there's no need to repeat type argument on the right (unless you're using var
identifier on the left).
List<Game> bestGames = new ArrayList<Game>();
instead:
List<Game> bestGames = new ArrayList<>();
Or with Java 10 local variable type inference:
var bestGames = new ArrayList<Game>();
Proper Logging
Do not ignore abnormal behavior.
Choose a logging framework that fits your purposes and log the cases that might require investigating.
For instance, instead of:
catch (Exception e) {
e.getMessage(); // which is not doing anything usefull
}
write something along these lines:
catch (Exception e) {
log.warn("Failed to parse", line);
}
Refactored version
Firstly, here's the method to be invoked from the client code, which takes a path to the source CSV-file and a path to the destination file:
public static void processGames(Path source,
Path destination) throws IOException {
try (var lines = Files.lines(source)) {
var mostReviewed = mostReviewedGames(lines);
Files.write(destination, mostReviewed);
}
}
Files.lines
returns a stream of lines from the file (not an IO-stream, but java.util.stream.Stream
), so we can express the logic in a declarative way:
private static Iterable<String> mostReviewedGames(Stream<String> lines) {
return lines
.mapMulti(GameParser::parseLine)
.filter(game -> game.getScore() > 0)
.distinct() // to remove duplicates; NB: equals/hashCode contract should be properly implemented
.sorted(Game.BY_SCORE_DESC)
.map(Game::toString)
.toList();
}
A class encapsulating functionality for parsing Game
instances:
import static java.lang.Double.parseDouble;
import static java.lang.Integer.parseInt;
public class GameParser {
public static final String DELIMITER = ",";
private static final int NAME_COLUMN = 1;
private static final int PRICE_COLUMN = 7;
private static final int POSITIVE_REVIEWS_COLUMN = 23;
private static final int NEGATIVE_REVIEWS_COLUMN = 24;
private static void parseLine(String line, Consumer<Game> downstream) {
try {
String[] columns = line.split(DELIMITER);
var game = Main.Game.builder()
.name(columns[NAME_COLUMN])
.price(parseDouble(columns[PRICE_COLUMN]))
.positiveReviews(parseInt(columns[POSITIVE_REVIEWS_COLUMN]))
.negativeReviews(parseInt(columns[NEGATIVE_REVIEWS_COLUMN]))
.build();
downstream.accept(game);
} catch (Exception e) {
log.warn("Failed to parse", line); // leave useful information to investigate the problem
}
}
}
Game
class:
import static java.lang.System.lineSeparator;
import static java.util.Comparator.comparingDouble;
public class Game {
public static final Comparator<Game> BY_SCORE_DESC =
comparingDouble(Game::getScore).reversed();
private static final double MIN_PRICE = 5.0;
private static final int MIN_REVIEW = 20;
private final String name;
private final double price;
private final double score;
public Game(String name, double price, double score) {
this.name = name;
this.price = price;
this.score = score;
}
public static double score(double price, int positiveReviews, int negativeReviews) {
return price >= MIN_PRICE && positiveReviews + negativeReviews >= MIN_REVIEW
? positiveReviews / (price + negativeReviews)
: 0;
}
public static GameBuilder builder() {
return new GameBuilder();
}
public double getScore() {
return score;
}
public String getName() {
return name;
}
@Override
public boolean equals(Object o) {
return o instanceof Game other
&& Objects.equals(this.name, other.name);
}
@Override
public int hashCode() {
return Objects.hash(name);
}
@Override
public String toString() {
return name + lineSeparator() + price + "$" + lineSeparator();
}
public static class GameBuilder {
private String name;
private double price = -1;
private int positiveReviews = -1;
private int negativeReviews = -1;
private GameBuilder() {}
public GameBuilder name(String name) {
this.name = name;
return this;
}
public GameBuilder price(double price) {
this.price = price;
return this;
}
public GameBuilder positiveReviews(int positiveReviews) {
this.positiveReviews = positiveReviews;
return this;
}
public GameBuilder negativeReviews(int negativeReviews) {
this.negativeReviews = negativeReviews;
return this;
}
public Game build() {
if (name == null ||
price == -1 || positiveReviews == -1 || negativeReviews == -1) {
throw new IllegalStateException("Incomplete data");
}
return new Game(
name,
price,
score(price, positiveReviews, negativeReviews)
);
}
}
}
String name = game[1];
String price = game[7];
String positive = game[23];
String negative = game[24];
A better practice (I don't know if there is an even better practice than that in Java) would be to mark these indices in a named global variables, so you can access them in a more expressive way.
It might look something like that:
int NameIndex = 1;
int PriceIndex = 7;
/* ... */
String name = game[NameIndex]
String price = game[PriceIndex]
/* ... */
This way, even if the indices will be changed in the future, it will be easy to update the code, without getting into the implementation details again.
Good luck! :)
appropriate datastructure
List<String> listOfNames = new ArrayList<String>();
...
for (... : games) {
...
if (listOfNames.contains(name) && ...) {
That .contains()
call has a \$O(n)\$ linear loop,
leading to overall \$O(n^2)\$ quadratic complexity.
Prefer a Set
, so \$O(1)\$ constant time lookups
lead to \$O(n)\$ linear complexity.
Or equivalently you might choose to turn this into a Map
:
List<String[]> games = new ArrayList<String[]>();
sort
Alternatively, call .sort()
on the ArrayList,
or use a datastructure which remains sorted.
Then you're scanning names in alphabetical order,
and it's trivial to
uniq
them by testing if current name equals previous name.
short-circuiting operator
if (listOfNames.contains(name) && !name.equals(games.get(0)[1])) {
Here we have if (A && B)
,
where A is expensive (linear in \$n\$), and
B is cheap to evaluate (\$O(1)\$ constant cost).
List the cheap one first, so if it turns out to
be False we don't do any of the work that A requires.