I really like using mongodb to store my data and recently I tried out GridFS and it really fits my use case.
My problem with it is the space requirement, which seems quite odd. I have ~107GB of images in Amazon S3, which is around 1 million files (all images, mostly small ones). I made a simple Java project to download the images from S3 and insert them into two separate MongoDB GridFS collections (single server, 3.6.5, 64 bit, Windows Server 2016). The problem is, when the upload/download completes, the GridFS collections take more than 300GB storage on the server. Is this acceptable for this kind of collection or should I worry about the tripled size?
Note: I simply inserted the images using the Java Mongo Driver (Spring Boot) without any significant change, the problem is with the image chunks. I do not delete or update any images (I defined a unique index for the MD5 field though, to ignore image duplication), thus compact and repair does not change the collection sizes. As much as I can see, the collections are not overly preallocated (I don't think my problem is similar to this: Huge size on mongodb's gridfs. Should I compact? )
Also, currently it is a single mongodb server, without a replica set.
Thank you very much for your help!
1 Answer 1
Add the MongoDB Java driver dependency to your project's pom.xml file:
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongodb-driver-sync</artifactId>
<version>4.4.2</version>
</dependency>
Create a MongoDB client bean in your application configuration class:
@Configuration
public class MongoConfig {
@Value("${spring.data.mongodb.uri}")
private String mongoUri;
@Bean
public MongoClient mongoClient() {
ConnectionString connectionString = new ConnectionString(mongoUri);
MongoClientSettings settings = MongoClientSettings.builder()
.applyConnectionString(connectionString)
.build();
return MongoClients.create(settings);
}
@Bean
public MongoDatabase mongoDatabase(MongoClient mongoClient) {
return mongoClient.getDatabase("your_database_name");
}
}
Define a service class to handle the GridFS operations:
@Service
public class GridFsService {
private final MongoDatabase mongoDatabase;
private final GridFSBucket gridFSBucket;
public GridFsService(MongoDatabase mongoDatabase) {
this.mongoDatabase = mongoDatabase;
this.gridFSBucket = GridFSBuckets.create(mongoDatabase);
}
public ObjectId uploadFile(String filename, InputStream inputStream, String contentType) throws IOException {
GridFSUploadOptions options = new GridFSUploadOptions()
.chunkSizeBytes(256 * 1024) // Set the desired chunk size
.metadata(new Document("contentType", contentType)); // Set additional metadata if needed
return gridFSBucket.uploadFromStream(filename, inputStream, options);
}
public GridFSDownloadStream downloadFile(ObjectId fileId) {
return gridFSBucket.openDownloadStream(fileId);
}
}
Use the GridFsService in your application logic to upload and download files:
@Service
public class YourService {
private final GridFsService gridFsService;
public YourService(GridFsService gridFsService) {
this.gridFsService = gridFsService;
}
public void uploadFile(MultipartFile file) throws IOException {
try (InputStream inputStream = file.getInputStream()) {
gridFsService.uploadFile(file.getOriginalFilename(), inputStream, file.getContentType());
}
}
public InputStream downloadFile(ObjectId fileId) {
GridFSDownloadStream downloadStream = gridFsService.downloadFile(fileId);
return downloadStream.getInputStream();
}
}
-
1But why, what does it do?Rohit Gupta– Rohit Gupta2023年07月20日 20:25:30 +00:00Commented Jul 20, 2023 at 20:25
ls -l
it will be helpful to include those output in the question. Please also post the output ofdb.fs.chunks.stats()
(assuming your chunks collection is using the default name offs.chunks
)