Feature/scala code/ch02 biman (#8) #9

Original file line number	Diff line number	Diff line change
		@@ -1,12 +1,12 @@
		# What are [Lambda Functions](./Lambda_Expressions.pdf)?

		Lambda functions = Anonymous functions

		A lambda function is a small function containing a single expression.
		Lambda functions can also act as anonymous functions where they don’t
		require any name. These are very helpful when we have to perform small
		tasks with less code.


		# Python Example

		Python programming language supports the creation of anonymous functions
Expand Down Expand Up		@@ -63,4 +63,3 @@ You may use lambda expressions or functions in PySpark:
		# rdd: RDD[(String, Integer)]
		rdd2 = rdd.filter(filter_function)

4 changes: 4 additions & 0 deletions code/chap01/scala/.gitignore

Show comments View file Open in desktop

4 changes: 4 additions & 0 deletions code/chap02/scala/.gitignore

Show comments View file Open in desktop

43 changes: 42 additions & 1 deletion code/chap02/scala/README.md

100644 → 100755

Show comments View file Open in desktop

Original file line number	Diff line number	Diff line change
		@@ -1 +1,42 @@
	Scala Solutions
	# Chapter 2

	## DNA-Base-Count Programs using FASTA Input Format

	Using FASTA input files, there are 3 versions of DNA-Base-Count

	* Version-1:
	* Uses basic MapReduce programs
	* Using Spark (`org.data.algorithms.spark.ch02.DNABaseCountVER1`)

	* Version-2:
	* Uses InMapper Combiner design pattern
	* Using PySpark (`org.data.algorithms.spark.ch02.DNABaseCountVER2`)

	* Version-3:
	* Uses InMapper Combiner design pattern (by using mapPartitions() transformations)
	* Using PySpark (`org.data.algorithms.spark.ch02.DNABaseCountVER3`)


	## DNA-Base-Count Programs using FASTQ Input Format

	Using FASTQ input files, the following solution is available:

	* Uses InMapper Combiner design pattern (by using mapPartitions() transformations)
	* Using PySpark (`org.data.algorithms.spark.ch02.DNABaseCountFastq`)


	## FASTA Files to Test DNA-Base-Count

	* A small sample FASTA file (`data/sample.fasta`) is provided.

	* To test DNA-Base-Count programs with large size FASTA files,
	you may download them from here:


	````
	ftp://ftp.ensembl.org/pub/release-91/fasta/homo_sapiens/dna/

	ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/rs_fasta/

	ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
	````

23 changes: 23 additions & 0 deletions code/chap02/scala/build.gradle

Show comments View file Open in desktop

Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,23 @@
	apply plugin: 'scala'
	apply plugin: 'application'

	ext.scalaClassifier = '2.13'
	ext.scalaVersion = '2.13.7'

	group 'com.spark.algos.data'
	version '1.0-SNAPSHOT'

	repositories {
	mavenLocal()
	mavenCentral()
	}

	dependencies {
	implementation group: "org.scala-lang", name: "scala-library", version: "2.13.7"
	implementation group: "org.apache.spark", name: "spark-core_2.13", version: "3.2.0"
	implementation group: "org.apache.spark", name: "spark-sql_2.13", version: "3.2.0"
	}

	application {
	mainClass = project.hasProperty("mainClass") ? project.getProperty("mainClass") : "NULL"
	}

12 changes: 12 additions & 0 deletions code/chap02/scala/data/sample.fasta

Show comments View file Open in desktop

Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,12 @@
	>seq1
	cGTAaccaataaaaaaacaagcttaacctaattc
	>seq2
	agcttagTTTGGatctggccgggg
	>seq3
	gcggatttactcCCCCCAAAAANNaggggagagcccagataaatggagtctgtgcgtccaca
	gaattcgcacca
	AATAAAACCTCACCCAT
	agagcccagaatttactcCCC
	>seq4
	gcggatttactcaggggagagcccagGGataaatggagtctgtgcgtccaca
	gaattcgcacca

Navigation Menu