Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Feature/scala code/ch02 biman (#8) #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
deepakmca05 merged 2 commits into master from feature/scala-code/ch02
Jan 7, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
View file Open in desktop
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
.DS_Store
.idea/
build
gradle
.gradle
.idea
build
!gradle-wrapper.jar
3 changes: 1 addition & 2 deletions code/bonus_chapters/lambda_expressions/README.md
View file Open in desktop
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# What are [Lambda Functions](./Lambda_Expressions.pdf)?

Lambda functions = Anonymous functions

A lambda function is a small function containing a single expression.
Lambda functions can also act as anonymous functions where they don’t
require any name. These are very helpful when we have to perform small
tasks with less code.


# Python Example

Python programming language supports the creation of anonymous functions
Expand Down Expand Up @@ -63,4 +63,3 @@ You may use lambda expressions or functions in PySpark:
# rdd: RDD[(String, Integer)]
rdd2 = rdd.filter(filter_function)


4 changes: 4 additions & 0 deletions code/chap01/scala/.gitignore
View file Open in desktop
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.idea
.idea/
.gradle/
build/
4 changes: 4 additions & 0 deletions code/chap02/scala/.gitignore
View file Open in desktop
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.idea
.idea/
.gradle/
build/
43 changes: 42 additions & 1 deletion code/chap02/scala/README.md
100644 → 100755
View file Open in desktop
Original file line number Diff line number Diff line change
@@ -1 +1,42 @@
Scala Solutions
# Chapter 2

## DNA-Base-Count Programs using FASTA Input Format

Using FASTA input files, there are 3 versions of DNA-Base-Count

* Version-1:
* Uses basic MapReduce programs
* Using Spark (`org.data.algorithms.spark.ch02.DNABaseCountVER1`)

* Version-2:
* Uses InMapper Combiner design pattern
* Using PySpark (`org.data.algorithms.spark.ch02.DNABaseCountVER2`)

* Version-3:
* Uses InMapper Combiner design pattern (by using mapPartitions() transformations)
* Using PySpark (`org.data.algorithms.spark.ch02.DNABaseCountVER3`)


## DNA-Base-Count Programs using FASTQ Input Format

Using FASTQ input files, the following solution is available:

* Uses InMapper Combiner design pattern (by using mapPartitions() transformations)
* Using PySpark (`org.data.algorithms.spark.ch02.DNABaseCountFastq`)


## FASTA Files to Test DNA-Base-Count

* A small sample FASTA file (`data/sample.fasta`) is provided.

* To test DNA-Base-Count programs with large size FASTA files,
you may download them from here:


````
ftp://ftp.ensembl.org/pub/release-91/fasta/homo_sapiens/dna/

ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/rs_fasta/

ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
````
23 changes: 23 additions & 0 deletions code/chap02/scala/build.gradle
View file Open in desktop
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
apply plugin: 'scala'
apply plugin: 'application'

ext.scalaClassifier = '2.13'
ext.scalaVersion = '2.13.7'

group 'com.spark.algos.data'
version '1.0-SNAPSHOT'

repositories {
mavenLocal()
mavenCentral()
}

dependencies {
implementation group: "org.scala-lang", name: "scala-library", version: "2.13.7"
implementation group: "org.apache.spark", name: "spark-core_2.13", version: "3.2.0"
implementation group: "org.apache.spark", name: "spark-sql_2.13", version: "3.2.0"
}

application {
mainClass = project.hasProperty("mainClass") ? project.getProperty("mainClass") : "NULL"
}
12 changes: 12 additions & 0 deletions code/chap02/scala/data/sample.fasta
View file Open in desktop
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
>seq1
cGTAaccaataaaaaaacaagcttaacctaattc
>seq2
agcttagTTTGGatctggccgggg
>seq3
gcggatttactcCCCCCAAAAANNaggggagagcccagataaatggagtctgtgcgtccaca
gaattcgcacca
AATAAAACCTCACCCAT
agagcccagaatttactcCCC
>seq4
gcggatttactcaggggagagcccagGGataaatggagtctgtgcgtccaca
gaattcgcacca
Loading

AltStyle によって変換されたページ (->オリジナル) /