I'm trying to split an incoming stream of strings into cumulative tokens per line item using a function below,
def cumulativeTokenise(string: String): Array[String] = {
val array = string.split(" +")
var result: Array[String] = Array()
array.map { i => (
result = result :+ (
if (result.lastOption == None) i
else result.lastOption.getOrElse("")+ " " + i
)
)
}
result
}
Ex: output of cumulativeTokenise("TEST VALUE DESCRIPTION . AS") would be => Array(TEST, TEST VALUE, TEST VALUE DESCRIPTION, TEST VALUE DESCRIPTION ., TEST VALUE DESCRIPTION . AS)
Trying to figure out if there's another efficient in-built method in Scala or better ways of doing it with FP, without any mutable array. Any help is much appreciated.
-
\$\begingroup\$ Have you tried scanLeft where the initial parameter is an empty list? \$\endgroup\$shanif– shanif2020年04月22日 12:53:39 +00:00Commented Apr 22, 2020 at 12:53
1 Answer 1
You can get the same results a little more directly.
def cumulativeTokenise(string: String): Array[String] =
string.split("\\s+")
.inits
.map(_.mkString(" "))
.toArray
.reverse
.tail
Or a, perhaps simpler, two step procedure.
def cumulativeTokenise(string: String): Array[String] = {
val tokens = string.split("\\s+")
Array.tabulate(tokens.length)(n => tokens.take(n+1).mkString(" "))
}
One problem I see here is that you rely on whitespace to separate tokens. That might not always be the case.
def cumulativeTokenise(string: String): Array[String] =
string.split("((?=\\W)|(?<=\\W))")
.filter(_.trim.nonEmpty)
.inits
.map(_.mkString(" "))
.toArray
.reverse
.tail
cumulativeTokenise("here@there")
//res0: Array[String] = Array(here, here @, here @ there)
Probably not the best solution to the problem, but it's something to think about.
-
\$\begingroup\$ I like the Array tabulate approach, I'm using whitespace because that was the requirement given to me. Thanks jwvh \$\endgroup\$Wiki_91– Wiki_912020年04月23日 13:51:14 +00:00Commented Apr 23, 2020 at 13:51
Explore related questions
See similar questions with these tags.