There is no intended end use for this. It is for learning and development only.
I got carried away following an example and ended up with a miniaturised vehicle registration system. You can bulk generate records, each of which is given a unique registration number in keeping with British registration format. I've written a script that compares and lists all registration numbers that fall afoul of the criteria.
It does work, but it is incredibly slow. (over an hour to check 1 mil records). I am looking for critique on the logic and any optimisation I may have missed.
Example string: AA99AAA
Example criteria: A?9?AAA
def full_search(offensive_list)
p 'Full check:'
p "Comparing #{$all_vehicles.count} records against #{offensive_list.count} banned combinations"
p 'This will take a few minutes'
vrm_array, example_array = [], []
vrm_list = $all_vehicles.keys.sort
vrm_list.each do |vrm|
vrm_array << vrm.split("") #We split each reg into an array of characters
end
offensive_list.each do |example|
example.strip!
example_array << example.split("") #and the same with our banned combinations
end
vrm_array.each do |vrm|
example_array.each do |example| #itterate through vrms x examples
@formatted_vrm = vrm.dup
if example.length == vrm.length
example.each_index do |index|
if example[index] == "?" #for each wildcard we add a wildcard to the vrm for comparison
@formatted_vrm[index] = "?"
end
end
if @formatted_vrm == example then offensive_found(vrm, example) end
end
end
end
end
def offensive_found(vrm, example)
built_vrm = ""
built_example = ""
if vrm.class == Array #clean up formatting so we can store it
vrm.each do |character|
built_vrm << character
end
example.each do |character|
built_example << character
end
else
built_example = example #clearly redundant, but it works so...
built_vrm = vrm
end
if $bad_vrms[built_example] # if we already have a record
prev_matched = $bad_vrms[built_example] #just add to the array
prev_matched << built_vrm
$bad_vrms.store(built_example, prev_matched)
else
new_match = [built_vrm] # or create a new hash key
$bad_vrms.store(built_example, new_match)
end
#p "#{built_vrm} - matched with #{built_example}"
end
If you'd prefer you can clone the full thing on github. https://github.com/Niall47/RubySomDemo
1 Answer 1
You could consider using Regex to speed up the search. The cleanest way to do so would be to change your full_search.txt
to be Regex expressions. For example A?9?AAA
would need to be changed to A.9.AAA
(in Regex the .
means any single character).
Then you could change your full_search
method to look like this:
def full_search(offensive_list)
vrm_list = $all_vehicles.keys.sort
offensive_examples = offensive_list.map(&:strip)
offensive_examples.each do |offensive_example|
vrm_list.grep(/^#{offensive_example}$/).each do |offensive_vrm|
offensive_found(offensive_vrm, offensive_example)
end
end
end
In the Regex the ^
means start of string and the $
means end of string; this basically ensures that substrings are not matched e.g.) abcd
matches with the regex .c
but not with ^.c$
If you don't want to modify your list, you could so something that dynamically creates the regex in Ruby. For example: offensive_example.gsub("?", ".")
this would replace all ?
with .
.
A couple unrelated pointers...
- Ruby has a
String#chars
method that is (arguably) more readable and might have some minor performance improvements overstring.split("")
. - Instead of assigning an initial empty value and building it in an
each
, consider usingArray#map
to reduce the amount of variable reassignments.
# before
vrm_array = []
vrm_list.each do |vrm|
vrm_array << vrm.split("")
end
vrm_array.each do |vrm|
# do stuff
end
# after
vrm_array = vrm_list.map { |vrm| vrm.split("") } # this calls `split` on each element in `vrm_list`
vrm_array.each do |vrm|
# do stuff
end
- If you need to take an array of characters and join them together into a single string, consider using
Array#join
.
irb(main):001:0> ['a', 'b', 'c'].join
=> "abc"
- If you have a Hash and you want an empty array if a key doesn't exist, consider using a default value (but be careful with some gotchas related to mutable defaults)
$bad_vrms = Hash.new { |h, k| h[k] = [] } # the argument is the default value which in this case is a block that initializes the key's value to a new instance of an empty array
# now we don't need to check if a key exists and can simply push to it
$bad_vrms[example] << vrm