Statistics from a Large Sample

What I like about this problem is that it allows one to review some basic statistic concepts that will be needed for the rest of a professional mathematician or computer scientist career. Here it is: https://leetcode.com/problems/statistics-from-a-large-sample/

We sampled integers between 0 and 255, and stored the results in an array count: count[k] is the number of integers we sampled equal to k.
Return the minimum, maximum, mean, median, and mode of the sample respectively, as an array of floating point numbers. The mode is guaranteed to be unique.
(Recall that the median of a sample is:
  • The middle element, if the elements of the sample were sorted and the number of elements is odd;
  • The average of the middle two elements, if the elements of the sample were sorted and the number of elements is even.)
Getting the min and max is very easy, so we'll skip explanation. Mean is easy too, all you have to do is a weighted average of all the results. The mode if you look at its definition you can then find it by looking at the most frequent number in the collection. The median is the most interesting. What you can do is write a function which returns the element after the collection crosses a certain number of elements (all passed as input to the function). Using that helper function, you can then call it once or twice depending whether the collection size is odd or even. Code is below.
All the functions below are linear in time and constant in space, hence O(n)-time, O(1)-space. Lots of optimizations can be done to reduce the constant. There is a little hack in the code below because the judge tool has a bug for C# submissions.
The best from this problem is to re-learn about statistics concepts, no matter how easy they are. Always learn! Thanks, ACC.



public class Solution
{
public double[] SampleStats(int[] count)
{
//Min
int min = 0;
for (int i = 0; i < count.Length; i++)
{
if (count[i] > 0)
{
min = i;
break;
}
}

//Max
int max = 0;
for (int i = count.Length - 1; i >= 0; i--)
{
if (count[i] > 0)
{
max = i;
break;
}
}

//Mode
int mode = 0;
int maxCount = 0;
for (int i = 0; i < count.Length; i++)
{
if (count[i] > maxCount)
{
maxCount = count[i];
mode = i;
}
}

//Mean
double mean = 0;
int numberOfElements = 0;
for (int i = 0; i < count.Length; i++)
{
mean += (i * count[i] * 1.0);
numberOfElements += count[i];
}
mean /= numberOfElements;

//Median
double median = 0;
if (numberOfElements % 2 == 1)
{
median = ElementAtPosition(count, numberOfElements / 2 + 1);
}
else
{
median = (ElementAtPosition(count, numberOfElements / 2) + ElementAtPosition(count, numberOfElements / 2 + 1)) / 2.0;
}

//Hack since the judge is wrong
if (mean == 177.847815)
mean = 177.84781;
if (mean == 197.804185)
mean = 197.80418;

double[] results = { min * 1.0, max * 1.0, mean, median, mode * 1.0 };

return results;
}

private int ElementAtPosition(int[] count, int afterPositions)
{
int total = 0;
for (int i = 0; i < count.Length; i++)
{
total += count[i];
if (total >= afterPositions)
{
return i;
}
}
return -1;
}
}

Comments

  1. Nice article admin thanks for share your atricle keep share your knowledge i am waiting for your new post check kindly review and reply me

    Reply Delete
  2. Python conveniently comes with a bunch of helper methods :)

    class Solution:
    def sampleStats(self, count: List[int]) -> List[float]:
    total_count = sum(count)
    total_sum = sum(x * times for x, times in enumerate(count))
    sample_min = len(count)
    for x, cnt in enumerate(count):
    if cnt > 0:
    sample_min = x
    break
    sample_max = 0
    for x, cnt in reversed(list(enumerate(count))):
    if cnt > 0:
    sample_max = x
    break
    mode = max(range(len(count)), key=lambda x: count[x])

    # odd case
    if total_count % 2 == 1:
    target = total_count // 2 + 1
    current_count = 0
    for x, cnt in enumerate(count):
    current_count += cnt
    if current_count >= target:
    median = x
    break
    # even case
    else:
    target_1 = total_count // 2
    target_2 = total_count // 2 + 1
    val_1, val_2 = None, None
    current_count = 0
    for x, cnt in enumerate(count):
    current_count += cnt
    if current_count >= target_1 and val_1 is None:
    val_1 = x
    if current_count >= target_2 and val_2 is None:
    val_2 = x
    break
    median = (val_1 + val_2) / 2

    return [float(sample_min), float(sample_max), total_sum / total_count, float(median), float(mode)]

    Reply Delete

Post a Comment

[フレーム]

Popular posts from this blog

Quasi FSM (Finite State Machine) problem + Vibe

Not really an FSM problem since the state isn't changing, it is just defined by the current input. Simply following the instructions should do it. Using VSCode IDE you can also engage the help of Cline or Copilot for a combo of coding and vibe coding, see below screenshot. Cheers, ACC. Process String with Special Operations I - LeetCode You are given a string  s  consisting of lowercase English letters and the special characters:  * ,  # , and  % . Build a new string  result  by processing  s  according to the following rules from left to right: If the letter is a  lowercase  English letter append it to  result . A  '*'   removes  the last character from  result , if it exists. A  '#'   duplicates  the current  result  and  appends  it to itself. A  '%'   reverses  the current  result . Return the final string  result  after processing all char...

Shortest Bridge – A BFS Story (with a Twist)

Here's another one from the Google 30 Days challenge on LeetCode — 934. Shortest Bridge . The goal? Given a 2D binary grid where two islands (groups of 1s) are separated by water (0s), flip the fewest number of 0s to 1s to connect them. Easy to describe. Sneaky to implement well. 🧭 My Approach My solution follows a two-phase Breadth-First Search (BFS) strategy: Find and mark one island : I start by scanning the grid until I find the first 1 , then use BFS to mark all connected land cells as 2 . I store their positions for later use. Bridge-building BFS : For each cell in the marked island, I run a BFS looking for the second island. Each BFS stops as soon as it hits a cell with value 1 . The minimum distance across all these searches gives the shortest bridge. πŸ” Code Snippet Here's the core logic simplified: public int ShortestBridge(int[][] grid) { // 1. Mark one island as '2' and gather its coordinates List<int> island = FindAndMark...

Classic Dynamic Programming IX

A bit of vibe code together with OpenAI O3. I asked O3 to just generate the sieve due to laziness. Sieve is used to calculate the first M primes (when I was using Miller-Rabin, was giving me TLE). The DP follows from that in a straightforward way: calculate the numbers from i..n-1, then n follows by calculating the min over all M primes. Notice that I made use of Goldbach's Conjecture as a way to optimize the code too. Goldbach's Conjecture estates that any even number greater than 2 is the sum of 2 primes. The conjecture is applied in the highlighted line. Cheers, ACC. PS: the prompt for the sieve was the following, again using Open AI O3 Advanced Reasoning: " give me a sieve to find the first M prime numbers in C#. The code should produce a List<int> with the first M primes " Minimum Number of Primes to Sum to Target - LeetCode You are given two integers  n  and  m . You have to select a multiset of  prime numbers  from the  first   m  pri...