Logo
(追記) (追記ここまで)

32565번 - Finding Suspicious Proteins 다국어

시간 제한메모리 제한제출정답맞힌 사람정답 비율
2 초 1024 MB129975.000%

문제

Little Claire studies proteins, which are sequences of amino acids. There are 20 amino acids from which proteins are built. While amino acids all have proper names, such as alanine or glycine, they are often denoted by single letters, so that proteins can be seen as sequences of different lengths, such as DTASDAAAAAALTAABAAAAAKLTABBAAAAAAATAA, TIFLQQQQQQQQQQQ or even maybe RICKRQLL.

Comparing two proteins can be difficult, because they may contain active sites, which determine their function in a cell, and less important parts of the sequence. Recent advances in artificial neural networks made it possible to train a network that, given a protein, outputs a sequence of $l$ numbers, where each number roughly corresponds to a feature of a protein that correlates with its possible functions in a cell. Such a sequence is called an embedding.

Claire is particularly interested in suspicious proteins, those which are really different from others. For this purpose, she considers the so-called Manhattan distance between embeddings of proteins. For two protein embeddings $p$ and $q$ of length $l,ドル the distance $\mathcal{D}(p, q)$ is computed as follows: \begin{equation*} \mathcal{D}(p, q) = \sum_{i=1}^l |p_i - q_i|, \end{equation*} where $p_i$ is the $i$-th element of the embedding $p$.

Claire wants to find $k$ suspicious proteins in the given list of $n$ proteins. As a baseline for her studies, Claire wants to use the following greedy algorithm:

  • Find a protein $p^{(1)}$ which is the most distant from the first protein in the list.
  • The second protein, $p^{(2)},ドル is chosen as the most distant protein from $p^{(1)}$.
  • The third one, $p^{(3)},ドル is chosen so that $\min\{\mathcal{D}(p^{(1)}, p^{(3)}), \mathcal{D}(p^{(2)}, p^{(3)})\}$ is maximum possible. That is, it must be far away from both previously chosen proteins.
  • All subsequent proteins $p^{(i)},ドル 4ドル \le i \le k,ドル are chosen in a similar way: the minimum of the distances to all the previously chosen proteins should be maximum possible.

Note that, in the case of ties, the first matching protein in the list must be chosen.

Claire's implementation works nicely for small numbers $n$ and $k,ドル but becomes too slow as they increase. You must find a way to optimise this.

입력

The first line contains three numbers $n$ $(3 \le n \le 10^4),ドル $l$ $(1 \le l \le 100)$ and $k$ $(2 \le k \le \min\{n, 256\})$: the overall number of proteins, the length of each protein embedding, and the number of proteins to choose.

Each of the following $n$ lines starts with a protein identifier, which is a sequence of at least one and most ten capital letters and/or numbers. Then, separated by whitespace, come $l$ single-digit integer numbers $v_{1 \ldots l}$ (0ドル \le v \le 9$), which define the embedding of the protein. All protein identifiers will be different.

출력

Output the identifiers of $k$ chosen proteins, one per line, in their respective order ($p^{(1)}$ to $p^{(k)}$).

제한

예제 입력 1

4 2 2
FIRST 3 4
SECOND 1 2
THIRD 8 7
FOURTH 5 6

예제 출력 1

THIRD
SECOND

예제 입력 2

6 5 3
1OGLOBIN 1 1 1 1 1
GLU10 9 9 9 9 9
8EIN 8 9 8 9 9
COLLA6EN 6 5 4 3 2
7ILK 3 4 5 6 7
0LBUMIN 1 2 0 2 1

예제 출력 2

GLU10
1OGLOBIN
7ILK

힌트

출처

ICPC > Regionals > Europe > Northwestern European Regional Contest > The UK & Ireland Programming Contest > UKIEPC 2024 F번

  • 문제를 만든 사람: Maxim Buzdalov
(追記) (追記ここまで)

출처

대학교 대회

  • 사업자 등록 번호: 541-88-00682
  • 대표자명: 최백준
  • 주소: 서울시 서초구 서초대로74길 29 서초파라곤 412호
  • 전화번호: 02-521-0487 (이메일로 연락 주세요)
  • 이메일: contacts@startlink.io
  • 통신판매신고번호: 제 2017-서울서초-2193 호

AltStyle によって変換されたページ (->オリジナル) /