\documentclass[11pt]{article} \usepackage{amsmath,amssymb,amsthm,graphicx} \DeclareMathOperator*{\E}{\mathbb{E}} \let\Pr\relax \DeclareMathOperator*{\Pr}{\mathbb{P}} \newcommand{\eps}{\epsilon} \newcommand{\inprod}[1]{\left\langle #1 \right\rangle} \newcommand{\R}{\mathbb{R}} \newcommand{\handout}[5]{ \noindent \begin{center} \framebox{ \vbox{ \hbox to 5.78in { {\bf CS 395T: Sublinear Algorithms } \hfill #2 } \vspace{4mm} \hbox to 5.78in { {\Large \hfill #5 \hfill} } \vspace{2mm} \hbox to 5.78in { {\em #3 \hfill #4} } } } \end{center} \vspace*{4mm} } \newcommand{\lecture}[4]{\handout{#1}{#2}{#3}{Scribe: #4}{Lecture #1}} \newtheorem{theorem}{Theorem} \newtheorem{corollary}[theorem]{Corollary} \newtheorem{lemma}[theorem]{Lemma} \newtheorem{observation}[theorem]{Observation} \newtheorem{proposition}[theorem]{Proposition} \newtheorem{definition}[theorem]{Definition} \newtheorem{claim}[theorem]{Claim} \newtheorem{fact}[theorem]{Fact} \newtheorem{assumption}[theorem]{Assumption} % 1-inch margins, from fullpage.sty by H.Partl, Version 2, Dec. 15, 1988. \topmargin 0pt \advance \topmargin by -\headheight \advance \topmargin by -\headsep \textheight 8.9in \oddsidemargin 0pt \evensidemargin \oddsidemargin \marginparwidth 0.5in \textwidth 6.5in \parindent 0in \parskip 1.5ex \begin{document} \lecture{6 --- Sept 16, 2014}{Fall 2014}{Prof.\ Eric Price}{Taewan Kim} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % 1. Overview % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Overview} In the last lecture we covered \begin{enumerate} \item Review of Count-Sketch algorithm originates in a paper by Charikar, Chen, and Farach-Colton. \cite{CCF02} \item Finishing an improved version of Count-Sktech introduced by Minton and Price. \cite{MP14} \item Fast recovery time Algorithm in $O(\log ^c n)$ developed by Gilbert, Li, Porat and Strauss. \cite{GLPS12} \end{enumerate} In this lecture we will discuss a new topic: Graph Sketching. The main algorithm to be learned in the course was introduced by Ahn, Guha, and McGregor in 2012 \cite{AGM12}. To get an intuition on how to deal with Graph Sketching problem, sub-topics will be covered as follows. \begin{enumerate} \item Define Graph Sketching problem \item Warm up problem: with only insertions of elements \item Warm up problem: with insertions/deletions \item Warm up problem: as a linear sketch \end{enumerate} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 2. Preliminaries for Graph Sketching %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Preliminaries for Graph Sketching} Graph is composed of sets of vertices and edges which can be found easily around us such as friendship in social network. When a given graph is dense, it is a natural question to have a sketch of the graph which preserves the structural characteristics of the original graph. Streaming model for the graph is usually defined by the stream of edges, and the number of nodes are given as $n$. Also the stream of edges can have both additions and deletions. You can't solve many problems with $o(n)$ space, so the ``semistreaming'' model allows $O(n \log^c n)$ space. A number of graph problems can be solved in the semistreaming model, using a remarkable technique due to \cite{AGM12}. These include: \begin{itemize} \item Find spanning tree \item $(1+\epsilon)$-approximation of minimum spanning tree (MST) \item Test whether the graph is bipartite \item Estimate the size of cuts \item Sample uniform edge from cut \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % 2.1 Warm up problem % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Warm Up Problem} Before introducing graph sketching algorithm directly, let's consider a simpler problem that will be used in the graph problems. \begin{description} \item[Given] Stream is composed of insertions/deletions of elements. We want to sample a random element. If there are $k$ distinct elements, output each element with probability $\approx \frac{1}{k}$ \item[Goal] $O(\log ^c n)$ space \end{description} \begin{description} \item[Case 1] Insertions only \item We can solve this problem using a simple hash function \begin{enumerate} \item $h:[n] \rightarrow [0,1]$ uniformly random \item Store $a$ if $h(a)$ is minimum \end{enumerate} \end{description} This simple algorithm can output an element with probability $\approx \frac{1}{k},ドル since the hash function $h$ doesn't care about the input stream. So any element with minimum $h(a)$ value can be stored uniformly random. However this algorithm can not deal with a case when deletion is included in the stream. \begin{description} \item[Case 2] Insertions and deletions \item[$\bullet$] Can assume we that know $k$ to factor 2 \item[$\bullet$] Can output FAIL with $\frac{1}{2}$ probability \item[$\bullet$] Use $S$: $\frac{1}{k}$ fraction of coordinates \begin{itemize} \item[-] Before: In distinct elements algorithm, we checked "$|S \bigcap stream|>0$?" \item[-] Now: Record $S \bigcap stream$ (if not $>1$ element.)\\ Finally: output it \end{itemize} \end{description} To fully build an algorithm for the case with both insertions and deletions, some algorithms for sparse recovery are needed to check the number of distinct elements $k$ in the stream. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % 2.2 Sparse recovery % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Sparse Recovery for $k = 1$} Goal of this section is to introduce algorithms which can check the number of nonzero elements in a vector. The first step of the algorithm is to find the index of nonzero elements, i.e. support of the verctor. Following statements describes the problem to consider in formal statements. \begin{description} \item[Stream] Input: $(i,\alpha) \Rightarrow$ Update: $x_i \rightarrow x_i + \alpha$ \item[Guarantee] At end, $\exists ~i^*$ $s.t.$ $x_j = 0$ $~(\forall j \neq i^* )$ and $x_{i^*} \neq 0$ \item[Goal] Find $i^*$ \item Choose matrix $A\in \mathbb{R}^{o(1)\times n}$ such that $Ax$ gives $i^*$ via some algorithm, i.e. $Ax \Rightarrow i^*$. \end{description} So, the basic goal of the problem is to recover the index of nonzero element given that there is only one nonzero element. To develop an algorithm, let's start with an simple case when the nonzero element has value 1ドル$. \begin{description} \item[Simple case] $x_{i^*}=1$ \item Consider a vector $v = [1,2,3,4,\cdots , n]$ \begin{equation} \langle v,x \rangle = v_{i^*}=i^* ~~\Rightarrow~~ \text{gives } i^* \nonumber \end{equation} \end{description} By defining $A=v=[1,2,3,4,\cdots,n]^T$ we can easily recover the index $i^*$. Now let's consider more general case. \begin{description} \item[General case] $x_{i^*}=\alpha$ \item Consider a vector $v = [1,2,3,4,\cdots , n]$. There are two possible methods \item[(Method 1)] \textit{This method was proposed during the class} \item Consider another vector $v'=[0,1,2,3,\cdots ,n-1]$ \begin{gather} y'=\langle v',x \rangle = \alpha (i^* -1)\nonumber \\ y=\langle v,x \rangle = \alpha i^* \nonumber \\ \Rightarrow \alpha = y-y' \nonumber \\ \text{So we can recover } i^* \text{ by } i^* = \frac{y}{y-y'} \nonumber \end{gather} \end{description} Actually, $v'$ doesn't have to be same as above. It should be any form of vector which has proper linear combination to recover $i^*$. So there could be other possible methods. For example: \begin{description} \item[(Method 2)] \item Consider another vector $v'=[1,1,1,\cdots,1]$ \begin{gather} y'=\langle v',x \rangle = \alpha \nonumber \\ y=\langle v,x \rangle = \alpha i^* \nonumber \\ \Rightarrow i^* = \frac{y}{y'} \end{gather} \end{description} So, we've shown how to recover the index of nonzero element when there exists only one support. However, the \textbf{Guarantee} condition above is not always true. So, an algorithm to check whether $k\neq 1$ is required, where $k$ is a number of nonzero element in given vector $x$. A sketch based algorithm can be used to check the number of nonzero elements with high probability. The number of nonzero elements can be represented in $\ell_0$-norm as $\|x\|_0,ドル and we want to check whether $\|x\|_0 \geq 2$. This $\ell_0$-norm checking step can be done using a linear sketch $Ax$ with only $O(\log n)$ rows. We will start by checking whether $\|x\|_0 \geq 1,ドル then use this to check whether $\|x\|_0 \geq 2$. \begin{description} \item[Simple case] ($k=1$) Check whether $\|x\|_0 \geq 1,ドル i.e. $x \neq 0$ \item[Algorithm NZ] (Based on random generation.) \begin{enumerate} \item Randomly generate a vector $v \in \{\pm1\}^n$ with equal probability $\frac{1}{2}$ so that $v$ has only $\pm 1$ values. \item Output whether $\langle v,x \rangle \neq 0$ or not. \end{enumerate} \end{description} This strategy can actually check $\|x\|_0 \geq 1$ with high probability. Suppose $x_i \neq 0$. And following equation is always true, where $y_{-i}$ represents a vector $y$ with coordinates without the $i_{th}$ element. \begin{equation} \langle v,x \rangle = \langle v_{-i} , x_{-i} \rangle + v_i x_i \nonumber \end{equation} So, to have $\langle v,x,\rangle = 0,ドル $v_i x_i = -\langle v_{-i} , x_{-i} \rangle$ should be satisfied. And $v_i$ determines the sign of $v_i x_i$. So the probability of having appropriate sign is a probability of choosing $v_i$ which is equal to $\frac{1}{2}$. In this sense, \begin{eqnarray} Pr\{\langle v,x \rangle = 0 | v_{-i} \} \leq \frac{1}{2} \nonumber \\ \Rightarrow Pr\{\langle v,x \rangle = 0\} \leq \frac{1}{2} \nonumber \end{eqnarray} If $\|x\|_0 = 0 \Rightarrow$ $\langle v,x \rangle = 0,ドル so if we do this operation for $O(\log n)$ times, we can check whether $\|x\|_0 \geq 1$ with high probability. \begin{description} \item[Advanced case] ($k=2$) Check whether $\|x\|_0 \geq 2~\Leftrightarrow ~\|x\|_0> 1$. And this is equivalent to checking $\|x\|_0 \neq 1$ given $\|x\|_0 \neq 0$. \item[Algorithm C] (Uses Algorithm NZ. FAIL when $\|x\|_0>1$) \begin{enumerate} \item For $O(\log n)$ times \item Choose set $S$ of size $\frac{n}{2},ドル where $S$ is a subset of $[n]$. (Subset of coordinates) \item Run \textbf{Algorithm NZ} on $(Stream \bigcap S)$ and $(Stream \bigcap \bar{S})$.\\ ($\bar{S}$ denotes the complement set of $S$. See Figure \ref{fig:sparse_matrix}) \item If run on both $S$ and $\bar{S}$ output nonzero, output FAIL with $\geq \frac{1}{8}$ probability ($\|x\|_0>1$) \end{enumerate} \end{description} So this Algorithm C can check the $\|x\|_0>1$ with 1ドル-(\frac{7}{8})^{8 \log n} \geq 1-\frac{1}{e^{\log n}} = 1-\frac{1}{n}$ probability, which is a high probability. Combining all the introduced algorithms, it is possible to recover \begin{figure}[ht] \centering \includegraphics[width = 0.9\textwidth]{fig_sparse2.pdf} \caption{Structure of matrix $A$ for checking $\|x\|_0 \geq 2$. Colored columns represents the randomly selected set of coordinates $S$. Others are set to 0ドル$.} \label{fig:sparse_matrix} \end{figure} %\begin{description} %\item[Sparse Recovery] ($k$-Recovery) %\item IF "$\exists ~i^*$ $s.t.$ $x_j = 0$ $~(\forall j \neq i^* )$ and $x_{i^*} \neq 0$", i.e. $\|x\|_0 \geq k$ $\rightarrow$ recover $x$ %\item ELSE $\rightarrow$ Output FAIL %\item This sparse recovery can be done by choosing a proper matrix $A\in \mathbb{R}^{O(\log n)\times n}$. %\end{description} \subsection{Distinct Element Sampling} Now let's go back to the Warm Up problem in Section 2.1 with Case 2 where both insertions and deletions are possible. We are trying to output an element with probability $\approx \frac{1}{k}$. \begin{description} \item[Distinct Element Sampling] (insertion/deletion) \begin{itemize} \item[] FOR $\hat{k}=1,2,4,\cdots,n$ \begin{itemize} \item[] FOR $\log n$ different choices of $S \subseteq [n]$ of size $\frac{n}{\hat{k}}$ \begin{itemize} \item[] $h:[n] \rightarrow [\hat{k}]$ \item[] $S:\{i|h(i)=0\}$ \item[] Run \textbf{Algorithm C} on $(Stream \bigcap S)$\\ IF result $\neq$ FAIL: return it \end{itemize} \item[] END \end{itemize} \item[] END \end{itemize} \end{description} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % 3. Next lecture % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Next lecture} In the next lecture, we will go back to the problem of graph sketching based on the preliminary algorithms covered today. Graph sketching algorithm in \cite{AGM12} guarantees the use of $O(\log ^c n)$ spaces. The flow of idea is quite similar to the original paper's \cite{AGM12}, so it is recommended to read the paper. \bibliographystyle{alpha} \begin{thebibliography}{42} \bibitem[CCF02]{CCF02} Moses~Charikar, Kevin~Chen, and Martin~Farach-Colton. \newblock Finding frequent items in data streams. \newblock {\em Automata, Languages and Programming}, 693--703, Springer, 2012. \bibitem[MP14]{MP14} Gregory~T~Minton and Eric~Price. \newblock Improved Concentration Bounds for Count-Sketch. \newblock {\em SODA}, 669--686, 2014. \bibitem[GLPS12]{GLPS12} Anna~C~Gilbert, Yi~Li, Ely~Porat, and Martin~J~Strauss. \newblock Approximate sparse recovery: optimizing time and measurements. \newblock {\em SIAM Journal on Computing}, 41(2):436--453, 2012. \bibitem[AGM12]{AGM12} Kook~Jin~Ahn, Sudipto~Guha, and Andrew~McGregor. \newblock Graph sketches: sparsification, spanners, and subgraphs. \newblock {\em Proceedings of the 31st symposium on Principles of Database Systems}, 5--14, ACM, 2012. \end{thebibliography} \end{document}

AltStyle によって変換されたページ (->オリジナル) /