'논문'에 해당되는 글 7

  1. 2014.03.22 Robust Visual Tracking using l1 Minimization (ICCV 2009)
  2. 2014.03.19 Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent
  3. 2014.03.17 Nonparametric Adaptive Control using Gaussian Processes with Online Hyperparameter Estimation (CDC 2013)
  4. 2014.03.14 Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision (ICCV 2013)
  5. 2014.03.13 Divide-and-Conquer Matrix Factorization (NIPS 2011)
  6. 2014.03.12 Distributed QR Factorization Based on Randomized Algorithms (PPAM 2011)
  7. 2014.02.13 Robust Matirx Factorization with Unknown Noise (ICCV, 2013)

Robust Visual Tracking using l1 Minimization (ICCV 2009)

  • Propose a robust visual tracking method by casting tracking as a sparse approximation problem in a particle filter framework.

  • Occlusion, corruption and other challenging issues are addressed seamlessly(균일하게) through a set of trivial templates.

  • To find the tracking target at a new frame, each target candidate is sparsely represented in the space spanned by target templates and trivial templates (linear combination of the template set composed of both target templates and trivial templates).

  • A good target candidate can be efficiently represented by the target templates.

  • In the case of occlusion, a limited number of trivial coefficients will be activated.

  • The candidate with the smallest target template projection error is chosen as the tracking result.
    y: target result,  T: target template,  a: target coefficient

  • Particle filter is used for propagating sample distributions over time.

  • 정리: tracking 결과인 y (target candidate)는 근사적으로 T의 linear span에 있다고 가정한다.
    (t는 taget template을 말하는데 image patch를 vector화 시킨 것이라고 보면 된다.)

  • occlusion과 같은 노이즈의 에러를 포함한 수식
  • epsilon의 nonzero entries들이 결국 y에서 corrupted or occluded 된 pixel을 의미
  • 이러한 에럴 포함하여 다시 모델링 하게되면

여기서 e는 trivial coefficient vector를 의미함. I는 identity matrix.

I의 각 vector들은 trivial template이라고 부름.(이또한 vector화 된 것)

  • 여기에 nonnegative coefficient(constraint)를 포함한다.
    (이를 추가하는 이유는 target template과 비슷하게 생긴 (reverse intensity patterns: 명암이 반대가 된것 같은 상황) 불필요한 patch들을 제거하기 위한 용도로 쓰여지는 것 같다.)

  • nonnegativity constraint를 넣어주면 다음과 같은 개선된 결과를 얻을 수 있다.
  • nonnegativity constraint를 고려한 formulation은 다음과 같다.
  • -I는 negative trivial template이 된다.
  • 결국 B가 target template과 trivial template이 있고, 이에 대응하는 coefficient가 a와 e가 되는데, convex optimization을 통하여 문제를 해결하게 되면, 에러들은 e에 sparse하게 모델링 되고, 가장 그럴듯한  target template에 해당하는 coefficient가 a에 반영되여 값이 클 것이다.다음의 그림을 통해 확인 가능.
  • 처음 10개의 coefficient들이 target template에 해당하는 것이고 나머지는 trivial template에 해당하는 것들이다.

  • 이제 이 문제를 어떻게 convex optimize하는지 살펴보면,

    위와 같은 l1 regularization된 least square문제로 formulation할 수 있다. 
  • 최종적으로 위 식을 minimize하는 c중에 a를 찾게 되는 것이다.
  • 위식은 interior-point method를 이용하여 풀며 preconditioned conjugate gradient를 이용하여 푼다고 한다.

  • 이제 지속적인 tracking을 위해 template을 업데이트를 해야하는데 위 식에서 구한 a를 활용하여 계산을 해야한다. 이문제는 particle filter framework하에서 진행되며, particle은 affine transformation parameters들이 된다. 즉, particle들인 rotation과 translation에 의해서 이를 적용한 image patch들이 나오게 될 것이다.??

  • target template은 coefficient와 tracking 결과에 의해 update되며 tracking 결과인 y를 기준으로 약간의 perturbation을 주어 여러 개 생성한다. (만약 y가 현재의 target template들과 유사하지 않다면, 가장 중요하지 않은 template으로 y가 update될 것이다.)

  • 현재 frame에서 y (선택된 target candidate)가 결정될것이고, 이를 기준으로 다시 particle을 뿌린다. (즉, target candidate들이 만들어진다.)
  • 다음 frame에서 뿌려진 particle들로 만들어진 target candidate들 존재하고 이전 frame에서 얻어진 y를 가지고 만들어진 target template들과 함께 optimization문제를 풀어 target template과 가장 유사한 target candidate을 찾으면 그것이 새로운 tracking result가 된다.



저작자 표시 비영리 변경 금지
받은 트랙백이 없고 댓글이 없습니다.

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

  • 이때 random으로 선택한 하나의 entry에 대해 local gradient를 구한 뒤 scaling up (곱하기 N)을 통해 gradient를 구한다. (why?)

  • 여기서 사용되는 stochatic gradient는 online gradient descent 와 같은 방법으로 standard (batch) gradient descent 방법과 달리 연속적으로 들어오는 data 하나하나에 대해서 순차적으로 gradient를 계산하는 방법이다.

  • Map reduce 방식으로 독립적인 subsets에 대해 SGD를 구한 뒤 평균을 취하는 방법이 이용될 수 있다.

  • 위의 그림이 말하고자 하는 바는 모든 entries가 dependent하지 않다는 것이다. 이는 다음과 같은 distributed SGD를 가능하게 하는 key idea가 된다.

  • 3x3 블락으로 data matrix를 나눌 경우 가능한 시나리오를 말한다. 여기서 각 Zi마다 회색 block들은 parallel하게 계산을 수행할 수 있다. (3!개의 strata가 필요한가? 3개면 모든 block들을 다 다룰수 있는데)

  • 다음은 이 논문에서 제안하는 Distributed SGD (DSGD)의 알고리듬이다.

  • 처음 for문에서 figure 1의 그림과 같이 하나의 Zi를 택하여 (d block들로 구성됨) 두번 째 for문에서 각 block들을 parallel하게 수행하게 된다.

  • Netflix dataset에 대한 결과이다. 일반 SGD에 비해 훨씬 더 빨리 수렴이 되는 것을 볼 수 있다. 성능은 크게 차이가 없는 것으로 보여진다.

  • 생각해봐야할 이슈: 이 논문은 independent한 block들에 대해서는 parallel하게 계산하여 전체 데이터 matrix를 빠르게 구하는데 convergence에 대한 guarantee가 local한 block에 대해 가능하면 전체 data matrix에 대한 convergence도 보장이 되는지? performance에 대해 얼마나 차이가 나는지?



저작자 표시 비영리 변경 금지
받은 트랙백이 없고 댓글이 없습니다.

Nonparametric Adaptive Control using Gaussian Processes with Online Hyperparameter Estimation (CDC 2013)

  • Many model reference adaptive control (MRAC) methods employ parametric adaptive elements in which the number of parameters are fixed a-priori and the hyperparameters are pre-defined.
  • Radial Basis Function Networks (RBFNs) are perhaps the most widely used adaptive elements when a basis for the modeling uncertainty is unknown.
  • But, if the system operates outsied of the domain of the RBF kernel's center, or if the hyperparameter of the kernel is not corretly specified, the modeling error will be high and stability cannot be guaranteed globally.
  • To overcome the aforementioned limitations of RBFNs and other fixed-parameter adaptive elements, one can use the Gaussian process (GP) nonparametric adaptive elements.
  • Learning hyperparameters in GPs in an online setting provides a unique set of challenges. Typically, it is assumed in the GP literature that data is available in batch mode.
  • A significant amount of research has been pursued in sparsification to reduce computation time, but these approaches only consider batch data.
  • Therefore, we can only use a subset of the data to ensure online tracktability. A random or windowed subset of data can be used.
  • In this paper, they propose online hyperparameter optimization methods to enhance the robustness of GP-MRAC to wrong choice of hyperparameters. They also provide stability and convergence guarantees in presence of hyperparameter adaptation.
  • Using a modified likelihood function, the hyperparameters are updated using stochastic gradient ascent.

  • In order to learn the most likely hyperparameters, we would ideally maximize the posterior estimate. However, the calculation of P(theta) at each iteration is intractable in general, so instead we maximize the data likelihood.

  • Instead of placing a kernel at every data point, we adopt the notion of placing kernels at a set of a budgeted active basis vector set BV which adequately cover the domain. From an information theoretical standpoint, if the conditional entropy exceeds a certain threshold, we add the point to the current budget.

    • Once a preset budget has been reached, we delete basis vectors from BV to accommodate new data according to one of two possible methods.
    • First (OP): one simply deletes the oldest vector
    • Second (KL): compute the KL divergence between the curret GP and the t+1 GPs missing one vector each, and select the one with the greatest KL divergence, which deletes the point which contributes the least uncertainty reduction to the GP model.


저작자 표시 비영리 변경 금지
받은 트랙백이 없고 댓글이 없습니다.

Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision (ICCV 2013)

  • When the number of observations is limeted, which is common in practice, results from RPCA might be degenerated, e.g., correct samples might be considered as outliers and vice versa.

  • Introduce an alternative objective function that can efficiently deal with deficient examples in rank minimization problem.

  • The proposed alternative alternative objective function can control the lower bound of the rank with a sample and efficient minimizer.

  • They have demonstrated the effectiveness of their proposed objective function through thoughtful experiments.

  • Contributions:

    1. We present a partial sum objective function and its corresponding minimization method for RPCA.

    2. We empirically study the partial sum objective function and claim that it outperforms conventional rank minimization when the number of observed samples is very limited.

    3. We demonstrate how low-level vision problems can be formulated into our framework.

  • Cost function (minimize the partial sum of singular values with l1 regularization term)

where N is the target rank of A which can be derived from definition (e.g. N=1 for background subtraction, N=3 for photometric stereo).

Solve A* using the Partial Singular Value Thresholding (PSVT).

Solve E* using the soft-threshoding operator.


  • We know the target rank in advance.
  • When there are many outliers, it might not be work well. (RPCA methods assume that the proper number of  noise entries exist in data)
  • This paper assumes that data samples are limited. (in practice?)

  • Differences between the proposed method and RPCA method
    • To compute A (low-rank matirx), after the proposed method uses original SVD for the raw data matrix, it uses soft-thresholding (shrinkage operator) for (target rank+1 ~ min(m,n))-th singular values , while RPCA just perform SVD for the raw data matrix.



저작자 표시 비영리 변경 금지
받은 트랙백이 없고 댓글이 없습니다.

Divide-and-Conquer Matrix Factorization (NIPS 2011)



 Goal: Given a matrix M (mxn) formed by deleting and corrupting the entries of L0, recover the underlying low-rank matrix L0.

  • Matrix completion (MC): Small fraction of entries revealed
  • Robust matrix factorization (RMF): Fraction of entries grossly corrupted




State of the art MF algorithms

  • Strong recovery guarantees
  • Plagued by extensive subroutines (e.g., truncated SVD)




Bad news: Provably accurate MC algorithms are still too expensive for large-scale or real-time matrix completion

Question: How can we scale up a given matrix completion algorithm and still retain recovery guarantees?

Idea: Divide and conquer (DFC)

  1. Divide M into submatrices
  2. Factor each submatrix in parallel
  3. Combine submatrix estimates to recover L0


  • Factoring a submatrix is often much cheaper than factoring M
  • Multiple submatrix factorizations can be carried out in parallel
  • DFC works with any base MC algorithm
  • With the right choice of division and recombination, yields recovery guarantees comparable to those of the base algorithm

Two algorithms:

  1. DFC-Nys (Generalized Nystrom Decomposition) 
  2. DFC-Proj (Partition and Project)





Theoretical Analysis (If you are interested in the analysis, see the paper for more details)

  1. Matrix Coherence
  2. DFC Master Theorem
  3. Consequences for Noisy MC
  4. Consequences for Noisy RMF






받은 트랙백이 없고 댓글이 없습니다.

Distributed QR Factorization Based on Randomized Algorithms (PPAM 2011)

  • We investigate randomized algorithms based on gossipng (push-sum algorithm) for the distributed computation of the QR factorization.
  • The main goal of this paper is a comprehensive comparison of this new randomized QR decomposition approach with the existing parallel algorithms.
  • We use modified Gram-Schmidt orthogonalization (mGS) as the basis for our new algorithm.
  • Push-sum algorithm is a gossip algorithm for approximating the sum or the aveage values distributed over a network with N nodes.

  • Left algorithm shows the mGS, while right algorithm represents the dmGS.

  • The only difference is that the sums in the mGS needed for computing a 2-norm (Line 4) and a dot product (Line 11) are replaced by the push-sum algorithm.
  • dmGS assumes as input a matrix A distributed row-wise over N nodes.

distributed QR factorization.pdf

받은 트랙백이 없고 댓글이 없습니다.

Robust Matirx Factorization with Unknown Noise (ICCV, 2013)

  • Most popular loss functions include the L2 and L1 losses
  • L2 is optimal for Gaussian noise, while L1 is for Laplacian distributed noise
  • However, real data is often corrupted by an unknown noise distribution, which is unlikely to be purely Gaussian or Laplacian.
  • To address this problem, this paper proposes a low-rank matrix factorization problem with a Mixture of Gaussians (MoG) noise model.
  • The MoG model is a universal approximator for any continuous distribution, and hence is able to model a wider range of noise distributions.
  • The parameters of the proposed model, MoG, can be estimated with the traditional Expectation-Maximization (EM) under a Maximum Likelihood Estimation (MLE) framework.



받은 트랙백이 없고 댓글이 없습니다.