Notes for DLCV (CommE5052) | Lucileee Homepage

notes are mainly from DLCV 2021 (NTU COMME5052)

For DNN

$W_i$看起來很像所有類別$i$的trainin data平均的結果 ($W$ can be viewed as an examplar of the corresponding class) since $y_1 = W_1^T * x + b_1, ……, y_n = W_n^T * x + b_n$ for $n$ class classification 內積大，代表做內積的兩個vector相像

==> interpret classifier scores as probabilities $$P(Y=k|X=x_i) = \frac{exp(s_k)}{\sum_{j}{exp(s_j)}}$$ with $s = f(x_i;W)$ as classifier output

==> How similar the predicted vector and the truth vector look like?

GD: using all training data for updaing gradient per iteration
SGD: only using minibatch of training data per iteration (re-sampled the batch of data for every iter)

$$\sigma(t) = \frac{1}{1+e^{-t}}$$ ==> from linear to non-linear Non-linear疊加多層NN才有意義，否則若都是linear的話，$W_1 \times W_2 \times W_3$直接用成一個大$W$就好

讓資料可以分的更好

example

$$E(w) = 1/2 \sum_{i}{w_i^2}$$ ==> regulariser discourages the network using extreme weights ==> to avoid **overfitting**

Property 1: local connectivity ==> only care about a small region in an image one time
Property 2: weight sharing (左眼右眼都是眼睛，或許可以share weight)

==> weighted moving sum 很像filter跟後面那小塊圖片做內積，而當二者很像時，內積會接近1 也就是在input圖片上找和filter相似的pattern

example example example

為了讓圖片邊界也被掃到，也為了不讓output feature map shrink

一次多走幾步

$$\frac{W+2p-k}{S}+1$$ [(input_size + 2*padding - kernel_size)/stride] + 1