fgg blog

: Book Notes

Book Notes: Probability and Information Theory

## Probability and Information Theory

Probability theory is a mathematical framework for representing uncertain statements. It provides a means of quantifying uncertainty as well as axioms (公理) for deriving new uncertain statements. In artificial intelligence applications, we use probability theory in two major ways:

  • The laws of probability tell us how AI systems should reason, so we design our algorithms to compute or approximate various expressions derived using probability theory;
  • We can use probability and statistics to theoretically analyze the behavior of proposed AI systems.

While probability theory allows us to make uncertain statements and to reason in the presence of uncertainty, information theory enables us to quantify the amount of uncertainty in a probability distribution.

Book Notes: Deep-Forest Model

  • online paper, follow the link to all the details.

In this paper, we extend our preliminary study which proposes the gcForest (multi-Grained Cascade Forest) approach for constructing deep forest, a non-NN style deep model. This is a novel decision tree ensemble, with a cascade structure which enables representation learning by forests. Its representational learning ability can be further enhanced by multi-grained scanning, potentially enabling gcForest to be contextual or structural aware. The cascade levels can be automatically determined such that the model complexity can be determined in a data-dependent way rather than manually designed before training; this enables gcForest to work well even on small-scale data, and enables users to control training costs according to computational resource available. Moreover, the gcForest has much fewer hyper-parameters than DNNs. Even better news is that its performance is quite robust to hyper-parameter settings; our experiments show that in most cases, it is able to get excellent performance by using the default setting, even across different data from different domains.

Book Notes: Tree-based Models

## Tree-based models

# Part-I: Theorist views

基本术语和符号约定

一般地,令 $D = {x_1, x_2, \ldots, x_m }$ 表示包含 $m$ 个示例的数据集,每个示例由 $d$ 个属性描述,则每个示例 $x_i = (x_{i1}, x_{i2}, \ldots, x_{id})$ 是 $d$ 维样本空间 $\mathcal{X}$ 的一个向量1,$x_i \in \mathcal{X}$, 其中 $x_{ij}$ 是 $x_i$ 在第 $j$ 个属性上的取值, $d$ 称为样本 $x_i$ 的“维数”(dimensionality)。

要建立一个关于“预测(prediction)”的模型,单有示例数据(也称为样本,sample)还不行,我们还需要获得训练样本的“结果”信息,例如,一个描述西瓜的记录“((色泽=青绿;根蒂=蜷缩;敲声=浊响),好瓜)”。这里,关于示例结果的信息,例如 “好瓜” ,称为 “标记(label)”;拥有了标记信息的示例,则称之为 “样例(example)"。

一般地,用 $(x_i, y_i)$ 表示第 $i$ 个样例,其中 $y_i \in \mathcal{Y}$ 是示例 $x_i$ 的标记, $\mathcal{Y}$ 是所有标记的集合,亦称“标记空间(label space)”或“输出空间”。

如果我们想要预测的是离散值,例如 “好瓜” “坏瓜”,此类学习任务称为 “分类(classification)”;如果要预测的是连续值, 例如西瓜的成熟度0.9,0.4,此类学习任务称为 “回归(regression)”。二分类(binary classification)任务中,通常令 $\mathcal{Y} = {-1, +1 }$ 或 $\mathcal{Y} = {0, 1 }$;对于多分类(multi-class classification), $|\mathcal{Y}| > 2$;对回归任务,$\mathcal{Y} = \R$,$\R$ 为实数集。