如何理解--Scenecut?

hdfuck · 发表于 2010-6-14 02:25

本帖最后由 hdfuck 于 2010-6-14 02:39 编辑

我的x264参数基础一致比较差，最近才偶尔看些资料，很多都看不懂。
这两天就被--Scenecut这个参数弄得头大，查了不少资料，做如下理解，不知道对不对，请大家指点。

相关资料：

WIKI上的资料
http://mewiki.project357.com/wiki/X264_Settings#scenecut

scenecut
Default: 40

Sets the threshold for I/IDR frame placement (read: scene change detection).

x264 calculates a metric for every frame to estimate how different it is from the previous frame. If the value is lower than scenecut, a 'scenecut' is detected. An I-frame is placed if it has been less than --min-keyint frames since the last IDR-frame, otherwise an IDR-frame is placed. Higher values of scenecut increase the number of scenecuts detected. For more information on how the scenecut comparison works, see this doom9 thread.

Setting scenecut to 0 is equivalent to setting --no-scenecut.

Recommendation: Default

Akupenguin的说法一（2005年）
http://forum.doom9.org/showthread.php?p=702708#post702708

Short answer: scenecut threshold is the max % extra bits it's willing to spend to switch to an I-frame.

First it encodes the frame as P. While it's doing so, it keeps track of how much it costs to code it that way, vs how much it would cost if it had selected intra mode for all macroblocks. ("cost" is either SAD or SATD or RD depending on the value of --subme.) When it has finished coding the frame as P, it computes a threshold proportional to scenecut_threshold and depending on the distance from the previous I-frame. If( P_cost > I_cost * (1 - threshold)) then it recodes the frame as I.

This is not quite the same as XviD. XviD's decision is: if( (total SAD > some threshold) or (more than half of macroblocks are intra) ) then frame is I. (The threshold still depends on distance from the previous I-frame.)

Akupenguin的说法二（2007年）
http://forum.doom9.org/showthread.php?p=942548#post942548

encode current frame as (a really fast approximation of) a P-frame and an I-frame.

if ((distance from previous keyframe) > keyint) then
set IDR-frame
else if (1 - (bit size of P-frame) / (bit size of I-frame) < (scenecut / 100) * (distance from previous keyframe) / keyint) then
if ((distance from previous keyframe) >= minkeyint) then
set IDR-frame
else
set I-frame
else
set P-frame

encode frame for real.

====================================
设置放置I/IDR帧的阈值。这个值越大，放置I/IDR帧的意愿强烈程度越大。
这个结论很多人都知道了，但对具体判定过程非常困惑。
1.WIKI说到用一个数值“metric”来度量当前帧与前一帧的差异，为什么这个值小于
--scenecut值，才判定是一个“scenecut”？
2.--scenecut值本身是不是就是所说“threshold“（阈值）？

解读开始：

scenecut的意思是场景切换。场景切换的时候，当前帧与前一帧的差异是比较大的。一般来说场景变换的时候适合放置I/IDR帧，假如放置P帧，因为当前帧与前一帧的差异比较大，记录差异信息的P帧数据体积就比较大。两帧差异越大，P帧数据体积越接近I帧数据体积，放置P帧的比要性减弱，放置I/IDR帧的比要性增强。
根据Akupenguin的说法，x264编码时，先尝试把当前帧编码成P和I帧（实际上是一个快速估算），记录编码成P帧和I帧的成本（编码成本是指SAD或SATD或RD，具体哪一个取决于—subme的值。这个我就不了解了，编码成本到底是指什么？似乎是跟帧数据体积有关系，成本越大，帧数据体积也越大，按照他2007年说话似乎印证这种猜测）。当P帧成本（P_cost）和I帧成本（ I_cost）接近到“某一程度”，就决定放置I/IDR帧，而不放置P帧。而前面所说的“某一程度”则是跟我们设定--scenecut值、当前帧跟前一个关键帧（I帧或IDR帧）的距离有关，Akupenguin用下面的数学关系表达：

表达式一：
P_cost > I_cost * (1 - threshold)

表达式二：
1 - (bit size of P-frame) / (bit size of I-frame) ＜ (scenecut / 100) * (distance from previous keyframe) / keyint

两表达式意思一致，但形式不统一，不好理解，分别变换成如下形式：

表达式三：
P_cost/ I_cost>(1 - threshold)

表达式四：
(bit size of P-frame) / (bit size of I-frame) >[1-（(scenecut / 100) * (distance from previous keyframe) / keyint]

从上面看出表达式四的中(scenecut / 100) * (distance from previous keyframe) / keyint相当于表达式三中的threshold（阈值）。我们设置--scenecut值对阈值的影响就体验于此。对于某一帧，(bit size of P-frame) / (bit size of I-frame)已经确定，(distance from previous keyframe)也确定，这时我们设置的--scenecut的值越大，越容易满足上面的关系式，于是放置I/IDR帧的几率也越大。

再把表达式四变换，用于wiki说法的理解：

表达式五：
[1- （bit size of P-frame） /（ bit size of I-frame）) ]* 100 keyint/(distance from previous keyframe) ＜scenecut

Wiki上说的用于衡量当前帧与前一帧差异大小的“metric”大概就相当于上面不等式的左边了。当前帧与前一帧差异越大，把当前帧分别编码为P帧和I帧时的数据体积越接近，比值（bit size of P-frame） /（ bit size of I-frame）越大（这个比值应该是在0-1之间），[1- （bit size of P-frame） /（ bit size of I-frame）) ]* 100 keyint/(distance from previous keyframe)越小，当小于--scenecut值的时候，判定这是一个场景切换，放置I/IDR帧。显然从表达式五可以看出，--scenecut值越大，放置放置I/IDR帧的几率越高，wiki和Akupenguin的表达方式不一样，但内涵是一致的。

最后再来看看Akupenguin所给的判定流程及keyint、minkeyint和scenecut三者之间的关系：

（1）如果当前帧与前一个关键帧的距离大于keyint，直接把当前帧位置放置IDR帧。
（2）如果当前帧与前一个关键帧的距离小于keyint，则进入下一步判断：
   ①如果当前帧与前一帧差异足够大到满足关系式：
[1- （bit size of P-frame） /（ bit size of I-frame）) ]* 100 keyint/(distance from previous keyframe) ＜scenecut
且当前帧与前一个关键帧的距离大于或等于minkeyint，那么当前帧位置放置IDR帧
   ②如果满足上面的关系，但当前帧与以前一个关键帧的距离小于minkeyint，那么当前帧位置放置I帧（非IDR的I帧）
   ③如果不满足上面的关系，则当前帧位置放置P帧。

其实，之前困惑的一部分原因看了一个wiki的翻译版本：

设定I/IDR帧放置的阀值。x264会计算每个帧和前面帧的不同，如果不同值低于“scenecut”，那么就确定为是一个scenecut，如果同时又少于min-keyint帧数，那么就随后放置一个I-frame，否则会放置一个IDR-frame。这个值越高，增加scenecut被侦测的几率。

“如果不同值低于“scenecut””这句话一直让人困惑，按常惯思维习惯，前后两帧不同值大到一定程度才放置I/IDR帧，这里却说小于.
再看wiki原文，他并不说不同值小于scenecut。原文说用一个数值（a metric）来衡量前后两帧差异，再用这个值与scenecut值比较。从上面看出，前后两帧差异越大，这个数值越小。

我认为这个翻译是搞错了。现在重新翻译wiki的解释：

设置放置I/IDR帧的阈值。X264会计算一个数值用于衡量每一帧与前一帧的差异，如果这个数值小于—scenecut的值，那么就确定为是一个“scenecut”（场景切换），如果同时该帧与上一个IDR帧的距离小于--min-keyint值，该处放置一个I帧，否则放置IDR帧。这个值越高，scenecut被侦测到的几率越大。
--scenecut 0等同于--no-scenecut
默认值：40
推荐值：默认

写完头还是很大。。。
再看到DS的一句话：
Again, "max keyint" in scenecut routines should be the distance between keyframes, not the distance between IDR frames.
又开始困惑了
到底-keyint是“IDR帧之间的最大间隔”？还是“GOP的最大长度”？
对--minkeyint也有类似困惑
有些GOP包含的是非IDR的I帧，按照上面Akupenguin的判定流程，非IDR的I帧与其它关键帧的距离可以小于minkeyint，另外megui还把这个参数称为mininum GOP size，是不是搞错了？到底谁错了？

请各位高手指点，谢谢

hdfuck · 发表于 2010-6-14 02:41

思绪比较混乱。。。

angering · 发表于 2010-6-18 09:17

本帖最后由 angering 于 2010-6-18 10:21 编辑

小白一隻，幫忙樓主人工置頂一下吧~

aki大的博客記得有提及IDR和I幀的區別…… = =
roozhou大也提及過……不夠忘了在哪……好像是dwing吧一個帖子……
樓主好強……一堆E文……
看了樓主文章好久……
覺得其實從1-P/I，
不是因為I和P體積差異大，才代表和之前的畫面差異小么（P是inter，I是intra），那麼1-P/I就大？
如果P幀和I幀體積差異小，那麼代表當前幀和之前幀差異大，那麼1-P/I就小，要插入I或者IDR么？

而從最後那個一堆akupenguin的2007年代碼（那是代碼嗎？）看
IDR應該是--keyint吧？

我覺得關於GOP和IDR的問題，還是看看aki大的博客上的那篇文章吧，裡頭介紹的很不錯~

最後，同問這個：
”Short answer: scenecut threshold is the max % extra bits it's willing to spend to switch to an I-frame.“
我的渣翻譯：scenecut閥值是一個（給編碼器）的作為判斷是否願意使用I幀的最大的額外碼率指標
這裡怎麼看起來是scenecut閥值越大，I幀越少呢？
還有，這個
”(1 - (bit size of P-frame) / (bit size of I-frame) < (scenecut / 100) * (distance from previous keyframe) / keyint) “
這裡的keyint，是指max，還是min啊？為啥距離上一個keyframe越遠，位權越重啊？

akiduki · 发表于 2010-6-19 22:03

GOP的定义并不明确，因为GOP是在SVC里被正式定义的，除了第一个GOP的第一帧必须是I以外，其他GOP的头尾都不一定是I...
x264由于是AVC的编码器所以其实是没有GOP这个概念的，因此keyint和min-keyint就可以看成GOP增长的限度吧。
scenecut无非就是mode decision时多加入的一个拉格朗日算子而已。Mode Decision无非就是根据RD模式做的bit allocation的优化问题而已。

帐号		自动登录	找回密码
密码			成为会员

[AVC(H.264)] 如何理解--Scenecut?