[Python学习]调用X-13程序进行季节性调整的三种方法-经纬的财新博客-财新网

季节性调整，是处理宏观经济数据的第一步。而对于中国宏观数据而言，还需要对春节单独做调整。本文对使用Python做季节性调整的三种方法做简要比较。

1、statsmodels模块的内置函数

Python的statsmodels模块，内置了x13_arima_analysis，可以直接进行季节性调整。我们先来看看函数的参数设定：

def x13_arima_analysis(endog, maxorder=(2, 1), maxdiff=(2, 1), diff=None,

exog=None, log=None, outlier=True, trading=False,

forecast_periods=None, retspec=False,

speconly=False, start=None, freq=None,

print_stdout=False, x12path=None, prefer_x13=True):

"""

Perform x13-arima analysis for monthly or quarterly data.

Parameters

----------

endog : array_like, pandas.Series

The series to model. It is best to use a pandas object with a

DatetimeIndex or PeriodIndex. However, you can pass an array-like

object. If your object does not have a dates index then ``start`` and

``freq`` are not optional.

maxorder : tuple

The maximum order of the regular and seasonal ARMA polynomials to

examine during the model identification. The order for the regular

polynomial must be greater than zero and no larger than 4. The

order for the seasonal polynomial may be 1 or 2.

maxdiff : tuple

The maximum orders for regular and seasonal differencing in the

automatic differencing procedure. Acceptable inputs for regular

differencing are 1 and 2. The maximum order for seasonal differencing

is 1. If ``diff`` is specified then ``maxdiff`` should be None.

Otherwise, ``diff`` will be ignored. See also ``diff``.

diff : tuple

Fixes the orders of differencing for the regular and seasonal

differencing. Regular differencing may be 0, 1, or 2. Seasonal

differencing may be 0 or 1. ``maxdiff`` must be None, otherwise

``diff`` is ignored.

exog : array_like

Exogenous variables.

log : bool or None

If None, it is automatically determined whether to log the series or

not. If False, logs are not taken. If True, logs are taken.

outlier : bool

Whether or not outliers are tested for and corrected, if detected.

trading : bool

Whether or not trading day effects are tested for.

forecast_periods : int

Number of forecasts produced. The default is None.

retspec : bool

Whether to return the created specification file. Can be useful for

debugging.

speconly : bool

Whether to create the specification file and then return it without

performing the analysis. Can be useful for debugging.

start : str, datetime

Must be given if ``endog`` does not have date information in its index.

Anything accepted by pandas.DatetimeIndex for the start value.

freq : str

Must be givein if ``endog`` does not have date information in its

index. Anything accepted by pandas.DatetimeIndex for the freq value.

print_stdout : bool

The stdout from X12/X13 is suppressed. To print it out, set this

to True. Default is False.

x12path : str or None

The path to x12 or x13 binary. If None, the program will attempt

to find x13as or x12a on the PATH or by looking at X13PATH or

X12PATH depending on the value of prefer_x13.

prefer_x13 : bool

If True, will look for x13as first and will fallback to the X13PATH

environmental variable. If False, will look for x12a first and will

fallback to the X12PATH environmental variable. If x12path points

to the path for the X12/X13 binary, it does nothing.

这个函数是用来对宏观数据进行x13-arima调整的，从参数列表里面可以看到：

maxorder 设定ARMA模型的最大阶数，不超过4

maxdiff 设定ARIMA模型的最大差分次数

diff 直接设定ARIMA模型的差分次数

exog 是否设置外生变量

log 是否做对数变换，如果不选就是模型自动选择

outlier 是否对离群值进行检测和修正

trading 是否对工作日进行调整

forecast_periods 设定预测期数，默认为空

retspec 是否返回识别文件

speconly 仅生成并返回识别文件

start 数据起始时间，如果原始数据没有时间的话，该参数为必填项

freq 数据频率，如果原始数据没有时间的话，该参数为必填项

x12path 录入X-13-ARIMA程序的地址

prefer_x13 优先选择X13方法

优点：这个函数有成熟的包，不需要过多的格式转换，使用方便。

缺点1：结果输出太少。通过查阅其生成的spc文件，可以看到，在结果保存方面，其设定是 x11{ save=(d11 d12 d13) } ，也就是说，其调用X-13程序之后，只保存了 d11 -即季调后序列，d12 -即趋势序列，以及d13 -即不规则项，三个数据。结果输出太少，如果想获得更多结果，需要修改源码，不太方便。

缺点2：春节调整不方便。 X-13程序内置了对西方复活节等移动假日的调整，但是中国春节调整，需要在外生变量处自己做配置，不直观。

2、通过rpy2包调用R的seasonal包

R有一个seasonal包，专门做季节性调整，并且内置了一个 holilday 选项，可以直接对中国春节做季节性调整。我们截取一段R的代码，从下文可以看到，usertype=holiday 可以对春节进行调整，而且R代码中集成了灵活的春节效应设置方法，比如可以按照春节前、春节后分别设置，天数也可以自己选择。

regression{

aictest = td

usertype = holiday

user = xreg

file = "/tmp/RtmpKuV8kT/x1322219a8fc82/xreg.dta"

format = "datevalue"

}

优点： R的seasonal包对春节调整的设置比较方便直观。

缺点1：rpy2的兼容性较差。 Python是通过rpy2，调用R的包，但是根据笔者的实践，rpy2的兼容性较差，目前测试情况，不兼容Python3.8及以上版本。

缺点2：增加R的学习成本。需要另外熟悉R语言，增加学习成本。

3、通过cmd调用X-13程序

目前主流统计软件，进行季节性调整，大多都是调用的X-13-ARIMA程序，比如EVIEWS，以及上文提到的R等。我们也可以通过Python直接调用。大概步骤是：

将Python中的DataFrame数据，转换成X-13-ARIMA程序能识别的数据格式；

根据所使用的数据，写spc文件；

通过 subprocess 模块调用cmd，然后通过cmd调用X-13-ARIMA程序，运行生成的spc文件；

将程序结果转换成DataFrame格式；

呈现结果。

优点：定制性最强，可以根据自己需要，灵活设定各种模型参数，包括设定春节因子等，并使用所有模型结果。

缺点：代码量较大。需要写数据转换函数等。

综合下来，笔者认为，如果不是专门研究春节效应的话，第一种方法基本是够用，或者可以在 statsmodels模块基础上进行一定的二次开发。如果是专门研究春节效应等，那么应当在第三种方法中做深度开发。

话题：