Piwik源码分析及API编写

Piwik源码分析及API编写

Piwik提供了一整套的事件统计,分析功能,并且可以以xml,json提供返回数据,还可以生成图表.

Piwik提供了Api的编写的一套方案,Coder只需要专注于数据的的处理即可.在Piwik根目录下有一个plugins的文件夹,文件夹下即当前所有的插件.最简单的额学习方式就是查看ExampleAPI和ExamplePlugin,代码风格已经很不错了,看起来其实也还不错.

一个API建立的最简单的流程就是在plugin目录下建立一个文件夹,例如Example.目录下新建API.php,类名为Piwik_PluginName_API

<?php
    class Piwik_Example_API{
        static private $instance = null;

        static public function getInstance()
        {
            if (self::$instance == null)
            {
                self::$instance = new self;
            }
            return self::$instance;
        }

        public function getDatatable(){
            $datatable = array();
            $datatable["start"]="OK";
            return $datatable;
        }
    }

然后就可以通过localhost/piwik/index.php?module=API&method=Example.getDatatable&format=xml&token_auth=*访问(搭建在本机的piwik,token在登录以后可以查询).显然,这里用了单例模式,防止多次被实例化.

下面提示几个需要注意的地方.

如果需要接受GET/POST的值,不要直接使用$_GET/POST,使用系统自带的Piwik_Common::getRequestVar(‘NAME’);

为了避免数据库注入攻击,不要直接用GET得到的参数执行查询语句
例如

<?php
$idsite = $_GET['value'];
Piwik_Query( "SELECT * FROM ".Piwik_Common::prefixTable('site')." WHERE idsite = $idsite" );

此时可以提交”1 OR 1”这样的字符串,查询语句就会是:”SELECT * FROM piwik_site WHERE idsite = 1 OR 1”
这样就会显示所有的站点.

当然,也可以自己对注入攻击进行处理,mysql_real_escape_string()什么的.

数据库的查询是用Piwik自带的函数

• function Piwik_Query( $sqlQuery, $parameters = array())
• function Piwik_FetchAll( $sqlQuery, $parameters = array())
• function Piwik_FetchOne( $sqlQuery, $parameters = array())

<?php
    $feedburnerFeedName = Piwik_FetchOne('SELECT feedburnerName
    FROM '.Piwik_Common::prefixTable('site').
    ' WHERE idsite = ? and name = ?',
    array( Piwik_Common::getRequestVar('idSite'), Piwik_Common::getRequestVar('name') )
);

$sqlQuery中为”?”的地方可以一一匹配数据值,将数据和sql语句分开.

Piwik支持以segment作为filiter对已选择的数据进行过滤,以下的gist就是当segment=browser==FF的情况下源码生成的数组及所有所有支持的segment

对于数据库的查询,给出Live.getLastVisitsDetails (idSite, period, date, segment = '', filter_limit = '', maxIdVisit = '', minTimestamp = '')的代码分析

$visitorDetails = $this->loadLastVisitorDetailsFromDatabase($idSite, $period, $date, $segment, $filter_limit, $maxIdVisit, $visitorId = false, $minTimestamp);

$dataTable = $this->getCleanedVisitorsFromDetails($visitorDetails, $idSite);

首先是loadLastVisitorDetailsFromDatabase()这个函数里面,实际上就只是处理数据然后对数据库进行读取,整套系统其实也都是这么做而已

$where
array(
    [0] => 'log_visit.idsite = ? '
    [1] => 'log_visit.visit_last_action_time >= ?'
)
$whereBind
array(
    [0] =>4
    [1] =>'2013-04-30 16:00:00'
)

使用join组合成字符串

if(count($where) > 0)
{
    $where = join(" 
        AND ", $where);
}

log_visit.idsite = ? AND log_visit.visit_last_action_time >= ?

最后通过segment组合

segment =Piwik_SegmentExpression(
    joins =
    valuesBind =
    parsedTree =
    tree = array(
    [0] =>array(
        [0] =>                    
        [1] => 'browserName==FF'
    )
)
parsedSubExpressions = array(
    [0] =>array(
        [0] =>
        [1] =>array(
            [0] =>'log_visit.config_browser_name'
            [1] =>'=='
            [2] =>'FF'
        )
    )
)
string = 'browserName==FF'
)

    $select = "log_visit.*";
    $from = "log_visit";
    $subQuery = $segment->getSelectQuery($select, $from, $where, $whereBind, $orderBy);


array(
    ['sql'] =>'
        SELECT
        log_visit.*
        FROM
        piwik_log_visit AS log_visit
        WHERE
        ( log_visit.idsite = ?
        AND log_visit.visit_last_action_time >= ? )
        AND
        ( log_visit.config_browser_name = ? )
        ORDER BY
        idsite, visit_last_action_time DESC'
    ['bind'] =>array(
        [0] =>4
        [1] =>'2013-04-30 16:00:00'
        [2] =>'FF'
    )
)

SQL的再度组装

    $sql = "
        SELECT sub.* 
        FROM ( 
            ".$subQuery['sql']."
            $sqlLimit
        ) AS sub
        GROUP BY sub.idvisit
        ORDER BY $orderByParent
    "; 


SELECT sub.* 
FROM ( 
    SELECT log_visit.* 
    FROM piwik_log_visit AS log_visit 
    WHERE ( log_visit.idsite = ? AND log_visit.visit_last_action_time >= ? ) 
        AND ( log_visit.config_browser_name = ? ) 
    ORDER BY idsite, visit_last_action_time DESC LIMIT 100 ) 
AS sub 
GROUP BY sub.idvisit 
ORDER BY sub.visit_last_action_time DESC 

$data = Piwik_FetchAll($sql, $subQuery['bind']);

进行查询

查询结束就会返回$visitorDetails,再通过这个数据进行详细查询和空值的过滤getCleanedVisitorsFromDetails()


[9] =>array(

    ['type'] =>1
    ['url'] =>'localhost/test/'
    ['url_prefix'] =>0
    ['pageTitle'] =>['pageIdAction'] =>11
    ['pageId'] =>9101
    ['serverTimePretty'] =>'2013-05-25 05:59:38'
    ['timeSpentRef'] =>1
    ['custom_var_k1'] =>'section1'
    ['custom_var_v1'] =>'third'
    ['custom_var_k2'] =>'section2'
    ['custom_var_v2'] =>'second'
)
[10] =>array(

    ['type'] =>1
    ['url'] =>'localhost/test/'
    ['url_prefix'] =>0
    ['pageTitle'] =>['pageIdAction'] =>11
    ['pageId'] =>9102
    ['serverTimePretty'] =>'2013-05-25 06:00:08'
    ['timeSpentRef'] =>30
    ['custom_var_k1'] =>'section1'
    ['custom_var_v1'] =>'first'
    ['custom_var_k2'] =>'section2'
    ['custom_var_v2'] =>'second'
    )
[11] =>array(

    ['type'] =>1
    ['url'] =>'localhost/test/'
    ['url_prefix'] =>0
    ['pageTitle'] =>['pageIdAction'] =>11
    ['pageId'] =>9103
    ['serverTimePretty'] =>'2013-05-25 06:00:50'
    ['timeSpentRef'] =>42
) 

以上为执行$actionDetails = Piwik_FetchAll($sql, array($idvisit));之后的原始数据

<row>
    <type>action</type>
    <url>http://localhost/test/</url>
    <pageTitle/>
    <pageIdAction>11</pageIdAction>
    <pageId>9101</pageId>
    <serverTimePretty>周六 25 五月 13:59:38    </serverTimePretty>
    <customVariables>
        <row>
            <customVariableName1>section1</customVariableName1>
            <customVariableValue1>third</customVariableValue1>
        </row>
        <row>
            <customVariableName2>section2</customVariableName2>
            <customVariableValue2>second</customVariableValue2>
        </row>
    </customVariables>
    <timeSpent>30</timeSpent>
    <timeSpentPretty>30 秒</timeSpentPretty>
    <icon/>
</row>
<row>
    <type>action</type>
    <url>http://localhost/test/</url>
    <pageTitle/>
    <pageIdAction>11</pageIdAction>
    <pageId>9102</pageId>
    <serverTimePretty>周六 25 五月 14:00:08</serverTimePretty>
    <customVariables>
        <row>
            <customVariableName1>section1
            </customVariableName1>
            <customVariableValue1>first
            </customVariableValue1>
        </row>
        <row>
            <customVariableName2>section2</customVariableName2>
            <customVariableValue2>second</customVariableValue2>
        </row>
    </customVariables>
    <timeSpent>42</timeSpent>
    <timeSpentPretty>42 秒</timeSpentPretty>
    <icon/>
</row>

可以看到时间上是错位了的= =,然后翻源代码,可以看到一个循环外加一个 注释

// set the time spent for this action (which is the timeSpentRef of the next action)
if (isset($actionDetails[$actionIdx + 1]))
{
    $actionDetail['timeSpent'] = $actionDetails[$actionIdx + 1]['timeSpentRef'];
    $actionDetail['timeSpentPretty'] = Piwik::getPrettyTimeFromSeconds($actionDetail['timeSpent']);

}

= =..也就是说,每次储存的时间长度是下一个动作的时间,这就很奇怪了.

另外,这套系统还有一些很不注意的地方,例如
数据整理结束以后,usort($actions, array($this, ‘sortByServerTime’));按照ServerTime来对所有数据进行排序
这样就有个很愚蠢的问题,如果有人同时访问了一个页面,那么时间戳是相同的,PageID会在访问的时候被确定
就会出现以下的问题..所以我很不明白既然都已经根据访问次序给了PageID了为什么不直接用PageID排序反而是用时间戳排序,再说了,就算用时间戳排序,难道不会在时间戳相同的情况下用PageID排序么?

<row>
<pageId>9100</pageId>
<serverTimePretty>周六 25 五月 13:59:37</serverTimePretty>
</row>
<row>
<pageId>9099</pageId>
<serverTimePretty>周六 25 五月 13:59:37</serverTimePretty>
</row>