Piwik源码分析及API编写
Piwik提供了一整套的事件统计,分析功能,并且可以以xml,json提供返回数据,还可以生成图表.
Piwik提供了Api的编写的一套方案,Coder只需要专注于数据的的处理即可.在Piwik根目录下有一个plugins的文件夹,文件夹下即当前所有的插件.最简单的额学习方式就是查看ExampleAPI和ExamplePlugin,代码风格已经很不错了,看起来其实也还不错.
一个API建立的最简单的流程就是在plugin目录下建立一个文件夹,例如Example.目录下新建API.php,类名为Piwik_PluginName_API
<?php
class Piwik_Example_API{
static private $instance = null;
static public function getInstance()
{
if (self::$instance == null)
{
self::$instance = new self;
}
return self::$instance;
}
public function getDatatable(){
$datatable = array();
$datatable["start"]="OK";
return $datatable;
}
}
然后就可以通过localhost/piwik/index.php?module=API&method=Example.getDatatable&format=xml&token_auth=*访问(搭建在本机的piwik,token在登录以后可以查询).显然,这里用了单例模式,防止多次被实例化.
下面提示几个需要注意的地方.
如果需要接受GET/POST的值,不要直接使用$_GET/POST,使用系统自带的Piwik_Common::getRequestVar(‘NAME’);
为了避免数据库注入攻击,不要直接用GET得到的参数执行查询语句
例如
<?php
$idsite = $_GET['value'];
Piwik_Query( "SELECT * FROM ".Piwik_Common::prefixTable('site')." WHERE idsite = $idsite" );
此时可以提交”1 OR 1”这样的字符串,查询语句就会是:”SELECT * FROM piwik_site WHERE idsite = 1 OR 1”
这样就会显示所有的站点.
当然,也可以自己对注入攻击进行处理,mysql_real_escape_string()什么的.
数据库的查询是用Piwik自带的函数
• function Piwik_Query( $sqlQuery, $parameters = array())
• function Piwik_FetchAll( $sqlQuery, $parameters = array())
• function Piwik_FetchOne( $sqlQuery, $parameters = array())
<?php
$feedburnerFeedName = Piwik_FetchOne('SELECT feedburnerName
FROM '.Piwik_Common::prefixTable('site').
' WHERE idsite = ? and name = ?',
array( Piwik_Common::getRequestVar('idSite'), Piwik_Common::getRequestVar('name') )
);
$sqlQuery中为”?”的地方可以一一匹配数据值,将数据和sql语句分开.
Piwik支持以segment作为filiter对已选择的数据进行过滤,以下的gist就是当segment=browser==FF的情况下源码生成的数组及所有所有支持的segment
对于数据库的查询,给出Live.getLastVisitsDetails (idSite, period, date, segment = '', filter_limit = '', maxIdVisit = '', minTimestamp = '')
的代码分析
$visitorDetails = $this->loadLastVisitorDetailsFromDatabase($idSite, $period, $date, $segment, $filter_limit, $maxIdVisit, $visitorId = false, $minTimestamp);
$dataTable = $this->getCleanedVisitorsFromDetails($visitorDetails, $idSite);
首先是loadLastVisitorDetailsFromDatabase()这个函数里面,实际上就只是处理数据然后对数据库进行读取,整套系统其实也都是这么做而已
$where
array(
[0] => 'log_visit.idsite = ? '
[1] => 'log_visit.visit_last_action_time >= ?'
)
$whereBind
array(
[0] =>4
[1] =>'2013-04-30 16:00:00'
)
使用join组合成字符串
if(count($where) > 0)
{
$where = join("
AND ", $where);
}
log_visit.idsite = ? AND log_visit.visit_last_action_time >= ?
最后通过segment组合
segment =Piwik_SegmentExpression(
joins =
valuesBind =
parsedTree =
tree = array(
[0] =>array(
[0] =>
[1] => 'browserName==FF'
)
)
parsedSubExpressions = array(
[0] =>array(
[0] =>
[1] =>array(
[0] =>'log_visit.config_browser_name'
[1] =>'=='
[2] =>'FF'
)
)
)
string = 'browserName==FF'
)
$select = "log_visit.*";
$from = "log_visit";
$subQuery = $segment->getSelectQuery($select, $from, $where, $whereBind, $orderBy);
array(
['sql'] =>'
SELECT
log_visit.*
FROM
piwik_log_visit AS log_visit
WHERE
( log_visit.idsite = ?
AND log_visit.visit_last_action_time >= ? )
AND
( log_visit.config_browser_name = ? )
ORDER BY
idsite, visit_last_action_time DESC'
['bind'] =>array(
[0] =>4
[1] =>'2013-04-30 16:00:00'
[2] =>'FF'
)
)
SQL的再度组装
$sql = "
SELECT sub.*
FROM (
".$subQuery['sql']."
$sqlLimit
) AS sub
GROUP BY sub.idvisit
ORDER BY $orderByParent
";
SELECT sub.*
FROM (
SELECT log_visit.*
FROM piwik_log_visit AS log_visit
WHERE ( log_visit.idsite = ? AND log_visit.visit_last_action_time >= ? )
AND ( log_visit.config_browser_name = ? )
ORDER BY idsite, visit_last_action_time DESC LIMIT 100 )
AS sub
GROUP BY sub.idvisit
ORDER BY sub.visit_last_action_time DESC
$data = Piwik_FetchAll($sql, $subQuery['bind']);
进行查询
查询结束就会返回$visitorDetails,再通过这个数据进行详细查询和空值的过滤getCleanedVisitorsFromDetails()
[9] =>array(
['type'] =>1
['url'] =>'localhost/test/'
['url_prefix'] =>0
['pageTitle'] =>['pageIdAction'] =>11
['pageId'] =>9101
['serverTimePretty'] =>'2013-05-25 05:59:38'
['timeSpentRef'] =>1
['custom_var_k1'] =>'section1'
['custom_var_v1'] =>'third'
['custom_var_k2'] =>'section2'
['custom_var_v2'] =>'second'
)
[10] =>array(
['type'] =>1
['url'] =>'localhost/test/'
['url_prefix'] =>0
['pageTitle'] =>['pageIdAction'] =>11
['pageId'] =>9102
['serverTimePretty'] =>'2013-05-25 06:00:08'
['timeSpentRef'] =>30
['custom_var_k1'] =>'section1'
['custom_var_v1'] =>'first'
['custom_var_k2'] =>'section2'
['custom_var_v2'] =>'second'
)
[11] =>array(
['type'] =>1
['url'] =>'localhost/test/'
['url_prefix'] =>0
['pageTitle'] =>['pageIdAction'] =>11
['pageId'] =>9103
['serverTimePretty'] =>'2013-05-25 06:00:50'
['timeSpentRef'] =>42
)
以上为执行$actionDetails = Piwik_FetchAll($sql, array($idvisit));之后的原始数据
<row>
<type>action</type>
<url>http://localhost/test/</url>
<pageTitle/>
<pageIdAction>11</pageIdAction>
<pageId>9101</pageId>
<serverTimePretty>周六 25 五月 13:59:38 </serverTimePretty>
<customVariables>
<row>
<customVariableName1>section1</customVariableName1>
<customVariableValue1>third</customVariableValue1>
</row>
<row>
<customVariableName2>section2</customVariableName2>
<customVariableValue2>second</customVariableValue2>
</row>
</customVariables>
<timeSpent>30</timeSpent>
<timeSpentPretty>30 秒</timeSpentPretty>
<icon/>
</row>
<row>
<type>action</type>
<url>http://localhost/test/</url>
<pageTitle/>
<pageIdAction>11</pageIdAction>
<pageId>9102</pageId>
<serverTimePretty>周六 25 五月 14:00:08</serverTimePretty>
<customVariables>
<row>
<customVariableName1>section1
</customVariableName1>
<customVariableValue1>first
</customVariableValue1>
</row>
<row>
<customVariableName2>section2</customVariableName2>
<customVariableValue2>second</customVariableValue2>
</row>
</customVariables>
<timeSpent>42</timeSpent>
<timeSpentPretty>42 秒</timeSpentPretty>
<icon/>
</row>
可以看到时间上是错位了的= =,然后翻源代码,可以看到一个循环外加一个 注释
// set the time spent for this action (which is the timeSpentRef of the next action)
if (isset($actionDetails[$actionIdx + 1]))
{
$actionDetail['timeSpent'] = $actionDetails[$actionIdx + 1]['timeSpentRef'];
$actionDetail['timeSpentPretty'] = Piwik::getPrettyTimeFromSeconds($actionDetail['timeSpent']);
}
= =..也就是说,每次储存的时间长度是下一个动作的时间,这就很奇怪了.
另外,这套系统还有一些很不注意的地方,例如
数据整理结束以后,usort($actions, array($this, ‘sortByServerTime’));按照ServerTime来对所有数据进行排序
这样就有个很愚蠢的问题,如果有人同时访问了一个页面,那么时间戳是相同的,PageID会在访问的时候被确定
就会出现以下的问题..所以我很不明白既然都已经根据访问次序给了PageID了为什么不直接用PageID排序反而是用时间戳排序,再说了,就算用时间戳排序,难道不会在时间戳相同的情况下用PageID排序么?
<row>
<pageId>9100</pageId>
<serverTimePretty>周六 25 五月 13:59:37</serverTimePretty>
</row>
<row>
<pageId>9099</pageId>
<serverTimePretty>周六 25 五月 13:59:37</serverTimePretty>
</row>