2019-06-29

Python based ops

有些时候纯粹用shell似乎感觉有点力不从心，在python的帮助下似乎简单很多

format json

using following commands：

echo '{"foo": "bar"}' | python -m json.tool

analyze results from timer

在系统的performance的相关问题中常常需要测性能，比如某个部分的时间，之后设置好的timer会打印出很多的格式化的数据存在log中。之前的时候常常用shell过滤出来所需要的log信息然后放在excel中处理，但是效率还是低，而且linux环境下使用excel不很方便，于是就采用shell+python结合起来的方式。

比如log中的信息如下

timer1: the time 1.0
timer1: the time 2.0
timer1: the time 3.0

这个时候可以用如下的bash先讲data过滤出来

cat testdata.txt |cut -d " " -f 4

之后再将这些数据使用pip传送给python scripts 比如

cat testdata.txt |cut -d " " -f 4 | python3 analysis.py

具体的python scripts如下，需要对于list添加什么样的运算都可以灵活处理，简单直接

import sys
import statistics 

data_list = []
real_data_list = []
def isfloat(value):
  try:
    float(value)
    return True
  except ValueError:
    return False

for line in sys.stdin:
    new_list = [(elem) for elem in line.split()]
    if(len(new_list)>0):
      data_list.append(new_list[0])

#print (data_list)
# delete useless string
for d in (data_list):
    if isfloat(d):
        real_data_list.append(float(d))

#print (real_data_list)
print ("len", len(real_data_list))
print ("avg", statistics.mean(real_data_list))
print ("min", min(real_data_list) )
print ("max", max(real_data_list) )
print ("std",statistics.stdev(real_data_list))
print ("sum",sum(real_data_list))

Actually, python statistical library have provided a function to compute these values at once:

import numpy as np
from scipy.stats import describe
v = np.random.normal(size=100)
res = describe(v)
print(res)

In addition to these basic descriptions, another description is the statistical significance test, this video shows a good theoretical backgound about the hypothesis testing.

Assuming the that the collected value is the execution time, you may see that 50% of them completed in 1 seconds. However, it is more professional to use the hypothesis testing to show this. Such as the p value to reject the hypothesis is 0.01, null hypothesis is the execution time which is larger then specific threshold.

Using python library can do the ttest quickly, there are flexible parameters to set the left tail testing, right tail testing of double sided testing.

AverageMind

Python based ops

format json

analyze results from timer

推荐文章