system.time()
function to analyse function performancemicrobenchmark
package and identically named function to time function callsmicrobenchmark
, otherwise times of individual runs are returneslibrary("microbenchmark")
apply()
function (as in Week 9)colMeans()
functionsystem.time
and microbenchmark
set.seed(2021)
# Here we create a data frame of 1000 observations of 50 variables
# where each variable is a random draw from a normal distribution with mean
# drawn from a uniform distribution between 0 and 10 and standard deviation 1
dat <- data.frame(mapply(
function(x) cbind(rnorm(n = 1000, mean = x, sd = 1)),
runif(n = 50, min = 0, max = 10)
))
dim(dat)
[1] 1000 50
time
moduletimeit
module provides a better alternative as it does it automatically an moremicrobenchmark
in R in that it averages over many runs%timeit
Kernel
, Change kernel
and pick Python from the drop-down menuimport random
import numpy as np
import pandas as pd
# Random numbers in Python can be generated either using
# the built-in `random` module or using `numpy` external
# module (which is underlying a lot of `pandas` operations)
random.gauss(mu = 0, sigma = 1)
-0.7261368325293743
# Instead of just a float number it returns an array
np.random.randn(1)
array([-0.88354514])
# Let's start our benchmarking experiments from looking
# at random number generation in Python.
# First let's draw a sample of 1M using both built-in `random` module
# And `numpy`'s methods
N = 1000000
# We can use `for _` expression to inicate that returned value is being discarded
%timeit [random.gauss(mu = 0, sigma = 1) for _ in range(N)]
358 ms ± 2.62 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# `numpy` is order of magnitude faster than built-in module
%timeit np.random.normal(size = N)
18.2 ms ± 99.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
pandas
DataFramemean()
from statistics
module)from statistics import mean
# Here we are, essnetially, replicating the process of data frame creation as in R above
# each variable is a random draw from a normal distribution with mean
# drawn from a uniform distribution between 0 and 10 and standard deviation 1
dat2 = pd.DataFrame(np.concatenate([
np.random.normal(loc = x, scale = 1, size = (1000, 1))
for x
in np.random.uniform(low = 0, high = 10, size = 50)
], axis = 1))
dat2.shape
(1000, 50)