I am a bit struggled with so many int data types in cython.
np.int, np.int_, np.int_t, int
I guess int in pure python is equivalent to np.int_, then where does np.int come from? I cannot find the document from numpy? Also, why does np.int_ exist given we do already have int?
In cython, I guess int becomes a C type when used as cdef int or ndarray[int], and when used as int() it stays as the python caster?
Is np.int_ equivalent to long in C? so cdef long is the identical to cdef np.int_?
Under what circumstances should I use np.int_t instead of np.int? e.g. cdef np.int_t, ndarray[np.int_t] ...
Can someone briefly explain how the wrong use of those types would affect the performance of compiled cython code?
13 Answers
It's a bit complicated because the names have different meanings depending on the context.
int
In Python
The
intis normally just a Python type, it's of arbitrary precision, meaning that you can store any conceivable integer inside it (as long as you have enough memory).>>> int(10**50) 100000000000000000000000000000000000000000000000000However, when you use it as
dtypefor a NumPy array it will be interpreted asnp.int_1. Which is not of arbitrary precision, it will have the same size as C'slong:>>> np.array(10**50, dtype=int) OverflowError: Python int too large to convert to C longThat also means the following two are equivalent:
np.array([1,2,3], dtype=int) np.array([1,2,3], dtype=np.int_)As Cython type identifier it has another meaning, here it stands for the c type
int. It's of limited precision (typically 32bits). You can use it as Cython type, for example when defining variables withcdef:cdef int value = 100 # variable cdef int[:] arr = ... # memoryviewAs return value or argument value for
cdeforcpdeffunctions:cdef int my_function(int argument1, int argument2): # ...As "generic" for
ndarray:cimport numpy as cnp cdef cnp.ndarray[int, ndim=1] val = ...For type casting:
avalue = <int>(another_value)And probably many more.
In Cython but as Python type. You can still call
intand you'll get a "Python int" (of arbitrary precision), or use it forisinstanceor asdtypeargument fornp.array. Here the context is important, so converting to a Pythonintis different from converting to a C int:cdef object val = int(10) # Python int cdef int val = <int>(10) # C int
np.int
Actually this is very easy. It's just an alias for int:
>>> int is np.int
TrueSo everything from above applies to np.int as well. However you can't use it as a type-identifier except when you use it on the cimported package. In that case it represents the Python integer type.
cimport numpy as cnp
cpdef func(cnp.int obj): return objThis will expect obj to be a Python integer not a NumPy type:
>>> func(np.int_(10))
TypeError: Argument 'obj' has incorrect type (expected int, got numpy.int32)
>>> func(10)
10My advise regarding np.int: Avoid it whenever possible. In Python code it's equivalent to int and in Cython code it's also equivalent to Pythons int but if used as type-identifier it will probably confuse you and everyone who reads the code! It certainly confused me...
np.int_
Actually it only has one meaning: It's a Python type that represents a scalar NumPy type. You use it like Pythons int:
>>> np.int_(10) # looks like a normal Python integer
10
>>> type(np.int_(10)) # but isn't (output may vary depending on your system!)
numpy.int32Or you use it to specify the dtype, for example with np.array:
>>> np.array([1,2,3], dtype=np.int_)
array([1, 2, 3])But you cannot use it as type-identifier in Cython.
cnp.int_t
It's the type-identifier version for np.int_. That means you can't use it as dtype argument. But you can use it as type for cdef declarations:
cimport numpy as cnp
import numpy as np
cdef cnp.int_t[:] arr = np.array([1,2,3], dtype=np.int_) |---TYPE---| |---DTYPE---|This example (hopefully) shows that the type-identifier with the trailing _t actually represents the type of an array using the dtype without the trailing t. You can't interchange them in Cython code!
Notes
There are several more numeric types in NumPy I'll include a list containing the NumPy dtype and Cython type-identifier and the C type identifier that could also be used in Cython here. But it's basically taken from the NumPy documentation and the Cython NumPy pxd file:
NumPy dtype Numpy Cython type C Cython type identifier
np.bool_ None None
np.int_ cnp.int_t long
np.intc None int
np.intp cnp.intp_t ssize_t
np.int8 cnp.int8_t signed char
np.int16 cnp.int16_t signed short
np.int32 cnp.int32_t signed int
np.int64 cnp.int64_t signed long long
np.uint8 cnp.uint8_t unsigned char
np.uint16 cnp.uint16_t unsigned short
np.uint32 cnp.uint32_t unsigned int
np.uint64 cnp.uint64_t unsigned long
np.float_ cnp.float64_t double
np.float32 cnp.float32_t float
np.float64 cnp.float64_t double
np.complex_ cnp.complex128_t double complex
np.complex64 cnp.complex64_t float complex
np.complex128 cnp.complex128_t double complexActually there are Cython types for np.bool_: cnp.npy_bool and bint but both they can't be used for NumPy arrays currently. For scalars cnp.npy_bool will just be an unsigned integer while bint will be a boolean. Not sure what's going on there...
1 Taken From the NumPy documentation "Data type objects"
4Built-in Python types
Several python types are equivalent to a corresponding array scalar when used to generate a dtype object:
int np.int_ bool np.bool_ float np.float_ complex np.cfloat bytes np.bytes_ str np.bytes_ (Python2) or np.unicode_ (Python3) unicode np.unicode_ buffer np.void (all others) np.object_
np.int_ is the default integer type (as defined in the NumPy docs), on a 64bit system this would be a C long. np.intc is the default C int either int32 or int64. np.int is an alias to the built-in int function
>>> np.int(2.4)
2
>>> np.int is int # object id equality
TrueThe cython datatypes should reflect C datatypes, so cdef int a is a C int and so on.
As for np.int_t that is the Cython compile time equivalent of the NumPy np.int_ datatype, np.int64_t is the Cython compile time equivalent of np.int64
This is a clarification on difference between int and np.int_t in Cython code, which are not the same:
np.int_t maps to long and not to int in Cython code.
That means:
- On 64bit Windows (i.e. compiled with MSVC),
intis 4 bytes but alsolong(and thusnp.int_t). - On 64bit Linux (i.e. compiled with gcc),
intis 4 bytes butlong(and thusnp.int_t) is 8 bytes!
An np.int-numpy-array would map to np.int_t[:]-memory view in Cython, which is correct because the following code:
import numpy as np
a = np.zeros(1, np.int_) # or np.zeros(1, np.int)
print(a.itemsize)would yield 4 (size of long in bytes on Windows) on Windows and 8 on Linux.
Often it makes sense to specify exactly how big the values are, e.g. by using np.int32 and np.int64 which would map to np.int32_t and np.int64_t in Cython and have the same size on all platforms.