Test if a value is numeric in python
This post explains how to test if a value is numeric. It is strange to me that there is not a built in method for this. And if there is, I would rather use it!
is-numeric
If you just want to use a module, use is-numeric that I wrote as part of this effort.
pip install is-numeric
Really simple. A method to test if a value is_numeric
.
from is_numeric import is_numeric
print( is_numeric(1) ) # True
print( is_numeric(-1) ) # True
print( is_numeric(123) ) # True
print( is_numeric(123.456) ) # True
print( is_numeric("123.456") ) # True
print( is_numeric("x") ) # False
print( is_numeric("1x") ) # False
The rest of this post will explain different approaches.
Notes on the algorithm
I tested 4 algorithms:
- str.isinstance
- error-driven
- regex
- regex (plus str.isinstance)
str.isinstance
Using the built in .isinstance is limited in functionality, as it only looks for all characters being a digit, and fails on ‘-1’ or ‘1.2’. Thus discard this approach.
def _str_isnumeric(value):
if isinstance(value, (int, float)):
return True
if isinstance(value, str) and value.isnumeric():
return True
return False
error-driven
This is ugly in that it relies on errors in flow control. I hate it. It is like nails on chalk board to me. (spoiler alert: it is the most efficient of the test approaches)
def _is_numeric_using_error(value):
try:
float(value)
return True
except ValueError:
return False
regex
This seems elegant. Check for a pattern.
def _is_numeric_regex(value):
if isinstance(value, (int, float)):
return True
if isinstance(value, str):
return bool(re.match(r"^-?\d+(\.\d+)?$", value))
return False
regex (plus str.isinstance)
The regex solution, with a small change to check if all numeric digits prior to doing regex.
def _is_numeric_regex_plus_isnumeric(value):
if isinstance(value, (int, float)):
return True
if isinstance(value, str) and value.isnumeric():
return True
if isinstance(value, str):
return bool(re.match(r"^-?\d+(\.\d+)?$", value))
return False
Performance Comparison
I ran a very non-scientific test.
Approach
Create a list of inputs, with a mix of float/int, strings that represent numeric values, and non-numeric values.
Iterate x time over the list for a given algorithm.
Repeat for each algorithm.
Variable
- % of inputs that are non-numeric (theory is that error-driven will perform better when the value is numeric, and less efficient when the value cannot be cast to float and an error is raised)
Findings
When most values are numeric 1 (0-50%), the error-driven approach outperforms other approaches (up to 3x).
When most values are non-numeric 2 (80%), the regex (plus str.isinstance) approach has a very slight advantage (4%).
Conclusion
without knowning the % of non-numeric values, it is recommended to use the error-driven approach. As such, the error-drive approach is exposed through the is_numeric
function in this package.
Raw Data
const: iterations=1000000
Algorithm | % Non-Numeric | Time in Seconds | Iterations per Second |
---|---|---|---|
is_numeric_regex_plus_isnumeric | 0.0 | 4.1 | 239,459.52 |
is_numeric_using_error | 0.0 | 1.4 | 680,668.39 |
is_numeric_regex | 0.0 | 4.7 | 211,903.44 |
is_numeric_regex_plus_isnumeric | 0.2 | 5.8 | 169,665.37 |
is_numeric_using_error | 0.2 | 3.2 | 303,433.65 |
is_numeric_regex | 0.2 | 6.3 | 156,751.32 |
is_numeric_regex_plus_isnumeric | 0.5 | 10.6 | 93,566.92 |
is_numeric_using_error | 0.5 | 8.3 | 119,920.32 |
is_numeric_regex | 0.5 | 10.8 | 92,182.89 |
is_numeric_regex_plus_isnumeric | 0.8 | 29.7 | 33,612.89 |
is_numeric_using_error | 0.8 | 28.5 | 35,076.76 |
is_numeric_regex | 0.8 | 28.4 | 35,162.57 |