Checking for Uppercase Characters in NumPy String Arrays: char.chararray.isupper()
Functionality
- It returns a boolean array of the same shape as the input array, where:
True
indicates all cased characters in the corresponding element are uppercase.False
indicates otherwise (mixed case, lowercase, or an empty string).
- It operates element-wise on a NumPy character array (
chararray
). - This function checks if all characters (excluding non-cased characters like spaces or symbols) in a string array are uppercase letters (A-Z).
Example
import numpy as np
data = np.array(['HELLO', 'world', '123'])
is_upper = data.chararray.isupper()
print(data)
print(is_upper)
This code outputs:
['HELLO' 'world' '123']
[ True False False]
'123'
has no cased characters (all digits), sois_upper[2]
isFalse
as well.'world'
contains lowercase characters, sois_upper[1]
isFalse
.'HELLO'
is all uppercase, sois_upper[0]
isTrue
.
Important Points
char.chararray.isupper()
is case-sensitive.- Non-cased characters (spaces, special characters, numbers) are not considered for the uppercase check.
- Empty strings (
''
) also returnFalse
since there are no cased characters.
Alternative
For a more general case-insensitive check for all uppercase characters, you can combine chararray.upper()
and element-wise comparison:
is_all_uppercase = data == data.chararray.upper()
print(is_all_uppercase)
This approach would return [True False False]
as well.
Finding elements containing at least one uppercase character
import numpy as np
data = np.array(['hello', 'HeLlO', 'WORLD', '123'])
has_uppercase = np.char.isupper(data).any(axis=1)
print(data[has_uppercase])
This code checks each element (row) in the array for at least one uppercase character using any(axis=1)
. It then uses boolean indexing to print only the elements that have True
in has_uppercase
. This would output:
['HeLlO' 'WORLD']
Converting elements to lowercase if not all uppercase
import numpy as np
data = np.array(['HELLO', 'World', 'MIXED'])
is_upper = data.chararray.isupper()
data[~is_upper] = data[~is_upper].chararray.lower()
print(data)
This example uses vectorized operations to convert elements that are not entirely uppercase to lowercase. ~is_upper
inverts the boolean array, targeting elements where is_upper
is False
. Then, it applies chararray.lower()
to those elements and updates the original data
array. This would print:
['HELLO' 'world' 'mixed']
Counting uppercase characters in each element
import numpy.char as npc
data = np.array(['HELLO WORLD', 'hello world', '123 AbC'])
uppercase_count = npc.count(data, np.char.upper)
print(uppercase_count)
This code uses npc.count
to count the occurrences of uppercase characters (np.char.upper
) within each element of the array. It outputs:
[11 5 3]
Regular Expressions (re module)
import numpy as np
import re
data = np.array(['HELLO', 'world', '123'])
is_upper = np.vectorize(lambda x: bool(re.match(r'^[A-Z]+$', x)))(data)
print(data)
print(is_upper)
This approach uses the re
(regular expressions) module and the vectorize
function to apply a regular expression that checks if the entire string (^$
) consists only of uppercase letters ([A-Z]+
). This method offers flexibility for more complex patterns, but can be slightly slower than vectorized NumPy functions.
np.char.upper() and Comparison
is_all_uppercase = data == data.chararray.upper()
print(data)
print(is_all_uppercase)
As mentioned earlier, this is a more general and case-insensitive approach. It converts all characters to uppercase and then compares element-wise for equality. This is suitable when you want to check for all uppercase characters regardless of their original case.
- For a general case-insensitive approach, using
char.upper()
and comparison provides a simple solution. - If you require a case-insensitive check for all uppercase characters or need to handle more complex patterns, consider using regular expressions.
- If you need a highly optimized and element-wise check specifically for all uppercase characters,
char.chararray.isupper()
remains the most efficient option within NumPy.