Checking for Uppercase Characters in NumPy String Arrays: char.chararray.isupper()


Functionality

  • It returns a boolean array of the same shape as the input array, where:
    • True indicates all cased characters in the corresponding element are uppercase.
    • False indicates otherwise (mixed case, lowercase, or an empty string).
  • It operates element-wise on a NumPy character array (chararray).
  • This function checks if all characters (excluding non-cased characters like spaces or symbols) in a string array are uppercase letters (A-Z).

Example

import numpy as np

data = np.array(['HELLO', 'world', '123'])
is_upper = data.chararray.isupper()

print(data)
print(is_upper)

This code outputs:

['HELLO' 'world' '123']
[ True False False]
  • '123' has no cased characters (all digits), so is_upper[2] is False as well.
  • 'world' contains lowercase characters, so is_upper[1] is False.
  • 'HELLO' is all uppercase, so is_upper[0] is True.

Important Points

  • char.chararray.isupper() is case-sensitive.
  • Non-cased characters (spaces, special characters, numbers) are not considered for the uppercase check.
  • Empty strings ('') also return False since there are no cased characters.

Alternative

For a more general case-insensitive check for all uppercase characters, you can combine chararray.upper() and element-wise comparison:

is_all_uppercase = data == data.chararray.upper()
print(is_all_uppercase)

This approach would return [True False False] as well.



Finding elements containing at least one uppercase character

import numpy as np

data = np.array(['hello', 'HeLlO', 'WORLD', '123'])
has_uppercase = np.char.isupper(data).any(axis=1)

print(data[has_uppercase])

This code checks each element (row) in the array for at least one uppercase character using any(axis=1). It then uses boolean indexing to print only the elements that have True in has_uppercase. This would output:

['HeLlO' 'WORLD']

Converting elements to lowercase if not all uppercase

import numpy as np

data = np.array(['HELLO', 'World', 'MIXED'])
is_upper = data.chararray.isupper()
data[~is_upper] = data[~is_upper].chararray.lower()

print(data)

This example uses vectorized operations to convert elements that are not entirely uppercase to lowercase. ~is_upper inverts the boolean array, targeting elements where is_upper is False. Then, it applies chararray.lower() to those elements and updates the original data array. This would print:

['HELLO' 'world' 'mixed']

Counting uppercase characters in each element

import numpy.char as npc

data = np.array(['HELLO WORLD', 'hello world', '123 AbC'])
uppercase_count = npc.count(data, np.char.upper)

print(uppercase_count)

This code uses npc.count to count the occurrences of uppercase characters (np.char.upper) within each element of the array. It outputs:

[11  5  3]


Regular Expressions (re module)

import numpy as np
import re

data = np.array(['HELLO', 'world', '123'])
is_upper = np.vectorize(lambda x: bool(re.match(r'^[A-Z]+$', x)))(data)

print(data)
print(is_upper)

This approach uses the re (regular expressions) module and the vectorize function to apply a regular expression that checks if the entire string (^$) consists only of uppercase letters ([A-Z]+). This method offers flexibility for more complex patterns, but can be slightly slower than vectorized NumPy functions.

np.char.upper() and Comparison

is_all_uppercase = data == data.chararray.upper()
print(data)
print(is_all_uppercase)

As mentioned earlier, this is a more general and case-insensitive approach. It converts all characters to uppercase and then compares element-wise for equality. This is suitable when you want to check for all uppercase characters regardless of their original case.

  • For a general case-insensitive approach, using char.upper() and comparison provides a simple solution.
  • If you require a case-insensitive check for all uppercase characters or need to handle more complex patterns, consider using regular expressions.
  • If you need a highly optimized and element-wise check specifically for all uppercase characters, char.chararray.isupper() remains the most efficient option within NumPy.