Checking for Uppercase Characters in NumPy String Arrays: char.chararray.isupper()

Functionality

It returns a boolean array of the same shape as the input array, where:
- True indicates all cased characters in the corresponding element are uppercase.
- False indicates otherwise (mixed case, lowercase, or an empty string).
It operates element-wise on a NumPy character array (chararray).
This function checks if all characters (excluding non-cased characters like spaces or symbols) in a string array are uppercase letters (A-Z).

Example

import numpy as np

data = np.array(['HELLO', 'world', '123'])
is_upper = data.chararray.isupper()

print(data)
print(is_upper)

This code outputs:

['HELLO' 'world' '123']
[ True False False]

'123' has no cased characters (all digits), so is_upper[2] is False as well.
'world' contains lowercase characters, so is_upper[1] is False.
'HELLO' is all uppercase, so is_upper[0] is True.

Important Points

char.chararray.isupper() is case-sensitive.
Non-cased characters (spaces, special characters, numbers) are not considered for the uppercase check.
Empty strings ('') also return False since there are no cased characters.

Alternative

For a more general case-insensitive check for all uppercase characters, you can combine chararray.upper() and element-wise comparison:

is_all_uppercase = data == data.chararray.upper()
print(is_all_uppercase)

This approach would return [True False False] as well.

Finding elements containing at least one uppercase character

import numpy as np

data = np.array(['hello', 'HeLlO', 'WORLD', '123'])
has_uppercase = np.char.isupper(data).any(axis=1)

print(data[has_uppercase])

This code checks each element (row) in the array for at least one uppercase character using any(axis=1). It then uses boolean indexing to print only the elements that have True in has_uppercase. This would output:

['HeLlO' 'WORLD']

Converting elements to lowercase if not all uppercase

import numpy as np

data = np.array(['HELLO', 'World', 'MIXED'])
is_upper = data.chararray.isupper()
data[~is_upper] = data[~is_upper].chararray.lower()

print(data)

This example uses vectorized operations to convert elements that are not entirely uppercase to lowercase. ~is_upper inverts the boolean array, targeting elements where is_upper is False. Then, it applies chararray.lower() to those elements and updates the original data array. This would print:

['HELLO' 'world' 'mixed']

Counting uppercase characters in each element

import numpy.char as npc

data = np.array(['HELLO WORLD', 'hello world', '123 AbC'])
uppercase_count = npc.count(data, np.char.upper)

print(uppercase_count)

This code uses npc.count to count the occurrences of uppercase characters (np.char.upper) within each element of the array. It outputs:

[11  5  3]

Regular Expressions (re module)

import numpy as np
import re

data = np.array(['HELLO', 'world', '123'])
is_upper = np.vectorize(lambda x: bool(re.match(r'^[A-Z]+$', x)))(data)

print(data)
print(is_upper)

This approach uses the re (regular expressions) module and the vectorize function to apply a regular expression that checks if the entire string (^$) consists only of uppercase letters ([A-Z]+). This method offers flexibility for more complex patterns, but can be slightly slower than vectorized NumPy functions.

np.char.upper() and Comparison

is_all_uppercase = data == data.chararray.upper()
print(data)
print(is_all_uppercase)

As mentioned earlier, this is a more general and case-insensitive approach. It converts all characters to uppercase and then compares element-wise for equality. This is suitable when you want to check for all uppercase characters regardless of their original case.

For a general case-insensitive approach, using char.upper() and comparison provides a simple solution.
If you require a case-insensitive check for all uppercase characters or need to handle more complex patterns, consider using regular expressions.
If you need a highly optimized and element-wise check specifically for all uppercase characters, char.chararray.isupper() remains the most efficient option within NumPy.

Formatting Strings in NumPy Arrays: char.mod() vs Alternatives

char. mod() is a function within the numpy. char module specifically designed for element-wise string formatting on arrays of strings or Unicode characters in NumPy

Extracting Information from Strings with NumPy's char.partition()

Returns a new array with three elements for each input element:The part before the separator (leftmost portion)The separator itselfThe part after the separator (rightmost portion)

Converting Strings to Uppercase in NumPy Arrays: char.upper()

It operates element-wise, meaning it applies the conversion to each individual string within the array.The char. upper() function in NumPy's char module is used to convert all lowercase characters in a NumPy array containing strings (or a single string) to uppercase