The card may function fine, but if the accuracy of your calculations on the gpu is important, you cannot trust the results with the gpu in this state.
I would recommend that you try flashing the Inforom, it may fix both errors. If it does not resolve after being flashed, the card is likely only viable for fault-tolerant applications such as rendering.
First try rebooting - there have been cases (such as sleep/suspend in linux) that have caused the driver to enter into an invalid state and spuriously report problems with the inforom.
If that does not work, you can attempt fixing it with the nvflash tool:
Page retirement error:
* This means that the GPU page retirement table is full and cannot store more bad pages. As a result, it will continue to use whatever frame buffer cells it has left regardless if they begin to exhibit dual-bit ecc errors or multiple single-bit ecc errors at the same address.
* On the surface, this only matters if running the gpu in ecc mode - you may be able to get generally useable results if using the gpu to do things that are more fault-tolerant such as rendering or AI.
* The page retirement table is stored in the Inforom. If the inforom is bad however, it can potentially cause this error to be spuriously reported.
Corrupt Inforom error:
* This means that the data in the Inforom (a small non-volatile storage device on the gpu used to store various data for the gpu's operation, including things like the page retirement table) does not match the expected checksum. You can try flashing the inforom, but if the error persists it means that the inforom is damaged and cannot accurately store data.
* While in this state, the gpu will ignore the inforom. Nvidia does not have any specifications for a gpu running without its inforom and treats it as an undefined case - it may produce erroneous results or experience unpredictable behavior.
* There has also been an observed bug with linux sleep causing the driver to think that the inforom is corrupted, but rebooting solves the issue.
Sources:
Configuration Setup - CentOS Linux release 7.6.1810 (Core) on system Precision T7610 + Driver 418.39 + NVIDIA Corporation GP102 [TITAN Xp] There is no warning message observed in nvidia-smi output. Steps Taken to Attempt for repro - Open vscode Hibernate System Powered on Back Ran...
forums.developer.nvidia.com
https://www.reddit.com/r/homelab/comments/w7sb1l