Checking
if a email addresses is valid in SQL can be very convinient from time to time.
Here are two ways to do it in SQL, and a bit about them.
Here is a
full SQL way to do this:
Create function VALIDEMAIL(@s varchar(255))
--Returns true if the string is a valid email address.
returns bit
as
BEGIN
DECLARE @u VARCHAR(60), @v VARCHAR(60), @x VARCHAR(60), @i int, @j int, @result bit
SET @result=1
set @s = rtrim(ltrim(@s))
SET @i=len(@s)-CHARINDEX('@',reverse(@s))+1
if @i <= 1 or len(@s) < 5
begin
set @result = 0
goto done
end
SET @u=LEFT(@s,@i-1)
SET @j=len(@s)-CHARINDEX('.',reverse(@s))+1
SET @v=RIGHT(@s,LEN(@s)-@j)
if @j < 3 or @j < @i
begin
set @result = 0
goto done
end
SET @x=substring(@s,@i+1,@j-@i-1)
IF LEN(@x)<2
BEGIN
SET @result=0
GOTo done
END
IF (LEN(@x)=3) AND (@x NOT LIKE '[a-zA-Z][a-zA-Z][a-zA-Z]'Collate Latin1_General_BIN)
BEGIN
SET @result=0
GOTo done
END
IF (LEN(@x)=2) AND (@x NOT LIKE '[a-zA-Z][a-zA-Z]' Collate Latin1_General_BIN)
BEGIN
SET @result=0
GOTo done
END
SET @i=1
WHILE (@i<LEN(@u))
BEGIN
IF SUBSTRING(@u,@i,1) NOT LIKE '[a-zA-Z0-9_\-.+=/!#$&''*%?^`{|}"@\\ ]' escape '\' Collate Latin1_General_BIN
BEGIN
SET @result=0
GOTo done
END
SET @i=@i+1
END
SET @i=1
WHILE (@i<LEN(@v))
BEGIN
IF SUBSTRING(@v,@i,1) NOT LIKE '[a-zA-Z]' Collate Latin1_General_BIN
BEGIN
SET @result=0
GOTo done
END
SET @i=@i+1
END
SET @i=1
WHILE (@i<LEN(@x))
BEGIN
IF SUBSTRING(@x,@i,1) NOT LIKE N'[a-zA-Z\-.0-9]' escape '\' Collate Latin1_General_BIN
BEGIN
SET @result=0
goto done
END
SET @i=@i+1
END
done:
return @result
END
GO
Here is
another way, where you insert .NET CLR code in the database. Here I use the .NET RegEx.Ismatch, so I can use it for tons of other stuff as well with the right input.
using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SqlServer.Server;
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
namespace SQLSignature
{
public class SqlFunctions
{
[SqlFunction(IsDeterministic = true, DataAccess = DataAccessKind.None)]
public static SqlBoolean fn_RegexIsMatch(SqlString input, SqlString pattern)
{
return (SqlBoolean)Regex.IsMatch(input.Value, pattern.Value);
}
}
}
In SQL:
CREATE ASSEMBLY SQLSignature
FROM 'C:\CLR Assemblies\Performance\SQLFunctions\bin\SQLFunctions.dll';
GO
-- Create fn_SQLSigCLR and fn_RegexReplace functions
CREATE FUNCTION dbo.fn_RegexIsMatch(@input AS NVARCHAR(MAX),@pattern AS NVARCHAR(MAX))RETURNS BIT
WITH RETURNS NULL ON NULL INPUT
EXTERNAL NAME SQLSignature.SQLFunctions.fn_RegexIsMatch;
GO
This can be run with a command like:
select *
from [AW].[person].emailaddress as p
where dbo.fn_RegexIsMatch(RTRIM(LTRIM(lower(P.emailaddress))),
'^[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$')=1
I did
some timings on the two functions and as you can see below there are big
differences. The .NET CLR one is almost 8 times faster(cpu time) than the T-SQL
one. It's also a whole lot easier to read, and can be used for all other
RegEx's needed. You do however need to get out of management studio to make it,
and 1.5 seconds is for processing almost 20000 email addresses still should be
fast enough for most uses.
SET STATISTICS TIME ON
select *
from [AW].[person].emailaddress as p
where dbo.VALIDEMAIL(P.emailaddress)=1
select *
from [AW].[person].emailaddress as p
where dbo.fn_RegexIsMatch(RTRIM(LTRIM(lower(P.emailaddress))),
'^[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$')=1
(19929 row(s) affected)
SQL Server Execution Times:
CPU time = 1466 ms, elapsed time = 1911 ms.
(19972 row(s) affected)
SQL Server Execution Times:
CPU time = 203 ms, elapsed time = 380 ms.
It's pretty cool to see that .NET performs good within SQL as well. I am primarily a .NET developer but do have to use SQL quite a bit as well, but the fact that .NET outperforms SQL both in readability and speed inside SQL on some actions make my job a lot easier.