Category Archives: sql

ASCII Delimited Text: Why the “Right” Answer Isn’t Very Useful

To overcome some issues with the SQL Server bulk import/export process, I went shopping for a an unquoted, delimited format this morning.  There’s a very old, very simple solution that would have worked fine in my specific circumstance.  Here’s why I’m not going to use it:

The problem I had was that I need to export using the SQL Server bcp utility and re-import using BULK INSERT.  While you can specify the separators with these tools, they don’t handle quoted fields at all, i.e. quoting, if present, doesn’t actually protect the separators should they appear in data.  Per the documentation:

If a terminator character occurs within the data, it is interpreted as a terminator, not as data, and the data after that character is interpreted as belonging to the next field or record. Therefore, choose your terminators carefully to make sure that they never appear in your data.

This was a problem, because my data actually could contain tabs, carriage returns and line feeds, which were the default separators.  It also definitely contained commas.  That puts all the common delimiters out of commission.  “Choose your terminators carefully” is pretty vague, but I figured why not give it a shot?

The approach I almost took was to use the ASCII “record separator” and “unit separator” (ASCII 30 and 31, HEX 0x1E and 0x1F respectively).   Here’s a blog post that nicely sums up the format and why it’s historically and logically valid.

Though it’s not well-documented (somewhat contra-documented), SQL Server supports this.   Even though they suggest that only “printable” characters and a small set of control characters are valid, I had no problem exporting an ASCII-delimited text file using a command like this:

bcp SELECT "x,y FROM t" queryout "c:\filename.asc" -c -r 0x1e -t 0x1f -UTF8 -T -Slocalhost"

I didn’t get as far as trying the BULK INSERT on the other end, though, and here’s why…

Once I had that “ASCII delimited” file, I opened it in NotePad++ to verify that the format was readable and correct.  It was, but the effect wasn’t pretty.  I immediately realized that if I wanted to do anything else with this data–extract a column as text, import into Excel–I was going to have problems.  Excel’s text import wizard, for example, doesn’t support control characters other than tab.  This wasn’t really news to me as I see weird control characters in ASCII all the time working with magnetic stripes, bar codes, RFID, and other encoding and storage mechanisms.  Yes, you can eventually get at the data to a usable form with search and replace, or worst-case regular expressions, but why make it hard to manage if you don’t have to?

In my case, the whitespace control characters in the data improved readability but weren’t functionally required–the data payload itself was code.  Plus, I had comment blocks available as a “quote-like” protection structure.  So, I ended up compromising on replacing the whitespace control characters in a such a way that I can get them back if I want to, or leave them alone for the automated process.  What I ended up doing was this:

bcp SELECT "x,REPLACE(REPLACE(REPLACE(y,CHAR(13),'/*CR*/'),CHAR(10),'/*LF*/'),CHAR(9),'/*TAB*/') FROM t" queryout "c:\filename.tsv" -c -UTF8 -T -Slocalhost"

That produces a “normal” tab-separated file with CRLF as the record separator.  I knew that “x” couldn’t contain those characters, so by replacing them out of “y” I have a file that safely exports and imports while being viewable, editable and importable using normal tools without excessive processing.

I wish we had kept better support for ASCII control characters in our tools as we moved forward with new technologies–it would have been useful to have distinct separators that never appeared in data (until, inevitably, your data payloads themselves contained ASCII-delimited data, at which point you’re back to quoting or replacing or choosing new separators… rinse… repeat).  Of course another solution would have been making the SQL Server utilities just slightly more robust so they could process modern quoted .csv and .tsv formats.  There’s always version.next, right?

SQL Server Upgrade – Transfering User Names and Passwords

When upgrading to a new version of SQL Server, moving the databases themselves is pretty straightforward–you can use either backup and restore, or detach and attach.  However, you’re likely to end up with orphaned users if you do this.  Recreating the users manually is tedious (especially when you consider tracking down and/or changing all the passwords) and may cause SID conflicts.  Fortunately, Microsoft provides a helper script in this knowledgebase article:

How to transfer the logins and the passwords between instances of SQL Server 2005 and SQL Server 2008

Some things to be aware of:

  • You need to review and edit the output of this script as it will capture all logins, including some you likely don’t want to recreate (built-in accounts, Windows groups, plus “real” accounts that may be dead wood).
  • Even though the page specifically says

    “The steps in this article do not transfer the default database information for a particular login. This is because the default database may not always exist on server B. To define the default database for a login, use the ALTER LOGIN statement by passing in the login name and the default database as arguments.”

    the resulting CREATE LOGIN scripts do include the DEFAULT DATABASE parameter.  As they indicate above, this can cause problems if the database isn’t present.  I set most of mine back to “master.”

I run this, selectively, before restoring target databases and everything “just works.”  Used judiciously, it’s a real time-saver.

Stored Procedure to Generate INSERT Statements for All Table Rows Under SQL Server 2005

If you need to create a “data dump” for Microsoft SQL Server, here’s a good free tool:

http://vyaskn.tripod.com/code.htm#inserts

The one thing I haven’t gotten working is any way to automatically add the “SET INDENTITY_INSERT ON/OFF” statements bracketing a table which contains an identity when not using the @ommit_identity = 1 switch. It might be in the comments, I just didn’t dig through it.

I’ll attach them in case that page ever goes away.

SQL 2000 Version

SQL 2005 Version

Copy a SQL Record with LINQ Using Object Shallow Copy

I wish I had kept my reference material on this, so if anyone recognizes borrowed code here, please let me know…

The basic idea is, I needed to “copy” a SQL record within the same table, avoiding some fields. With SQL this could be accomplished with something like:

INSERT INTO ShippingInfo (Name,Value) SELECT Name,Value FROM ShippingInfo where id=5

But imagine there are 20 fields or so. I couldn’t find a native way to do this with LINQ. What I came up with is an object shallow copy with the ability to punch out (omit) fields by name. Here’s the method:

public static class AnyObject<t> where T : class, new()
{
    public static T ShallowCopy(T item)
    {
        return ShallowCopy(item, "");
    }
    
    public static T ShallowCopy(T item, string OmitPropeties)
    {
        if (item == null)
        {
            return null;
        }
        T newItem = new T();
        foreach (System.Reflection.PropertyInfo prop in item.GetType().GetProperties())
        {
            if ((prop.PropertyType.IsValueType || prop.PropertyType.Name == "String") && 
                !prop.PropertyType.Name.StartsWith("EntitySet") &&
                !("|"+OmitPropeties+"|").ToLower().Contains("|"+prop.Name.ToLower()+"|"))
            {
                System.Reflection.PropertyInfo prop2 = item.GetType().GetProperty(prop.Name);
                prop2.SetValue(newItem, prop.GetValue(item, null), null);
            }
        }
        return newItem;
    }
}

Then you call it like…

// get the source record
ShippingInfo TempShipping;
TempShipping = (from s in db.ShippingInfos 
          where s.Id == tempShippingId select s).FirstOrDefault();

// shallow copy the object
ShippingInfo NewShipping = Helpers.AnyObject<shippinginfo>.ShallowCopy(TempShipping, 
                                              "Id|ModifiedBy|ModifiedTime");

// override some copied fields
NewShipping.CreatedBy = PageContext.UserId;
NewShipping.CreatedTime = DateTime.Now;

// insert it
db.ShippingInfos.InsertOnSubmit(NewShipping);
db.SubmitChanges();

Returning a Strongly-Type IEnumerable from a SQL Stored Procedure Using LINQ to SQL

This is surprisingly difficult, but there’s a relatively robust way to do it:

Create your stored procedure
Create a view that mimics the return signature of the stored procedure
In the model (.dbml), import both the stored procedure and view
Set the “Return Type” property of the stored procedure Data Function to the view type
In code, use the .ToList() method of the stored procedure function of the model and cast it to an IEnumerable of the view type, for example:

DBDataContext db = new DBDataContext();
IEnumerable<vwUspEventCalendarResultSet> CalendarResults = db.uspEventCalendar(CustomerId, StartDate, EndDate, DataStartDate, DataEndDate).ToList();

One thing to be careful of is if your stored procedure might return null values in integer key columns you may have to ISNULL them to 0 or some other meaningful value (in the stored procedure) since LINQ to SQL does not pick up a nullable int type from the view signature.

Here are the code bits to back this up…

View:

SELECT     dbo.AllDates.TheDate, dbo.Events.Id, dbo.Events.Name, dbo.Events.Description, dbo.Events.Url, dbo.Events.CustomerId, dbo.Events.TeamId, 
                      dbo.Events.ShowCityId, dbo.Events.EventStartDate, dbo.Events.EventEndDate, dbo.Events.EventMoveInDate, dbo.Events.EventMoveOutDate, 
                      dbo.Events.EventAddressId, dbo.Events.InventoryPullDate, dbo.Events.InventoryShipDate, dbo.Events.InventoryReturnDate, 
                      dbo.Events.InventoryAvailableDate, dbo.Events.EventShippingInfoId, dbo.Events.EventReturnAddressId, dbo.Events.EventAttendance, 
                      dbo.Events.BoothNumber, dbo.Events.BoothSize, dbo.Events.CeilingHeight, dbo.Events.BoothAttendance, dbo.Events.LeadsCaptured, 
                      dbo.Events.CreatedBy, dbo.Events.CreatedTime, dbo.Events.ModifiedBy, dbo.Events.ModifiedTime, dbo.Events.Deleted, dbo.Events.Active, 
                      dbo.Events.Notes
FROM         dbo.Events INNER JOIN
                      dbo.AllDates ON dbo.Events.EventStartDate = dbo.AllDates.TheDate

Stored Procedure:

ALTER PROCEDURE [dbo].[uspEventCalendar] 
	-- Add the parameters for the stored procedure here
	@CustomerId int = 0, 
	@StartDate datetime = null,
	@EndDate datetime = null,
	@DataStartDate datetime = null,
	@DataEndDate datetime = null
AS
BEGIN
	-- default to "this month"
	if @StartDate is null 
		SET @StartDate = cast(cast(DatePart(month,getdate()) as varchar) + '/1/' + cast (DatePart(year,getdate()) as varchar) as datetime)
	if @EndDate is null 
		SET @EndDate = dateadd(day,-1,dateadd(month,1,@StartDate))
	if @DataStartDate is null 
		SET @DataStartDate = @StartDate
	if @DataEndDate is null 
		SET @DataEndDate = @StartDate

	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.
	--SET NOCOUNT ON;
	--SELECT @StartDate,@EndDate
    -- Insert statements for procedure here
	
	SELECT CalendarDates.TheDate,
					ISNULL(EventCalendarEntries.Id,0) as Id, EventCalendarEntries.Name, EventCalendarEntries.Description, EventCalendarEntries.Url, EventCalendarEntries.CustomerId, 
                      EventCalendarEntries.TeamId, EventCalendarEntries.ShowCityId, EventCalendarEntries.EventStartDate, EventCalendarEntries.EventEndDate, EventCalendarEntries.EventMoveInDate, 
                      EventCalendarEntries.EventMoveOutDate, EventCalendarEntries.EventAddressId, EventCalendarEntries.InventoryPullDate, EventCalendarEntries.InventoryShipDate, 
                      EventCalendarEntries.InventoryReturnDate, EventCalendarEntries.InventoryAvailableDate, EventCalendarEntries.EventShippingInfoId, 
                      EventCalendarEntries.EventReturnAddressId, EventCalendarEntries.EventAttendance, EventCalendarEntries.BoothNumber, EventCalendarEntries.BoothSize, 
                      EventCalendarEntries.CeilingHeight, EventCalendarEntries.BoothAttendance, EventCalendarEntries.LeadsCaptured, EventCalendarEntries.CreatedBy, 
                      EventCalendarEntries.CreatedTime, EventCalendarEntries.ModifiedBy, EventCalendarEntries.ModifiedTime, EventCalendarEntries.Deleted, EventCalendarEntries.Active, 
                      EventCalendarEntries.Notes
		 FROM 
		(SELECT thedate from alldates where thedate BETWEEN @StartDate AND @EndDate) as CalendarDates LEFT OUTER JOIN
		(SELECT     EventDates.TheDate, EventsInRange.Id, EventsInRange.Name, EventsInRange.Description, EventsInRange.Url, EventsInRange.CustomerId, 
                      EventsInRange.TeamId, EventsInRange.ShowCityId, EventsInRange.EventStartDate, EventsInRange.EventEndDate, EventsInRange.EventMoveInDate, 
                      EventsInRange.EventMoveOutDate, EventsInRange.EventAddressId, EventsInRange.InventoryPullDate, EventsInRange.InventoryShipDate, 
                      EventsInRange.InventoryReturnDate, EventsInRange.InventoryAvailableDate, EventsInRange.EventShippingInfoId, 
                      EventsInRange.EventReturnAddressId, EventsInRange.EventAttendance, EventsInRange.BoothNumber, EventsInRange.BoothSize, 
                      EventsInRange.CeilingHeight, EventsInRange.BoothAttendance, EventsInRange.LeadsCaptured, EventsInRange.CreatedBy, 
                      EventsInRange.CreatedTime, EventsInRange.ModifiedBy, EventsInRange.ModifiedTime, EventsInRange.Deleted, EventsInRange.Active, 
                      EventsInRange.Notes
FROM         (SELECT     Id, Name, Description, Url, CustomerId, TeamId, ShowCityId, EventStartDate, EventEndDate, EventMoveInDate, EventMoveOutDate, 
                                              EventAddressId, InventoryPullDate, InventoryShipDate, InventoryReturnDate, InventoryAvailableDate, EventShippingInfoId, 
                                              EventReturnAddressId, EventAttendance, BoothNumber, BoothSize, CeilingHeight, BoothAttendance, LeadsCaptured, CreatedBy, 
                                              CreatedTime, ModifiedBy, ModifiedTime, Deleted, Active, Notes
                       FROM          Events
                       WHERE      (EventStartDate < = @DataEndDate) AND (EventEndDate >= @DataStartDate)) AS EventsInRange CROSS JOIN
                          (SELECT     TheDate
                            FROM          AllDates
                            WHERE      (TheDate BETWEEN @StartDate AND @EndDate)) AS EventDates
WHERE     (EventDates.TheDate BETWEEN EventsInRange.EventStartDate AND EventsInRange.EventEndDate)) EventCalendarEntries ON EventCalendarEntries.thedate = CalendarDates.thedate
END

Insert a Record for Every Date Between Two Dates

Revisiting the Common Table Expression from an earlier post, I had already forgotten how to do this thing I actually ended up using this for, which was populating an “AllDates” table for my database. It turned out to be a lot easier to do this and then JOIN to it than to incorporate the CTE code into a larger query. So, here’s how to turn the CTE from the earlier post into an insert statement:

GO
/****** Object: Table [dbo].[allDates] Script Date: 03/27/2008 13:34:54 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[allDates](
[thedate] [datetime] NOT NULL,
CONSTRAINT [PK_allDates] PRIMARY KEY CLUSTERED
(
[thedate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

WITH datecte(anydate) AS (SELECT CAST(‘1/1/2000’ AS datetime) AS anydate
UNION ALL
SELECT anydate + 1 AS anydate
FROM datecte AS datecte_1
WHERE (anydate < CAST('1/1/2021' AS datetime) - 1)) INSERT INTO allDates SELECT anydate FROM datecte AS datecte_2 OPTION (MAXRECURSION 10000)

Return a Record for Each Date Between Two Dates in SQL Server >= 2005

Blogging this so I don’t forget it…

It used to require some fairly ugly, resource intensive hacks (cursors, temp tables, etc.) to emit an inclusive list between two data points when the source data might not include an entry for every point (for example, a calendar, where not every day contains an event). In SQL Server 2005 and above, this is trivially easy, with a Common Table Expression (CTE) and a Recursive Query. To emit one record for every date between 1/1/2008 and 1/31/2008, you do this:


WITH datecte(anydate) AS (SELECT CAST('1/1/2008' AS datetime) AS anydate
UNION ALL
SELECT anydate + 1 AS anydate
FROM datecte AS datecte_1
WHERE (anydate < CAST('2/1/2008' AS datetime) - 1)) SELECT anydate FROM datecte AS datecte_2

If you need more than 100 days (the recursion limit is 100), add this to the end:

OPTION (MAXRECURSION 1000)

The fact that they stop recursion short at 100 by default would seem to indicate that this is an expensive procedure, but even if you're just using this to produce a dummy table with all the dates for several years, it's a nice shortcut.

I just tried the following query, which emits a record for every day between 1/1/2000 and 12/31/2020:


WITH datecte(anydate) AS (SELECT CAST('1/1/2000' AS datetime) AS anydate
UNION ALL
SELECT anydate + 1 AS anydate
FROM datecte AS datecte_1
WHERE (anydate < CAST('1/1/2021' AS datetime) - 1)) SELECT anydate FROM datecte AS datecte_2 OPTION (MAXRECURSION 10000)

On my P4-641+ the script emits 7671 records in 0 (that's zero) seconds and "spikes" the processor to all of 3%. Granted this is not a complex query, but at least we know the recursion (if it really is recursion internally, which I doubt) isn't expensive by itself.